Our end goal is to draw inferences from the population. We first need to learn about probability because it is the underlying base for statistical inferences, predictive models, and machine learning algorithms.
Different types of probability distributions help to find the probability of the occurrence of an event. Different types of distribution work in different conditions.
In this article, I will discuss:
- The basics of the probability distribution
2. The ideas and properties of different types of discrete distributions
3. The formula and how to calculate different types of discrete distributions manually with examples, so you know what goes behind the scene.
4. Implementation of different types of discrete distributions in R.
I will start with some basic concepts and slowly move to the Types of Discrete distributions and their R implementation
I will explain each type of distribution with an example. I think the best way to learn them is by working on examples. You will find many articles, books out there with wordy descriptions. But I find them hard to understand if not many hands-on examples. I will focus primarily on doing. Each of the topics will be discussed with examples. Fewer words and more work.
Random Variables
During an experiment when there is no way to tell which outcome is coming, the outcome is random. The outcome should be only one. Such as the sum of 5 rolls of a die. When you roll a die, you do not know its outcome. Or when you draw a card from a deck randomly, you do not know what cards you are drawing. So, these are examples of random variables.
There two types of random variables:
- Discrete random variables
- Continuous random variables
In this article, I will explain the discrete distributions and their types in detail. Before diving into the distributions it is important to understand a few important terms that are part of those distribution discussions.
Discrete Distribution
Let’s understand some important basic concepts with simple examples
This is defined as the mapping of the probabilities of all the values takes by a discrete random variable. The probability distribution is specified in terms of a function called the Probability mass function, PMF.
PMF is the core of the probability distribution. We will calculate PMF for each type of distribution. Please elarn it well before moving to the next section.
Here is an example. Toss a fair coin three times. You can get 1 head and 2 tails, 2 heads, and 1 tail, and so on. How many different combinations are possible? There are eight combinations possible:
S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}
Here S is called sample space. It is equally likely to have either of these combinations from this sample space from 3 tosses.
Assume that X is the number of heads observed.
What is the PMF of X?
X has support Sx = {0, 1, 2, 3}. That means X can occur either zero times or once or twice or thrice.

When the outcome is TTT, X is zero(Look at the sample space S above). So, the probability of X being zero is 1/8.
fx(0) = 1/8
Look at the sample space again, the probability of having three heads(HHH) in all three tosses is 1/8.
fx(3) = 1/8
There are 3 possible outcomes where H comes out 1 time only out of 8 possible combinations. So,
fx(1) = 3/8
There are 3 possible outcomes where H comes out 2 times out of 8 possible combinations. So,
fx(2) = 3/8
So, here is the representation of PMF in a table:

If you sum up all the probabilities in the table above, they will sum up to one. Therefore, 1/8 + 3/8 + 3/8 + 1/8 =1
Let’s calculate the mean of X:
µ = 0 * 1/8 + 1 * 3/8 + 2* 3/8 + 3* 1/8 = 3.5
That means if we repeat this experiment many times independently, the sample mean will come out to be close to 3.5. More experiments will bring better approximation. It can be inferred that X is 3.5 “on the average” or “in the long run”.
Notice, how we calculated the mean. If we generalize it in a formula, it will look like this:
E(X) = x1 p1 + x2p2 + ……. + xn on
Where, x1, x2…xn are the occurrence of heads, in this case, the occurrence of heads. p1, p1,…..pn are the corresponding probability of the occurrence of X. Mean is also called the mathematical expectation E.
Variance and Standard Deviation
Two of the core parameters for understanding a distribution. The variance defined as the degree to which a random variable is scattered around its mean or the mathematical expectation.

The variance is also denoted as sigma square. Let’s calculate the variance of the coin toss example above. Remember we calculated the mean or expectation(E) as 3.5. We will use that to calculate the variance now.

The standard deviation is simply the square root of the variance. So,

There is one more important function, we need to learn. That is the cumulative distribution function(CDF) that will be used in the distribution a lot.
Cumulative Distribution Function (CDF)
Denoted as F(x). F(x) is the probability of a random variable x to be less than or equal to x. In our coin toss example, F(2) means that the probability of tossing a head 2times or less than 2times.
So it is cumulative of: fx(0) + fx(1) + fx(2) = 1/8 + 3/8 + 3/8
These are all the basic functions. Now, Let’s dive into the distributions.
Different Types of Discrete Distributions
There are several types of discrete distributions. Here I will talk about some major tyes of discrete distributions with examples:
Uniform Distribution
This is the simplest distribution. It will be easier to understand if you see an example first. If you roll a die once, the probability of getting 1, 2, 3, 4, 5, or 6 is the same, 1/6.

This type of distribution is called the uniform distribution.
The Probability Mass Function PMF is:

Where ‘m’ is the number of possible values. In a dice, there are six sides. So here m is 6 and PMF is 1/6 or 0.1667.
The Cumulative Distribution Function CDF is:

Here, x is the number of the possible occurrences of an event. If you roll a dice three times, what is the probability of rolling a 3 twice? So, here x = 2.The CDF is 2/6 or 1/3.
The Formula for Mean is:

The mean of rolling dice is (6+1)/2 or 3.5.
The Formula for Variance is:

The variance of this example is:

That comes out to be 2.92.
Input values of a uniform distribution can be a range as well.
For example, a random student is taking a test where the full score is 15, and the passing score is 10. You know that the student passed. Find the CDF of the student’s score.
Here is the solution.
You know that the passing score is 10. So the student’s score must be between 10 and 15 inclusive. The student’s score might be anything from 10 to 15. The student may get 10, 11, 12, 13, 14, or 15. If you count, the number of the possible scores is 6.
The CDF should be:

So, it’s 1/6 or 0.1667.
The general formula is:

Here the random variable is in the integer range (a, b) where b > a and m is (b-a+1). You can calculate mean and variance using this m with the formula I mentioned before.
Binomial Distribution
Think of an exam where you can pass or fail. A Covid test, that can be positive or negative. Tossing a fair coin, the outcome can be a ‘head’ or a ‘tail’. In all these cases there are only two possible outcomes, “yes/no”.
The probabilities are p or q. p is the probability of occurring an event and q is the probability of not occurring an event.
Here is a complete example
Suppose you are taking a test without any preparation? There are 15 multiple-choice questions. Each question has four options. You need to get 10 questions right to pass. What is the probability that you will get 10 questions right?
Let’s dissect the question.
Here,
the number of questions is 15. So, n = 15.
Each question has four options. That means the probability of getting one question right is 0.25 (p = 0.25).
That means the probability of not getting one question right is 1-p or 0.75.
You need to get 10 questions(k=10) right out of 15.
The formula to calculate the PMF of the binomial distribution is:

Where,

Plugging in the values:

After calculation, the possibility that you will pass is 0.00067961!
If you know how to calculate the PMF, you can easily calculate the CDF. Let’s see how to use R to do the binomial distribution.
R has ‘dbinom’ function to calculate PMF, ‘pbinom’ function to find the CDF, and ‘rbinom’ function to generate a random distribution of binomial property.
It will be clearer with examples in a bit.
Here, I will use the dbinom function to calculate the probability that you get 10 questions right out of 15.
n=15; p=0.25; k=10
dbinom(k, size=n, prob=p )
Output:
0.00067961
Calculate what is the probability that you can get at most 10 questions right?
That means you can get either 0 or 1 or 2, up to 10. You can do it by calculating PMF for 0, 1, 2,….10 individually and sum them up. R has a great function pbinom that does this job in one line of code.
I will do it using the pbinom function.
pbinom(10, size=n, prob=p )
Output:
0.9998847
What is the probability that you will pass the test? That means you will get 10, 11, 12, 13, 14, or 15 questions right.
There are two ways to do that. You can either calculate the PMF for 10 to 15 and sum them up or calculate the CDF for 15 and 10 individually and subtract. Here I will show both of them.
First, let’s calculate the probability of getting 10, 11, 12, 13, 14, and 15 individually and sum them up using the dbinom function.
sum(dbinom(10:15, size=n, prob = p))
Output:
0.000794949
Now, we will do the same thing using the cumulative distribution function. Calculate the cumulative probability of each of 15 questions getting right first and then subtract the cumulative probability of getting 9 questions right. Then subtract to find the cumulative probability of getting 10 to 15 questions right.
pbinom(15, size=n, prob=p) - pbinom(9, size=n, prob=p)
Output:
0.000794949
Lastly, generate 30 random numbers from a binomial distribution where n = 15 and p=0.25.
rbinom(30, size = 15, prob = 0.25)
Output:
6 2 3 4 3 2 6 5 5 2 4 2 8 6 4 4 1 1 3 1 3 2 5 2 7 4 4 3 5 4
Generating a set of random numbers is common in statistics and research.
The mean and Variance of Binomial Distribution
The formula for mean is np and
The formula for variance is p(1-p)
In our example, where you have to choose from an answer to a question from 4 options, the probability of getting one question right s 0.25.
The mean of the distribution is 15*0.25 = 3.75
The variance is np(1-p) = 15 * 0.25 * (1–0.25) = 2.8125
Hypergeometric Distribution
In hypergeometric distribution, the random data is selected without replacement, unlike binomial distribution. So, the data selected is independent of the previous outcomes. Let’s work on an example to understand it clearly.
Suppose a basket has 8 black balls and 5 white balls. I randomly picked three balls from it. What is the probability that I picked 1 white ball?
That means we will calculate the probability of getting 1 white ball. That means 2 balls will be black.
The formula for picking 1 white ball and 2 black balls is (choosing 2 black balls from 8 black balls) * (choosing 1 white ball from 5 white balls) / (choosing total 3 balls out of total 13 balls).
Here is the mathematical representation:

Where,

You can calculate it in R directly using the ‘choose’ function like this:
(choose(8, 2) * choose(5, 1)) / choose(13, 3)
Output:
0.4895105
Or, R has a dhyper function to calculate that. It will take the total number of white balls denoted as ‘m’ as we are interested to know about the white balls, the total number of black balls that we are not interested in denoted as ’n’, and the total number of balls I picked denoted as ‘k’.
M <- 5; N <- 8; K <- 3 dhyper(1, m = M, n = N, k = K) #We put 1 in the beginning because we want to know the probability of picking 1 white ball.
Output:
0.4895105
What is the probability that at most 2 balls will be black?
This means, the probability of picking 0 black balls plus the probability of picking 1 black ball plus the probability of picking 2 black balls.
R has phyper function for doing that. The value of m and n will be different this time. Because we are interested in black balls now. So, m will be the total number of black balls and n will be the total number of white balls.
M <- 8; N <- 5; K <- 3 phyper(2, m = M, n = N, k = K)#We put 2 in the beginning because we are looking to find the probability of picking at most 2 black balls
Output:
0.8041958
Or you can find the probability of getting 0, 1, and 2 black balls individually and sum them up using the dhyper function:
M <- 8; N <- 5; K <- 3
sum(dhyper(0:2, m = M, n = N, k = K))
Output:
0.8041958
Geometric Distribution
The number of independent trials before the first occurrence of an event is the random variable in the Geometric distribution.
Let’s take the previous example, 8 black balls and 5 white balls in a basket. What is the probability that 1 white ball will be drawn after 3 black balls? Balls are returned after each pick.
We denote the probability as p and q are 1-p. Here, probability p will be the probability of getting a white ball in one pick.
The total number of balls is (8+5) = 13 and the number of white balls is 5. So, the probability of picking a white ball in one pick is 5/13. q should be (1-p) or 8/13. The formula is:

Here, x is the number of failed attempts. Plugging in the values, the probability to pick a white ball after 3 black balls or 3 failed attempts is:

The result comes out to be 0.08963. You can do the same calculation using R. R has a dgeom function available to do it very easily. It will take the number of failed attempts and the probability p as the parameters:
dgeom(3, prob = 5/13)
Output:
0.08963272
What is the probability of picking a white ball after a maximum of 3 failed attempts?
That means we need to calculate the probability of picking a white ball after 0, 1, 2, and 3 failed attempts and sum them up.
sum(dgeom(0:3, prob = 5/13))
Output:
0.8565877
We can do the same using the pgeom function.
pgeom(3, prob = 5/13)
Output:
0.8565877
The Mean and Variance of Geometric Distribution
The formula for mean is:

The probability p in our example above is 5/13 = 0.384
The mean is: (1–0.384)/0.384 = 1.6
The formula for variance is:

The variance is:

The variance comes out to be: 4.16
Negative Binomial Distribution
The number of independent trials before occurring an event a number of times. Let’s learn it by working on an example.
Suppose a basketball player is throwing free shots. The probability of getting success in each shot is 0.5. What is the probability of getting 3 shots in out of 7 free shots?
Here, the random variable is the number of failures. We are looking for the probability of getting 3 successes (getting 3 black balls) out of 7 trials.
So, x= 7, the number of trials
r = 3 (3 is the number of success)
Probability p is 0.5 and q = (1-p) or 0.5.
The formula for negative binomial distribution is:

Now, plug in the values and calculate it. I choose to do it in R:
choose(7-1, 3-1) * (0.5)^3 * (0.5)^4
Output:
0.1171875
You can use the dnbinom function in R to do the same. It takes the number of failures, the number of successes, and the probability p as the parameters.
r=3
dnbinom(4, size = r, prob = 0.5)
Output:
0.1171875
What is the probability of getting 0 through 7 failures before 3 successes?
You can rephrase this same question as,
“What is the probability of getting at most 7 failures before 3 successes?”
Like the previous examples, you can calculate it by calculating 0, 1, 2, up to 7 failures individually and sum them up.
r=3
sum(dnbinom(0:7, size = r, prob = 0.5))
Output:
0.9453125
Or, you can use the pnbinom function:
pnbinom(7, size=3, prob = 0.5)
Output:
0.9453125
The Mean and Variance of the Negative Binomial Distribution
The formula for the mean of a negative binomial distribution is :

The mean of the example distribution above is:

The mean comes out to be 9.
The formula for the variance of the negative binomial distribution is:

Plugging in the value of r and p:

The variance of the above example distribution comes out to be 36.
Poisson Distribution
This type of distribution is used to model the frequency of a specified event that occurs during a particular period of time.
Suppose an appliance store receives 10 customers between 7 pm to 9 pm every day on average. What is the probability of getting 9 customers in a given day?
Here, the average number of customers is 10. so,

The formula for Poisson distribution is:

Plugging in the values of lambda and x:

The probability of getting 9 customers in a given day is 0.12511.
dpois(9, lambda = 10)
Output:
0.12511
What is the probability of that store receiving 10 to 15 customers in a given day in that same period (7 pm to 9 pm)?
I will show it in two ways. First, find the probabilities of that store of getting 10, 11, 12 up to 15 customers and sum up using the dpois function.
sum(dpois(10:15, lambda = 10))
Output:
0.4933299
The same can be done using the ppois function. We need to find the cumulative probability distribution or the cumulative probability of getting the first 15 customers. Calculate the cumulative probability of getting the first 9 customers. And then subtract to get the cumulative probability of getting 10 to 15 customers.
ppois(15, lambda = 10)-ppois(9, lambda = 10)
Output:
0.4933299
You can also generate a set of random numbers with the same lambda value using the rpois function:
rpois(30, lambda = 10)
Output:
11 12 6 14 7 12 11 15 9 9 14 8 18 15 15 14 14 915 7 8 13 8 12 11 11 9 15 11 12
The Mean and Variance of the Distribution
The mean and variance both are lambda for Poisson distribution.
So, for the example above, the mean and variance both are 10.
Conclusion
Congratulations! If you could finish all the examples above and understood the distributions, you learned a very important topic of statistics. This is equally important in data analytics, data science, and artificial intelligence. I will write a separate article on Continous Probability Distributions. Please feel free to ask if you have any questions regarding either of the distributions above.
Feel free to follow me on Twitter and like my Facebook page.
#statistics #probability #programming #DataScience #DataAnalytics #machinelearning