A Complete Overview of the Probability Distributions with Examples, R Implementation

A Complete Overview of the Probability Distributions with Examples, R Implementation

Our end goal is to draw inferences from the population. We first need to learn about probability because it is the underlying base for statistical inferences, predictive models, and machine learning algorithms.

Different types of probability distributions help to find the probability of the occurrence of an event. Different types of distribution work in different conditions.

In this article, I will discuss:

  1. The basics of the probability distribution

2. The ideas and properties of different types of distributions

3. The formula and how to calculate different types of discrete distributions with examples.

4. Implementation of different types of distributions in R.

I will start with some basic concepts and slowly move to the Types of distributions and their R implementation

I will explain each type of distribution with examples. I think the best way to learn them is by working on examples. You will find many articles, books out there with wordy descriptions. But I find them hard to understand if not many hands-on examples. I will focus primarily on doing. Each of the topics will be discussed with examples. Fewer words and more work.

Random Variables

During an experiment when there is no way to tell which outcome is coming, the outcome is random. The outcome should be only one. Such as the sum of 5 rolls of a die. When you roll a die, you do not know its outcome. Or when you draw a card from a deck randomly, you do not know what cards you are drawing. So, these are examples of random variables.

There two types of random variables:

  1. Discrete random variables
  2. Continuous random variables

Discrete Distribution

Let’s understand some important basic concepts with simple examples

This is defined as the mapping of the probabilities of all the values takes by a discrete random variable. The probability distribution is specified in terms of a function called the Probability mass function, PMF.

PMF is the core of the probability distribution. We will calculate PMF for each type of distribution. Please learn it well before moving to the next section.

Here is an example. Toss a fair coin three times. You can get 1 head and 2 tails, 2 heads, and 1 tail, and so on. How many different combinations are possible? There are eight combinations possible:

S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}

Here S is called sample space. It is equally likely to have either of these combinations from this sample space from 3 tosses.

Assume that X is the number of heads observed.

What is the PMF of X?

X has support Sx = {0, 1, 2, 3}. That means X can occur either zero times or once or twice or thrice.

Image for post

When the outcome is TTT, X is zero(Look at the sample space S above). So, the probability of X being zero is 1/8.

fx(0) = 1/8

Look at the sample space again, the probability of having three heads(HHH) in all three tosses is 1/8.

fx(3) = 1/8

There are 3 possible outcomes where H comes out 1 time only out of 8 possible combinations. So,

fx(1) = 3/8

There are 3 possible outcomes where H comes out 2 times out of 8 possible combinations. So,

fx(2) = 3/8

So, here is the representation of PMF in a table:

Image for post

If you sum up all the probabilities in the table above, they will sum up to one. Therefore, 1/8 + 3/8 + 3/8 + 1/8 =1

Let’s calculate the mean of X:

µ = 0 * 1/8 + 1 * 3/8 + 2* 3/8  + 3* 1/8 = 1.5

That means if we repeat this experiment many times independently, the sample mean will come out to be close to 3.5. More experiments will bring better approximation. It can be inferred that X is 3.5 “on the average” or “in the long run”.

Notice, how we calculated the mean. If we generalize it in a formula, it will look like this:

E(X) = x1 p1 + x2p2 + ……. + xn on

Where, x1, x2…xn are the occurrence of heads, in this case, the occurrence of heads. p1, p1,…..pn are the corresponding probability of the occurrence of X. Mean is also called the mathematical expectation E.

Variance and Standard Deviation

Two of the core parameters for understanding a distribution. The variance defined as the degree to which a random variable is scattered around its mean or the mathematical expectation.

Image for post

The variance is also denoted as sigma square. Let’s calculate the variance of the coin toss example above. Remember we calculated the mean or expectation(E) as 3.5. We will use that to calculate the variance now.

Image for post

The standard deviation is simply the square root of the variance. So,

Image for post

There is one more important function, we need to learn. That is the cumulative distribution function(CDF) that will be used in the distribution a lot.

Cumulative Distribution Function (CDF)

Denoted as F(x). F(x) is the probability of a random variable x to be less than or equal to x. In our coin toss example, F(2) means that the probability of tossing a head 2times or less than 2times.

So it is cumulative of: fx(0) + fx(1) + fx(2) = 1/8 + 3/8 + 3/8

These are all the basic functions. Now, Let’s dive into the distributions.

Different Types of Discrete Distributions

There are several types of discrete distributions. Here I will talk about some major types of discrete distributions with examples:

Uniform Distribution

This is the simplest distribution. It will be easier to understand if you see an example first. If you roll a die once, the probability of getting 1, 2, 3, 4, 5, or 6 is the same, 1/6.

Image for post

This type of distribution is called the uniform distribution.

The Probability Mass Function PMF is:

Image for post

Where ‘m’ is the number of possible values. In a dice, there are six sides. So here m is 6 and PMF is 1/6 or 0.1667.

The Cumulative Distribution Function CDF is:

Image for post

Here, x is the number of the possible occurrences of an event. If you roll a dice three times, what is the probability of rolling a 3 twice? So, here x = 2.The CDF is 2/6 or 1/3.

The Formula for Mean is:

Image for post

The mean of rolling dice is (6+1)/2 or 3.5.

The Formula for Variance is:

Image for post

The variance of this example is:

Image for post

That comes out to be 2.92.

Input values of a uniform distribution can be a range as well.

For example, a random student is taking a test where the full score is 15, and the passing score is 10. You know that the student passed. Find the CDF of the student’s score.

Here is the solution.

You know that the passing score is 10. So the student’s score must be between 10 and 15 inclusive. The student’s score might be anything from 10 to 15. The student may get 10, 11, 12, 13, 14, or 15. If you count, the number of the possible scores is 6.

The CDF should be:

Image for post

So, it’s 1/6 or 0.1667.

The general formula is:

Image for post

Here the random variable is in the integer range (a, b) where b > a and m is (b-a+1). You can calculate mean and variance using this m with the formula I mentioned before.

Binomial Distribution

Think of an exam where you can pass or fail. A Covid test, that can be positive or negative. Tossing a fair coin, the outcome can be a ‘head’ or a ‘tail’. In all these cases there are only two possible outcomes, “yes/no”.

The probabilities are p or q. p is the probability of occurring an event and q is the probability of not occurring an event.

Here is a complete example

Suppose you are taking a test without any preparation? There are 15 multiple-choice questions. Each question has four options. You need to get 10 questions right to pass. What is the probability that you will get 10 questions right?

Let’s dissect the question.

Here,

the number of questions is 15. So, n = 15.

Each question has four options. That means the probability of getting one question right is 0.25 (p = 0.25).

That means the probability of not getting one question right is 1-p or 0.75.

You need to get 10 questions(k=10) right out of 15.

The formula to calculate the PMF of the binomial distribution is:

Image for post

Where,

Image for post

Plugging in the values:

Image for post

After calculation, the possibility that you will pass is 0.00067961!

If you know how to calculate the PMF, you can easily calculate the CDF. Let’s see how to use R to do the binomial distribution.

R has ‘dbinom’ function to calculate PMF, ‘pbinom’ function to find the CDF, and ‘rbinom’ function to generate a random distribution of binomial property.

It will be clearer with examples in a bit.

Here, I will use the dbinom function to calculate the probability that you get 10 questions right out of 15.

n=15; p=0.25; k=10
dbinom(k, size=n, prob=p )

Output:

0.00067961

Calculate what is the probability that you can get at most 10 questions right?

That means you can get either 0 or 1 or 2, up to 10. You can do it by calculating PMF for 0, 1, 2,….10 individually and sum them up. R has a great function pbinom that does this job in one line of code.

I will do it using the pbinom function.

pbinom(10, size=n, prob=p )

Output:

0.9998847

What is the probability that you will pass the test? That means you will get 10, 11, 12, 13, 14, or 15 questions right.

There are two ways to do that. You can either calculate the PMF for 10 to 15 and sum them up or calculate the CDF for 15 and 10 individually and subtract. Here I will show both of them.

First, let’s calculate the probability of getting 10, 11, 12, 13, 14, and 15 individually and sum them up using the dbinom function.

sum(dbinom(10:15, size=n, prob = p))

Output:

0.000794949

Now, we will do the same thing using the cumulative distribution function. Calculate the cumulative probability of each of 15 questions getting right first and then subtract the cumulative probability of getting 9 questions right. Then subtract to find the cumulative probability of getting 10 to 15 questions right.

pbinom(15, size=n, prob=p) - pbinom(9, size=n, prob=p)

Output:

0.000794949

Lastly, generate 30 random numbers from a binomial distribution where n = 15 and p=0.25.

rbinom(30, size = 15, prob = 0.25)

Output:

6 2 3 4 3 2 6 5 5 2 4 2 8 6 4 4 1 1 3 1 3 2 5 2 7 4 4 3 5 4

Generating a set of random numbers is common in statistics and research.

The mean and Variance of Binomial Distribution

The formula for mean is np and

The formula for variance is p(1-p)

In our example, where you have to choose from an answer to a question from 4 options, the probability of getting one question right s 0.25.

The mean of the distribution is 15*0.25 = 3.75

The variance is np(1-p) = 15 * 0.25 * (1–0.25) = 2.8125

Hypergeometric Distribution

In hypergeometric distribution, the random data is selected without replacement, unlike binomial distribution. So, the data selected is independent of the previous outcomes. Let’s work on an example to understand it clearly.

Suppose a basket has 8 black balls and 5 white balls. I randomly picked three balls from it. What is the probability that I picked 1 white ball?

That means we will calculate the probability of getting 1 white ball. That means 2 balls will be black.

The formula for picking 1 white ball and 2 black balls is (choosing 2 black balls from 8 black balls) * (choosing 1 white ball from 5 white balls) / (choosing total 3 balls out of total 13 balls).

Here is the mathematical representation:

Image for post

Where,

Image for post

You can calculate it in R directly using the ‘choose’ function like this:

(choose(8, 2) * choose(5, 1)) / choose(13, 3)

Output:

0.4895105

Or, R has a dhyper function to calculate that. It will take the total number of white balls denoted as ‘m’ as we are interested to know about the white balls, the total number of black balls that we are not interested in denoted as ’n’, and the total number of balls I picked denoted as ‘k’.

M <- 5; N <- 8; K <- 3
dhyper(1, m = M, n = N, k = K) #We put 1 in the beginning because we want to know the probability of picking 1 white ball.

Output:

0.4895105

What is the probability that at most 2 balls will be black?

This means, the probability of picking 0 black balls plus the probability of picking 1 black ball plus the probability of picking 2 black balls.

R has a phyper function for doing that. The value of m and n will be different this time. Because we are interested in black balls now. So, m will be the total number of black balls and n will be the total number of white balls.

M <- 8; N <- 5; K <- 3
phyper(2, m = M, n = N, k = K)#We put 2 in the beginning because we are looking to find the probability of picking at most 2 black balls

Output:

0.8041958

Or you can find the probability of getting 0, 1, and 2 black balls individually and sum them up using the dhyper function:

M <- 8; N <- 5; K <- 3
sum(dhyper(0:2, m = M, n = N, k = K))

Output:

0.8041958

Geometric Distribution

The number of independent trials before the first occurrence of an event is the random variable in the Geometric distribution.

Let’s take the previous example, 8 black balls and 5 white balls in a basket. What is the probability that 1 white ball will be drawn after 3 black balls? Balls are returned after each pick.

We denote the probability as p and q as 1-p. Here, probability p will be the probability of getting a white ball in one pick.

The total number of balls is (8+5) = 13 and the number of white balls is 5. So, the probability of picking a white ball in one pick is 5/13. q should be (1-p) or 8/13. The formula is:

Image for post

Here, x is the number of failed attempts. Plugging in the values, the probability to pick a white ball after 3 black balls or 3 failed attempts is:

Image for post

The result comes out to be 0.08963. You can do the same calculation using R. R has a dgeom function available to do it very easily. It will take the number of failed attempts and the probability p as the parameters:

dgeom(3, prob = 5/13)

Output:

0.08963272

What is the probability of picking a white ball after a maximum of 3 failed attempts?

That means we need to calculate the probability of picking a white ball after 0, 1, 2, and 3 failed attempts and sum them up.

sum(dgeom(0:3, prob = 5/13))

Output:

0.8565877

We can do the same using the pgeom function.

pgeom(3, prob = 5/13)

Output:

0.8565877

The Mean and Variance of Geometric Distribution

The formula for mean is:

Image for post

The probability p in our example above is 5/13 = 0.384

The mean is: (1–0.384)/0.384 = 1.6

The formula for variance is:

Image for post

The variance is:

Image for post

The variance comes out to be: 4.16

Negative Binomial Distribution

The number of independent trials before occurring an event a number of times. Let’s learn it by working on an example.

Suppose a basketball player is throwing free shots. The probability of getting success in each shot is 0.5. What is the probability of getting 3 shots in out of 7 free shots?

Here, the random variable is the number of failures. We are looking for the probability of getting 3 successes (getting 3 black balls) out of 7 trials.

So, x= 7, the number of trials

r = 3 (3 is the number of success)

Probability p is 0.5 and q = (1-p) or 0.5.

The formula for negative binomial distribution is:

Image for post

Now, plug in the values and calculate them. I choose to do it in R:

choose(7-1, 3-1) * (0.5)^3 * (0.5)^4

Output:

0.1171875

You can use the dnbinom function in R to do the same. It takes the number of failures, the number of successes, and the probability p as the parameters.

r=3
dnbinom(4, size = r, prob = 0.5)

Output:

0.1171875

What is the probability of getting 0 through 7 failures before 3 successes?

You can rephrase this same question as,

“What is the probability of getting at most 7 failures before 3 successes?”

Like the previous examples, you can calculate it by calculating 0, 1, 2, up to 7 failures individually and sum them up.

r=3
sum(dnbinom(0:7, size = r, prob = 0.5))

Output:

0.9453125

Or, you can use the pnbinom function:

pnbinom(7, size=3, prob = 0.5)

Output:

0.9453125

The Mean and Variance of the Negative Binomial Distribution

The formula for the mean of a negative binomial distribution is :

Image for post

The mean of the example distribution above is:

Image for post

The mean comes out to be 9.

The formula for the variance of the negative binomial distribution is:

Image for post

Plugging in the value of r and p:

Image for post

The variance of the above example distribution comes out to be 36.

Poisson Distribution

This type of distribution is used to model the frequency of a specified event that occurs during a particular period of time.

Suppose an appliance store receives 10 customers between 7 pm to 9 pm every day on average. What is the probability of getting 9 customers in a given day?

Here, the average number of customers is 10. so,

Image for post

The formula for Poisson distribution is:

Image for post

Plugging in the values of lambda and x:

Image for post

The probability of getting 9 customers in a given day is 0.12511.

dpois(9, lambda = 10)

Output:

0.12511

What is the probability of that store receiving 10 to 15 customers in a given day in that same period (7 pm to 9 pm)?

I will show it in two ways. First, find the probabilities of that store of getting 10, 11, 12 up to 15 customers and sum up using the dpois function.

sum(dpois(10:15, lambda = 10))

Output:

0.4933299

The same can be done using the ppois function. We need to find the cumulative probability distribution or the cumulative probability of getting the first 15 customers. Calculate the cumulative probability of getting the first 9 customers. And then subtract to get the cumulative probability of getting 10 to 15 customers.

ppois(15, lambda = 10)-ppois(9, lambda = 10)

Output:

0.4933299

You can also generate a set of random numbers with the same lambda value using the rpois function:

rpois(30, lambda = 10)

Output:

11 12  6 14  7 12 11 15  9  9 14  8 18 15 15 14 14  915  7  8 13  8 12 11 11  9 15 11 12

The Mean and Variance of the Distribution

The mean and variance both are lambda for Poisson distribution.

So, for the example above, the mean and variance both are 10.

Continuous Distributions — Uniform

This is the simplest continuous distribution and almost similar to the discrete uniform distribution.

With the continuous uniform distributions in a range [a, b], the probability of occurrence of a random variable X is uniform throughout the range. Let x be a random number between 1 to 100. Here ‘a’ is 1 and ‘b’ is 100.

The probability that x is any number between 1 to 100 inclusive will be the same. That is 1/100 or 0.01. It’s uniform throughout the range. The probability that x is 10 is 0.01, Again, the probability that x is 90 is 0.01. The probability of a random number occurring is called a probability density function(PDF).

So, we can say that the PDF of any number occurring randomly from 1 to 100 is 0.01.

What is the probability of a number occurring is 10 or less?

Well, that is the PDF of 1 +PDF of 2+ PDF of 3 + PDF of 4 + …. PDF of 10. As this is the uniform distribution, the PDF of all of them is 0.01. So, the probability of a number occurring from 1 to 10 is 0.01 * 10 = 0.1.

This is called a cumulative distribution function. Makes sense, right? It is cumulative.

These two concepts PDF and CDF will be used over and over again in each of the distribution methods.

The formula for the probability density function (PDF) for the random variable X for a uniform continuous distribution is:

Image for post

I already explained an example before while explaining the concepts of PDF and CDF above. Let’s have a look at one more example and the implementation with R.

Suppose that the stock price of a certain company follows a uniform distribution between 50 to 90 dollars. What is the probability that stock price is 63 dollars?

You can calculate it very easily for uniform distribution. Because it’s the same throughout the range as I explained before. But here I want to introduce the R functionalities to calculate the uniform distribution.

We can easily do it by using the ‘dunif’ function that takes the random variable, minimum, and maximum. We can calculate the PDF by using this one line of code.

dunif(63, min = 50, max = 90)

Output:

0.025

What is the probability that the stock price will be 60 dollars or less?

Here we need to calculate the cumulative distribution function (CDF). Here, the probability that the stock price will be 60 dollars or less means the probability of stock price is 50 +the probability of 51 + the probability of 52 + …… + the probability of 60.

R has the ‘punif’ function to calculate the CDF like this.

punif(60, min = 50, max=90)

Output:

0.25

What is the probability that the stock price is at least 70 dollars?

The stock price is at least 70 dollars means the stock price is either 70 dollars or more. So we need the CDF of 71 to 90 dollars.

Look, the total probability is 1. If we just deduct the CDF of 69(that means the cumulative probabilities of 50 to 69) we will get the cumulative probabilities of 71 or more.

1 - punif(70, min = 50, max = 90)

Output:

0.5

That’s all for the uniform continuous distribution

Normal Distribution

The most common continuous distribution. That famous bell-shaped curve or Gaussian distribution represents the normal distribution.

The normal distribution is determined completely by the mean and the standard deviation. If two normal distributions have the same mean and standard deviation, they are identical.

Like uniform continuous distribution, normal distribution also involves a range. The probability is the area under the curve.

Image for post

Three normal distribution with the same mean and different sd

Image for post

Three normal distribution with the same sd and different mean

The area under a complete bell-shaped curve is 1.

The reason normal distribution is so important and widely used is, a lot of natural phenomena of population samples follow a normal distribution. Like the age, height, and BMI of a representative sample should follow a normal distribution.

Let’s work on some example problems in R

Suppose the age of a population is normally distributed between 12 to 72. The mean of the population age is 40 and the standard deviation is 8. What is the probability of a random person being 35 years old?

R has the ‘dnorm’ function that takes the random variable(in this example 35) mean, and standard deviation as inputs and gives you the PDF.

dnorm(35, mean = 40, sd = 8)

Output:

0.041

The probability of a random person being 35 years old is 0.041 in the above-mentioned normal distribution.

What is the probability that a random person is at most 30 years old?

That means the probability of a random person being 30 years or less. That can be anything between 12 to 30. This is the cumulative probability of 12 to 30. We will use the ‘pnorm’ function that is used to calculate the CDF of a normal distribution.

pnorm(30, mean = 40, sd = 8)

Output:

0.106

What is the probability that a random person is at least 30 years old in the same distribution?

Here we need to calculate the probability of a random person being 30 years or more. The cumulative probability of 30, 31, 32….72.

As we know that the total probability is always 1. If we subtract the CDF of 29 (that means the probability of 12, 13, 14……, 29) from the total probability, we will get the CDF of 30 to 72.

1 - pnorm(29, mean = 40, sd = 8)

Output:

0.915

This type of analysis gives a much clear idea about a population rather than just mean and standard deviation, right?

Have you heard of the 68–95–99 rule?

This rule of a normal distribution makes probabilistic inference a lot easier.

68.27% of the variables lie within one standard deviation of the mean.

95.45% of the variables lie within two standard deviations of the mean.

99.73% of the variables lie within three standard deviations of the mean.

Let’s test it!

Calculate the probability of a random person being in the two standard deviation of the mean.

Here we need to calculate the probability of a random person being in between mean+2 *sd and mean — 2* sd. So, this is also a CDF calculation. If we first find the CDF of 12 to mean+2*sd and then subtract the CDF of mean — 2* sd, that should give us the CDF of in between.

mu= 40
sigma = 8
pnorm(mu + 2*sigma, mean = 40, sd = 8) - pnorm(mu - 2*sigma, mean = 40, sd = 8)

Output:

0.9545

or 95.45%. Do you see it? For a perfectly normal distribution 95.45% population or variables lie in between two standard deviations from the mean. You can prove the other two in the same way. Please try it yourself.

All this time we calculated the probabilities of a random variable.

Now we will do the otherwise.

That means now the probability will be given and we will match the population proportions accordingly. Here is an example.

Suppose you have a probability of 75% and you need to know the CDF of the age. It will be clearer after we do the calculation.

qnorm(0.75, mean = 40, sd = 8)

Output:

45.4

That means the CDF of 45 is 75%. In more detail, if we calculate the probability of a random person being 12 + the probability of 13 + the probability of 14 + ……+ the probability of 45, it will be 75%.

Lastly, I want to demonstrate how to generate a set of normally distributed data and plot them.

Suppose we want to generate 1000 random numbers that have a mean of 75 and a standard deviation of 11.

y = rnorm(1000, mean = 75, sd = 11)

This line of code will generate a set of thousand numbers that has a mean of 75 and sd of 11. I am not showing the output here because it will take too much space. Instead, let’s plot it and I will show the plot here.

I will round the numbers before plotting to get a smoother curve

y = round(y)
plot(table(y), type = "h")

Output:

Image for post

Here, table(y) gives you the frequency of each number. Please check the output of table(y), if this is new to you.

This curve will look more normal and more smooth if you generate more data.

Exponential Continuous Distribution

The range of the exponential distribution is from zero to positive infinity. This is defined by a single parameter, the mean number of occurrences per unit of time which is denoted by lambda.

This distribution is used commonly in queuing theory for the distribution of waiting time. It mainly about the amount of time until a certain event occurs. Patients entering hospitals, the length of time between arrivals or the number of sales calls we get every day, the amount of time until an earthquake happens.

The PDF can be calculated manually using the following formula:

Image for post
Image for post

In this article, I will use R to calculate the PDFs and CDFs.

Let’s work on an example to learn it better.

Suppose 25 customers arrive per hour at a retail store on an average. If a customer arrived just now, what is the probability that the next customer will arrive in the next 3 minutes?

Here, the rate is 25 per hour. So, we need to be careful about the unit.

Notice it is asking in the span of the next 3 minutes. We need to calculate the CDF. R has the ‘pexp’ function to do that for an exponential continuous distribution.

pexp(3/60, rate = 25)

Output:

0.7134

What is the probability of the next customer arriving in the next 3 to 7 minutes?

For this, we need to calculate the CDF of 7 and subtract the CDF of 3 from it. That should provide us with the CDF in between.

pexp(7/60, rate = 25) - pexp(3/60, rate = 25)

Output:

0.2323

The probability of the next customer arriving between 3 to 7 minutes is 23.23%.

Conclusion

Congratulations! If you could finish all the examples above and understood the distributions, you learned a very important topic of statistics. This is equally important in data analytics, data science, and artificial intelligence. There are several other types of distributions available. It’s pretty hard to learn and remember all the probability distributions once. My idea is to learn the most common ones and looking over the books or Google for the rest whenever necessary.

Feel free to follow me on Twitter and like my Facebook page.

#probability #statistics #DataAnalytics #DataScience #RProgramming #MachineLearning

Leave a Reply

Close Menu