Gaussian distribution is the most important probability distribution in statistics and it is also important in machine learning. Because a lot of natural phenomena such as the height of a population, blood pressure, shoe size, education measures like exam performances, and many more important aspects of nature tend to follow a Gaussian distribution.
I am sure, you heard this term and also know it to some extent. If not, do not worry. This article will explain it clearly. I found some amazing visuals in Professor Andrew Ng’s machine learning course in Coursera. He knows how to break a topic into small tiny pieces and make it easier and explain it in detail.
He used some visuals that made it so easy to understand Gaussian distribution and its relationship with the parameters that are related to it such as mean, standard deviation, and variance.
In this article, I cut some of the visuals from his course and used it here to explain the Gaussian distribution in detail.
Gaussian Distribution
Gaussian distribution is a synonym for normal distribution. They are the same thing. Say, S is a set of random values whose probability distribution looks like the picture below.
This is a bell-shaped curve. If a probability distribution plot forms a bell-shaped curve like above and the mean, median, and mode of the sample are the same that distribution is called normal distribution or Gaussian distribution.
The Gaussian distribution is parameterized by two parameters:
a. The mean and
b. The variance
The mean mu is the center of the distribution and the width of the curve is the standard deviation denoted as sigma of the data series.
So, the Gaussian density is the highest at the point of mu or mean, and further, it goes from the mean, the Gaussian density keeps going lower.
Here is the formula for the Gaussian distribution:
The left side of this equation reads as the probability of x parameterized by the mu and sigma square. This is the formula for the bell-shaped curve where sigma square is called the variance.
How Gaussian Distribution Relates to ‘Mean’ and Standard Deviation
In this section, I will show some pictures that will give you a clear idea of how mu and sigma relate to a bell curve. I will show three pictures where mu will fix at zero and sigma will be different.
Notice how the shape and range of the curves change with different sigma.
This is the probability distribution of a set of random numbers with mu is equal to 0 and sigma is 1.
In this picture, mu is 0 which means the highest probability density is around 0 and the sigma is one. means the width of the curve is 1.
Notice, the height of the curve is about 0.5 and the range is -4 to 4 (look at x-axis). The variance sigma square is 1.
Here is another set of random numbers that has a mu of 0 and sigma 0.5.
Because the mu is 0, like the previous picture the highest probability density is at around 0 and the sigma is 0.5. So, the width of the curve is 0.5. The variance sigma square becomes 0.25.
As the width of the curve is half the previous curve, the height became double. The range changed to -2 to 2 (x-axis) which is the half of the previous picture.
In this picture, sigma is 2 and mu is 0 as the previous two pictures.
Compare it to figure 1 where sigma was 1. This time height became half of figure 1. Because the width became double as the sigma became double.
The variance sigma square is 4, four times bigger than figure 1. Look at the range in the x-axis, it’s -8 to 8.
This example is a bit different than the previous three examples.
Here, we changed mu to 3 and sigma is 0.5 as figure 2. So, the shape of the curve is exactly the same as figure 2 but the center shifted to 3. Now the highest density is at around 3.
Look at all four curves above. It changes shapes with the different values of sigma but the area of the curve stays the same.
One important property of probability distribution is, the area under the curve is integrated to one.
Parameter Estimation
Suppose, we have a series of data. How to estimate the mu(mean), sigma(standard deviation), and sigma square(variance)?
Calculating mu is straight forward. it’s simply the average. Take the summation of all the data and divide it by the total number of data.
Here, xi is a single value in the dataset and m is the total number of data.
The formula for the variance (sigma square) is:
The standard deviation sigma is simply the square root of the variance.
Multivariate Gaussian Distribution
Instead of having one set of data, what if we have two sets of data and we need a multivariate Gaussian distribution. Suppose we have two sets of data; x1 and x2.
Separately modeling p(x1) and p(x2) is probably not a good idea to understand the combined effect of both the dataset. In that case, you would want to combine both the dataset and model only p(x).
Here is the formula to calculate the probability for multivariate Gaussian distribution,
The summation symbol in this equation can be confusing! It is the determinant of sigma which is actually an n x n matrix of sigma.
Visual Representation of Multivariate Gaussian Distribution
In this section, we will see the visual representation of Multivariate Gaussian distribution and how the shape of the curve changes with mu, sigma, and the correlation between the variables.
Start with a Standard Normal Distribution
The picture represents a probability distribution of a multivariate Gaussian distribution where mu of both x1 and x2 are zeros.
Please don’t get confused by the summation symbol here. That is an identity matrix that contains sigma values as diagonals. The 1s in the diagonals are the sigma for both x1 and x2. And the zeros in the off diagonals show the correlation between x1 and x2. So, x1 and x2 are not correlated in this case.
The picture here is simple. In both x1 and x2 direction, the highest probability density is at 0 as the mu is zero.
The dark red color area in the center shows the highest probability density area. The probability density keeps going lower in the lighter red, yellow, green, and cyan areas. It’s the lowest in the dark blue color zone.
Change the Standard Deviation Sigma
Now, let’s see what happens if the sigma values shrink a little bit. It is 0.6 for both x1 and x2.
As I mentioned before the area under the curve has to be integrated to 1. So, when the standard deviation sigma shrinks, the range also shrinks. At the same time, the height of the curve becomes higher to adjust the area.
In the contrast, when sigma is larger, the variability becomes wider. So, the height of the curve gets lower.
Look at figure 6, this changing of the height of the curve and ranges is almost similar to the figures I showed before in the single variable Gaissual distributions.
The sigma values for both x1 and x2 will not be the same always. Let’s check a few cases like that.
Here in figure 7, sigma for x1 is 0.6, and sigma for x2 is 1.
So, the range looks like an eclipse. It shrunk for the x1 as the standard deviation sigma is smaller for sigma now.
In figure 8, it is the opposite of the previous picture.
The sigma for x1 is the double of the sigma for x2.
x1 has a much wider range this time! So the eclipse changed its direction.
Change the Correlation Factor Between the Variables
This is a completely different scenario. In figure 9, the off-diagonal values are not zeros anymore. It’s 0.5. It shows that x1 and x2 are correlated by a factor of 0.5.
The eclipse has a diagonal direction now. x1 and x2 are growing together as they are positively correlated.
When x1 is large x2 also large and when x1 is small, x2 is also small.
In figure 10, the correlation between x1 and x2 is even bigger, 0.8!
So the eclipse is steeper!
All the probability lies in a narrow region. The distribution also looks tall and thin.
In all the pictures above the correlation between x1 and x2 was either positive or zeros. Let’s see an example where the correlation is negative.
In figure 11, the correlation between x1 and x2 is -0.8.
You can see the probability lies in a narrow range again. But when x1 is bigger, x2 is smaller and when x1 is smaller, x2 is bigger.
Finally, we should check for some different mean(mu)
We kept the value of mu always 0. Let’s see how it changes with a different mu.
In figure 12, mu is zero for x1 and mu is 0.5 for x2.
Look at the range in the picture. The center of the curve shifts from zero for x2 now.
The center position or the highest probability distribution area should be at 0.5 now.
In figure 13, mu is 1.5 for x1 and -0.5 for x2.
The center of the highest probability in the x1 direction is 1.5. At the same time, the center of the highest probability is -0.5 for x2 direction.
Overall, the whole curve shifted.
Conclusion
I hope this article was helpful in understanding Gaussian distribution and it’s characteristics clearly. I tried to present and explain the relationship of the curve with the different parameters. Hopefully, when you will use Gaussian distribution in statistics or in machine learning, it will be much easier now.
Feel free to follow me on Twitter and like my Facebook page.
#statistics #datascience #machinelearning