The Confidence Interval (CI) is very important in statistics and data science. In this article, I am going to explain the confidence interval, how to calculate it, and the important characteristics of it.
The confidence interval (CI) is a range of values. It is expressed as a percentage and is expected to contain the best estimate of a statistical parameter. A confidence interval of 95% mean, it is 95% certain that our population parameter lies in between this confidence interval.
Interpretation of confidence intervals
Here is a statement:
“In a sample of 659 parents with toddlers, 540, about 85 percent, stated they use a car seat for all travel with their toddler. From these results, a 95% confidence interval was provided, going from about 82.3 percent up to 87.7 percent.”
This statement means, it is 95% certain that the population proportion that uses a car seat for all travel with their toddler is 82.3 and 87.7. If we take several subsamples from this population, 95% of the time, the population proportion that uses a car seat for all travel with their toddler will fall between 82.3% to 87.7%.
Can we say that the confidence interval (82.3, 87.7) contains the true population proportion? The answer is unknown. The population proportion is a fixed value but unknown. It is important to remember that 95% confidence does not mean a 95% probability.
Why Confidence Interval Is Important?
It is important because it is not possible to take data from every single person in a population most of the time. In the example above, the sample size was 659. We estimated the population proportion of the parents with toddlers who use a car seat for all travel from a sample of 659 parents. We could not get the data from all the parents with toddlers. So, we calculate the population proportion from our available sample and consider a margin of error. With that margin of error, we get a range. This range is called a confidence interval. A confidence interval is a way to express how well the sample data represent the total population. You can calculate a confidence interval of any number(less than 100%). But a 95% confidence interval is the most common.
How to Calculate the Confidence Interval
The formula for the confidence interval is:
We normally want a high confidence level such as 75%, 95%, or 99%. Higher the confidence level(CL), lower the precision. In the example above, the best estimate is 85%. We can calculate the estimates SE from the following formula:
In the equation, above p1 is the best estimate and n is the sample size. Here is a table for z- score for a few commonly used confidence level.
Plugging in all the values,
The confidence interval come out to be 82.3% and 87.7%.
The range of CI is higher for higher CL
In the same way, we can calculate a 99% confidence level. You only need to change the z-score. From the table above, the z-score for a 99% confidence level is 2.57. Plugging in that value in the confidence interval formula, the confidence interval for a 99% confidence level is 81.43% to 88.57%. The range of a confidence interval is higher for a higher confidence level.
In the picture above, ‘mu’ in the middle is the best estimate and sigma is the standard deviation. We used Standard error in our example because the population standard deviation is unknown. It shows in the picture that 68% of the population stays within the range of one sigma from the best estimate. That is a 68% confidence interval. In the same way, a 95% confidence interval and a 99.7% confidence interval is in the range of 2 sigmas and 3 sigmas from the best estimate ‘mu’. Another way to express it is, 95% and 99.7% of the population lies in the range of 2 sigmas and 3 sigmas from the best estimate ‘mu’. If this picture is confusing to you, please don’t worry about it and calculate different confidence levels from formulas above using the z-score table. That will give you an idea about the trend.
CI is Narrower for Bigger Sample
Bigger the sample size, more precise the confidence interval is. Let’s prove it with the example of parents with toddlers. Let’s assume the best estimate remains the same, 0.85. But the sample size is 1500 instead of 659. Now, plugin this new sample size in the formula and calculate a 95% confidence interval.
The confidence interval for 95% confidence level becomes 83.2% and 86.8% which is narrower than 82.3% and 87.7%.
I hope this was helpful. More reading recommendations: