A p-test is a statistical method to test the validity of a commonly accepted claim about a population. That commonly accepted claim is called a null hypothesis. Based on the p-value, we reject or fail to reject a null hypothesis.
Key Characteristic To Remember
- The smaller the p-value, the stronger the evidence that the null hypothesis should be rejected.
- The p-test statistic follows a normal distribution when the sample size is large enough. When at least 10 positives and at least 10 negative answers are in the sample, the sample size can be called large enough. Please see the example below for a more clear explanation.
Understanding The p-test With An Example
Here is the research question:
‘In previous year 52% of parents believe that electronics and social media was the cause of their teenager’s lack of sleep. Do more parents today believe that their teenager’s lack of sleep is caused due to electronics and social media?’
This question is taken from the course ‘Inferential Statistical Analysis with Python’ in Coursera. In this question, we are asked to test, if there is a significant increase in the number of parents who believe that social media is the cause of their teenager’s lack of sleep. Here is the step by step process of performing this test:
Set up the null hypothesis. In any hypothesis test, we need to set up the hypothesis before collecting any data. Researchers set up two hypotheses. The first one is the null hypothesis which is the belief or premise that researchers want to test and reject. In the example above, the null hypothesis is 0.52. Because 52% of parents believe that electronics and social media were causing their teenager’s lack of sleep.
Define the alternative hypothesis. Look at the research question again. We need to find out if more parents today believe that electronics and social media are the cause of the lack of sleep. That means, we have to find out if p is greater than 0.52 today.
After conducting the p-test, if we have enough evidence to reject the null hypothesis, we will accept the alternative hypothesis.
Choose the significance level. Most of the time researchers choose 0.05. That means the confidence level is 95%. A p-value with a significance level of less than or equal to 5% means that there is a probability of greater than or equal to 95% that the results are not random. So, your results are significant and there is enough evidence to reject the null hypothesis. For this example, we will use the significance level 0.05.
Collect the data. After defining the hypothesis and significance level, we should collect the data. For this example, Mott’s Children’s Hospital collected the data and found out this:
‘A random sample of 1018 parents with a teenager was taken. 56% of them said that they believe electronics and social media was the cause of their teenager’s lack of sleep.’
Check the standard assumptions for the p-test. There are two assumptions:
- We need a random sample.
- We need a large enough sample size to ensure the distribution of sample proportions are normal.
How to know if the sample is large enough? n* p needs to be at least 10 and n*(1-p) also needs to be at least 10. Here, p is 0.52. Because 0.52 is our null hypothesis. And n is the population size. In this case 1018.
n*p = 1018 * 0.52 = 529
n*(1-p) = 1018 * (1–0.52) = 489
The sample was random. So, checking the assumptions are done. Now we can proceed with the p-test.
Calculate the p-value. The formula to calculate the test statistic Z is:
If I use the symbols instead of words:
The formula for standard error (SE) is:
Plugging the values:
The standard error comes out to be 0.0157. The test statistic Z is:
The test statistic Z is 2.555. It means that the observed sample proportion (0.56 or 56%) is 2.555 null standard errors above the null hypothesis (0.52 0r 52%).
In the picture above, the hashed area is the p-value. Using the Z test statistic, we can find the p-value as 0.0053. You can find this p-value using a programming language like Python or a z-table.
Come up with the conclusions. As the p-value (0.0053) is less than the significance level (0.05), we have enough evidence to reject the null hypothesis. So, our alternative hypothesis is true that more than 53% of parents today believe that electronics and social media are the cause of their teenager’s lack of sleep.
This example was to conduct a p-test for a population proportion.
#Statistics, #HypothesisTesting, #ptest, #mathematics, #DataScience, #DataAnalysis #StatisticalAnalysis