In this activity, you will explore the data provided and conduct a hypothesis test.
- The purpose of this project is to demostrate knowledge of how to conduct a two-sample hypothesis test.
- The goal is to apply descriptive statistics and hypothesis testing in Python.
In the dataset, device is a categorical variable with the labels iPhone and Android.
In order to perform this analysis, you must turn each label into an integer. The following code assigns a 1 for an iPhone user and a 2 for Android. It assigns this label back to the variable device_new.
Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, you can conduct a hypothesis test.
Recall the steps for conducting a hypothesis test:
𝐻0: there is no difference between the mean no. of drives btw Android users & iPhone users
𝐻𝐴: there is difference between the mean no. of drives btw Android users & iPhone users
The significant level is set as 5%
Technical note: The default for the argument equal_var in stats.ttest_ind() is True, which assumes population variances are equal. This equal variance assumption might not hold in practice (that is, there is no strong reason to assume that the two groups have the same variance); you can relax this assumption by setting equal_var to False, and stats.ttest_ind() will perform the unequal variances 𝑡 -test (known as Welch’s t-test). Refer to the scipy t-test documentation for more information.
The p-value (0.14) is more than siginificant value 5%, hence fail to reject the null hypothesis.
In conclusion, there is not a statistically significance difference between the mean no. of drives btw Android users & iPhone users.
Back to Projects main page