Hypothesis-testing

In this activity, you will explore the data provided and conduct a hypothesis test.

In the dataset, device is a categorical variable with the labels iPhone and Android.

In order to perform this analysis, you must turn each label into an integer. The following code assigns a 1 for an iPhone user and a 2 for Android. It assigns this label back to the variable device_new.

Mapping text labels with numerical code

Image

Explore the relationship between device type and the number of drives

Image

Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, you can conduct a hypothesis test.

Hypothesis Testing

Recall the steps for conducting a hypothesis test:

𝐻0: there is no difference between the mean no. of drives btw Android users & iPhone users
𝐻𝐴: there is difference between the mean no. of drives btw Android users & iPhone users

The significant level is set as 5%
Technical note: The default for the argument equal_var in stats.ttest_ind() is True, which assumes population variances are equal. This equal variance assumption might not hold in practice (that is, there is no strong reason to assume that the two groups have the same variance); you can relax this assumption by setting equal_var to False, and stats.ttest_ind() will perform the unequal variances 𝑡 -test (known as Welch’s t-test). Refer to the scipy t-test documentation for more information.

Two-sample hypothesis test

Image

The p-value (0.14) is more than siginificant value 5%, hence fail to reject the null hypothesis.
In conclusion, there is not a statistically significance difference between the mean no. of drives btw Android users & iPhone users.

Back to Projects main page