Programming for Data Science Assignment-3
Programming for Data Science Assignment-3
3. Run the library command on palmerpenguins, and then use the View function to show some
of the contents of the penguins dataframe from the palmerpenguins package.
4. Show the structure of the penguins dataframe.
5. Calculate the following statistics for the weight (body_mass_g) column of the penguins
dataframe:
a. Mean of the weight of all penguins.
e. Create a dataframe that will hold only data for female penguins. Call the dataframe
theGirls. Use the view command to show some of the contents of theGirls dataframe.
Then use theGirls dataframe to determine the mean flipper length for female
penguins.
6. Load the tidyverse. Then create a single command, using pipes, to output the gender (sex)
and the flipper length in millimeters (flipper_length_mm). Use the select command to choose
only the columns you require and filter the male and female genders. Be sure to take care of
any “NA” values in the flipper_length_mm column.
7. Create a new dataframe which will contain all the columns from the penguins dataframe and
also a new column. The new column will be called “bill_Flipper_Ratio” and will be the
bill_length value for each row divided by the flipper_length_mm value. The new dataframe
will be called myNewData. Use the view command to show the new dataframe.
8. In your myNewData dataframe, show only the species column, the island column, and the
bill_length_mm column without writing the island name in the command. Make sure to use
pipes to link the parts of your code.
9. Change the “sex” column name to “gender”. Make sure the change is permanent.
11. It looks, from the results in question 9, that the average bill depth for males is significantly
greater than for females. Use a t-test to determine if the means are equal for these two
groups. Use the myNewData dataframe.
a. What are the hypotheses for this test?
Null Hypothesis: The means for these two groups (male & female) are EQUAL.
Alternative Hypothesis: The means for these two groups (male & female) are NOT
EQUAL.
The p-value from the t-test is 0.1578. This value is greater than the significance level of
0.05. Therefore, we cannot reject the null hypothesis. This means that there is not enough
statistical evidence to conclude that the average bill depth of the penguins is different from 17
mm. The 95% confidence interval for the mean is (16.94113, 17.36121). Since this interval
contains 17, it further supports the conclusion that the true mean of bill depth is not significantly
different from 17 mm. The sample mean of 17.15117 indicates that the average bill depth in your
sample is slightly above 17 mm, but it is not statistically significant.