0% found this document useful (0 votes)
2 views

Programming for Data Science Assignment-3

The document outlines a series of tasks involving data analysis with the palmerpenguins package in R, including installing packages, calculating statistics, and performing t-tests. It covers data manipulation, visualization, and hypothesis testing related to penguin characteristics. The document emphasizes the importance of interpreting statistical results and understanding the implications of p-values in hypothesis testing.

Uploaded by

Faisal Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Programming for Data Science Assignment-3

The document outlines a series of tasks involving data analysis with the palmerpenguins package in R, including installing packages, calculating statistics, and performing t-tests. It covers data manipulation, visualization, and hypothesis testing related to penguin characteristics. The document emphasizes the importance of interpreting statistical results and understanding the implications of p-values in hypothesis testing.

Uploaded by

Faisal Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

1. Install the tidyverse (show a screenshot).

2. Install the palmerpenguins package (another screenshot).

3. Run the library command on palmerpenguins, and then use the View function to show some
of the contents of the penguins dataframe from the palmerpenguins package.
4. Show the structure of the penguins dataframe.
5. Calculate the following statistics for the weight (body_mass_g) column of the penguins
dataframe:
a. Mean of the weight of all penguins.

b. Variance of the weight of all penguins.

c. Standard deviation of the weight of all penguins.


d. Median of the flipper length (length_flipper_mm) column for all penguins.

e. Create a dataframe that will hold only data for female penguins. Call the dataframe
theGirls. Use the view command to show some of the contents of theGirls dataframe.
Then use theGirls dataframe to determine the mean flipper length for female
penguins.
6. Load the tidyverse. Then create a single command, using pipes, to output the gender (sex)
and the flipper length in millimeters (flipper_length_mm). Use the select command to choose
only the columns you require and filter the male and female genders. Be sure to take care of
any “NA” values in the flipper_length_mm column.
7. Create a new dataframe which will contain all the columns from the penguins dataframe and
also a new column. The new column will be called “bill_Flipper_Ratio” and will be the
bill_length value for each row divided by the flipper_length_mm value. The new dataframe
will be called myNewData. Use the view command to show the new dataframe.
8. In your myNewData dataframe, show only the species column, the island column, and the
bill_length_mm column without writing the island name in the command. Make sure to use
pipes to link the parts of your code.
9. Change the “sex” column name to “gender”. Make sure the change is permanent.

We can see the column name is permanently changed in the View.


10. Find the average bill depth for female, male, and NA genders. You must have output for all
three categories.

11. It looks, from the results in question 9, that the average bill depth for males is significantly
greater than for females. Use a t-test to determine if the means are equal for these two
groups. Use the myNewData dataframe.
a. What are the hypotheses for this test?
Null Hypothesis: The means for these two groups (male & female) are EQUAL.
Alternative Hypothesis: The means for these two groups (male & female) are NOT
EQUAL.

b. What will be the default confidence interval in the R programming t-tests?


The default confidence interval in the R programming t.test is 95%.

c. Run the t-test and show your results.


d. Interpret your results.
From the t-test, we observe that the p-value = 2.036e-12 which is extremely small (much less
than 0.05), indicating strong evidence against the null hypothesis. Since the p-value (2.036e-12)
is much less than 0.05, we can reject the null hypothesis. This suggests that there is a statistically
significant difference in bill depths between male and female penguins.
12. Suppose you think that the average bill depth for both male and female penguins is equal to
17 mm.
Run a t-test to determine if this is correct.

a. What are your hypotheses?


Null hypothesis: The average bill depth for both genders is equal to 17 mm.
Alternative hypothesis: The average bill depth for both genders is not equal to 17 mm.

b. Which type of t-test is this?


This is a one-sample t-test since it tests against a specified mean

c. Run the t-test

d. Interpret the output.

The p-value from the t-test is 0.1578. This value is greater than the significance level of
0.05. Therefore, we cannot reject the null hypothesis. This means that there is not enough
statistical evidence to conclude that the average bill depth of the penguins is different from 17
mm. The 95% confidence interval for the mean is (16.94113, 17.36121). Since this interval
contains 17, it further supports the conclusion that the true mean of bill depth is not significantly
different from 17 mm. The sample mean of 17.15117 indicates that the average bill depth in your
sample is slightly above 17 mm, but it is not statistically significant.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy