0% found this document useful (0 votes)

18 views13 pages

Experiment 5

The document demonstrates various techniques for loading, cleaning, and summarizing data in R. It shows how to import data from CSV and Excel files, remove rows with missing values, replace missing values with means, remove duplicate rows, and drop specific rows. Descriptive statistics like mean, median, mode, variance, and quantiles are calculated on the loaded data.

Uploaded by

anjanisvecw2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views13 pages

Experiment 5

Uploaded by

anjanisvecw2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

EXPERIMENT 5

#LOAD DATA FROM DIFFERENT FILES LIKE CSV AND EXCEL R

getwd()

setwd("C:/Users/HP/Desktop/R LAB PROGRAMS")

OUTPUT:

> getwd()

[1] "C:/Users/HP/Documents"

> setwd("C:/Users/HP/Desktop/R LAB PROGRAMS")

#IMPORTING DATA USING READ.CSV

f1<- read.csv("marks.csv")

#length of data frame

length(f1)

#summary of the data frame

summary(f1)

OUTPUT:

Student.ID Name JAVA.MARKS DM.MARKS OS.MARKS CGPA

1 101 Alice 85 90 88 3.8

2 102 Bob 78 85 80 3.4

3 103 Charlie 92 88 95 4.0

4 104 David 75 82 77 3.2

5 105 Emily 88 95 89 3.9

6 106 Frank 80 78 82 3.1

7 107 Grace 95 92 97 4.2

8 108 Henry 70 75 67 2.9

9 109 Irene 89 87 91 3.7

10 110 Jack 84 91 86 3.6

> length(f1)

[1] 6
> summary(f1)

Student.ID Name JAVA.MARKS DM.MARKS

Min. :101.0 Length:10 Min. :70.00 Min. :75.00
1st Qu.:103.2 Class :character 1st Qu.:78.50 1st Qu.:82.75
Median :105.5 Mode :character Median :84.50 Median :87.50
Mean :105.5 Mean :83.60 Mean :86.30
3rd Qu.:107.8 3rd Qu.:88.75 3rd Qu.:90.75
Max. :110.0 Max. :95.00 Max. :95.00
OS.MARKS CGPA
Min. :67.0 Min. :2.900
1st Qu.:80.5 1st Qu.:3.250
Median :87.0 Median :3.650
Mean :85.2 Mean :3.580
3rd Qu.:90.5 3rd Qu.:3.875
Max. :97.0 Max. :4.200

#DISCRIPTIVE STATISTICS

#DATA DESCRIPTION

#COMPUTING MEAN VALUE

mean = mean(f1$CGPA)

mean

#computing median

median = median(f1$CGPA)

median

#computing mode

install.packages("modeest")

library(modeest)

mode = mfv(f1$CGPA)

print(mode)

cat("mean, median & mode of the java discription are: ")

print(mean)

print(median)

print(mode)

OUTPUT:

> mean

[1] 3.58

> median

[1] 3.65

package ‘modeest’ successfully unpacked and MD5 sums checked

> print(mode)

[1] 2.9 3.1 3.2 3.4 3.6 3.7 3.8 3.9 4.0 4.2

mean, median & mode of the java discription are:

[1] 3.58

[1] 3.65

[1] 2.9 3.1 3.2 3.4 3.6 3.7 3.8 3.9 4.0 4.2

##MEASURES OF VARIABILITY

max = max(f1$JAVA.MARKS)

cat("maximum value: \n",max)

min = min(f1$JAVA.MARKS)

cat("minimum value: \n",min)

OUTPUT:

maximum value:

minimum value:

#calculate range

range(f1$JAVA.MARKS)

#calculate range, difference between max and min

ranged = max - min

cat("Difference between max and min, Range is : ")

ranged

range(f1$JAVA.MARKS)

max(f1$JAVA.MARKS)

OUTPUT:

> range(f1$JAVA.MARKS)

[1] 70 95

Difference between max and min, Range is : > ranged

[1] 25

> range(f1$JAVA.MARKS)

[1] 70 95
> max(f1$JAVA.MARKS)

[1] 95

#Data variability

#Variance and standard variation

cat("Variance of the java marks: ")

variance = var(f1$JAVA.MARKS)

variance

stdevq = sd(f1$JAVA.MARKS)

meanvq = mean(f1$JAVA.MARKS)

cv = (stdevq/meanvq)* 100

cat("Variance of the java marks: ",variance)

cat("Standard deviation of the java marks: ", stdevq)

varianced = var(f1$DM.MARKS)

stdeva = sd(f1$DM.MARKS)

meannd = mean(f1$DM.MARKS)

meannd

stdeva

cv1 = (stdeva/meannd)*100

cv1

cat("Variance of the dm marks: ",varianced)

cat("Standard deviation of dm marks: ",stdeva)

cat("S.D in java and dm marks: ",stdevq,stdeva)

print("Data variability in java and dm marks")

if(cv > cv1){

print("Java marks have more variability than dm")

}else{

print("Dm marks have more variability than java")

OUTPUT:
Variance of the java marks:

[1] 61.6

> cv

[1] 9.388238

Variance of the java marks: 61.6

Standard deviation of the java marks: 7.848567

> meannd

[1] 86.3

> stdeva

[1] 6.360468

> cv1

[1] 7.370183

Variance of the dm marks: 40.45556

Standard deviation of dm marks: 6.360468

S.D in java and dm marks: 7.848567 6.360468

[1] "Data variability in java and dm marks"

[1] "Java marks have more variability than dm"

##QUANTILES OF THE DATA

quartiles = quantile(f1$JAVA.MARKS)

quartiles

probs = seq(0,1,0.25)

quantile(f1$JAVA.MARKS,probs)

OUTPUT:

> quartiles

0% 25% 50% 75% 100%

70.00 78.50 84.50 88.75 95.00

> quantile(f1$JAVA.MARKS,probs)

0% 25% 50% 75% 100%

70.00 78.50 84.50 88.75 95.00

#DECILES
probs = seq(0,1,0.1)

quantile(f1$JAVA.MARKS,probs)

OUTPUT:

> quantile(f1$JAVA.MARKS,probs)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

70.0 74.5 77.4 79.4 82.4 84.5 86.2 88.3 89.6 92.3 95.0

#PERCENTILES

h = c(1:2000)

probs1 = seq(0,1,0.1)

quantile(h,probs)

OUTPUT:

> quantile(h,probs)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

1.0 200.9 400.8 600.7 800.6 1000.5 1200.4 1400.3 1600.2 1800.1 2000.0

getwd()

setwd("C:/Users/HP/Desktop/R LAB PROGRAMS")

path = "C:/Users/HP/Desktop/R LAB PROGRAMS/student-marks.xlsx"

library("readxl")

ps = read_excel(path , skip=4)

getwd()

setwd("C:/Users/HP/Desktop/R LAB PROGRAMS")

library("readxl")

OUTPUT:

> ps

# A tibble: 6 × 6

`104` David `75` `82` `77` `3.2`

<dbl> <chr> <dbl> <dbl> <dbl> <dbl>

1 105 Emily 88 95 89 3.9

2 106 Frank 80 78 82 3.1

3 107 Grace 95 92 97 4.2

4 108 Henry 70 75 67 2.9

5 109 Irene 89 87 91 3.7

6 110 Jack 84 91 86 3.6

#DATA CLEANING IN R

#Data cleaning refers to the process of transforming raw data into data that is suitable to perform
operations

#Method 1: Remove Rows With Missing Values

library(dplyr)

#remove rows with any missing values

df <- data.frame(

PatientID = c(1, 2, 3, 4, 5),

PatientName = c("John Doe", "Jane Smith", "Bob Johnson", NA, "Charlie Brown"),

Age = c(35, 28, NA, 60, 42),

AdmissionDate = as.Date(c("2024-01-01", "2024-02-15", "2024-03-10", "2024-04-05", "2024-05-

20")),

DischargeDate = as.Date(c("2024-01-10", "2024-03-05", NA, "2024-05-15", "2024-06-10")),

Diagnosis = c("Flu", "Fractured Leg", "Pneumonia", "Hypertension", "Appendicitis"),

RoomNumber = c(101, NA, 310, 415, 520),

DoctorName = c("Dr. Smith", "Dr. Johnson", "Dr. Davis", "Dr. Wilson", NA)

print(df)

df %>% na.omit()

OUTPUT:

> print(df)

PatientID PatientName Age AdmissionDate DischargeDate Diagnosis RoomNumber

1 1 John Doe 35 2024-01-01 2024-01-10 Flu 101

2 2 Jane Smith 28 2024-02-15 2024-03-05 Fractured Leg NA

3 3 Bob Johnson NA 2024-03-10 <NA> Pneumonia 310

4 4 <NA> 60 2024-04-05 2024-05-15 Hypertension 415

5 5 Charlie Brown 42 2024-05-20 2024-06-10 Appendicitis 520

DoctorName

1 Dr. Smith

2 Dr. Johnson

3 Dr. Davis

4 Dr. Wilson

5 <NA>

> df %>% na.omit()

PatientID PatientName Age AdmissionDate DischargeDate Diagnosis RoomNumber

1 1 John Doe 35 2024-01-01 2024-01-10 Flu 101

DoctorName

1 Dr. Smith

#replace missing values

df1 <- df %>% mutate(across(where(is.numeric), ~ifelse(is.na(.), mean(., nTRUE), .)))

print(df1)

Output

PatientID PatientName Age AdmissionDate DischargeDate

Diagnosis RoomNumber DoctorName
1 1 John Doe 35.00 2024-01-01 2024-01-10
Flu 101.0 Dr. Smith
2 2 Jane Smith 28.00 2024-02-15 2024-03-05
Fractured Leg 336.5 Dr. Johnson
3 3 Bob Johnson 41.25 2024-03-10 <NA>
Pneumonia 310.0 Dr. Davis
4 4 <NA> 60.00 2024-04-05 2024-05-15
Hypertension 415.0 Dr. Wilson
5 5 Charlie Brown 42.00 2024-05-20 2024-06-10
Appendicitis 520.0 <NA>

#remove

zz=c(23,29,NA,9,21.19)
zz
length(zz)
mean(zz)
M=mean(zz,na.rm=T)
print(M)
OUTPUT:
#remove>

zz=c(23,29,NA,9,21.19)

zz[1] 23.00 29.00 NA 9.00 21.19

> length(zz)

[1] 5

> mean(zz)

[1] NA

> M=mean(zz,na.rm=T)

> print(M)

[1] 20.5475

#Remove using data frame

df <- data.frame(

Name = c("Alice", "Jack", "Bob", "Anny", "Christie"),

Age = c(24, 35, 23, 32, 40),

City = c("New York", "Los Angeles", NA, "San Francisco", "Seattle"),

Grade = c("A", "B", "C", "A", "B")

print(df)

new_df<- df %>% na.omit()

print(new_df)

OUTPUT:

> print(df)

Name Age City Grade

1 Alice 24 New York A

2 Jack 35 Los Angeles B

3 Bob 23 <NA> C

4 Anny 32 San Francisco A

5 Christie 40 Seattle B

> new_df<- df %>% na.omit()

> print(new_df)

Name Age City Grade

1 Alice 24 New York A

2 Jack 35 Los Angeles B

4 Anny 32 San Francisco A

5 Christie 40 Seattle B

#Remove duplicate rows using dataFrame

df_with_duplicates <- data.frame(

Name = c("Jack", "Jane", "Jack", "Jane", "Bob", "Alice", "Christie"),

Age = c(25, 30, 25, 30, 22, 28, 35),

City = c("New York", "Los Angeles", "New York", "Los Angeles", "Chicago", "San Francisco",
"Seattle"),

Grade = c("A", "B", "A", "B", "C", "A", "B")

print(df_with_duplicates)

new_df<- df_with_duplicates %>% distinct(.keep_all = TRUE)

print(new_df)

OUTPUT:

> print(df_with_duplicates)

Name Age City Grade

1 Jack 25 New York A

2 Jane 30 Los Angeles B

3 Jack 25 New York A

4 Jane 30 Los Angeles B

5 Bob 22 Chicago C

6 Alice 28 San Francisco A

7 Christie 35 Seattle B

> new_df<- df_with_duplicates %>% distinct(.keep_all = TRUE)

> print(new_df)

Name Age City Grade

1 Jack 25 New York A

2 Jane 30 Los Angeles B

3 Bob 22 Chicago C

4 Alice 28 San Francisco A

5 Christie 35 Seattle B

#Drop rows
df <- data.frame(
Name = c("John", "Jane", "Bob", "Alice", "Charlie"),
Age = c(25, 30, 22, 28, 35),
City = c("New York", "Los Angeles", NA, "San Francisco", "Seattle"),
Grade = c("A", "B", "C", "A", "B")
)
df %>% drop_na()

OUTPUT:

Name Age City Grade

1 John 25 New York A

2 Jane 30 Los Angeles B

3 Alice 28 San Francisco A

4 Charlie 35 Seattle B

#drop rows in specific columns

df %>% drop_na(City)

OUTPUT:

Name Age City Grade

1 John 25 New York A

2 Jane 30 Los Angeles B

3 Alice 28 San Francisco A

4 Charlie 35 Seattle B
# to get column headings
glimpse(df)

OUTPUT:

Rows: 5

Columns: 4

$ Name <chr> "John", "Jane", "Bob", "Alice", "Charlie"

$ Age <dbl> 25, 30, 22, 28, 35

$ City <chr> "New York", "Los Angeles", NA, "San Francisco", "Seattle"

$ Grade <chr> "A", "B", "C", "A", "B"

#bind rows

df1 <- data.frame(

Name = c("Aarav", "Aditi", "Arjun", "Ananya", "Ayush"),
Age = c(25, 30, 22, 28, 35),
City = c("Mumbai", "Delhi", "Bangalore", "Kolkata", "Chennai"),
Grade = c("A", "B", "C", "A", "B")
)
df2 <- data.frame(
Name = c("Bhavya", "Chirag", "Deepika", "Dhruv", "Esha"),
Age = c(27, 32, 24, 30, 28),
City = c("Jaipur", "Ahmedabad", "Lucknow", "Hyderabad", "Pune"),
Grade = c("B", "C", "A", "B", "A")
)
df1
df2
com<- bind_rows(df1,df2)
print(com)

OUTPUT:

Name Age City Grade

1 Aarav 25 Mumbai A
2 Aditi 30 Delhi B

3 Arjun 22 Bangalore C

4 Ananya 28 Kolkata A

5 Ayush 35 Chennai B

Name Age City Grade

1 Bhavya 27 Jaipur B

2 Chirag 32 Ahmedabad C

3 Deepika 24 Lucknow A

4 Dhruv 30 Hyderabad B

5 Esha 28 Pune A

Name Age City Grade

1 Aarav 25 Mumbai A

2 Aditi 30 Delhi B

3 Arjun 22 Bangalore C

4 Ananya 28 Kolkata A

5 Ayush 35 Chennai B

6 Bhavya 27 Jaipur B

7 Chirag 32 Ahmedabad C

8 Deepika 24 Lucknow A

9 Dhruv 30 Hyderabad B

10 Esha 28 Pune A

L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
r file code
No ratings yet
r file code
16 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Lecture_5_(Managing_and_Understanding_Data)
No ratings yet
Lecture_5_(Managing_and_Understanding_Data)
9 pages
DA Lab Manual
No ratings yet
DA Lab Manual
42 pages
Bigdata Programs&Solutions
No ratings yet
Bigdata Programs&Solutions
7 pages
R Assignment
No ratings yet
R Assignment
9 pages
R Exam
No ratings yet
R Exam
18 pages
R Record
No ratings yet
R Record
16 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
Group 11 - Lab5
No ratings yet
Group 11 - Lab5
4 pages
R Programing Bhagu
No ratings yet
R Programing Bhagu
40 pages
R Functions
No ratings yet
R Functions
8 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Module 2.9
No ratings yet
Module 2.9
11 pages
BDA Assignment Aman 19019
No ratings yet
BDA Assignment Aman 19019
38 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
DSDA MANUAL
No ratings yet
DSDA MANUAL
64 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
Lec 13
No ratings yet
Lec 13
46 pages
R Commands
No ratings yet
R Commands
18 pages
as
No ratings yet
as
22 pages
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
No ratings yet
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
11 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
Dba Midterm Cheatsheet
No ratings yet
Dba Midterm Cheatsheet
2 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
[R] Internal-2 Q & A
No ratings yet
[R] Internal-2 Q & A
65 pages
FE418_RLectureNotes1
No ratings yet
FE418_RLectureNotes1
15 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Statistic And R Programming lab Exercise
No ratings yet
Statistic And R Programming lab Exercise
8 pages
R study material I
No ratings yet
R study material I
8 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
24 pages
DA
No ratings yet
DA
10 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
Week 7
No ratings yet
Week 7
10 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
DS_Practice
No ratings yet
DS_Practice
3 pages
Expt. No. Basic Math Date
No ratings yet
Expt. No. Basic Math Date
24 pages
R Assignment 10
No ratings yet
R Assignment 10
12 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Midterm Session II #0000000224 - On March 25, 2016 14 13: Processing
No ratings yet
Midterm Session II #0000000224 - On March 25, 2016 14 13: Processing
11 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Sharing
No ratings yet
R Sharing
16 pages
R code
No ratings yet
R code
9 pages
Cleaning Data
No ratings yet
Cleaning Data
17 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
R Practicals
No ratings yet
R Practicals
32 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
arunav da prac
No ratings yet
arunav da prac
55 pages
materi 4
No ratings yet
materi 4
30 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
Single Variable data-MA4-20SP, MA5.1-12SP
No ratings yet
Single Variable data-MA4-20SP, MA5.1-12SP
26 pages
Maths Integration
No ratings yet
Maths Integration
7 pages
Statistic Video Analysis
No ratings yet
Statistic Video Analysis
3 pages
D5 D6
No ratings yet
D5 D6
7 pages
PSQT Tutorial
No ratings yet
PSQT Tutorial
79 pages
Loudoun County Teacher Salary Plan
No ratings yet
Loudoun County Teacher Salary Plan
2 pages
Spot Speed
No ratings yet
Spot Speed
26 pages
Sta1501 2024 TL 011 0 e
No ratings yet
Sta1501 2024 TL 011 0 e
6 pages
5564 Quantitative Techniques
No ratings yet
5564 Quantitative Techniques
9 pages
MAS202Group1 Group-Assignment
No ratings yet
MAS202Group1 Group-Assignment
20 pages
CHAPTER5 Assessment Answer
No ratings yet
CHAPTER5 Assessment Answer
20 pages
Online Let Reviewer
67% (3)
Online Let Reviewer
31 pages
WFP CDC A Manual Measuring and Interpreting Malnutrition and Mortality
100% (1)
WFP CDC A Manual Measuring and Interpreting Malnutrition and Mortality
222 pages
Method Comparison Study Report For The ISO 16140-2:2016 Validation of Compact Dry EC, For The Enumeration of Coliforms
No ratings yet
Method Comparison Study Report For The ISO 16140-2:2016 Validation of Compact Dry EC, For The Enumeration of Coliforms
24 pages
MTH302
0% (1)
MTH302
117 pages
66r-11Selecting Probability Distribution Functions for use in cost& schedule Risk simulaiton mofels
No ratings yet
66r-11Selecting Probability Distribution Functions for use in cost& schedule Risk simulaiton mofels
7 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
CCMA4001 Quantitative Analysis I v1 PDF
No ratings yet
CCMA4001 Quantitative Analysis I v1 PDF
3 pages
X To y Svy
No ratings yet
X To y Svy
321 pages
Class 10 - Maths - Statistics
No ratings yet
Class 10 - Maths - Statistics
55 pages
Statistical Analysis QUARTILES
No ratings yet
Statistical Analysis QUARTILES
10 pages
Biostatistics and Epidemiology: Bioepi
No ratings yet
Biostatistics and Epidemiology: Bioepi
25 pages
Aula1-Estatistica Basica e Probabilidade
No ratings yet
Aula1-Estatistica Basica e Probabilidade
68 pages
Unit II Question Bank With Hints and Answers
No ratings yet
Unit II Question Bank With Hints and Answers
16 pages
A Study On The Financial Performance of General Insurance Companies in India
No ratings yet
A Study On The Financial Performance of General Insurance Companies in India
7 pages
H. Descriptive Statistics मराठी
No ratings yet
H. Descriptive Statistics मराठी
27 pages
Network 4.5.1.6. User Guide
No ratings yet
Network 4.5.1.6. User Guide
52 pages
Ips6e Ex ST PDF
No ratings yet
Ips6e Ex ST PDF
216 pages
Mean Median Mode Range Homework Sheet
100% (1)
Mean Median Mode Range Homework Sheet
4 pages
Maths all equations by Mission Success
No ratings yet
Maths all equations by Mission Success
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.