0% found this document useful (0 votes)
33 views

R Programming: © 2016 SMART Training Resources Pvt. LTD

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

R Programming: © 2016 SMART Training Resources Pvt. LTD

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

R Programming

© 2016 SMART Training Resources Pvt. Ltd.


Overview

1. The Basics

2. R Data Structures

3. Data Input/Output

4. In-Built Functions

5. Data Visualization
What R does and does not
• data handling and storage: • is not a database, but
numeric, textual connects to DBMSs

• matrix algebra • has no graphical user


interfaces, but connects to
• hash tables and regular Java, TclTk
expressions
• language interpreter can be
• high-level data analytic very slow, but allows to
and statistical functions call own C/C++ code
• classes (“OO”) • no spreadsheet view of
• graphics data, but connects to Excel
• programming language: / MsOffice
loops, branching, • no professional /
subroutines commercial support
R and statistics
• Packaging: a crucial infrastructure to efficiently
produce, load and keep consistent software libraries
from (many) different sources / authors
• Statistics: most packages deal with statistics and data
analysis
• State of the art: many statistical researchers provide
their methods as R packages
> 1550+2000
R as a Calculator
[1] 3550
or various calculations in the same row

1.0
> 2+3; 5*9; 6-6
[1] 5
[1] 45

0.5
sin(seq(0, 2 * pi, length = 100))
[1] 0

0.0
> log2(32)

[1] 5

-0.5
> sqrt(2)

[1] 1.414214
-1.0

> seq(0, 5, length=6)


0 20 40 60 80 100

Index
[1] 0 1 2 3 4 5
> plot(sin(seq(0, 2*pi, length=100)))
Variables
> i = 81
> sqrt(i) numeric
[1] 9

> prov = "All that Glitters are not Gold"


character
> sub("Glitters ","Glisters",prov)
[1] " All that Glisters are not Gold“ string

> 1>2
[1] FALSE logical
Object orientation
primitive (or: atomic) data types in R are:

• numeric (integer, double, complex)


• character
• logical
• function
Numbers in R: NAN and NA

• NAN (not a number)


• NA (missing value)
o Basic handling of missing values
>x
[1] 1 2 3 4 5 6 7 8 NA
> mean(x)
[1] NA
> mean(x,na.rm=TRUE)
[1] 4.5
Objects in R

• Objects in R obtain values by assignment.


• This is achieved by the gets arrow, <-, and not the
equal sign, =.
• Objects can be of different kinds.
R Data Structures

Vector
Matrix
Array
Factor
Data Frame
List
Vectors
• vector: an ordered collection of data of the same type

> a = c(1,2,3)
> a*2
[1] 2 4 6

• In R, a single number is the special case of a vector with 1


element.
• Other vector types: character strings, logical
Vectors
• Create a vector
> x <- 1:10
• Give the elements some names
> names(x) <-
c("first","second","third","fourth","fifth")

• Select elements based on another vector


> i <- c(1,5)
> x[i]
first fifth
1 5
> x[-c(i,8)]
second third fourth <NA> <NA> <NA> <NA>
2 3 4 6 7 9 10
Matrices

• matrix: a rectangular table of data of the same type

• array: 3-,4-,..dimensional matrix


• example: the red and green foreground and background
values for 20000 spots on 120 chips: a 4 x 20000 x 120 (3D)
array.
Matrices
• Create an array
> x <- array(1:10, dim = c(2, 5))
>x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> attributes(x)
$dim
[1] 2 5
> dim(x)
[1] 2 5
Matrices
• Set column or row names
> colnames(x) <- c("col1", "col2", "col3", "col4", "5", "6")
>x
col1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> colnames(x)[1] <- "column1"
>x
column1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Matrix
• Set row and columns names using dimnames
> dimnames(x) <- list(c("first", "second"), NULL)
>x
column1 col2 col3 col4 col5
first 1 3 5 7 9
second 2 4 6 8 10
• Setting dimension names
> dimnames(x) <- list(my.rows = c("first", "second"), my.cols = NULL)
>x
my.cols
my.rows [,1] [,2] [,3] [,4] [,5]
first 1 3 5 7 9
second 2 4 6 8 10
Lists
• vector: an ordered collection of data of the same type.
> a = c(7,5,1)
> a[2]
[1] 5
• list: an ordered collection of data of arbitrary types.
> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28
• Typically, vector elements are accessed by their index (an integer),
list elements by their name (a character string). But both types
support both access methods.
Data frames
• data frame: is supposed to represent the typical data table that
researchers come up with – like a spreadsheet.

• It is a rectangular table with rows and columns; data within


each column has the same type (e.g. number, text, logical), but
different columns may have different types.

Example:
>a
localisation tumorsize progress
XX348 proximal 6.3 FALSE
XX234 distal 8.0 TRUE
XX987 proximal 10.0 FALSE
Factors
• A character string can contain arbitrary text. Sometimes it is useful
to use a limited vocabulary, with a small number of allowed
words. A factor is a variable that can only take such a limited
number of values, which are called levels.
• Example
• a family of two girls (1) and four boys(0),
>kids = factor(c(1,0,1,0,0,0),levels=c(0,1),
labels=c("boy","girl"))
> Kids
[1] girl boy girl boy boy boy
Levels: boy girl
> class(kids)
[1] "factor"
Data Input/Output
Directory management
• dir() list files in directory
• setwd(path) set working directory
• getwd() get working directory
• ?files File and Directory Manipulation

Standard ASCII Format


• read.csv read comma-delimited file
• write.csv write comma-delimited file
Reading

> sets <- read.csv("Sets_All.csv", header = TRUE)


> sets$Ordered.Year <- ordered(sets$Year)
> sets$SpotCd.Fac <- factor(sets$SpotCd, exclude = NULL)
> spotted.sets <- sets[sets$Sp1Cd == 2, ]

> write.csv(spotted.sets, file = "spotted.txt", row.names =


FALSE)
Data Visualization

• plot() is the main graphing function


• Automatically produces simple plots for vectors, functions or data
frames
Sample Data Set
Plotting a Vector

• plot(v) will print the elements of the vector v according to their


index
# Plot height for each observation
> plot(dataset$Height)
# Plot values against their ranks
> plot(sort(dataset$Height))
Common Parameters for
plot()
• Specifying labels:
o main: provides a title
o xlab: label for the x axis
o Ylab: label for the y axis
• Specifying range limits
o ylim – 2-element vector gives range for x axis
o xlim – 2-element vector gives range for y axis
• Example

o plot(sort(dataset$Height), ylim = c(120,200), ylab =


"Height (in cm)", xlab = "Rank", main = "Distribution of
Heights”)
Plotting Two Vectors

• plot()can pair elements from 2 vectors to produce x-y coordinates


• plot() and pairs() can also produce composite plots that pair all the
variables in a data frame.
• Example
o plot(dataset$Hip, dataset$Waist, xlab = "Hip", ylab = "Waist", main =
"Circumference (in cm)", pch = 2, col = "blue")
Histograms

• Generated by the hist() function


• The parameter breaks is key
o Specifies the number of categories to plot
or
o Specifies the breakpoints for each category
• The xlab, ylab, xlim, ylim options work as expected
• Example
o hist(dataset$bp.sys, col = "lightblue", xlab = "Systolic Blood Pressure",
main = "Blood Pressure“)
End of Session
Thank you…

© 2016 SMART Training Resources Pvt. Ltd.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy