0% found this document useful (0 votes)

7 views

L1 Intro R

Uploaded by

pabluuss08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

L1 Intro R

Uploaded by

pabluuss08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Lab 1: Introduction to R

Mónica Rojas Martínez, Alexander Perera Lluna

October 2024

1. Starting with R
R is a scientific software package developed The R Foundation for Statistical Computing and distributed
from their R website.
The name is partly based on the (first) names of the first two R authors (Robert Gentleman and Ross Ihaka),
and partly a play on the name of the Bell Labs language'S. S is a very high level language and an environment
for data analysis and graphics. The earliest beginnings of S came from discussions in the spring of 1976,
among a group of five people at Bell Labs: Rick Becker, John Chambers, Doug Dunn, Paul Tukey, and
Graham Wilkinson. We can regard S as a language with three current implementations or “engines”, the
“old S engine” (S version 3; S-Plus 3.x and 4.x), the “new S engine” (S version 4; S-Plus 5.x and above), and
R. Given this understanding, asking for “the differences between R and S’ ’ really amounts to asking for the
specifics of the R implementation of the S language, i.e., the difference between the R and S engines. R is
just a particular implementation of S Language.
Although R language is slightly different from MATLAB, its functionality is similar and offers powerful
packages for pattern recognition and multivariate analysis. It has a strong community with very active forums
and under continuous development of the language. The implementation of this language has been done in a
very portable fashion and in a way which requires relatively low resources machine profile. Ports exists for
many members of the Unix family of operating systems, including AIX, FreeBSD, GNU/Linux, HP-UX, Irix,
MacOS X, Solaris and Tru64. In addition there is a version for Microsoft Windows (v.~98 onwards),
There is a number of user interfaces available for R, a mode for emacs ESS : Emacs Speaks Statistics, Rcmndr
and RKWard}. R distributions for OS X and Microsoft versions are packaged with custom GUIs.

1.1. Lab requirements

1
In order to follow this laboratory the following items are required
1. A computer
2. R installed from http://www.R-project.org
3. A text editor or rstudio. We strongly recommend to use rstudio in this class.
4. A .pdf reader (e.g. Adobe Acrobat Reader) to read this text on electronic format
5. A graphical environment (Windows, X11 if in Unix alike OS)

1.2 Starting R
Under Windows, just follow: Start -> All Programs-> R -> R . On Mac OSX and Microsoft Windows
R comes with a native GUI, on GNU/Linux and other UNIX implementations, a GUI can be started later on
from the command line interface
1 You can install any package into R through the command install.packages(“package name”). To build a base installation

of R in an Ubuntu Linux Computer, you can use sudo apt-get install r-recommended. You can download rstudio from
http://www.rstudio.com/products/rstudio/download/

1
Important!! R is case sensitive!!!

Under UNIX systems, to start R, open a terminal and type R

>guino@marmota : ~ $ R

R v e r s i o n 4 . 3 . 1 (2023−06−16 u c r t ) −− " B e a g l e S c o u t s "

Copyright (C) 2023 The R Foundation f o r S t a t i s t i c a l Computing
P l a t f o r m : x86_64−w64−mingw32/ x64 (64− b i t )

R i s f r e e s o f t w a r e and comes with ABSOLUTELY NO WARRANTY.

You a r e welcome t o r e d i s t r i b u t e i t under c e r t a i n c o n d i t i o n s .
Type ' l i c e n s e ( ) ' o r ' l i c e n c e ( ) ' f o r d i s t r i b u t i o n d e t a i l s .

R i s a c o l l a b o r a t i v e p r o j e c t with many c o n t r i b u t o r s .
Type ' c o n t r i b u t o r s ( ) ' f o r more i n f o r m a t i o n and
' c i t a t i o n ( ) ' on how t o c i t e R o r R p a c k a g e s i n p u b l i c a t i o n s .

Type ' demo ( ) ' f o r some demos , ' h e l p ( ) ' f o r on−l i n e help , o r
' h e l p . s t a r t ( ) ' f o r an HTML b r o w s er i n t e r f a c e t o h e l p .
Type ' q ( ) ' t o q u i t R.

To use rstudio, type rstudio on the prompt of a terminal window.

1.3 Getting help, finding functions

Once R is started we have a prompt, very like MATLAB’s. The syntaxis is also close to MATLAB with some
differences. Maybe the first thing to learn in any language is how to ask for help. In R we use the command
help(). For instance, to check the syntax of getwd(), you can type
help("getwd")

## starting httpd help server ... done

The command help.start() will launch the help system within a browser on a different window. You can
help(print) and execute you first Hello World:
print("Hello World")

## [1] "Hello World"

A very useful way to find help is with the example() function. This will return a sample of different example
uses for a given command, for instance:
example(ls)

##
## ls> .Ob <- 1
##
## ls> ls(pattern = "O")
## character(0)
##
## ls> ls(pattern= "O", all.names = TRUE) # also shows ".[foo]"
## [1] ".Ob"
##
## ls> # shows an empty list because inside myfunc no variables are defined
## ls> myfunc <- function() {ls()}

2
##
## ls> myfunc()
## character(0)
##
## ls> # define a local variable inside myfunc
## ls> myfunc <- function() {y <- 1; ls()}
##
## ls> myfunc() # shows "y"
## [1] "y"
The help system can be searched for a given pattern through the help.search() command.

1.4 Current Path

It is convinient to set current path with the workspace path where you will place all your data. Create a direc-
tory where you will store all data for this lab (e.g. ~/r-files/ in a UNIX environment or My Documents\r-files
in windows). To set a path you an use the command setwd() as in setwd(``\textasciitilde/r-files/''),
or use the File-\textgreater Change Dir\ldots in the Windows GUI.

1.5 R Session workspace

An assignment is done with the \textless- operator. To a assign a value to a variable a, just execute
a <- pi
sqrt(2) -> b
a

## [1] 3.141593
b

## [1] 1.414214
After the assignments, list the objects that live in memory with the objects() function:
{ r objects} objects()
Then you can remove objects from your session with the command rm():
rm(myfunc,a)
objects()

## [1] "b"

1.6 Saving Restoring the workspace

It is easy to save a variable into a R data file (.RData), for instance,
xx <- runif(20)
yy <- list(a = 1, b = TRUE, c = "oops")
save(xx, yy, file = "xxyy.Rdata")

To save the complete workspace you can use the

save.image(file="backup.RData")

1.7 Quiting from R, batch processing

You can quit the R session with the q() command.

3
Also, R can be called for batch processing from the shell or command line as follows2 ,
$ R [options] [< infile] [> outfile]
%$ where the commands to process are found in the infile and the output will be logged to outfile.

2. Simple types
2.1 Vectors
As many other languages, the simplest structure is a numeric vector, defined with help of the concatenate
c() function,
x <- c( 2.3, 3, 5, 7, 7.5, 7, 5, 4)
y <- c(x,3,x)

note that the following code has the same effect:

c( 2.3, 3, 5, 7, 7.5, 7, 5, 4) -> x

Simple arithmetic is like other languages, +, -, *, /, \^{} as well as typical functions log, exp, sin, cos, tan, sqrt.
A vector with regular spacing can be generated with the seq() function:
sq <-seq( 5, -2, by = -0.5)

Note that arguments in R are mainly defined by name, and in absence of name, by order, so the following
code would produce the same effect,
sq <- seq( by = -0.5, to = -2,
from = 5)

2.2 Logical Vectors

Test the effect of a comparison of a number with a vector
big <- x > 5

2.3 Data types

R functions work different depending on the object class input. In this sense, c() function also concatenates
characters, and the function summary() will behave differently,
y <- c(2, -3, 3, -1, 3, 5, 6, 12)
cv <- c("X1", "Y2", "X1", "Y1", "X2", "Y3", "X2", "Y3")
summary(cv)

## Length Class Mode

## 8 character character

2.4 Indexing
Any object can be indexed, either vectors, arrays or data frames as you will see later on. The elements of any
vector are retrieved through the [ operator, as in follows,
“‘r x[1]“‘
## [1] 2.3
2 This example for the UNIX case

4
x[3]

## [1] 5
x[c(3,5)]

## [1] 5.0 7.5

x[big]

## [1] 7.0 7.5 7.0

x[which(big)]

## [1] 7.0 7.5 7.0

(x+10)[5]

## [1] 17.5
Note that R admits an advanced indexing, aiming for good meta-data management, for example:
geneexp <- c(1.3, -4, 3,4, 0.3)
geneexp

## [1] 1.3 -4.0 3.0 4.0 0.3

names(geneexp) <- c("ACP1","COX2","F7","F11","ABG1")
geneexp

## ACP1 COX2 F7 F11 ABG1

## 1.3 -4.0 3.0 4.0 0.3
coag_factors <- geneexp[c("F7","F11")]
coag_factors

## F7 F11
## 3 4

2.5. Factors
Factors are a special an useful type, as these are the natural form for representing categorical data. Any
vector can be converted to a factor with the as.factor() function, or created with the factor() function,
cvf <- as.factor(cv)
print(cvf)

## [1] X1 Y2 X1 Y1 X2 Y3 X2 Y3
## Levels: X1 X2 Y1 Y2 Y3
summary(cvf)

## X1 X2 Y1 Y2 Y3
## 2 2 1 1 2
Factors can be very useful from a practical perspective. For example, assume you have a vector containing a
FVII concentration in blood of a for individuals carrying different diseases:
disease <- c("Hemophilia A", "Thrombocytopenia", "Ehlers-Danlos syndrome",
"Hemophilia", "Ehlers-Danlos syndrome", "Thrombocytopenia", "Vasculitis",
"Vasculitis", "Vasculitis", "Thrombocytopenia", "Myeloproliferative",
"Thrombocytopenia", "Myeloproliferative", "Hemophilia", "Thrombocytopenia",

5
"Ehlers-Danlos syndrome",
"Thrombocytopenia", "Thrombocytopenia", "Thrombocytopenia", "Vasculitis",
"Hemophilia A", "Ehlers-Danlos syndrome", "Thrombocytopenia", "Myeloproliferative",
"Ehlers-Danlos syndrome", "Hemophilia B", "Hhemophilia A", "Vasculitis",
"Ehlers-Danlos syndrome", "Vasculitis")
diseasef <- factor(disease)
diseasef

## [1] Hemophilia A Thrombocytopenia Ehlers-Danlos syndrome

## [4] Hemophilia Ehlers-Danlos syndrome Thrombocytopenia
## [7] Vasculitis Vasculitis Vasculitis
## [10] Thrombocytopenia Myeloproliferative Thrombocytopenia
## [13] Myeloproliferative Hemophilia Thrombocytopenia
## [16] Ehlers-Danlos syndrome Thrombocytopenia Thrombocytopenia
## [19] Thrombocytopenia Vasculitis Hemophilia A
## [22] Ehlers-Danlos syndrome Thrombocytopenia Myeloproliferative
## [25] Ehlers-Danlos syndrome Hemophilia B Hhemophilia A
## [28] Vasculitis Ehlers-Danlos syndrome Vasculitis
## 8 Levels: Ehlers-Danlos syndrome Hemophilia Hemophilia A ... Vasculitis
summary(diseasef)

## Ehlers-Danlos syndrome Hemophilia Hemophilia A

## 6 2 2
## Hemophilia B Hhemophilia A Myeloproliferative
## 1 1 3
## Thrombocytopenia Vasculitis
## 9 6

3. Not no simple types

3.1 Arrays
Arrays can be easily created changing the attribute of a vector. Suppose that z is a random vector of 10
elements, which can be set through the following assignment,
z <- rnorm(10)
z

## [1] 0.05753091 -0.86792728 -0.53033853 1.72361002 -2.18436755 0.01387335

## [7] 0.32248995 -0.37494762 -0.95641824 1.17342998
dim(z) <- c(5,2)
z

## [,1] [,2]
## [1,] 0.05753091 0.01387335
## [2,] -0.86792728 0.32248995
## [3,] -0.53033853 -0.37494762
## [4,] 1.72361002 -0.95641824
## [5,] -2.18436755 1.17342998
Empty matrices can be achieved with the array() function (e.g. array(0,c(5,2))). Similarly to the vectors,
elements in the arrays can be accessed through the [] operator:
z[3,2]

## [1] -0.3749476

6
z[,2]

## [1] 0.01387335 0.32248995 -0.37494762 -0.95641824 1.17342998

z[1,]

## [1] 0.05753091 0.01387335

Arrays can be expanded and added to other arrays through the cbind() and rbind() operators for column
and row bindings, respectively. The transpose of an array is obtained through the t() function.
tz <- t(z)

3.3 Matrix operators

A number of operators exists for matrix manipulation. For multiplying two matrices, the \%*\% operator is
used:
tz%*%z

## [,1] [,2]
## [1,] 8.780160 -4.291945
## [2,] -4.291945 2.536452
Other operators of interest for matrix-matrix, matrix-vector and matrix are: diag(), crossprod(), solve(),
eigen(), svd(), etc.

3.4 Lists and Data Frames

A list is an important R object consisting of an ordered collection of objects known as components. There
is no particular need for the components to be of the same type, so a list can hold different classes of objects,
as the cell’s in MatLab. The components are accessed through the [[]] operator for index numbers or the
$ operator for index names.
l <- list(name="FVII",type="Coagulation Factor", range=c(100,100000))
l[[2]]

## [1] "Coagulation Factor"

l$type

## [1] "Coagulation Factor"

l[[3]][2]

## [1] 1e+05
ty <- "type"
l[[ty]]

## [1] "Coagulation Factor"

You can print a more human readable output for complex objects with help of the str() function:
str(l)

## List of 3
## $ name : chr "FVII"
## $ type : chr "Coagulation Factor"
## $ range: num [1:2] 1e+02 1e+05

7
Lists can be concatenated with the c() function
A Data Frame is a special list of class “data.frame”, which can be mainly regarded as a matrix with columns
possibly of different modes (or types) and attributes. It can be displayed in matrix form, and its rows and
columns extracted using matrix indexing as seen in section 3.1.
A data frame can be created by importing a csv file, or with the data.frame() function.
Let’s define a variable containing the FVII levels (in some arbitraty units),
fvii <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)

Then we can create a dataframe with the factor variable containing the disease types and the FVII vector
containing the expression levels for an individual in eachdisease type:
blood <- data.frame(Disease=diseasef, FVII=fvii)
# Use head() or tail () for displaying the first/last elements
head(blood)

## Disease FVII
## 1 Hemophilia A 60
## 2 Thrombocytopenia 49
## 3 Ehlers-Danlos syndrome 40
## 4 Hemophilia 61
## 5 Ehlers-Danlos syndrome 64
## 6 Thrombocytopenia 60
The head() function displays just the first six rows, see also the tail() function. Remember that the
elements are retrieved through the $ operator.

4. I/O
An important part of R deals with reading data from files. The easiest way to read a file is through the
read.table() function. This function will read a .csv file and generate a data frame in the R session. Other
functions in the same family are read.csv() and read.delim().
Be advised that reading very large data files with read.table() is resource expensive and non-optimal. For
large file others lower level primitives should be used as scan()
There are tools for interactively editing objects like data frames. This is useful for making small changes to
the data in the R session.
An object editor can be invoked with the edit or fix function names, e.g. blood \textless- edit(blood),
which is equivalent to execute fix(blood).
An example using read.table() is presented bellow
myDF <- read.table('iris.csv') #read file 'iris.csv' on the current workspace
str(myDF) # describe the structure of myDF

## 'data.frame': 151 obs. of 1 variable:

## $ V1: chr "sepal_length,sepal_width,petal_length,petal_width,class" "5.1,3.5,1.4,0.2,Iris-setosa" "
myDF <- read.table('iris.csv', sep = ',') #read file by setting the field separator
#character as ','
str(myDF)

## 'data.frame': 151 obs. of 5 variables:

8
## $ V1: chr "sepal_length" "5.1" "4.9" "4.7" ...
## $ V2: chr "sepal_width" "3.5" "3.0" "3.2" ...
## $ V3: chr "petal_length" "1.4" "1.4" "1.3" ...
## $ V4: chr "petal_width" "0.2" "0.2" "0.2" ...
## $ V5: chr "class" "Iris-setosa" "Iris-setosa" "Iris-setosa" ...
myDF <- read.table('iris.csv', sep = ',', head = TRUE) #read file by setting the field
#separtor character as ',' and header as TRUE
str(myDF)

## 'data.frame': 150 obs. of 5 variables:

## $ sepal_length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ sepal_width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ petal_length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ petal_width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ class : chr "Iris-setosa" "Iris-setosa" "Iris-setosa" "Iris-setosa" ...
Note that the resulting data frame depends on how you set the function parameters. For example, the
separator character sep in this file is a colon. Leaving this parameter unspecified will result on a data frame
with only one variable (one column), given that the default separator is a blank space. Likewise, data files
that employ headers should be read by setting the parameter head to TRUE. Otherwise, the first data raw will
be the header, and R will automatically set the names of each variable to ‘V1’, ‘V2’, etc. Besides, the type of
the object in ‘V1’, ‘V2’, etc. can be wrongly interpreted as in this example (e.g. the first variable is numeric,
not char if the separator is not well specified).

5.Control statements
5.1. Conditional
The language has available a conditional construction of the form
if ( 1==2 ) { print("yes") } else { print("no") }

## [1] "no"
The conditional expression must evaluate to a single logical value. Comparison operators are typically
&&,||,>=,>,<,<=,==,!=, whereas &,| operators applies element-wise to vectors.
There is a vectorized version of the if/else construct, the ifelse() function. This has the form
ifelse(condition, a, b) and returns a vector of the length of its longest argument, with elements a[i] if
condition [i] is true, otherwise b[i].

5.2 Loops
Loops are quite similar to other programming languages. There is also a for loop construction which has the
form
for ( ic in c("joan","helena","maria") ) { print(ic) }

## [1] "joan"
## [1] "helena"
## [1] "maria"
For loops are found much less often than in compiled languages, as R provides with some compact forms for
object iteration, like the apply(), sweep(), mapply(), tapply(), and others.
There also some other directives like repeat(), or while()

9
k<-3; while ( k ) { print(k <- k-1) }

## [1] 2
## [1] 1
## [1] 0

6. Defining functions
As hinted before, R allows the user to create functions, this is a way to expand the functionality of R towards
our interest. The definition of a simple function can be seen in the following below:
FunctionName <- function(x,y) { x+y }
FunctionName(3,4)

## [1] 7
Another example is the following function, that implements the following expression,

k−1
X
f (k) = x
x=1

through the following code,

SerialSum <- function(k)
{
out <- 0
while (k)
out <- out + ( k <- k-1 )
out
}
SerialSum(3)

## [1] 3
SerialSum(5)

## [1] 10

Important!! Note that the return value of the function is the result of the last expression in the function.

Once a function is defined, it is easy to check its definition just typing the name of the function, e.g.
SerialSum

## function(k)
## {
## out <- 0
## while (k)
## out <- out + ( k <- k-1 )
## out
## }
## <bytecode: 0x000002b6f3ab6af8>
There is also the possibility to define binary operators through the following syntax.
> "%!%" <- function(X, y) { ... }

10
6.1.apply() family functions and friends
In R there is a family of iterators known as the apply() family. This set of functions allow to do most of the
work when an iterator is needed, avoiding in most cases the use of a for() function.
Let’s define with a function definition, its really easy to iterate your own function over a vector, such as:
myfun <- function(x) sqrt(x)*x
sapply(1:10, myfun)

## [1] 1.000000 2.828427 5.196152 8.000000 11.180340 14.696938 18.520259

## [8] 22.627417 27.000000 31.622777
In this case, sapply() applies your defined function to each element of the vector in the first argument. Also,
it simplifies the output so it is reformated as a vector.
The apply() function allows to iterate a function over its first or second margin (rows or columns) in an
easy way. Here it follows an example with the iris dataset.
data(iris)
head(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
apply(iris[,-5],2,sd)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## 0.8280661 0.4358663 1.7652982 0.7622377
apply(iris[1:10,-5],1,summary)

## 1 2 3 4 5 6 7 8 9 10
## Min. 0.20 0.200 0.200 0.200 0.20 0.400 0.300 0.200 0.200 0.10
## 1st Qu. 1.10 1.100 1.025 1.175 1.10 1.375 1.125 1.175 1.100 1.15
## Median 2.45 2.200 2.250 2.300 2.50 2.800 2.400 2.450 2.150 2.30
## Mean 2.55 2.375 2.350 2.350 2.55 2.850 2.425 2.525 2.225 2.40
## 3rd Qu. 3.90 3.475 3.575 3.475 3.95 4.275 3.700 3.800 3.275 3.55
## Max. 5.10 4.900 4.700 4.600 5.00 5.400 4.600 5.000 4.400 4.90
• EX1. Please try to explain the function calls and the output generated.
A very usefull function is the split() function, where we can retrieve the strata of a dataframe given an
input factor. The output is a list with each strata in each element, named as the levels of the factor. An
example with the iris dataset is:
iris.strata <- split(iris,iris$Species)
length(iris.strata)

## [1] 3
names(iris.strata)

## [1] "setosa" "versicolor" "virginica"

summary(iris.strata$versicolor)

11
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## Min. :4.900 Min. :2.000 Min. :3.00 Min. :1.000 setosa : 0
## 1st Qu.:5.600 1st Qu.:2.525 1st Qu.:4.00 1st Qu.:1.200 versicolor:50
## Median :5.900 Median :2.800 Median :4.35 Median :1.300 virginica : 0
## Mean :5.936 Mean :2.770 Mean :4.26 Mean :1.326
## 3rd Qu.:6.300 3rd Qu.:3.000 3rd Qu.:4.60 3rd Qu.:1.500
## Max. :7.000 Max. :3.400 Max. :5.10 Max. :1.800
The by() function expands the functionality of the split() function so we can apply methods to each strata:
by(iris[,1:2],iris$Species,summary)

## iris$Species: setosa
## Sepal.Length Sepal.Width
## Min. :4.300 Min. :2.300
## 1st Qu.:4.800 1st Qu.:3.200
## Median :5.000 Median :3.400
## Mean :5.006 Mean :3.428
## 3rd Qu.:5.200 3rd Qu.:3.675
## Max. :5.800 Max. :4.400
## ------------------------------------------------------------
## iris$Species: versicolor
## Sepal.Length Sepal.Width
## Min. :4.900 Min. :2.000
## 1st Qu.:5.600 1st Qu.:2.525
## Median :5.900 Median :2.800
## Mean :5.936 Mean :2.770
## 3rd Qu.:6.300 3rd Qu.:3.000
## Max. :7.000 Max. :3.400
## ------------------------------------------------------------
## iris$Species: virginica
## Sepal.Length Sepal.Width
## Min. :4.900 Min. :2.200
## 1st Qu.:6.225 1st Qu.:2.800
## Median :6.500 Median :3.000
## Mean :6.588 Mean :2.974
## 3rd Qu.:6.900 3rd Qu.:3.175
## Max. :7.900 Max. :3.800
Retrieve the fvii and diseasef variables defined in section 2.5. It is easy to calculate the mean concentration
in blood for each disease,i.e. using the function by() (please look at the help of the function).
by(fvii, diseasef, mean)

## diseasef: Ehlers-Danlos syndrome

## [1] 54
## ------------------------------------------------------------
## diseasef: Hemophilia
## [1] 61
## ------------------------------------------------------------
## diseasef: Hemophilia A
## [1] 54.5
## ------------------------------------------------------------
## diseasef: Hemophilia B
## [1] 46
## ------------------------------------------------------------

12
## diseasef: Hhemophilia A
## [1] 59
## ------------------------------------------------------------
## diseasef: Myeloproliferative
## [1] 58
## ------------------------------------------------------------
## diseasef: Thrombocytopenia
## [1] 53.22222
## ------------------------------------------------------------
## diseasef: Vasculitis
## [1] 54.83333
• Ex2: Compute the same values through the tapply() function.
• Ex3: Can you explain the function of tapply()?
A powerful function is the aggregate() function, which accepts the R formula interface for easy computations:
aggregate( . ~ Species, iris, mean)

## Species Sepal.Length Sepal.Width Petal.Length Petal.Width

## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026

7. Graphical Output
The most frequently used plotting function is the plot() function. This is a generic function that will behave
differently depending on the type or mode of the object in the first argument.
Any graphical output will be diverted to the current graphics device. A graphic device can be just a window
on the X11 system or windows system or a file like a pdf. Please take a while to look at help(Devices).
Standard devices in a GNU/Linux system are X11, jpeg, png, pdf, pictex, xfig, bitmap and postcript.
If the argument has a numeric mode or type, like the fvii variable. So with this code,
plot(fvii,col=diseasef,pch=16)
70
65
60
fvii

55
50
45
40

0 5 10 15 20 25 30

Index

Figure 1: Plot of the FVII expression for all diseases.

we obtain figure 1.
Other standard functions for plotting includes hist(), dotchart(), image(), contour(), persp(). And

13
some a bit more low-level primitives like points(), lines(), text(), abline(), polygon(), legend(), and
others. See also the help of plot() and par() functions.
By default graphics are not interactive on the built-in capabilities of R, however additional packages (see
section 8) can be installed and activated for interactive and dynamic graphics. One of this packages is the
GGobi package by Swayne, Cook and Buja, which can be found online at http://www.ggobi.org. These plotting
libraries can be accessed from R via a package by name rggobi, described at http://www.ggobi.org/rggobi.
A nice and easy addition is the playwith package 3 .

8.Packages
Packages are sets of functions, data and documentations for specific purposes. To check which packages are
installed at your site, issue the following command
> library()
To know which packages are currently loaded, write
search()

## [1] ".GlobalEnv" "package:stats" "package:graphics"

## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
There are hundreds of contributed packages for R, which are written by different authors. Each package
is usually specialized in a particular method, data or problem. A complete list of standard packages in R
and their description can be found online from the R website, at http://cran.r-project.org . A particular
set of packages which is specialized in bioinformatics is named as Bioconductor, and found online at
http://www.bioconductor.org.
To install a package in windows, follow the menu in R: Packages -> Install Packages-> ,
choose a CRAN mirror and then the package. If you have a local package (in the form of a
zip file) use Packages -> Install Packages from local zip file. In R studio follow the menu
Tools -> Install Packages, choose a mirror and search for the package you are looking for in the
Packages box. From a command line, use install.packages(<packagename>).

9.Non guided work with datasets

Please check the output of the data() function. Search for its help and explain its functionality. Also, practice
downloading other libraries. One example is the package mlbench which is a collection of artificial and
real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.
You can learn more about the mlbench library on the mlbench CRAN page.
If not installed, you can install this library as follows:
install.packages("mlbench",repos = "http://cran.us.r-project.org")

You can load the library as follows:

# load the library
library(mlbench)

## Warning: package 'mlbench' was built under R version 4.3.3

To see a list of the datasets available in this library, you can type:
3 If the package does not compile due to a GTK2, install the libgtk2.0-dev package through sudo apt-get install libgtk2.0-dev

14
# list the contents of the library
library(help = "mlbench")

Search in both libraries, the pre-instaled package datasets and in the package mlbench. Choose a dataset and:
• Describe the size and type of the dataset, including the number of variables and observations.
• Try to describe the meaning of each variable and its type.
• Try to obtain some statistical description of the dataset.
• Practice reporting your results both, numerically and graphically.
• Make sure you get ready for the upcoming questionaire!

OpenStack Architecture Design Guide PDF
100% (4)
OpenStack Architecture Design Guide PDF
215 pages
R Programming Notes
100% (1)
R Programming Notes
32 pages
Bayes CPH - Tutorial R
No ratings yet
Bayes CPH - Tutorial R
9 pages
Arduino Modbus Slave - Jpmzometa
No ratings yet
Arduino Modbus Slave - Jpmzometa
2 pages
LORENZ Foundation Administrator Guide
No ratings yet
LORENZ Foundation Administrator Guide
87 pages
AdvanTrol-Pro V2.50 Software User Manual
75% (4)
AdvanTrol-Pro V2.50 Software User Manual
244 pages
Lec 1
No ratings yet
Lec 1
42 pages
Computerstatistik Skriptum
No ratings yet
Computerstatistik Skriptum
162 pages
R Tutorial
No ratings yet
R Tutorial
100 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
Statistical Methods Lab Manual-2021-22
No ratings yet
Statistical Methods Lab Manual-2021-22
58 pages
Brief R Tutorial
No ratings yet
Brief R Tutorial
8 pages
Introduction To R Programming Notes For Students
No ratings yet
Introduction To R Programming Notes For Students
41 pages
SEE_R_Practical_Dhara
No ratings yet
SEE_R_Practical_Dhara
57 pages
Stat3355 PDF
No ratings yet
Stat3355 PDF
106 pages
Owen-TheRGuide
No ratings yet
Owen-TheRGuide
61 pages
Module 1-1
No ratings yet
Module 1-1
38 pages
R Programming - PPT - UNIT- 1
No ratings yet
R Programming - PPT - UNIT- 1
72 pages
R Module 1
No ratings yet
R Module 1
34 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
Introduction To R: General Lines
No ratings yet
Introduction To R: General Lines
36 pages
R Studio
No ratings yet
R Studio
41 pages
A Concise Tutorial On R
No ratings yet
A Concise Tutorial On R
112 pages
Introducation To R
No ratings yet
Introducation To R
23 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Rintro
No ratings yet
Rintro
14 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
R For Beginners
No ratings yet
R For Beginners
76 pages
Chapter1 Notes
No ratings yet
Chapter1 Notes
73 pages
What Is Statistical Programming?: Computations Which Aid in Statistical Analysis To
No ratings yet
What Is Statistical Programming?: Computations Which Aid in Statistical Analysis To
47 pages
Topic 1 - Intro To Basics
No ratings yet
Topic 1 - Intro To Basics
38 pages
R Lab
No ratings yet
R Lab
114 pages
Nirula R Programming Lab Manual (1)
No ratings yet
Nirula R Programming Lab Manual (1)
94 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
R Quick Guide
No ratings yet
R Quick Guide
140 pages
R Programming Presentation
100% (1)
R Programming Presentation
23 pages
R Module 1 Notes
No ratings yet
R Module 1 Notes
15 pages
Mmsac FDP Tutorial
No ratings yet
Mmsac FDP Tutorial
54 pages
Getting Started in R
No ratings yet
Getting Started in R
39 pages
LAB MANUAL
No ratings yet
LAB MANUAL
46 pages
1.R Unit 1
No ratings yet
1.R Unit 1
49 pages
R Workshop
No ratings yet
R Workshop
47 pages
Statistical Analysis With R - A Quick Start
100% (1)
Statistical Analysis With R - A Quick Start
47 pages
R Presentation
No ratings yet
R Presentation
19 pages
STA1007S Lab 1: R Interface: Getting Started
No ratings yet
STA1007S Lab 1: R Interface: Getting Started
9 pages
R For Beginners: Emmanuel Paradis
No ratings yet
R For Beginners: Emmanuel Paradis
58 pages
DAR Programming - An Approach to Data Analytics-1
No ratings yet
DAR Programming - An Approach to Data Analytics-1
156 pages
Introduction To R
No ratings yet
Introduction To R
67 pages
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
No ratings yet
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
22 pages
Introduction To R: Alka Vaidya Nibm
No ratings yet
Introduction To R: Alka Vaidya Nibm
50 pages
R Reference Card: 1 Getting Started 3 Input and Output
No ratings yet
R Reference Card: 1 Getting Started 3 Input and Output
7 pages
An Introduction To R
No ratings yet
An Introduction To R
141 pages
AnalyticsEdge Rmanual PDF
100% (1)
AnalyticsEdge Rmanual PDF
44 pages
Introduction To R
No ratings yet
Introduction To R
6 pages
Advanced R Notes
No ratings yet
Advanced R Notes
28 pages
Introduction To R
No ratings yet
Introduction To R
6 pages
AP Lab Assignment 1
No ratings yet
AP Lab Assignment 1
30 pages
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Class-X DBMS (Tables, Forms, Reports) - Notes
No ratings yet
Class-X DBMS (Tables, Forms, Reports) - Notes
23 pages
668 Ol
No ratings yet
668 Ol
10 pages
Pkgmanage
100% (1)
Pkgmanage
19 pages
How To Install Tecplot
No ratings yet
How To Install Tecplot
38 pages
5 - UNIT-3 Working With Linux OS
No ratings yet
5 - UNIT-3 Working With Linux OS
5 pages
BSD (01 - 2009) - Explore NetBSD
No ratings yet
BSD (01 - 2009) - Explore NetBSD
68 pages
Proxy Troubleshooting
No ratings yet
Proxy Troubleshooting
163 pages
Hardware-Software Debugging Techniques For Reconfigurable Systems-on-Chip
No ratings yet
Hardware-Software Debugging Techniques For Reconfigurable Systems-on-Chip
6 pages
Erp Self Training
No ratings yet
Erp Self Training
2 pages
00 DLC Info
No ratings yet
00 DLC Info
14 pages
论文句子生成器
100% (1)
论文句子生成器
8 pages
Computer Programming: SECT-M1082
No ratings yet
Computer Programming: SECT-M1082
22 pages
Volante For SWIFT
No ratings yet
Volante For SWIFT
2 pages
Samihan Resume
No ratings yet
Samihan Resume
1 page
Wiki Magnusbilling Org en Source
No ratings yet
Wiki Magnusbilling Org en Source
121 pages
Touch Panel Designer - Manual v1.0.6.0
No ratings yet
Touch Panel Designer - Manual v1.0.6.0
14 pages
Insurance Premium Prediction
No ratings yet
Insurance Premium Prediction
12 pages
D3PLOT Tips: Ls-Dyna Environment
No ratings yet
D3PLOT Tips: Ls-Dyna Environment
52 pages
SR - Containers in Object Oriented ABAP
No ratings yet
SR - Containers in Object Oriented ABAP
55 pages
Anyconnect Guide
No ratings yet
Anyconnect Guide
46 pages
Whats New SBO 10
No ratings yet
Whats New SBO 10
18 pages
SentinelOne - Battle Card - March 2021
No ratings yet
SentinelOne - Battle Card - March 2021
3 pages
AWP Practical 5
No ratings yet
AWP Practical 5
11 pages
Thesis Theme For Free
100% (2)
Thesis Theme For Free
6 pages
Zoho Exp
No ratings yet
Zoho Exp
5 pages
What Is Online Platform?
No ratings yet
What Is Online Platform?
51 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

L1 Intro R

Uploaded by

L1 Intro R

Uploaded by

Lab 1: Introduction to R

Mónica Rojas Martínez, Alexander Perera Lluna

1.1. Lab requirements

Under UNIX systems, to start R, open a terminal and type R

R v e r s i o n 4 . 3 . 1 (2023−06−16 u c r t ) −− " B e a g l e S c o u t s "

R i s f r e e s o f t w a r e and comes with ABSOLUTELY NO WARRANTY.

To use rstudio, type rstudio on the prompt of a terminal window.

1.3 Getting help, finding functions

## starting httpd help server ... done

## [1] "Hello World"

1.4 Current Path

1.5 R Session workspace

1.6 Saving Restoring the workspace

To save the complete workspace you can use the

1.7 Quiting from R, batch processing

note that the following code has the same effect:

2.2 Logical Vectors

2.3 Data types

## Length Class Mode

## [1] 5.0 7.5

## [1] 7.0 7.5 7.0

## [1] 7.0 7.5 7.0

## [1] 1.3 -4.0 3.0 4.0 0.3

## ACP1 COX2 F7 F11 ABG1

## [1] Hemophilia A Thrombocytopenia Ehlers-Danlos syndrome

## Ehlers-Danlos syndrome Hemophilia Hemophilia A

3. Not no simple types

## [1] 0.05753091 -0.86792728 -0.53033853 1.72361002 -2.18436755 0.01387335

## [1] 0.01387335 0.32248995 -0.37494762 -0.95641824 1.17342998

## [1] 0.05753091 0.01387335

3.3 Matrix operators

3.4 Lists and Data Frames

## [1] "Coagulation Factor"

## [1] "Coagulation Factor"

## [1] "Coagulation Factor"

## 'data.frame': 151 obs. of 1 variable:

## 'data.frame': 151 obs. of 5 variables:

## 'data.frame': 150 obs. of 5 variables:

through the following code,

## [1] 1.000000 2.828427 5.196152 8.000000 11.180340 14.696938 18.520259

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## [1] "setosa" "versicolor" "virginica"

## diseasef: Ehlers-Danlos syndrome

## Species Sepal.Length Sepal.Width Petal.Length Petal.Width

Figure 1: Plot of the FVII expression for all diseases.

## [1] ".GlobalEnv" "package:stats" "package:graphics"

9.Non guided work with datasets

You can load the library as follows:

## Warning: package 'mlbench' was built under R version 4.3.3

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.