0% found this document useful (0 votes)
7 views

L1 Intro R

Uploaded by

pabluuss08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

L1 Intro R

Uploaded by

pabluuss08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Lab 1: Introduction to R

Mónica Rojas Martínez, Alexander Perera Lluna

October 2024

1. Starting with R
R is a scientific software package developed The R Foundation for Statistical Computing and distributed
from their R website.
The name is partly based on the (first) names of the first two R authors (Robert Gentleman and Ross Ihaka),
and partly a play on the name of the Bell Labs language'S. S is a very high level language and an environment
for data analysis and graphics. The earliest beginnings of S came from discussions in the spring of 1976,
among a group of five people at Bell Labs: Rick Becker, John Chambers, Doug Dunn, Paul Tukey, and
Graham Wilkinson. We can regard S as a language with three current implementations or “engines”, the
“old S engine” (S version 3; S-Plus 3.x and 4.x), the “new S engine” (S version 4; S-Plus 5.x and above), and
R. Given this understanding, asking for “the differences between R and S’ ’ really amounts to asking for the
specifics of the R implementation of the S language, i.e., the difference between the R and S engines. R is
just a particular implementation of S Language.
Although R language is slightly different from MATLAB, its functionality is similar and offers powerful
packages for pattern recognition and multivariate analysis. It has a strong community with very active forums
and under continuous development of the language. The implementation of this language has been done in a
very portable fashion and in a way which requires relatively low resources machine profile. Ports exists for
many members of the Unix family of operating systems, including AIX, FreeBSD, GNU/Linux, HP-UX, Irix,
MacOS X, Solaris and Tru64. In addition there is a version for Microsoft Windows (v.~98 onwards),
There is a number of user interfaces available for R, a mode for emacs ESS : Emacs Speaks Statistics, Rcmndr
and RKWard}. R distributions for OS X and Microsoft versions are packaged with custom GUIs.

1.1. Lab requirements


1
In order to follow this laboratory the following items are required
1. A computer
2. R installed from http://www.R-project.org
3. A text editor or rstudio. We strongly recommend to use rstudio in this class.
4. A .pdf reader (e.g. Adobe Acrobat Reader) to read this text on electronic format
5. A graphical environment (Windows, X11 if in Unix alike OS)

1.2 Starting R
Under Windows, just follow: Start -> All Programs-> R -> R . On Mac OSX and Microsoft Windows
R comes with a native GUI, on GNU/Linux and other UNIX implementations, a GUI can be started later on
from the command line interface
1 You can install any package into R through the command install.packages(“package name”). To build a base installation

of R in an Ubuntu Linux Computer, you can use sudo apt-get install r-recommended. You can download rstudio from
http://www.rstudio.com/products/rstudio/download/

1
Important!! R is case sensitive!!!

Under UNIX systems, to start R, open a terminal and type R


>guino@marmota : ~ $ R

R v e r s i o n 4 . 3 . 1 (2023−06−16 u c r t ) −− " B e a g l e S c o u t s "


Copyright (C) 2023 The R Foundation f o r S t a t i s t i c a l Computing
P l a t f o r m : x86_64−w64−mingw32/ x64 (64− b i t )

R i s f r e e s o f t w a r e and comes with ABSOLUTELY NO WARRANTY.


You a r e welcome t o r e d i s t r i b u t e i t under c e r t a i n c o n d i t i o n s .
Type ' l i c e n s e ( ) ' o r ' l i c e n c e ( ) ' f o r d i s t r i b u t i o n d e t a i l s .

R i s a c o l l a b o r a t i v e p r o j e c t with many c o n t r i b u t o r s .
Type ' c o n t r i b u t o r s ( ) ' f o r more i n f o r m a t i o n and
' c i t a t i o n ( ) ' on how t o c i t e R o r R p a c k a g e s i n p u b l i c a t i o n s .

Type ' demo ( ) ' f o r some demos , ' h e l p ( ) ' f o r on−l i n e help , o r
' h e l p . s t a r t ( ) ' f o r an HTML b r o w s er i n t e r f a c e t o h e l p .
Type ' q ( ) ' t o q u i t R.

To use rstudio, type rstudio on the prompt of a terminal window.

1.3 Getting help, finding functions


Once R is started we have a prompt, very like MATLAB’s. The syntaxis is also close to MATLAB with some
differences. Maybe the first thing to learn in any language is how to ask for help. In R we use the command
help(). For instance, to check the syntax of getwd(), you can type
help("getwd")

## starting httpd help server ... done


The command help.start() will launch the help system within a browser on a different window. You can
help(print) and execute you first Hello World:
print("Hello World")

## [1] "Hello World"


A very useful way to find help is with the example() function. This will return a sample of different example
uses for a given command, for instance:
example(ls)

##
## ls> .Ob <- 1
##
## ls> ls(pattern = "O")
## character(0)
##
## ls> ls(pattern= "O", all.names = TRUE) # also shows ".[foo]"
## [1] ".Ob"
##
## ls> # shows an empty list because inside myfunc no variables are defined
## ls> myfunc <- function() {ls()}

2
##
## ls> myfunc()
## character(0)
##
## ls> # define a local variable inside myfunc
## ls> myfunc <- function() {y <- 1; ls()}
##
## ls> myfunc() # shows "y"
## [1] "y"
The help system can be searched for a given pattern through the help.search() command.

1.4 Current Path


It is convinient to set current path with the workspace path where you will place all your data. Create a direc-
tory where you will store all data for this lab (e.g. ~/r-files/ in a UNIX environment or My Documents\r-files
in windows). To set a path you an use the command setwd() as in setwd(``\textasciitilde/r-files/''),
or use the File-\textgreater Change Dir\ldots in the Windows GUI.

1.5 R Session workspace


An assignment is done with the \textless- operator. To a assign a value to a variable a, just execute
a <- pi
sqrt(2) -> b
a

## [1] 3.141593
b

## [1] 1.414214
After the assignments, list the objects that live in memory with the objects() function:
{ r objects} objects()
Then you can remove objects from your session with the command rm():
rm(myfunc,a)
objects()

## [1] "b"

1.6 Saving Restoring the workspace


It is easy to save a variable into a R data file (.RData), for instance,
xx <- runif(20)
yy <- list(a = 1, b = TRUE, c = "oops")
save(xx, yy, file = "xxyy.Rdata")

To save the complete workspace you can use the


save.image(file="backup.RData")

1.7 Quiting from R, batch processing


You can quit the R session with the q() command.

3
Also, R can be called for batch processing from the shell or command line as follows2 ,
$ R [options] [< infile] [> outfile]
%$ where the commands to process are found in the infile and the output will be logged to outfile.

2. Simple types
2.1 Vectors
As many other languages, the simplest structure is a numeric vector, defined with help of the concatenate
c() function,
x <- c( 2.3, 3, 5, 7, 7.5, 7, 5, 4)
y <- c(x,3,x)

note that the following code has the same effect:


c( 2.3, 3, 5, 7, 7.5, 7, 5, 4) -> x

Simple arithmetic is like other languages, +, -, *, /, \^{} as well as typical functions log, exp, sin, cos, tan, sqrt.
A vector with regular spacing can be generated with the seq() function:
sq <-seq( 5, -2, by = -0.5)

Note that arguments in R are mainly defined by name, and in absence of name, by order, so the following
code would produce the same effect,
sq <- seq( by = -0.5, to = -2,
from = 5)

2.2 Logical Vectors


Test the effect of a comparison of a number with a vector
big <- x > 5

2.3 Data types


R functions work different depending on the object class input. In this sense, c() function also concatenates
characters, and the function summary() will behave differently,
y <- c(2, -3, 3, -1, 3, 5, 6, 12)
cv <- c("X1", "Y2", "X1", "Y1", "X2", "Y3", "X2", "Y3")
summary(cv)

## Length Class Mode


## 8 character character

2.4 Indexing
Any object can be indexed, either vectors, arrays or data frames as you will see later on. The elements of any
vector are retrieved through the [ operator, as in follows,
“‘r x[1]“‘
## [1] 2.3
2 This example for the UNIX case

4
x[3]

## [1] 5
x[c(3,5)]

## [1] 5.0 7.5


x[big]

## [1] 7.0 7.5 7.0


x[which(big)]

## [1] 7.0 7.5 7.0


(x+10)[5]

## [1] 17.5
Note that R admits an advanced indexing, aiming for good meta-data management, for example:
geneexp <- c(1.3, -4, 3,4, 0.3)
geneexp

## [1] 1.3 -4.0 3.0 4.0 0.3


names(geneexp) <- c("ACP1","COX2","F7","F11","ABG1")
geneexp

## ACP1 COX2 F7 F11 ABG1


## 1.3 -4.0 3.0 4.0 0.3
coag_factors <- geneexp[c("F7","F11")]
coag_factors

## F7 F11
## 3 4

2.5. Factors
Factors are a special an useful type, as these are the natural form for representing categorical data. Any
vector can be converted to a factor with the as.factor() function, or created with the factor() function,
cvf <- as.factor(cv)
print(cvf)

## [1] X1 Y2 X1 Y1 X2 Y3 X2 Y3
## Levels: X1 X2 Y1 Y2 Y3
summary(cvf)

## X1 X2 Y1 Y2 Y3
## 2 2 1 1 2
Factors can be very useful from a practical perspective. For example, assume you have a vector containing a
FVII concentration in blood of a for individuals carrying different diseases:
disease <- c("Hemophilia A", "Thrombocytopenia", "Ehlers-Danlos syndrome",
"Hemophilia", "Ehlers-Danlos syndrome", "Thrombocytopenia", "Vasculitis",
"Vasculitis", "Vasculitis", "Thrombocytopenia", "Myeloproliferative",
"Thrombocytopenia", "Myeloproliferative", "Hemophilia", "Thrombocytopenia",

5
"Ehlers-Danlos syndrome",
"Thrombocytopenia", "Thrombocytopenia", "Thrombocytopenia", "Vasculitis",
"Hemophilia A", "Ehlers-Danlos syndrome", "Thrombocytopenia", "Myeloproliferative",
"Ehlers-Danlos syndrome", "Hemophilia B", "Hhemophilia A", "Vasculitis",
"Ehlers-Danlos syndrome", "Vasculitis")
diseasef <- factor(disease)
diseasef

## [1] Hemophilia A Thrombocytopenia Ehlers-Danlos syndrome


## [4] Hemophilia Ehlers-Danlos syndrome Thrombocytopenia
## [7] Vasculitis Vasculitis Vasculitis
## [10] Thrombocytopenia Myeloproliferative Thrombocytopenia
## [13] Myeloproliferative Hemophilia Thrombocytopenia
## [16] Ehlers-Danlos syndrome Thrombocytopenia Thrombocytopenia
## [19] Thrombocytopenia Vasculitis Hemophilia A
## [22] Ehlers-Danlos syndrome Thrombocytopenia Myeloproliferative
## [25] Ehlers-Danlos syndrome Hemophilia B Hhemophilia A
## [28] Vasculitis Ehlers-Danlos syndrome Vasculitis
## 8 Levels: Ehlers-Danlos syndrome Hemophilia Hemophilia A ... Vasculitis
summary(diseasef)

## Ehlers-Danlos syndrome Hemophilia Hemophilia A


## 6 2 2
## Hemophilia B Hhemophilia A Myeloproliferative
## 1 1 3
## Thrombocytopenia Vasculitis
## 9 6

3. Not no simple types


3.1 Arrays
Arrays can be easily created changing the attribute of a vector. Suppose that z is a random vector of 10
elements, which can be set through the following assignment,
z <- rnorm(10)
z

## [1] 0.05753091 -0.86792728 -0.53033853 1.72361002 -2.18436755 0.01387335


## [7] 0.32248995 -0.37494762 -0.95641824 1.17342998
dim(z) <- c(5,2)
z

## [,1] [,2]
## [1,] 0.05753091 0.01387335
## [2,] -0.86792728 0.32248995
## [3,] -0.53033853 -0.37494762
## [4,] 1.72361002 -0.95641824
## [5,] -2.18436755 1.17342998
Empty matrices can be achieved with the array() function (e.g. array(0,c(5,2))). Similarly to the vectors,
elements in the arrays can be accessed through the [] operator:
z[3,2]

## [1] -0.3749476

6
z[,2]

## [1] 0.01387335 0.32248995 -0.37494762 -0.95641824 1.17342998


z[1,]

## [1] 0.05753091 0.01387335


Arrays can be expanded and added to other arrays through the cbind() and rbind() operators for column
and row bindings, respectively. The transpose of an array is obtained through the t() function.
tz <- t(z)

3.3 Matrix operators


A number of operators exists for matrix manipulation. For multiplying two matrices, the \%*\% operator is
used:
tz%*%z

## [,1] [,2]
## [1,] 8.780160 -4.291945
## [2,] -4.291945 2.536452
Other operators of interest for matrix-matrix, matrix-vector and matrix are: diag(), crossprod(), solve(),
eigen(), svd(), etc.

3.4 Lists and Data Frames


A list is an important R object consisting of an ordered collection of objects known as components. There
is no particular need for the components to be of the same type, so a list can hold different classes of objects,
as the cell’s in MatLab. The components are accessed through the [[]] operator for index numbers or the
$ operator for index names.
l <- list(name="FVII",type="Coagulation Factor", range=c(100,100000))
l[[2]]

## [1] "Coagulation Factor"


l$type

## [1] "Coagulation Factor"


l[[3]][2]

## [1] 1e+05
ty <- "type"
l[[ty]]

## [1] "Coagulation Factor"


You can print a more human readable output for complex objects with help of the str() function:
str(l)

## List of 3
## $ name : chr "FVII"
## $ type : chr "Coagulation Factor"
## $ range: num [1:2] 1e+02 1e+05

7
Lists can be concatenated with the c() function
A Data Frame is a special list of class “data.frame”, which can be mainly regarded as a matrix with columns
possibly of different modes (or types) and attributes. It can be displayed in matrix form, and its rows and
columns extracted using matrix indexing as seen in section 3.1.
A data frame can be created by importing a csv file, or with the data.frame() function.
Let’s define a variable containing the FVII levels (in some arbitraty units),
fvii <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)

Then we can create a dataframe with the factor variable containing the disease types and the FVII vector
containing the expression levels for an individual in eachdisease type:
blood <- data.frame(Disease=diseasef, FVII=fvii)
# Use head() or tail () for displaying the first/last elements
head(blood)

## Disease FVII
## 1 Hemophilia A 60
## 2 Thrombocytopenia 49
## 3 Ehlers-Danlos syndrome 40
## 4 Hemophilia 61
## 5 Ehlers-Danlos syndrome 64
## 6 Thrombocytopenia 60
The head() function displays just the first six rows, see also the tail() function. Remember that the
elements are retrieved through the $ operator.

4. I/O
An important part of R deals with reading data from files. The easiest way to read a file is through the
read.table() function. This function will read a .csv file and generate a data frame in the R session. Other
functions in the same family are read.csv() and read.delim().
Be advised that reading very large data files with read.table() is resource expensive and non-optimal. For
large file others lower level primitives should be used as scan()
There are tools for interactively editing objects like data frames. This is useful for making small changes to
the data in the R session.
An object editor can be invoked with the edit or fix function names, e.g. blood \textless- edit(blood),
which is equivalent to execute fix(blood).
An example using read.table() is presented bellow
myDF <- read.table('iris.csv') #read file 'iris.csv' on the current workspace
str(myDF) # describe the structure of myDF

## 'data.frame': 151 obs. of 1 variable:


## $ V1: chr "sepal_length,sepal_width,petal_length,petal_width,class" "5.1,3.5,1.4,0.2,Iris-setosa" "
myDF <- read.table('iris.csv', sep = ',') #read file by setting the field separator
#character as ','
str(myDF)

## 'data.frame': 151 obs. of 5 variables:

8
## $ V1: chr "sepal_length" "5.1" "4.9" "4.7" ...
## $ V2: chr "sepal_width" "3.5" "3.0" "3.2" ...
## $ V3: chr "petal_length" "1.4" "1.4" "1.3" ...
## $ V4: chr "petal_width" "0.2" "0.2" "0.2" ...
## $ V5: chr "class" "Iris-setosa" "Iris-setosa" "Iris-setosa" ...
myDF <- read.table('iris.csv', sep = ',', head = TRUE) #read file by setting the field
#separtor character as ',' and header as TRUE
str(myDF)

## 'data.frame': 150 obs. of 5 variables:


## $ sepal_length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ sepal_width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ petal_length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ petal_width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ class : chr "Iris-setosa" "Iris-setosa" "Iris-setosa" "Iris-setosa" ...
Note that the resulting data frame depends on how you set the function parameters. For example, the
separator character sep in this file is a colon. Leaving this parameter unspecified will result on a data frame
with only one variable (one column), given that the default separator is a blank space. Likewise, data files
that employ headers should be read by setting the parameter head to TRUE. Otherwise, the first data raw will
be the header, and R will automatically set the names of each variable to ‘V1’, ‘V2’, etc. Besides, the type of
the object in ‘V1’, ‘V2’, etc. can be wrongly interpreted as in this example (e.g. the first variable is numeric,
not char if the separator is not well specified).

5.Control statements
5.1. Conditional
The language has available a conditional construction of the form
if ( 1==2 ) { print("yes") } else { print("no") }

## [1] "no"
The conditional expression must evaluate to a single logical value. Comparison operators are typically
&&,||,>=,>,<,<=,==,!=, whereas &,| operators applies element-wise to vectors.
There is a vectorized version of the if/else construct, the ifelse() function. This has the form
ifelse(condition, a, b) and returns a vector of the length of its longest argument, with elements a[i] if
condition [i] is true, otherwise b[i].

5.2 Loops
Loops are quite similar to other programming languages. There is also a for loop construction which has the
form
for ( ic in c("joan","helena","maria") ) { print(ic) }

## [1] "joan"
## [1] "helena"
## [1] "maria"
For loops are found much less often than in compiled languages, as R provides with some compact forms for
object iteration, like the apply(), sweep(), mapply(), tapply(), and others.
There also some other directives like repeat(), or while()

9
k<-3; while ( k ) { print(k <- k-1) }

## [1] 2
## [1] 1
## [1] 0

6. Defining functions
As hinted before, R allows the user to create functions, this is a way to expand the functionality of R towards
our interest. The definition of a simple function can be seen in the following below:
FunctionName <- function(x,y) { x+y }
FunctionName(3,4)

## [1] 7
Another example is the following function, that implements the following expression,

k−1
X
f (k) = x
x=1

through the following code,


SerialSum <- function(k)
{
out <- 0
while (k)
out <- out + ( k <- k-1 )
out
}
SerialSum(3)

## [1] 3
SerialSum(5)

## [1] 10

Important!! Note that the return value of the function is the result of the last expression in the function.

Once a function is defined, it is easy to check its definition just typing the name of the function, e.g.
SerialSum

## function(k)
## {
## out <- 0
## while (k)
## out <- out + ( k <- k-1 )
## out
## }
## <bytecode: 0x000002b6f3ab6af8>
There is also the possibility to define binary operators through the following syntax.
> "%!%" <- function(X, y) { ... }

10
6.1.apply() family functions and friends
In R there is a family of iterators known as the apply() family. This set of functions allow to do most of the
work when an iterator is needed, avoiding in most cases the use of a for() function.
Let’s define with a function definition, its really easy to iterate your own function over a vector, such as:
myfun <- function(x) sqrt(x)*x
sapply(1:10, myfun)

## [1] 1.000000 2.828427 5.196152 8.000000 11.180340 14.696938 18.520259


## [8] 22.627417 27.000000 31.622777
In this case, sapply() applies your defined function to each element of the vector in the first argument. Also,
it simplifies the output so it is reformated as a vector.
The apply() function allows to iterate a function over its first or second margin (rows or columns) in an
easy way. Here it follows an example with the iris dataset.
data(iris)
head(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species


## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
apply(iris[,-5],2,sd)

## Sepal.Length Sepal.Width Petal.Length Petal.Width


## 0.8280661 0.4358663 1.7652982 0.7622377
apply(iris[1:10,-5],1,summary)

## 1 2 3 4 5 6 7 8 9 10
## Min. 0.20 0.200 0.200 0.200 0.20 0.400 0.300 0.200 0.200 0.10
## 1st Qu. 1.10 1.100 1.025 1.175 1.10 1.375 1.125 1.175 1.100 1.15
## Median 2.45 2.200 2.250 2.300 2.50 2.800 2.400 2.450 2.150 2.30
## Mean 2.55 2.375 2.350 2.350 2.55 2.850 2.425 2.525 2.225 2.40
## 3rd Qu. 3.90 3.475 3.575 3.475 3.95 4.275 3.700 3.800 3.275 3.55
## Max. 5.10 4.900 4.700 4.600 5.00 5.400 4.600 5.000 4.400 4.90
• EX1. Please try to explain the function calls and the output generated.
A very usefull function is the split() function, where we can retrieve the strata of a dataframe given an
input factor. The output is a list with each strata in each element, named as the levels of the factor. An
example with the iris dataset is:
iris.strata <- split(iris,iris$Species)
length(iris.strata)

## [1] 3
names(iris.strata)

## [1] "setosa" "versicolor" "virginica"


summary(iris.strata$versicolor)

11
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## Min. :4.900 Min. :2.000 Min. :3.00 Min. :1.000 setosa : 0
## 1st Qu.:5.600 1st Qu.:2.525 1st Qu.:4.00 1st Qu.:1.200 versicolor:50
## Median :5.900 Median :2.800 Median :4.35 Median :1.300 virginica : 0
## Mean :5.936 Mean :2.770 Mean :4.26 Mean :1.326
## 3rd Qu.:6.300 3rd Qu.:3.000 3rd Qu.:4.60 3rd Qu.:1.500
## Max. :7.000 Max. :3.400 Max. :5.10 Max. :1.800
The by() function expands the functionality of the split() function so we can apply methods to each strata:
by(iris[,1:2],iris$Species,summary)

## iris$Species: setosa
## Sepal.Length Sepal.Width
## Min. :4.300 Min. :2.300
## 1st Qu.:4.800 1st Qu.:3.200
## Median :5.000 Median :3.400
## Mean :5.006 Mean :3.428
## 3rd Qu.:5.200 3rd Qu.:3.675
## Max. :5.800 Max. :4.400
## ------------------------------------------------------------
## iris$Species: versicolor
## Sepal.Length Sepal.Width
## Min. :4.900 Min. :2.000
## 1st Qu.:5.600 1st Qu.:2.525
## Median :5.900 Median :2.800
## Mean :5.936 Mean :2.770
## 3rd Qu.:6.300 3rd Qu.:3.000
## Max. :7.000 Max. :3.400
## ------------------------------------------------------------
## iris$Species: virginica
## Sepal.Length Sepal.Width
## Min. :4.900 Min. :2.200
## 1st Qu.:6.225 1st Qu.:2.800
## Median :6.500 Median :3.000
## Mean :6.588 Mean :2.974
## 3rd Qu.:6.900 3rd Qu.:3.175
## Max. :7.900 Max. :3.800
Retrieve the fvii and diseasef variables defined in section 2.5. It is easy to calculate the mean concentration
in blood for each disease,i.e. using the function by() (please look at the help of the function).
by(fvii, diseasef, mean)

## diseasef: Ehlers-Danlos syndrome


## [1] 54
## ------------------------------------------------------------
## diseasef: Hemophilia
## [1] 61
## ------------------------------------------------------------
## diseasef: Hemophilia A
## [1] 54.5
## ------------------------------------------------------------
## diseasef: Hemophilia B
## [1] 46
## ------------------------------------------------------------

12
## diseasef: Hhemophilia A
## [1] 59
## ------------------------------------------------------------
## diseasef: Myeloproliferative
## [1] 58
## ------------------------------------------------------------
## diseasef: Thrombocytopenia
## [1] 53.22222
## ------------------------------------------------------------
## diseasef: Vasculitis
## [1] 54.83333
• Ex2: Compute the same values through the tapply() function.
• Ex3: Can you explain the function of tapply()?
A powerful function is the aggregate() function, which accepts the R formula interface for easy computations:
aggregate( . ~ Species, iris, mean)

## Species Sepal.Length Sepal.Width Petal.Length Petal.Width


## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026

7. Graphical Output
The most frequently used plotting function is the plot() function. This is a generic function that will behave
differently depending on the type or mode of the object in the first argument.
Any graphical output will be diverted to the current graphics device. A graphic device can be just a window
on the X11 system or windows system or a file like a pdf. Please take a while to look at help(Devices).
Standard devices in a GNU/Linux system are X11, jpeg, png, pdf, pictex, xfig, bitmap and postcript.
If the argument has a numeric mode or type, like the fvii variable. So with this code,
plot(fvii,col=diseasef,pch=16)
70
65
60
fvii

55
50
45
40

0 5 10 15 20 25 30

Index

Figure 1: Plot of the FVII expression for all diseases.

we obtain figure 1.
Other standard functions for plotting includes hist(), dotchart(), image(), contour(), persp(). And

13
some a bit more low-level primitives like points(), lines(), text(), abline(), polygon(), legend(), and
others. See also the help of plot() and par() functions.
By default graphics are not interactive on the built-in capabilities of R, however additional packages (see
section 8) can be installed and activated for interactive and dynamic graphics. One of this packages is the
GGobi package by Swayne, Cook and Buja, which can be found online at http://www.ggobi.org. These plotting
libraries can be accessed from R via a package by name rggobi, described at http://www.ggobi.org/rggobi.
A nice and easy addition is the playwith package 3 .

8.Packages
Packages are sets of functions, data and documentations for specific purposes. To check which packages are
installed at your site, issue the following command
> library()
To know which packages are currently loaded, write
search()

## [1] ".GlobalEnv" "package:stats" "package:graphics"


## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
There are hundreds of contributed packages for R, which are written by different authors. Each package
is usually specialized in a particular method, data or problem. A complete list of standard packages in R
and their description can be found online from the R website, at http://cran.r-project.org . A particular
set of packages which is specialized in bioinformatics is named as Bioconductor, and found online at
http://www.bioconductor.org.
To install a package in windows, follow the menu in R: Packages -> Install Packages-> ,
choose a CRAN mirror and then the package. If you have a local package (in the form of a
zip file) use Packages -> Install Packages from local zip file. In R studio follow the menu
Tools -> Install Packages, choose a mirror and search for the package you are looking for in the
Packages box. From a command line, use install.packages(<packagename>).

9.Non guided work with datasets


Please check the output of the data() function. Search for its help and explain its functionality. Also, practice
downloading other libraries. One example is the package mlbench which is a collection of artificial and
real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.
You can learn more about the mlbench library on the mlbench CRAN page.
If not installed, you can install this library as follows:
install.packages("mlbench",repos = "http://cran.us.r-project.org")

You can load the library as follows:


# load the library
library(mlbench)

## Warning: package 'mlbench' was built under R version 4.3.3


To see a list of the datasets available in this library, you can type:
3 If the package does not compile due to a GTK2, install the libgtk2.0-dev package through sudo apt-get install libgtk2.0-dev

14
# list the contents of the library
library(help = "mlbench")

Search in both libraries, the pre-instaled package datasets and in the package mlbench. Choose a dataset and:
• Describe the size and type of the dataset, including the number of variables and observations.
• Try to describe the meaning of each variable and its type.
• Try to obtain some statistical description of the dataset.
• Practice reporting your results both, numerically and graphically.
• Make sure you get ready for the upcoming questionaire!

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy