MLlab5th
MLlab5th
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD)
and whether they’re homogeneous (all elements must be of the identical type) or
heterogeneous (the elements are often of various types). This gives rise to the five
data types which are most frequently utilized in data analysis. the subsequent table
shows a transparent cut view of those data structures (1).
1D Vector List
2D Matrix Dataframe
nD Array
Vector:
A vector is an ordered collection of basic data types of a given length. The only key
thing here is all the elements of a vector must be of the identical data type e.g
homogenous data structures. Vectors are one-dimensional data structures.
> typeof(X)
[1] "double"
> length(X)
[1] 5
> x <- c(1, 5.4, TRUE, "hello")
> typeof(x)
[1] "character"
> length(x)
[1] 4
>x
[1] "1" "5.4" "TRUE" "hello"
>X
[1] 1 3 5 7 8
> x[-1]
[1] 2 4 6 8 10 #Access all expect 1st element
Accessing elements using logical vector as index:
When we use a logical vector for indexing, the position where the logical vector is
TRUE is re turned. This useful feature helps us in filtering of vector as shown
below.
> x[c(TRUE)]
[1] 0 2 4 6 8 10
[1] 6 8
10
Modifying vectors
We can modify a vector using the assignment operator. We can use the techniques
discussed a bove to access specific elements and modify them. If we want to
truncate the elements, we can use reassignments.
> x=c(-3, -2, - 0, 1, 2
1, )
>x
[1] -3 -2 -1 0 1 2
> x[2] <- 0
>x
[1] -3 0 -1 0 1 2
> x[x<0] <- 5 # modify elements less than 0 as 5
>x
[1] 5 0 5 0 1 2
> x <- x[1:4] # truncate x to first 4 elements
>x
[1] 5 0 5 0
Lists:
A list is a generic object consisting of an ordered collection of objects. Lists are
heterogeneous da ta structures. These are also one-dimensional data structures. A list
can be a list of vectors, list of matrices, a list of characters and a list of functions and
so on.
Creating Lists
List can be created using the list() function.
> x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)
>
s
t
r
(
x
)
L
i
s
t
o
f
3
$ a: num 2.5
$ b: logi TRUE
$ c: int [1:3] 1 2 3
[
[
1
]
]
[1] 2.5
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3
[[2]]
[[3]]
[1] 4
[[2]]
[,1] [,2] [,3]
[1,] 1 3 -1
[2,] 2 4 9
[[3]]
[[3]][[1]]
[1] "Red"
[[3]][[2]]
[1] 12.3
$Matrix
$Misc
$Misc[[1]]
[1] "Red"
$Misc[[2]]
[1] 12.3
$Misc[[1]]
[1] "Red"
$Misc[[2]]
[1] 12.3
> print(data_list[1])
$Monat
Merging Lists
> num_list <- list(1,2,3,4,5)
> day_list <- list("Mon","Tue","Wed", "Thurs", "Fri")
> merge_list <- c(num_list, day_list)
>m
erg
e_li
st
[[1]
]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] "Mon"
[[7]]
[1] "Tue"
[[8]]
[1] "Wed"
[[9]]
[1] "Thurs"
[[10]]
[1] "Fri"
Matrices:
A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix,
as we know rows are the ones that run horizontally and columns are the ones that run
vertically. Matrices are t wo-dimensional, homogeneous data structures.
Now, let’s see how to create a matrix in R. To create a matrix in R you need to use the
function c alled matrix. The arguments to this matrix() are the set of elements in the
vector. You have to pas s how many numbers of rows and how many numbers of
columns you want to have in your matri x and this is the important point you have to
remember that by default, matrices are in column-wi se order.
>A
> A[c(3,2),] # leaving column field blank will select entire columns
[,1] [,2] [,3]
[1,] 3 6 9
[2,] 2 5 8
> A[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,] 1 4
[2,] 3 6
> A[A>5]
[1] 6 7 8 9
>x
ABC
X147
Y258
Z369
> x[,"A"]
XYZ
123
>
x[TRUE,c("
A","C")] A
C
X 17
Y 28
Z 39
> x[2:3,c("
A","C")]
AC
Y28
Z39
[3,] 3 6 9
Dataframes:
Dataframes are generic data objects of R which are used to store the tabular data.
Dataframes are the foremost popular data objects in R programming because we are
comfortable in seeing the da ta within the tabular form. They are two-dimensional,
heterogeneous data structures. These are lis ts of vectors of equal lengths (5).
> ncol(df)
[1] 3
> nrow(df)
[1] 3
$ SN : int 1 2
$ Age : num 21 15
$ Name: Factor w/ 2 levels "Dora","John": 2 1
> x$Name
[1] John Dora
> x[["Name"]]
[1] John Dora
> x[[3]]
[1] John Dora
> trees <- data.frame("Girth" = c(8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11,
11.1, 11.2), "Height" = c(70, 65, 63, 72, 81, 83, 66, 75, 80, 75), "Volume"
= c(10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9))
> str(trees)
'data.frame': 10 obs. of 3 variables:
$ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2
$ Height: num 70 65 63 72 81 83 66 75 80 75
$ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9
> head(trees,n=3)
> trees[2:5,]
> trees[trees$Height
> 80,] Girth Height
Volume
5 10.7 81 18.8
6 10.8 83 19.7
>x
SN Age Name
1 1 21 John
2 2 15 Dora
> x[1,"Age"] <- 20
>x
SN Age Name
1 1 20 John
2 2 15 Dora
Arrays:
Arrays are the R data objects which store the data in more than two dimensions.
Arrays are n-dim ensional data structures. For example, if we create an array of
dimensions (2, 3, 3) then it creates 3 rectangular matrices each with 2 rows and 3
columns. They are homogeneous data structures.
Now, let’s see how to create arrays in R. To create an array in R you need to use the
function call ed array(). The arguments to this array() are the set of elements in
vectors and you have to pass a vector containing the dimensions of the array.
,,2
[,1] [,2]
[1,] 5 7
[2,] 6 8
,,2
,,3
c1 c2 c3
r1 1 4 7
r2 2 5 8
r3 3 6 9
,, m
2
c1 c2 c3
r1 1 1 16
0 3
r2 1 1 17
1 4
r3 1 1 18
2 5
, , m3
c1 c2 c3
r1 19 22 25
r2 20 23 26
r3 21 24 27
[1] 20
> named_array[c(1,2),c(2,3),c(1,2)]
#first and second row, second and third column of first and second matrices
, , m1
[[2]]
[1] "CSE"
[[3]]
[1] 316129510013
[[4]]
function (x) .Primitive("sin")
Matrix:
M=matrix(c(1,2,3,4,5,6,7,8,9,10),nrow=2,ncol=5,byrow=TRUE)
> print(M)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
> v1=c(1,2,3)
> v2=c(4,5,6,7,8,9)
> arr=array(c(v1,v2),dim=c(3,3,2))
> print(arr)
,,1
,,2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Dataframes:
costs=data.frame(
+ name=c("carrot","apple","sugar"),
+ costPerKG=c(50.00,60.00,39.50),
+ QuantityAvailableinKGs=c(10,5,50))
> print(costs)
name costPerKG QuantityAvailableinKGs
1 carrot 50.0 10
2 apple 60.0 5
3 sugar 39.5 50