Statistical Models Using R
Statistical Models Using R
R supports not only branching and looping but also modular programming via
functions. To boost efficiency, R can be integrated with procedures written in
C, C++, .Net, Python, and FORTRAN.
In the modern world, data analysts, statisticians, and marketers frequently use
R as a tool for accessing, cleaning, and presenting data.
Why to use R?
Open Source: R is an open-source language, meaning it's freely available for anyone
to use, modify, and distribute.
Supports cross language Integration: R can easily integrate with other languages
and tools.
Community Support: The R community is large and active. This makes it easier for
users to find help and learn from others.
Applications of R
1. Statistical Analysis: R is widely used for statistical analysis in fields such as
academia, pharmaceuticals, finance, and social sciences. It offers a rich set of
statistical tools for hypothesis testing, regression analysis, time series analysis,
and more.
2. Data Visualization: R is a popular choice for data visualization due to
packages like ggplot2, which enable users to create a wide variety of high-
quality plots and charts for exploring and presenting data.
3. Machine Learning: R provides several packages for machine learning, such as
caret, mlr, and TensorFlow, making it suitable for tasks like classification,
regression, clustering, and dimensionality reduction.
4. Data Mining: R can be used for data mining tasks such as association rule
mining, cluster analysis, and anomaly detection. Packages like arules and
cluster facilitate these tasks.
5. Bioinformatics: R is extensively used in bioinformatics for analyzing and
visualizing biological data, including DNA sequencing data,
6. Social Network Analysis: R offers packages like igraph and network for
analyzing and visualizing social networks and complex networks
Data Types in R
Basic Data
Values Examples
Types
Numeric Set of all real numbers "numeric_value <- 3.14"
“a”, “b”, “c”, …, “@”, “#”, “$”, …., “1”, "character_value <- "Hello
Character
“2”, …etc Geeks"
R’s base data structures are often organized by their dimensionality (1D, 2D, or
nD) and whether they’re homogeneous (all elements must be of the identical
type) or heterogeneous (the elements are often of various types). This gives
rise to the three data types which are most frequently utilized in data analysis.
• Vectors
• Dataframes
• Matrices
• Vectors
Example:
• Python3
X = c(1, 3, 5, 7, 8)
print(X)
Output:
[1] 1 3 5 7 8
Dataframes
Dataframes are generic data objects of R which are used to store the tabular
data. Dataframes are the foremost popular data objects in R programming
because we are comfortable in seeing the data within the tabular form. They
are two-dimensional, heterogeneous data structures. These are lists of vectors
of equal lengths.
• A data-frame must have column names and every row should have a
unique name.
• Each column must have the identical number of items.
• Each item in a single column must be of the same data type.
• Different columns may have different data types.
Example:
• Python3
print(df)
Output:
Example:
• Python3
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3, ncol = 3,
# in column-wise order
byrow = TRUE
print(A)
Output: