BDA LAB Programs
BDA LAB Programs
ii)Pseudo-Distributed Mode
iii)Fully-Distributed Mode
Prerequisites
Supported Platforms
Windows is also a supported platform but the followings steps are for Linux only. To set up
Hadoop on Windows, see wiki page.
Required Software
ssh must be installed and sshd must be running to use the Hadoop scripts that manage
remote Hadoop daemons.
Installing Software
If your cluster doesn’t have the requisite software you will need to install
Download
To get a Hadoop distribution, download a recent stable release from one of the Apache
Download Mirrors.
Unpack the downloaded Hadoop distribution. In the distribution, edit the file
etc/hadoop/hadoop-env.sh to define some parameters as follows:
export JAVA_HOME=/usr/java/latest
1 Hadoop and BigData Lab
$ bin/hadoop
This will display the usage documentation for the hadoop script.Now you are ready to start
your Hadoop cluster in one of the three supported modes:
Pseudo-Distributed Mode
Fully-Distributed Mode
The following example copies the unpacked conf directory to use as input and then finds
and displays every match of the given regular expression. Output is written to the given
output directory.
$ mkdir input
$ cp etc/hadoop/*.xml input
3. Installing SSH
Configuring SSH
# First login with hduser (and from now use only hduser account for further steps)
$ sudo su hduser
4. Disabling IPv6
Since Hadoop doesn’t work on IPv6, we should disable it. One of another reason is also that
it has been developed and tested on IPv4 stacks. Hadoop nodes will be able to communicate
if we are having IPv4 cluster. (Once you have disabled IPV6 on your machine, you need to
reboot your machine in order to check its effect. In case if you don’t know how to reboot
with command use sudo reboot )
For getting your IPv6 disable in your Linux machine, you need to update /etc/sysctl.conf by
adding following line of codes at end of the file,open sysctl.conf by using the following
command
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
Tip:- You can use nano, gedit, and Vi editor for updating all text files for this configuration
purpose.
First you need to download Apache Hadoop 2.6.0 (i.e. hadoop-2.6.0.tar.gz)or latest version
source from Apache download Mirrors. You can also try stable hadoop to get all latest
features as well as recent bugs solved with Hadoop source. Choose location where you want
to place all your hadoop installation, I have chosen /usr/local/hadoop
$cd Download
# #Extract Hadoop source run this following command where your hadoop is downloaded
$ cd /usr/local/
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
JAVA_HOME=/usr/lib/jvm/java-8-oracle
c. move to /usr/local/hadoop/etc/hadoop
$ cd /usr/local/hadoop/etc/hadoop
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
$cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
7. Format Namenode
$ start-dfs.sh
Start MapReduce daemons:
$ start-yarn.sh
Instead both of these above command you can also use start-all.sh, but its now deprecated
so its not recommended to be used for better Hadoop operations.
9. Track/Monitor/Verify
$ jps
If you wish to track Hadoop MapReduce as well as HDFS, you can try exploring Hadoop
web view of ResourceManager and NameNode which are usually used by hadoop
administrators. Open your default browser and visit to the following links.
For ResourceManager – Http://localhost:8088
If you are getting output as shown in the above snapshot then Congratulations! You
have successfully installed Apache Hadoop in your Ubuntu and if not then post your
error messages in comments. We will be happy to help you. Happy Hadooping.!!
iii) Fully-Distributed Mode
Prerequisites
Install and Confiure Single node Hadoop which will be our Masternode.
Step 3A: Hostname identification of your nodes to be configured in the further steps. To
Masternode, we will name it as HadoopMaster and to 2 different Slave nodes, we will name
them as HadoopSlave1, HadoopSlave2 respectively in /etc/hosts directory. After deciding a
hostname of all nodes, assign their names by updating hostnames (You can ignore this step
if you do not want to setup names.) Add all host names to /etc/hosts directory in all
Machines (Master and Slave nodes).
192.168.2.14 HadoopMaster
192.168.2.15 HadoopSlave1
192.168.2.16 HadoopSlave2
Step 3B: Create hadoop as group and hduser as user in all Machines (if not created !!).
Step 3C: Install rsync for sharing hadoop source with rest all Machines,
Step 3D: To make above changes reflected, we need to reboot all of the Machines.
$sudo reboot
Changes:
1. Update core-site.xml
## Paste these lines into <configuration> tag OR Just update it by replacing localhost
<property>
<name>fs.default.name</name>
<value>hdfs://HadoopMaster:9000</value>
</property>
with master
2. Update hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
## Paste/Update these lines into <configuration> tag
3. Update yarn-site.xml
Update this file by updating the following three properties by updating hostname
from localhost to HadoopMaster,
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>HadoopMaster:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>HadoopMaster:8035</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>HadoopMaster:8050</value>
</property>
## Paste/Update these lines into <configuration> tag
4. Update Mapred-site.xml
HadoopMaster
## Add name of master nodes
6. Update slaves
HadoopSlave1
HadoopSlave2
## Add name of slave nodes
# In HadoopSlave1 machine
$sudo mkdir /usr/local/hadoop
☑ Applying Master node specific Hadoop configuration: (Only for master nodes)
These are some configuration to be applied over Hadoop MasterNodes (Since we have only
one master node it will be applied to only one master node.)
step a: Remove existing Hadoop_data folder (which was created while singlenode hadoop
setup.
$sudo rm -rf /usr/local/hadoop_tmp/
step b: Make same (/usr/local/hadoop_tmp/hdfs) directory and create NameNode
(/usr/local/hadoop_tmp/hdfs/namenode) directory
$ sudo mkdir -p /usr/local/hadoop_tmp/
☑ Applying Slave node specific Hadoop configuration : (Only for slave nodes)
Since we have three slave nodes, we will be applying the following changes over HadoopSlave1,
HadoopSlave2 and HadoopSlave3 nodes.
step a: Remove existing Hadoop_data folder (which was created while single node hadoop
setup)
$sudo rm -rf /usr/local/hadoop_tmp/hdfs/
☑ Copying ssh key for Setting up passwordless ssh access from Master to Slave node :
Fire the following command for sharing public SSH key –$HOME/.ssh/id_rsa.pub file (of
HadoopMaster node) to authorized_keys file of hduser@HadoopSlave1 and also on
hduser@HadoopSlave1 (in$HOME/.ssh/authorized_keys)
hduser@HadoopMaster:/usr/local/hadoop$ start-dfs.sh
hduser@HadoopMaster: jps
hduser@HadoopSlave1: jps
hduser@HadoopSlave2: jps
(As shown in above snap- The running services of HadoopSlave1 will be the same for all Slave nodes
configured in Hadoop Cluster.)
If you wish to track Hadoop MapReduce as well as HDFS, you can also try exploring Hadoop web
view of ResourceManager and NameNode which are usually used by hadoop administrators. Open
your default browser and visit to the following links from any of the node.
Hadoop and BigData Lab
If you are getting the similar output as shown in the above snapshot for Master and Slave noes then
Congratulations! You have successfully installed Apache Hadoop in your Cluster and if not then post
your error messages in comments.
3. Implement the following file management tasks in Hadoop:
Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and
copies them into HDFS using one of the above command line utilities.
$ hadoop version
> Result
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-
2.7.3.jar
$ start-dfs.sh
$ start-yarn.sh
$ jps
$ hadoop fs
the above command gives the list of options for hadoop filesystem like
$ export JAVA_HOME=/usr/lib/jvm/jdk1.8.0._101
$ export PATH=${JAVA_HOME}/bin:${PATH}
$ export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
next create input directory for text files what you are storing.
OR
if you want to create a directory using single command use following command.
For example your files name is wordfile.txt is in your home directory. load this file into hdfs
directory /wordcount/input
Solution:
import java.io.IOException;
import java.util.StringTokenizer;
import
org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper extends Mapper<Object, Text, Text,
IntWritable>{ private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken())
; context.write(word,
one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>
{ private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values)
{ sum += val.get();
}
result.set(sum);
context.write(key,
result);
}
}
public static void main(String[] args) throws Exception
{ Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word
count"); job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
3. Hadoop and BigData Lab
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new
Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Input.txt(input file)
Apache > Hadoop > Apache Hadoop 3.0.0-alpha1 Wiki | git | Last Published: 2016-08-30 |
Version: 3.0.0-alpha1 General Overview Single Node Setup Cluster Setup commands
Reference FileSystem Shell Compatibility Interface Classification FileSystem Specification
Common CLI Mini Cluster Etc ……………
Output:
(jar/
executable1
(multi-
terabyte1 (see
2
(thousands 1
A 1
Architecture 2
Distributed 1
File 1
Guide) 1
Guide). 1
HDFS 1
Hadoop 3
MRAppMaster 1
MapReduce 4
Minimally, 1
NodeManager 1
ResourceManager 1
3. Write a Map Reduce program that mines weather
data.
Solution:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.conf.Configuration;
import
org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Reducer;
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path>
<output path>");
System.exit(-1);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "maxtemprature"); job.setJarByClass(MaxTemperature.class); job.setJobName("Max
temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new
Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Input file:
1940,+761
1940,+341
1940,-041
1940,+221
1940,+481
1940,+921
1940,+981
1940,+861
1950,-241
1950,+521
1950,+041
1950,+041
1950,+941
1950,+761
1950,+101
1950,+101
1950,+401
1950,+041
1955,+841
1955,+621
1955,-021
1955,+761
1955,+261
1955,+981
1955,+941
1955,+301
1955,+961
1955,+721
1955,+721
1955,+881
1955,+981
1955,-061
1955,+361
1955,+581
1955,+941
Output:
1940 98
1950 94
1955 98
4a. Implement Linear Regression using R
library(datasets)
colnames(InputData)[4] = "Life.Exp"
colnames(InputData)[6] = "HS.Grad"
summary(fit1)
#It appears higher populations are related to increased life expectancy and
#we're not seeing much. Another kind of summary of the model can be obtained like this.
summary(fit2)
anova(fit1, fit2)
#As you can see, removing "Area" had no significant effect on the model (p = .4205).
#Compare the p-value to that for "Area" in the first summary table above.
fit3 <- lm(formula = Life.Exp ~ Population + Murder + HS.Grad + Frost + Density, data = InputData)
summary(fit3)
fit4 <- lm(formula = Life.Exp ~ Population + Murder + HS.Grad + Frost , data = InputData)
summary(fit4)
confint(fit4)
par(mfrow=c(2,2))
plot(fit1, 1)
plot(fit2, 1)
plot(fit3, 1)
plot(fit4, 1)
par(mfrow=c(1,1))
names(model5)
Output:
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Frost + Density
1 41 22.068
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7174 on 44 degrees of freedom
Call:
data = InputData)
Residuals:
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2.5 % 97.5 %
# Loading package
library(caTools)
library(ROCR)
# Splitting dataset
split <- sample.split(mtcars, SplitRatio = 0.8)
split
# Training model
logistic_model <- glm(vs ~ wt + disp,
data = train_reg,
family = "binomial")
logistic_model
# Summary
summary(logistic_model)
# Changing probabilities
predict_reg <- ifelse(predict_reg >0.5, 1, 0)
# ROC-AUC Curve
ROCPred <- prediction(predict_reg, test_reg$vs)
ROCPer <- performance(ROCPred, measure = "tpr",
x.measure = "fpr")
# Plotting curve
plot(ROCPer)
plot(ROCPer, colorize = TRUE,
print.cutoffs.at = seq(0.1, by = 0.1),
main = "ROC CURVE")
abline(a = 0, b = 1)
Output:
[1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
Coefficients:
(Intercept) wt disp
Call:
Deviance Residuals:
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC: 18.692
Maserati Bora
5.087486e-03
predict_reg
01
041
113
[1] 0.775
5. Implement SVM/ Decision tree techniques
m<-read.csv("C:/Users/pradeep/OneDrive/datasets/students_placement_data.csv")
head(m) # Check the first 6 rows.
## Roll.No Gender Section SSC.Percentage inter_Diploma_percentage
## 1 1 M A 87.30 65.3
## 2 2 F B 89.00 92.4
## 3 3 F A 67.00 68.0
## 4 4 M A 71.00 70.4
## 5 5 M A 67.00 65.5
## 6 6 M A 81.26 68.0
## B.Tech_percentage Backlogs registered_for_.Placement_Training
## 1 40.00 18 NO
## 2 71.45 0 yes
## 3 45.26 13 yes
## 4 36.47 17 yes
## 5 42.52 17 yes
## 6 62.20 6 yes
## placement.status
## 1 Not placed
## 2 Placed
## 3 Not placed
## 4 Not placed
## 5 Not placed
## 6 Not placed
str(m) # Check the structure of the dataset
## 'data.frame': 117 obs. of 9 variables:
## $ Roll.No : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 1 1 2 2 2 2 1 2 2 ...
## $ Section : Factor w/ 2 levels "A","B": 1 2 1 1 1 1 1 1 1 1 ...
## $ SSC.Percentage : num 87.3 89 67 71 67 ...
## $ inter_Diploma_percentage : num 65.3 92.4 68 70.4 65.5 68 56.5 79.3 89.6 75.5 ...
## $ B.Tech_percentage : num 40 71.5 45.3 36.5 42.5 ...
## $ Backlogs : int 18 0 13 17 17 6 20 3 10 8 ...
## $ registered_for_.Placement_Training: Factor w/ 2 levels "NO","yes": 1 2 2 2 2 2 2 1 2 2 ...
## $ placement.status : Factor w/ 2 levels "Not placed","Placed": 1 2 1 1 1 1 1 1 1 1 ...
3) Step 3: Divide the data (117 observations) into training data and test data.
# We use sample function to partition the data. Here 85 percent is training data and 15 percent is test
data. Note that since "replace = TRUE", we may have a row sampled more than once.
data_index=sample(1:n, size = round(0.85*n),replace = TRUE)
train_data=m[data_index,]
test_data=m[-data_index,]
str(train_data)
## 'data.frame': 99 obs. of 9 variables:
## $ Roll.No : int 44 6 84 77 30 36 69 40 73 64 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 1 2 ...
## $ Section : Factor w/ 2 levels "A","B": 2 1 1 1 2 2 1 2 2 1 ...
## $ SSC.Percentage : num 86 81.3 89 78 72 ...
## $ inter_Diploma_percentage : num 92.5 68 88.9 59 88.1 90 61 88.8 83.7 69.2 ...
## $ B.Tech_percentage : num 70.8 62.2 63 51.1 69.6 ...
## $ Backlogs : int 0 6 1 17 0 0 6 0 0 20 ...
## $ registered_for_.Placement_Training: Factor w/ 2 levels "NO","yes": 2 2 1 1 2 2 1 2 2 2 ...
## $ placement.status : Factor w/ 2 levels "Not placed","Placed": 1 1 1 1 2 2 1 1 1 1 ...
str(test_data)
## 'data.frame': 49 obs. of 9 variables:
## $ Roll.No : int 1 4 7 8 11 12 14 15 17 18 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 1 1 2 1 2 1 1 ...
## $ Section : Factor w/ 2 levels "A","B": 1 1 1 1 2 1 2 1 2 2 ...
## $ SSC.Percentage : num 87.3 71 71 84.8 82.3 ...
## $ inter_Diploma_percentage : num 65.3 70.4 56.5 79.3 76.3 66 88.7 52.2 85 95.1 ...
## $ B.Tech_percentage : num 40 36.5 33.8 61 71.5 ...
## $ Backlogs : int 18 17 20 3 0 16 0 7 0 0 ...
## $ registered_for_.Placement_Training: Factor w/ 2 levels "NO","yes": 1 2 2 1 1 2 2 2 2 2 ...
## $ placement.status : Factor w/ 2 levels "Not placed","Placed": 1 1 1 1 2 1 2 1 1 2 ...
stu_model<-rpart(formula =placement.status~
Backlogs+Gender+B.Tech_percentage+SSC.Percentage+inter_Diploma_percentage,
data=train_data,method = "class",parms = list(split="gini"))
type=5 means we want to show the split variable name in the interior nodes.
extra=2 means we want to display the classification rate at the node, expressed as the number
of correct classifications and the number of observations in the node.
rpart.plot(stu_model,type=5,extra = 2 )
7) Apply the model stu_model on our test data using predict function.
p<-predict(stu_model,test_data,type="class")
print(p)
## 1 4 7 8 11 12
## Not placed Not placed Not placed Not placed Not placed Not placed
## 14 15 17 18 19 21
## Placed Not placed Placed Placed Not placed Placed
## 23 26 31 32 34 35
## Not placed Not placed Placed Not placed Not placed Not placed
## 37 41 42 43 45 55
## Not placed Placed Not placed Not placed Not placed Not placed
## 56 57 58 59 60 63
## Placed Not placed Not placed Placed Placed Not placed
## 65 66 68 71 74 75
## Placed Placed Placed Placed Not placed Not placed
## 76 85 87 88 89 93
## Not placed Not placed Not placed Not placed Not placed Not placed
## 100 101 102 105 106 114
## Not placed Not placed Not placed Not placed Not placed Not placed
## 116
## Placed
## Levels: Not placed Placed
t<-table(test_data[,9],p)
print(t)
## p
## Not placed Placed
## Not placed 29 2
## Placed 6 12
Note: In the diagonal element in the matrix t, there are correct predictions
print(sum(diag(t))/sum(t))
## [1] 0.8367347
library("cluster")
m<-read.csv("C:/Users/pradeep/OneDrive/datasets/hclustdata.csv")
head(m)
## Name Gender SSC.Perc.entage inter.Diploma.perc
## 1 ARIGELA AVINASH M 87.30 65.3
## 2 BALADARI KEERTHANA F 89.00 92.4
## 3 BAVIRISETTI PRAVALIKA F 67.00 68.0
## 4 BODDU SAI BABA M 71.00 70.4
## 5 BONDAPALLISRINIVAS M 67.00 65.5
## 6 CH KANAKARAJU M 81.26 68.0
## B.Tech.perc Back.logs
## 1 40.00 18
## 2 71.45 0
## 3 45.26 13
## 4 36.47 17
## 5 42.52 17
## 6 62.20 6
Step 3a: Apply Agglomerative hierarchal clustering with group single link (min technique)
Step 3b: Apply Agglomerative hierarchal clustering with group complete link (complete technique)
Step 3b: Apply Agglomerative hierarchal clustering with group group average
clust3<-agnes(x = m,stand = TRUE,metric = "euclidean",method = "average")
pltree(clust3)
#Installing ggplot2
#ggplot2-ggplot2 is a R package dedicated to data visualization. It can greatly improve the quality and
aesthetics of your graphics, and will make you much more efficient in creating them.
install.packages("ggplot2")
# load ggplot2
library(ggplot2)
library(hrbrthemes)
Output:
7b) Box plot
# Load ggplot2
library(ggplot2)
output:
7c) Bar Plot
# Load ggplot2
library(ggplot2)
# Create data
data <- data.frame(
name=c("A","B","C","D","E") ,
value=c(3,12,5,18,45)
)
# Barplot
ggplot(data, aes(x=name, y=value)) +
geom_bar(stat = "identity")
Output:
7d) Histogram:
# library
library(ggplot2)
# dataset:
data=data.frame(value=rnorm(100))
# basic histogram
p <- ggplot(data, aes(x=value)) +
geom_histogram()
#p
Output: