DV - Unit 2
DV - Unit 2
visualization
Data visualization working with R graphics
Pie –chart:
In R the pie chart is created using the pie() function which takes positive
numbers as a vector input. The additional parameters are used to control
labels, color, title etc.
Syntax
pie(x, labels, radius, main, col, clockwise)
Parameter description:
pie(x,labels,main="Departments")
#colr function:
colour<-c("pink","blue","yellow","white")
# legend function:
legend("bottomright",c("product","sales","advertise","mark"),cex=0.7,
fill=colour)
Bar chart:
• R uses the function barplot() to create bar charts. R can draw both
vertical and Horizontal bars in the bar chart. In bar chart each of the bars
can be given different colors.
Syntax
barplot(H,xlab,ylab,main, names.arg,col)
Then we use the barplot() function to create a bar chart of the values
names.arg defines the names of each observation in the x-axis
a<-c(15,30,45,60) # y axis(name.arg)#Names to each bar
b<-c("A","B","C","D") # x axis
barplot(a,names.arg=b)
a<-c(15,30,45,60) # y axis(name.arg)#Names to each bar
b<-c("A","B","C","D") # x axis
barplot(a,names.arg=b)
• A histogram is a type of bar chart which shows the frequency of the number
of values which are compared with a set of values ranges. The histogram is
used for the distribution.
To specify the range of values allowed in X axis and Y axis, we can use the
xlim and ylim parameters.
The width of each of the bar can be decided by using breaks.
hist(a,xlab="value",ylab="point",main="histogram",col="orange"
,border="green",xlim=c(0,40),ylim=c(0,5),breaks=3)
Example of Histogram:
A histogram is what we call an area diagram. It indicates the frequency of
a class interval. The class interval or the range of values is known as bins
or classes with reference to histograms. A bar indicates the number of
data points within a specific class. That means the higher the frequency of
a particular class, higher the bar.
Example of a Histogram.
From the below-given table of the various heights of trees in a region,
we will draw a histogram to illustrate how it is done. Let us look at the
frequency table now.
Height of Trees (ft) No. of trees
60-65 3
65-70 3
70-75 8
75-80 10
80-85 5
85-90 2
here the heights of the tree are continuous data. These class intervals are
the bins. And the number of trees are obviously the frequency.
Histogram:
Histograms vs Bar Charts
In bar graphs, each bar represents one value or category. On the other
hand in a histogram, each bar will represent a continuous data
In a bar graph, the x-axis need not always be a numerical value. It can also
be a category. However, in a histogram, the X-axis is always quantitative
data and it is continuous data.
plot(v,type,col,xlab,ylab)
Description of the parameters
x<-c(12,14,16,18,15,29,23,34)
plot(x,type="p")
plot(x,type="l")
plot(x,type="o")
plot(x,type="b")
plot(x,type="c")
plot(x,type="h")
plot(x,type="s")
plot(x,type="S")
plot(x,type="n")
#plotting the chart:
plot(x,type="o",xlab="points","ylab"="value",
col="pink",border="green",main="Line chart")
Scatter plot
A "scatter plot" is a type of plot used to display the relationship between two
numerical variables, and plots one dot for each observation.
Each point represents the values of two variables. One variable is chosen
in the horizontal axis and another in the vertical axis.
Syntax
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Description of the parameters :
file<-data [,c("Age","Usage")]
head(file)
plot(x=file$Age,y=file$Usage,main="Scatter plot",xlab="Age",
ylab="Usage",colr="pink")
(or)
# creating dataset for scatterplot:
data <-data.frame(weight=c(3,5,4,2,2,5),
milegae=c(15,30,45,60,75,80))
data
Output:
weight milegae
1 3 15
2 5 30
3 4 45
4 2 60
5 2 75
6 5 80
#plotting the dataset:
plot(data,xlab="mileage",ylab="weight",main="scatterplot" ,col="red",)
The different points
symbols commonly used in R
pch = 0,square
pch = 1,circle
pch = 2,triangle point up
pch = 3,plus
pch = 4,cross
pch = 5,diamond
pch = 6,triangle point down
pch = 7,square cross
pch = 8,star
pch = 9,diamond plus
pch = 10,circle plus
pch = 11,triangles up and down
pch = 12,square plus
pch = 13,circle cross
pch = 14,square and triangle down
pch = 15, filled square
pch = 16, filled circle
pch = 17, filled triangle point-up
pch = 18, filled diamond
pch = 19, solid circle
pch = 20,bullet (smaller circle)
pch = 21, filled circle blue
pch = 22, filled square blue
pch = 23, filled diamond blue
pch = 24, filled triangle point-up blue
pch = 25, filled triangle point down blue
#pch=2
plot(data,xlab="mileage",ylab="weight",main="scatterplot"
,col="red",pch=2)
#plot(data,xlab="mileage",ylab="weight",main="scatterplot"
,col="red",pch=18)
#limits apply x and y axis:
plot(data,xlab="mileage",ylab="weight",main="scatterplot"
,col="red",xlim=c(3,5),ylim=c(30,60))
Scatterplot Matrices
When we have more than two variables and we want to find the
correlation between one variable versus the remaining ones we use
scatterplot matrix.
We use pairs() function to create matrices of scatterplots.
Syntax
pairs(formula, data)
Output:
weight milegae cyl km
1 3 15 12 20
2 5 30 14 30
3 4 45 8 45
4 2 60 23 35
5 2 75 45 40
6 5 80 60 48
>
# pair of variables in scatter plot
pairs(~weight+mileage+cyl+km, data=input)
#making line graph using data set :
plot(input$cyl,input$km,type="l",xlab="cycle",ylab="kilometer",
main="Graph", col="blue")
# making bar chart using data set
x=input$cyl
y=input$km
barplot(x, names.arg=y,xlab="first“, ylab="second",
col="red",border="green",main="barplot")
ggplot2 package:
Install.packages(“<package-name>”)
Install.packages(“ggplot2”)
library(“ggplot2”)
#ggplot2 Histogram:
qplot(input$mileage,geom="bar",xlab="vehicle",ylab="distance",
fill="red",main="ggplot graph")
The Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations,
visualizations and narrative text.
Uses include data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much more.
Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in
arrays.
Matplotlib
Each pyplot function makes some change to a figure: e.g., creates a figure,
creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc.
Creating plot using matplotlib:
import matplotlib.pyplot as plt
x=[12,15,18,20,23]
y=[21,24,26,28,30]
plt.plot(x,y)
plt.show
Adding Title
The title() method in matplotlib module is used to specify the title of the
visualization
x=[12,15,18,20,23]
y=[21,24,26,28,30]
plt.plot(x,y)
plt.title("graph")
plt.show
Adding fontsize ,color,labels:
#plt.title("graph", fontsize=50,color="red")
Plt.xlabel(“x axis”)
Plt.ylabel(“y axis”)
# setting label name in x-axis
#legend
plt.ylim(24,28)
plt.xticks(x,labels=["a","b","c","d","e"])
plt.legend(["ABC"])
#grid()
Plt.grid(axis=‘x’)
Plt.grid(axis=‘y’)
Creating a bar plot
The matplotlib API in Python provides the bar() function which can be
used in MATLAB style use or as an object-oriented API. The syntax of the
bar() function to be used with the axes is as follows:-
plt.xlabel("Subject")
plt.ylabel("Duration")
plt.title("bar chart")
plt.show
Histogram:
To create a histogram the first step is to create bin of the ranges, then
distribute the whole range of the values into a series of intervals, and count
the values which fall into each of the intervals.