0% found this document useful (0 votes)
2 views33 pages

Pds-clg

The document outlines the curriculum for a Python for Data Science course at Gujarat Technological University, detailing various Python data types such as Numbers, Boolean, Strings, Lists, Tuples, Sets, and Dictionaries. It also covers the range() function, different coding styles in Python (Functional, Imperative, Object-oriented, Procedural), and the role of Python in data science. Additionally, it includes examples and explanations of string operations, slicing, and the significance of Python's flexibility in data science applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views33 pages

Pds-clg

The document outlines the curriculum for a Python for Data Science course at Gujarat Technological University, detailing various Python data types such as Numbers, Boolean, Strings, Lists, Tuples, Sets, and Dictionaries. It also covers the range() function, different coding styles in Python (Functional, Imperative, Object-oriented, Procedural), and the role of Python in data science. Additionally, it includes examples and explanations of string operations, slicing, and the significance of Python's flexibility in data science applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

GUJARAT TECHNOLOGICAL UNIVERSITY

Affiliated
S. N. PATEL INSTITUTE OF TECHNOLOGY & RESEARCH
CENTRE, UMRAKH

Subject Name : PYTHON FOR DATA SCIENCE


Subject Code : 3150713
Branch : Computer Science & Engineering
Semester : 5th
Subject Teacher : Prof. Viralkumar M. Prajapati, Prof. Gaurav V. Patel

CO 1 – QUESTIONS & ANSWERS

Explain the Data Types available in Python.


ANS:
A variable can hold different types of values. For example, a person's name must be
stored as a string whereas its id must be stored as an integer. Python provides various
standard data types that define the storage method on each of them.
The data types defined in Python are given below:
 Numbers
o Integer
o Complex Number
o Float
 Boolean
 Sequence Type
Q.1. o String
o List
o Tuple
o Set
o Dictionary

Numbers
 Number stores numeric values.
 The integer, float, and complex values belong to a Python Numbers data-type.
 Python provides the type() function to know the data-type of the variable.
 Python creates Number objects when a number is assigned to a variable.
 Python supports three types of numeric data.
o Int - Integer value can be any length such as integers 10, 2, 29, -20, -150
etc. Python has no restriction on the length of an integer. Its value belongs
to int
o Float - Float is used to store floating-point numbers like 1.9, 9.902, 15.2,
etc. It is accurate up to 15 decimal points.
o Complex - A complex number contains an ordered pair, i.e., xiy where x
and y denote the real and imaginary parts, respectively. The complex
numbers like 2.14j, 2.02.3j, etc. (Note: imaginary part in python will be
denoted with j suffix)
Example:

Boolean
 Boolean type provides two built-in values, True and False, these values are used
to determine within the given statement is true or false.
 It is denoted by bool.
True can be represented by any non-zero value or 'True' whereas false can be represented
by the 0 or 'False'.
Is String a mutable data type? Also explain the string operations length, indexing and slicing in
detail with an appropriate example.
ANS:
No. String in Python is immutable.
 The strings is Ordered Sequence of character such as "darshan", 'college', "282"
Q.2. etc...
 Strings are arrays of bytes representing Unicode characters.
 A string can be represented as single, double, or triple quotes.
 String with triple Quotes allows multiple lines.
 Square brackets can be used to access elements of the string, Ex. "Darshan”[1] =
a, characters can also be accessed with a reverse index like “Darshan”[-1] = n.
 String in python is immutable.
x = "S N P a t e l"
Index = 0 1 2 3 4 5 6 String
Negative index = 0 -6 –5 -4 -3 -2 -1

Length:
len() is not a string function but we can use this method to get the length of string.
x = "S N Patel"
print(len(x))
Output:
8

String Indexing & Slicing:


 We know string is a sequence of individual characters, and therefore individual
characters in a string can be extracted using the item access operator ([]).
 Access operator ([]) is much more versatile and can be used to extract not just one
item or character, but an entire slice (subsequence).
 We can get the substring in python using string slicing, we can specify start
index, end index, and steps (colon-separated) to slice the string.

X = "our string"
subx1 = x[StartIndex]
subx2 = x[StartIndex:EndIndex]
subx3 = x[StartIndex:Endindex:Steps]

 Syntax:

 The StartIndex, EndIndex, and steps values must all be integers, It extracts the
substring from StartIndex till EndIndex (not including Endindex) with steps
defining the increment of index, If we specify steps to be -1 it will extract the
reversed string.
 Example:
x = 'We are the students of SNPITRC, Umrakh.'

subx1 = x[0:6]
subx2 = x[23:30]
subx3 = x[32:]
subx4 = x[:19]
subx5 = x[::2]
subx6 = x[::-1]
print(subx1)
print(subx2)
print(subx3)
print(subx4)
print(subx5)
print(subx6)
Output
We are
SNPITRC
Umrakh.
We are the students
W r h tdnso NIR,Urk.
.hkarmU ,CRTIPNS fo stneduts eht era eW

Write an python code for slicing to fetch first name and last name from full name of person and
display it.
ANS:

name = input("Enter your full name: ")

fullname = name.split(" ")

Q.3. firstname, lastname = fullname


print("Your First Name is ", firstname)
print("Your Last Name is ", lastname)
Output
Enter your full name: Viral Prajapti
Your First Name is Viral
Your Last Name is Prajapti

Explain range() function with suitable examples.


Q.4.
ANS:
The Python range() function returns a sequence of numbers, in a given range. The most common
use of it is to iterate sequences on a sequence of numbers using Python loops. For example, we
are printing the number from 0 to 4.

for i in range(5):
print(i, end=" ")
print()
Output
0 1 2 3 4

Syntax of Python range() function: range(start, stop, step)


Parameter :
start: [ optional ] start value of the sequence
stop: next value after the end value of the sequence
step: [ optional ] integer value, denoting the difference between any two numbers in the sequence
Return : Returns an object that represents a sequence of numbers

In simple terms, range() allows the user to generate a series of numbers within a given range.
Depending on how many arguments the user is passing to the function, the user can decide where
that series of numbers will begin and end, as well as how big the difference will be between one
number and the next. Python range() function takes can be initialized in 3 ways.
 range (stop) takes one argument:
When the user call range() with one argument, the user will get a series of numbers that
starts at 0 and includes every whole number up to, but not including, the number that the
user has provided as the stop.

for i in range(6):
print(i, end=" ")
print()
Output
0 1 2 3 4 5

 range (start, stop) takes two arguments:


When the user call range() with two arguments, the user gets to decide not only where the
series of numbers stops but also where it starts, so the user doesn’t have to start at 0 all
the time. Users can use range() to generate a series of numbers from X to Y using
range(X, Y).
for i in range(5, 20):
print(i, end=" ")
Output
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

 range (start, stop, step) takes three arguments:


When the user call range() with three arguments, the user can choose not only where the
series of numbers will start and stop, but also how big the difference will be between one
number and the next. If the user doesn’t provide a step, then range() will automatically
behave as if the step is 1. In this example, we are printing even numbers between 0 and
10, so we choose our starting point from 0(start = 0) and stop the series at 10(stop = 10).
For printing an even number the difference between one number and the next must be 2
(step = 2) after providing a step we get the following output (0, 2, 4, 8).

for i in range(0, 10, 2):


print(i, end=" ")
print()

Output
0 2 4 6 8

Compare and summarize four different coding styles supported by Python language. List
Advantages of Python
ANS:
Python is a Programming language that is emerging in a broader way throughout the
world. Many programming languages like Java, C++ are object-oriented programming
languages that only support object-oriented coding. Unlike other languages, Python
provides flexibility for the users to opt for different coding styles.

Different coding styles can be chosen for different problems. There are four different
coding styles offered by python. They are:
Q.5.
 Functional
 Imperative
 Object-oriented
 Procedural

Functional coding:
 In the functional type of coding, every statement is treated as a Mathematical
equation and mutable (able to change) data can be avoided.
 Most of the programmers prefer this type of coding for recursion and lambda
calculus.
 The merit of this functional coding is it works well for parallel processing, as
there is no state to consider.
 This style of coding is mostly preferred by academics and Data scientists.
 Example:

my_list = [1, 5, 4, 6, 8, 11, 3, 12]


new_list = list(map(lambda x: x * 2 ,
my_list))
print(new_list)

Imperative coding:
 When there is a change in the program, computation occurs.
 Imperative style of coding is adopted, if we have to manipulate the data
structures.
 This style establishes the program in a simple manner.
 It is mostly used by Data scientists because of its easy demonstration.
 Example:

sum = 0
for x in my_list:
sum += x
print(sum)

Object-oriented coding:
 This type of coding relies on the data fields, which are treated as objects.
 These can be manipulated only through prescribed methods.
 Python doesn’t support this paradigm completely, due to some features like
encapsulation cannot be implemented.
 This type of coding also supports code reuse.
 Other programming languages generally employ this type of coding.
 Example:

class word:
def test(self):
print("Python")
string = word()
string.test()

Procedural coding:
 Usually, most of the people begin learning a language through procedural code,
where the tasks proceed a step at a time.
 It is the simplest form of coding.
 It is mostly used by non-programmers as it is a simple way to accomplish simpler
and experimental tasks.
 It is used for iteration, sequencing, selection, and modularization.
 Example:
def add(list):
sum = 0
for x in list:
sum += x
return sum
print(add(my_list))

Explain List, Tuple, Set and Dictionary in Python with example.


ANS:
List: A list is a collection of elements. The items in a list are separated by a comma. Items in the
list can be accessed by their index e.g. a[0] = 1. The list is mutable. So, items can be added,
updated, and deleted from the list. As the list is mutable, hence accessing or iterating items from
the list has more time complexity due to its dynamic nature over a tuple. The list can store any
type of data element e.g. int, str, etc.

a = [1,2,3,4,5,6]
type(a)

Output
<class 'list'>

Q.6. The list is useful over tuple when we need to perform add or delete on the data structure. Because
of this dynamic nature choosing a list over a tuple will increase the runtime of the program while
accessing or iterating items from the list.

Tuple: A tuple is a collection of data elements like a list. The items in a tuple are separated by a
comma. However, the major difference is, a tuple is immutable. Items in the tuple can be
accessed by its index e.g. a[0] =1. Items can’t be added or deleted once the tuple is defined. Also,
items can’t be updated in a tuple. The tuple can store any type of data element e.g. int, str, etc.

a = (1,2,3,4,5,6)
type(a)
len(a)
Output
<class 'tuple'>
6
The tuple is preferred over a list when we need to deal with a fixed data type e.g. [‘MON’,
‘TUE’, ‘WED’, ‘THU’, ‘FRI’, ‘SAT’, ‘SUN’]. Because of the immutable nature, a tuple is
efficient in terms of the runtime of the program while accessing or iterating items from it.

Set: Set is an unordered collection of data elements. Items in a set are separated by a comma. set
is an unordered collection of data elements. So, items in the set can’t be accessed by its index. set
does not allow duplicates. So adding an existing element to the set will not make any change.
However, items in a set can be deleted/removed.

a = {1,2,3,4,5,6}
type(a)
len(a)
Output
<class 'set'>
6

Set is really useful when a programmer needs to check whether an item exists in the data
structure. This leads to an efficient time complexity over tuple or list.

Dictionary (dict): Dictionary is a data type that stores data values as key: values pair. Dictionary
is written in curly brackets with comma-separated key: value pair. dict is an ordered collection of
elements that allow the addition, update, and deletion of items. dict allows retrieving values using
keys. keys and values can be accessed by the following. e.g. a.keys() gives all the keys, a.values()
return all the values. A key is unique to a dictionary meaning key can’t be duplicated in the same
dictionary. Also, a value can be retrieved by providing a corresponding key. A key can be any
type of data element e.g. int, str, etc. However, values for a key can be int, str, list(), set(), tuple.

a = {'a':1,'b':2,'c':3}
type(a)
len(a)
Output
<class 'dict'>
3
CO 2 – QUESTIONS & ANSWERS

Q.7. What is the role of Python in Data science? Explain sampling in terms of data science?
Ans:
Python is the most flexible and capable because it supports so many third‐party libraries
devoted to the task
•The following points you better understand why Python is such a good choice for many data
science needs.
1. Considering the shifting profile of data scientists
2. Working with a multipurpose, simple, and efficient language
Python is the vision of a single person, Guido van Rossum, Guido started the language in
December 1989 as a replacement for the ABC language
However, it far exceeds the ability to create applications of all types, and in contrast to
ABC, boasts four programming styles (programming paradigms)
Functional
 Treats every statements as a mathematical equation and avoids any form of state or
mutable data
 The main advantage of this approach is having no side effects to consider
 This coding style lends itself better than the others to parallel processing because there is
no state to consider
 Many developers prefer this coding style for recursion and for lambda calculus
Imperative
 Performs computations as a direct change to program state
 This style is especially useful when manipulating data structures and produces elegant
but simple code
-oriented
 Relies on data fields that are treated as objects and manipulated only through prescribed
methods
 Python doesn’t fully support this coding form because it can’t implement features such as
data hiding
 This is useful coding style for complex applications because it supports encapsulation
and polymorphism
Procedural
 Treats tasks as step by step iterations where common tasks are placed in functions that
are called as needed

Python has a unique attribute and is easy to use when it comes to quantitative and
analytical computing
Data Science Python is widely used and is a favorite tool along being a flexible and open
sourced language.
Its massive libraries are used for data manipulation and are very easy to learn even for a
beginner data analyst.
Apart from being an independent platform it also easily integrates with any existing
infrastructure which can be used to solve the most complex problems.
Python is preferred over other data science tools because of following features,
⮩ Powerful and Easy to use
⮩ Open Source
⮩ Choice of Libraries
⮩ Flexibility
⮩ Visualization and Graphics
⮩ Well supported

Q.8. Discuss why python is a first choice for data scientists?


Ans:
Python is the vision of a single person, Guido van Rossum, Guido started the language in
December 1989 as a replacement for the ABC language.
However, it far exceeds the ability to create applications of all types, and in contrast to ABC,
boasts four programming styles (programming paradigms)
⮩ Functional :
▪ Treats every statements as a mathematical equation and avoids any form of state or mutable
data
▪ The main advantage of this approach is having no side effects to consider.
▪ This coding style lends itself better than the others to parallel processing because there is no
state to consider.
▪ Many developers prefer this coding style for recursion and for lambda calculus.
⮩ Imperative :
▪ Performs computations as a direct change to program state.
▪ This style is especially useful when manipulating data structures and produces elegant but
simple code.
⮩ Object-oriented :
▪ Relies on data fields that are treated as objects and manipulated only through prescribed
methods.
▪ Python doesn’t fully support this coding form because it can’t implement features such as
data hiding.
▪ This is useful coding style for complex applications because it supports encapsulation and
polymorphism.
⮩ Procedural :
▪ Treats tasks as step-by-step iterations where common tasks are placed in functions that are
called as needed.
Discuss the role of indentation in python.
Ans:
Python Indentation
Indentation refers to the spaces at the beginning of a code line.

Where in other programming languages the indentation in code is for readability only, the
Q.9. indentation in Python is very important.

Python uses indentation to indicate a block of code.


if 5 > 2:
print("Five is greater than two!")

Python will give you an error if you skip the indentation:


The number of spaces is up to you as a programmer, but it has to be at least one.

Python provides no braces to indicate blocks of code for class and function definitions or flow
control. Blocks of code are denoted by line indentation, which is rigidly enforced.
The number of spaces in the indentation is variable, but all statements within the block must be
indented the same amount.

Which are the basic activities we performed as a part of data science pipeline? Summarize and
explain in brief.
Ans:
Data science pipeline requires the data scientist to follow particular steps in the preparation,
analysis and presentation of the data.
General steps in the pipeline are
⮩ Preparing the data
▪ The data we access from various sources may not come directly in the structured format.
▪ We need to transform the data in the structured format.
▪ Transformation may require changing data types, order in which data appears and even the
creation of missing data
Q.10.
⮩ Performing data analysis
▪ Results of the data analysis should be provable and consistent.
▪ Some time single approach may not provide the desired output, we need to use multiple
algorithms to get the result.
▪ The use of trial and error is part of the data science art.
⮩ Learning from data
▪ As we iterate through various statistical analysis methods and apply algorithms to detect
patterns, we begin learning from the data.
▪ The data might not tell the story that you originally thought it would.
⮩ Visualizing
⮩ Obtaining insights
Summarize the characteristics of NumPy, Pandas, Scikit-Learn and matplotlib libraries along
with their usage in brief.
Ans:
1) NumPy
NumPy is used to perform fundamental scientific computing.
NumPy library provides the means for performing n-dimensional array manipulation,
which is critical for data science work.
NumPy provides functions that include support for linear algebra, Fourier transformation,
Q.11. random-number generation and many more..
2) pandas
pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language.
it offers data structures and operations for manipulating numerical tables and time series.
The library is optimized to perform data science tasks especially fast and efficiently.
The basic principle behind pandas is to provide data analysis and modelling support for
Python that is similar to other languages such as R.
3) matplotlib
The matplotlib library gives a MATLAB like interface for creating data presentations of
the analysis.
The library is initially limited to 2-D output, but it still provide means to express analysis
graphically.
Without this library we can not create output that people outside the data science
community could easily understand.
4) Scikit-learn
The Scikit-learn library is one of many Scikit libraries that build on the capabilities provided
by NumPy and SciPy to allow Python developers to perform domain specific tasks.
Scikit-learn library focuses on data mining and data analysis, it provides access to following
sort of functionality:
⮩ Classification
⮩ Regression
⮩ Clustering
⮩ Dimensionality reduction
⮩ Model selection
⮩ Pre-processing
Scikit-learn is the most important library we are going to learn in this subject
What is HTML parsing?
Ans:
BeautifulSoup is a Python library for parsing HTML and XML documents. It provides a
convenient way to extract and navigate data from HTML documents, making it a popular choice
among developers for web scraping and data extraction tasks.

One reason for its popularity is its ease of use. BeautifulSoup provides a simple and intuitive API
that makes it easy to extract data from HTML documents. It also supports a wide range of parsing
strategies and can handle malformed HTML documents with ease.

In the following example, we show you how to use BeautifulSoup to extract every quote from the
one website.
Program:
Q.12. import requests
from bs4 import BeautifulSoup

url = 'https://quotes.toscrape.com/'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

quotes = soup.find_all('div', {'class': 'quote'})


for quote in quotes:
text = quote.find('span', {'class': 'text'}).text
author = quote.find('small', {'class': 'author'}).text
print(text)
print(author)
Here we start by passing the response content to BeautifulSoup's constructor along with the
parsing strategy 'html.parser'.

Then we use the find_all() method to extract all div elements with a class attribute of 'quote'. For
each quote, we extract the quote text and author name using the find() method and print them to
the console.

Explain Web Scrapping with Example using Beautiful Soup library.


Ans:
steps involved in web scraping using the implementation of a Web Scraping framework of
Python called Beautiful Soup. Steps involved in web scraping:

1. Send an HTTP request to the URL of the webpage you want to access. The server
responds to the request by returning the HTML content of the webpage. For this task, we
will use a third-party HTTP library for python-requests.
2. Once we have accessed the HTML content, we are left with the task of parsing the data.
Since most of the HTML data is nested, we cannot extract data simply through string
processing. One needs a parser which can create a nested/tree structure of the HTML
data. There are many HTML parser libraries available but the most advanced one is
html5lib.
3. Now, all we need to do is navigating and searching the parse tree that we created, i.e. tree
traversal. For this task, we will be using another third-party python library, Beautiful
Soup. It is a Python library for pulling data out of HTML and XML files.

Program:
#Python program to scrape website
#and save quotes from website
Q.13.
import requests
from bs4 import BeautifulSoup
import csv

URL = "http://www.values.com/inspirational-quotes"
r = requests.get(URL)

soup = BeautifulSoup(r.content, 'html5lib')

quotes=[] # a list to store quotes

table = soup.find('div', attrs = {'id':'all_quotes'})

for row in table.findAll('div',


attrs = {'class':'col-6 col-lg-3 text-center margin-30px-bottom sm-margin-30px-
top'}):
quote = {}
quote['theme'] = row.h5.text
quote['url'] = row.a['href']
quote['img'] = row.img['src']
quote['lines'] = row.img['alt'].split(" #")[0]
quote['author'] = row.img['alt'].split(" #")[1]
quotes.append(quote)

filename = 'inspirational_quotes.csv'
with open(filename, 'w', newline='') as f:
w = csv.DictWriter(f,['theme','url','img','lines','author'])
w.writeheader()
for quote in quotes:
w.writerow(quote)

CO 3 – QUESTIONS & ANSWERS

Q.14. Explain Slicing rows and columns with example.

Q.15. What do you mean by missing values? Explain the different ways to handle the missing value
with example. Explain how to deal with missing data in Pandas.
Q.16. What is Categorical Variables? Explain it with example.
Q.17. Write a python program to read data from CSV files using pandas.
Q.18. Write a python program to read data from text files in form of table.
Q.19. Explain DataFrame in Pandas with example.
Q.20. Define term n-gram. Explain the TF-IDF techniques.
Q.21. Compare the numpy and pandas on the basis of their characteristics and usage.
Q.22. Differentiate rand and randn function in Numpy.
Q.23. What kind data is analyzed with Bag of word model? Explain it with example.
Q.24. Write a brief note on NetworkX library.
Q.25. Describe date time transformation using datetime module.

CO 4 – QUESTIONS & ANSWERS

Q.26. List the features of matplotlib.


Ans:
Most people visualize information better when they see it in graphic versus textual format.
Graphics help people see relationships and make comparisons with greater ease.
Fortunately, python makes the task of converting textual data into graphics relatively easy using
libraries, one of most commonly used library for this is MatPlotLib.
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python.
A Graph or chart is simply a visual representation of numeric data.
MatPlotLib makes a large number of graph and chart types.
We can choose any of the common graph such as line charts, histogram, scatter plots etc....
if we want to show how various data elements contribute towards a whole, we should use pie
chart.
If we want to compare data elements, we should use bar chart.
If we want to show distribution of elements, we should use histograms.

Q.27. List various types of graph/chart available in the pyplot of matplotlib library for data
visualization.
Ans:
The kind of graph we choose determines how people view the associated data, so choosing
the right graph from the outset is important.
For example,

⮩ if we want o show how various data elements contribute towards a whole, we should use
pie chart.

⮩ If we want to compare data elements, we should use bar chart.

⮩ If we want to show distribution of elements, we should use histograms.

⮩ If we want to depict groups in elements, we should use boxplots.

⮩ If we want to find patterns in data, we should use scatterplots.

⮩ If we want to display trends over time, we should use line chart.

⮩ If we want to display geographical data, we should use basemap.

⮩ If we want to display network, we should use networkx.


All the above graphs are there in our syllabus and we are going to cover all the graphs in this
Unit.
We are also going to cover some other types of libraries which is not in the syllabus like
seaborn, plotly, cufflinks and choropleth maps etc..

Explain pie chart plot with appropriate examples.


Q.28. Ans:
Pie chart focus on showing parts of a whole, the entire pie would be 100 percentage, the
question is how much of that percentage each value occupies.
pieChartDemo.py
import matplotlib.pyplot as plt
%matplotlib notebook
values = [305,201,805,35,436]
l = ['Food','Travel','Accomodation','Misc','Shoping']
c = ['b','g','r','c','m']
e = [0,0.2,0,0,0]
plt.pie(values,colors=c,labels=l,explode=e)
plt.show()
Output:

Explain scatterplots with example.


Ans:
A scatter plot is a type of plot that shows the data as a collection of points.
The position of a point depends on its two-dimensional value, where each value is a position
Q.29. on either the horizontal or vertical dimension.
It is really useful to study the relationship/pattern between variables.
Program:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
df = pd.read_csv('insurance.csv')
plt.scatter(df['bmi'], df['charges'])
plt.show()
Output:

To find specific pattern from the data, we can further divide the data and plot scatter plot.
We can do this with the help of groupby method of DataFrame, and then using tuple
unpacking while looping the group.
we can specify marker, color, and size of the marker with the help of marker, color and s
parameter respectively.

Explain time series plot with appropriate examples. (OR)


What do you mean by time series data? How can we plot it? Explain it with example to plot
trend over time.
Ans:
Q.30.
Observations over time can be considered as a Time Series.
Visualization plays an important role in time series analysis and forecasting.
Time Series plots can provide valuable diagnostics to identify temporal structures like
trends, cycles, and seasonality.
In order to create a Time Series we first need to get the date range, which can be created
with the help of datetime and pandas library.
Program:
import pandas as pd
import datetime as dt
start_date = dt.datetime(2020,8,28)
end_date = dt.datetime(2020,9,05)
daterange = pd.date_range(start_date,end_date)
print(daterange)
Output:
DatetimeIndex(['2020-08-28', '2020-08-29', '2020-08-30', '2020-08-31', '2020-09-01', '2020-
09-02', '2020-09-03', '2020-09-04', '2020-09-05'],
dtype='datetime64[ns]', freq='D')

We can use some more parameters for date_range() function like

⮩ freq, to specify the frequency at which we want the date range (default is ‘D’ for days)

⮩ periods, number of periods to generate in between start/end or from start with freq.
We can also create a date range with the help of startdate, periods and freq, for example
Some of important possible values for the freq are

⮩ D, for calendar day

⮩ W, for week

⮩ M, for month

⮩ Y, for year

⮩ H, for hour

⮩ T/min, for minute

⮩ S, for seconds

⮩ L, for milliseconds

Compare bar graph, box-plot and histogram with respect to their applicability in data
Q.31. visualization.
Ans:
Histogram Bar Graph
The histogram is a term that refers to a The bar graph is a graphical
graphical representation that shows data representation of data that uses
by way of bars to display the frequency of bars to compare different categories
numerical data. of data.
Distribution of non-discrete variables. Comparison of discrete variables.
Bars touch each other, so there are no Bars never touch each other, so
spaces between bars. there are spaces between bars.
In this type of graph, elements are
In this type of graph, elements are
grouped so that they are considered as
taken as individual entities.
ranges.
The bar chart is mostly of equal
Histogram width may vary.
width.
To compare different categories of
To display the frequency of occurrences.
data.
In Histogram, the data points are
In the Bar graph, each data point is
grouped and rendered based on its bin
rendered as a separate bar.
value.
The items of the Histogram are numbers, As opposed to the bar graph, items
which should be categorized to represent should be considered as individual
data range. entities.
Bar graph, it is common to
In Histogram, we cannot rearrange the
rearrange the blocks, from highest
blocks.
to lowest

What is the use of scatter-plot in data visualization? Can we draw trendline in scatter-plot?
Explain it with example.

Q.32. Ans:
To draw a scatter trend line using matplotlib, we can use polyfit() and poly1d() methods to get
the trend line points.
Steps

 Set the figure size and adjust the padding between and around the subplots.
 Create x and y data points using numpy.
 Create a figure and a set of subplots.
 Plot x and y data points using numpy.
 Find the trend line data points using polyfit() and poly1d() method.
 Plot x and p(x) data points using plot() method.
 To display the figure, use show() method.

Example
import numpy as np
from matplotlib import pyplot as plt

plt.rcParams["figure.figsize"] = [7.50, 3.50]


plt.rcParams["figure.autolayout"] = True

x = np.random.rand(100)
y = np.random.rand(100)
fig, ax = plt.subplots()
_ = ax.scatter(x, y, c=x, cmap='plasma')
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x, p(x), "r-o")
plt.show()

Output
Q.33. Explain Labels, Annotation and Legends in MatPlotLib.
Ans:
To fully document our graph, we have to resort the labels, annotation and legends.
Each of this elements has a different purpose as follows,

⮩ Label : provides identification of a particular data element or grouping, it will make easy for
viewer to know the name or kind of data illustrated.

⮩ Annotation : augments the information the viewer can immediately see about the data with
notes, sources or other useful information.

⮩ Legend : presents a listing of the data groups within the graph and often provides cues (
such as line type or color) to identify the line with the data.
Program:
import matplotlib.pyplot as plt
%matplotlib inline
values1 = [5,8,9,4,1,6,7,2,3,8]
values2 = [8,3,2,7,6,1,4,9,8,5]
plt.plot(range(1,11),values1)
plt.plot(range(1,11),values2)
plt.xlabel('Roll No')
plt.ylabel('CPI')
plt.annotate(xy=[5,1],s='Lowest CPI')
plt.legend(['CX','CY'],loc=4)
plt.show()
Output:

Q.34. Explain hist() function with code.


Ans:
Histograms categorize data by breaking it into bins, where each bin contains a subset of the
data range.
A Histogram then displays the number of items in each bin so that you can see the
distribution of data and the progression of data from bin to bin.
hist() function plots a histogram. It computes and draws the histogram of x(x is array or
sequence of arrays)
Program:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook
cpis = np.random.randint(0,10,100)
plt.hist(cpis,bins=10, histtype='stepfilled',align='mid',label='CPI Hist')
plt.legend()
plt.show()
Output:

Explain bar() function with code.


Ans:
Bar charts make comparing values easy, wide bars an d segregated measurements emphasize
the difference between values, rather that the flow of one value to another as a line graph.
bar() function to be used with axes object is as follows −
ax.bar(x, height, width, bottom, align)
The function makes a bar plot with the bound rectangle of size (x −width = 2; x + width=2;
bottom; bottom + height).
Program:
import matplotlib.pyplot as plt
Q.35. %matplotlib notebook
x = [1,2,3,4,5]
y = [5.9,6.2,3.2,8.9,9.7]
l = ['1st','2nd','3rd','4th','5th']
c = ['b','g','r','c','m']
w = [0.5,0.6,0.3,0.8,0.9]
plt.title('Sem wise spi')
plt.bar(x,y,color=c,label=l,width=w)
plt.show()
Output:
Write a simple python program that draws a line graph where x = [1,2,3,4] and y = [1,4,9,16] and
gives both axis label as “X- axis”and “Y-axis”.
Ans:
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3,4]
# corresponding y axis values
y = [1,4,9,16]

Q.36. # plotting the points


plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my graph
plt.title('My first graph!')
# function to show the plot
plt.show()
Explain Box plot with example.
Ans:
Boxplots provide a means of depicting groups of numbers through their quartiles.
Quartiles means three points dividing a group into four equal parts.
In boxplot, data will be divided in 4 part using the 3 points (25 th percentile, median, 75th
percentile)
Boxplot basically used to detect outliers in the data, lets see an example where we need
boxplot.
We have a dataset where we have time taken to check the paper, and we want to find the
faculty which either takes more time or very little time to check the paper.
We can specify other parameters like
widths, which specify the width of the box
notch, default is False
vert, set to 0 if you want to have horizontal graph
Program:
Q.37.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
timetaken = pd.Series([50,45,52,63,70,21,56,68,54,57,35,62,65,92,32])
plt.boxplot(timetaken)
Output:
Q.38. Write a Python programming to create a pie chart with a title of the popularity of programming
Languages.
Ans:
import matplotlib.pyplot as pyplot
# Manual data setup
labels = ('Python', 'Java', 'JavaScript', 'C#', 'PHP', 'C,C++', 'R')
sizes = [29.9, 19.1, 8.2, 7.3, 6.2, 5.9, 3.7]
# bar chart setup
pyplot.pie(sizes, labels=labels, autopct='%1.f%%', counterclock=False, startangle=105)
# layout configuration
pyplot.ylabel('Usage in %')
pyplot.xlabel('Programming Languages')
# Save the chart file
#pyplot.savefig('matplotlib_pie_chart01.png', dpi=300)
# Print the chart
pyplot.show()

Output:
CO 5 – QUESTIONS & ANSWERS

What is Scikit-learn. or List and explain interfaces of SciKit-learn.


Ans:
Understanding how classes work is an important prerequisite for being able to use the Scikit-
learn package appropriately.
Scikit-learn is the package for machine learning and data science experimentation favored by
most data scientists.
It contains wide range of well-established learning algorithms, error functions and testing
procedures.
Install : conda install scikit-learn
Q.39.
There are four class types covering all the basic machine-learning functionalities,

⮩ Classifying

⮩ Regressing

⮩ Grouping by clusters

⮩ Transforming data
Even though each base class has specific methods and attributes, the core functionalities for
data processing and machine learning are guaranteed by one or more series of methods and
attributes.

What is Data Wrangling process? Define data exploratory data analysis? Why EDA is required
in data analysis? Explain the steps needed to perform data wrangling.
Ans:
Exploratory Data Analysis (EDA) refers to the critical process of performing initial
investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to
check assumptions with the help of summary statistics and graphical representations.
EDA was developed at Bell Labs by John Tukey, a mathematician and statistician who
Q.40. wanted to promote more questions and actions on data based on the data itself.
In one of his famous writings, Tukey said:
“The only way humans can do BETTER than computers is to take a chance of doing WORSE
than them.”
Above statement explains why, as a data scientist, your role and tools aren’t limited to
automatic learning algorithms but also to manual and creative exploratory tasks.
Computers are unbeatable at optimizing, but humans are strong at discovery by taking
unexpected routes and trying unlikely but very effective solutions.
With EDA we,
Describe data
Closely explore data distributions
Understand the relationships between variables
Notice unusual or unexpected situations
Place the data into groups
Notice unexpected patterns within the group
Take note of group differences

What do you mean by covariance? What is the importance of covariance in data analysis?
Explain it with example.
Ans:
Covariance

⮩ Covariance is a measure used to determine how much two variables change in tandem.

Q.41. ⮩ The unit of covariance is a product of the units of the two variables.
⮩ Covariance is affected by a change in scale, The value of covariance lies between -∞ and +∞.

⮩ Pandas does have built-in function to find covariance in DataFrame named cov()
Eg,
print(df.cov())

List different way for defining descriptive statistics for Numeric Data. Explain them in brief.
Ans:
We can use pandas built-in function value_counts() to obtain the frequency of the categorical
variables.
Program:
Q.42.
print(df["Sex"].value_counts())
Output:
male 577
female 314
Name: Sex, dtype: int64
Program:
print(df.describe())
Output:

What is the use of hash function in EDA? Express various hashing trick along with example.
Ans:
Most Machine Learning algorithms uses numeric inputs, if our data contains text instead we
need to convert those text into numeric values first, this can be done using hashing tricks.
For Example our dataset is something like below table,

Q.43.

We can not apply the machine learning algorithms in text like Male/Female, we need numeric
values instead of this, One thing which we can do here is assigning the number to each word
and replace that number with the word.
If we assign 1 to male and 0 to female we can use ML algorithms, same data set in numeric
values are given above.
In machine learning, feature hashing, also known as the hashing trick, is a fast and space-
efficient way of vectorising features, i.e. turning arbitrary features into indices in a vector or
matrix.
When dealing with text, one of the most useful solutions provided by the Scikit‐learn package
is the hashing trick.

Differentiate Supervised and Unsupervised learning.


Ans:

Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained Unsupervised learning algorithms are


using labeled data. trained using unlabeled data.

Supervised learning model takes direct Unsupervised learning model does not
feedback to check if it is predicting correct take any feedback.
output or not.

The goal of supervised learning is to train The goal of unsupervised learning is to


Q.44. the model so that it can predict the output find the hidden patterns and useful
when it is given new data. insights from the unknown dataset.

Supervised learning can be categorized Unsupervised Learning can be classified


in Classification and Regression problems. in Clustering and Associations problems.

Supervised learning model produces an Unsupervised learning model may give


accurate result. less accurate result as compared to
supervised learning.

It includes various algorithms such as It includes various algorithms such as


Linear Regression, Logistic Regression, Clustering, KNN, and Apriori algorithm.
Support Vector Machine, Multi-class
Classification, Decision tree, Bayesian
Logic, etc.
Explain Regression with example. Or Define the regression problem. How can it be solved using
SciKit-learn?
Ans:
Program:
import pandas as pd
df = pd.read_csv('iphones.csv')
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
X = df['iphonenumber'].values.reshape(-1,1)
y = df['price']
lr.fit(X,y)
lr.predict([[14]]) # price for the next iphone
predicted = lr.predict(X)

Q.45. import matplotlib.pyplot as plt


%matplotlib inline
plt.scatter(df['iphonenumber'],df['price'],c='r')
plt.scatter(df['iphonenumber'],predicted,c='b')

input csv file:

Output:
Define covariance and correlation.
Ans:
Covariance

⮩ Covariance is a measure used to determine how much two variables change in tandem.

⮩ The unit of covariance is a product of the units of the two variables.

⮩ Covariance is affected by a change in scale, The value of covariance lies between -∞ and +∞.

⮩ Pandas does have built-in function to find covariance in DataFrame named cov()
Q.46. Correlation

⮩ The correlation between two variables is a normalized version of the covariance.

⮩ The value of correlation coefficient is always between -1 and 1.

⮩ Once we’ve normalized the metric to the -1 to 1 scale, we can make meaningful statements
and compare correlations.
Program Code:
print(df.cov())
print(df.corr())

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy