Python Libraries
Python Libraries
Python Libraries
PANDAS
PYTHON
MATPLOTLIB
SEABORN
1
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
• NumPy
• SciPy All these libraries are
• Pandas installed on the SCC
• SciKit-Learn
Visualization libraries
• matplotlib
• Seaborn
The core functionality of NumPy is its "ND array", for n-dimensional array,
data structure. These arrays are stride views on memory.
Link: http://www.numpy.org/
3
• Here is some function that are defined in this NumPy Library.
• 1. zeros (shape [, dtype, order]) - Return a new array of given shape and type,
filled with zeros.
4
SciPy:
SciPy is a collection of algorithms for linear algebra, differential equations,
numerical integration, optimization, statistics and more
built on NumPy
Link: https://www.scipy.org/scipylib/
5
Features Of SciPy:-
The main feature of SciPy library is that it is developed using NumPy, and its
array makes the most use of NumPy.
SciPy is a library that uses NumPy for the purpose of solving mathematical
functions. SciPy uses NumPy arrays as the basic data structure, and comes
with modules for various commonly used tasks in scientific programming.
Link: http://pandas.pydata.org/
7
Key Features of Pandas
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data
8
SciKit-Learn:
provides machine learning algorithms: classification, regression, clustering,
model validation etc.
Link: http://scikit-learn.org/
9
It features various classification, regression and clustering algorithms
including support vector machines, random forests, gradient boosting, k-
means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy and SciPy.
Advantages of using Scikit-Learn:
Scikit-learn provides a clean and consistent interface to tons of different
models.
It provides you with many options for each model, but also chooses sensible
defaults.
Its documentation is exceptional, and it helps you to understand the models
as well as how to use them properly.
It is also actively being developed
10
matplotlib:
python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats
Link: https://matplotlib.org/
11
12
Seaborn:
based on matplotlib
Link: https://seaborn.pydata.org/
13
The main aim of Seaborn is to make visualization a vital part of exploring and
understanding data. Its dataset-oriented plotting functions operate on arrays
and data-frames containing whole datasets. The library is ideal for examining
relationships among multiple variables.
Highlights:
Automatic estimation as well as the plotting of linear regression models
Comfortable views of the overall structure of complex datasets
Eases building complex visualizations using high-level abstractions for
structuring multi-plot grids
Options for visualizing bivariate or univariate distributions
Specialized support for using categorical variables
14
Loading Python Libraries
In [ ]: #Import Python Libraries
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns
15
Reading data using pandas
In [ ]: #Read csv file
df = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv")
Note: The above command has many optional arguments to fine-tune the data import process.
pd.read_excel('myfile.xlsx',sheet_name='Sheet1', index_col=None,
na_values=['NA'])
pd.read_stata('myfile.dta')
pd.read_sas('myfile.sas7bdat')
pd.read_hdf('myfile.h5','df')
16
Exploring data frames
In [3]: #List first 5 records
df.head()
Out[3]:
17