0% found this document useful (0 votes)
17 views

NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)

The document provides an overview of key Python libraries for data analytics and science, including NumPy, Pandas, Matplotlib, and Seaborn. It highlights the functionalities and differences between NumPy and Pandas, detailing their respective data structures, operations, and performance characteristics. Additionally, it covers data visualization techniques using Matplotlib and Seaborn, outlining various plot types and their applications.

Uploaded by

rgrewal112233
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)

The document provides an overview of key Python libraries for data analytics and science, including NumPy, Pandas, Matplotlib, and Seaborn. It highlights the functionalities and differences between NumPy and Pandas, detailing their respective data structures, operations, and performance characteristics. Additionally, it covers data visualization techniques using Matplotlib and Seaborn, outlining various plot types and their applications.

Uploaded by

rgrewal112233
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Python Data Analytics/Science:

NumPy, Pandas, MatplotLib,


Seaborn, ScikitLearn (SkLearn)
NumPy
• NumPy: Foundation of Python Data Analytics
• Library for creating N-dimensional arrays of various data types
• More efficient than lists: Homogeneous unlike lists (All elements are of the same
type) and are stored in contiguous memory locations
• Vectorization: Apply a function/operation simultaneously on entire arrays,
without needing a for loop, e.g. result = arr + 5
• Broadcasting: Align arrays of different shapes, e.g. result = arr_2d + arra_1d
• Linear algebra: Matrix, Eigenvalues/Eigenvectors, etc
• Statistical functions: Mean, Mode, Median, Standard deviation, Covariance, etc
• Random number generation: From uniform, normal, binomial distributions
• Missing data: Can handle missing data
• Compatibility: With Pandas, Matplotlib, SciPy, TensorFlow, etc
NumPy Arrays
• Main ways of creating a NumPy array:
• Transform Python list
• Use built-in functions
• Generate random data
• Indexing and Selection: Single element, Slicing, Broadcasting, 2D
indexing and selection, Conditional selection
• Operations on arrays: Arithmetic, Universal functions, Summary
statistics, 2D arrays
Pandas
Pandas
• Pandas: Library for Data Analysis
• Extremely powerful and flexible table (DataFrame) system built on top
of NumPy
• Computationally very efficient
• Features
• Read/Write data – Many formats supported
• Indexing, Applying logic, Sub-setting, etc
• Handle missing data
• Adjust and restructure data
NumPy Compared to Pandas
NumPy Pandas
• Aim: Numerical computation using • Aim: Data processing using series
n-dimensional arrays and dataframes
• Data types: Mainly Integer, Float • Data types: Numeric, Text, Date
• Performance: Very fast • Performance: Relative slower
• Indexing: Integer-based (e.g. array • Indexing: Additionally also supports
[0,1]) label-indexing (e.g. df[‘age’]
• Built-in operations: Numerical and • Built-in operations: Data analysis
linear-algebra related tools such as merging, sorting,
• Time-series data: No support joining, handling missing data, etc
• Time-series data: Excellent support
such as date-based indexing,
shifting, resampling, etc
Pandas: Main Topics
• Series and DataFrames
• Conditional filtering and useful methods
• Missing data
• Grouping operations
• Combining dataframes
• Text methods and Time methods
• Inputs and Outputs
Series
• Series: A data structure that holds an array of information along with
a named index
• The named index distinguishes it from a NumPy array
NumPy array has Pandas series has a Note: Data is internally still
numeric index labelled index numerically organized!

Index Data Labelled Index Data Numeric Index Labelled Index Data
0 1776 USA 1776 0 USA 1776
1 1867 Canada 1867 1 Canada 1867
2 1821 England 1821 2 England 1821

Finding data using this Finding data using this We can still use the
index is not easy index is very easy numeric index, if we want
DataFrame
• DataFrame: Table of columns and rows that can be easily
restructured/filtered
Series Multiple Series with the Same Index Dataframe

Index Year Index Year Index Pop Index GDP Index Year Pop GDP
USA 1776 USA 177 USA 328 USA 20.5 USA 1776 328 20.5
Canada 1867 6 Canada 38 Canada 1.7 Canada 1867 38 1.7
England 1821 Canada 186 Englan 126 Englan 3.9 England 1821 126 3.9
7 d d
Englan 182
• So, Dataframe
d = Several
1 series that share the same index, like a
spreadsheet
DataFrame
• Basic operations
• Create a dataframe
• Select a column/multiple columns
• Select a row/multiple rows
• Insert a new column/row
• Advanced Operations
• Indexing
• Filtering
• Missing data
• Grouping
• Joining
MatplotLib
• Data visualization is very important to quickly understand trends and relationships in the data
• Matplotlib: One of the most popular plotting libraries in Python
• Grandfather of plotting and visualization libraries in Python
• Seaborn/Pandas built-in visualization are built on top of Matplotlib
• Heavily inspired by the plotting functions in the MatLab programming language
• Two approaches: (1) Functional (2) OOP
• Main goals
• Plot a functional relationship, e.g. y = 2x
• Plot a relationship between raw data points: x = [1, 2, 3, 4] and y = [2, 4, 6, 8]
• Main types of plots
• Line Plot: Great for showing functional relationships and continuous data
• Scatter Plot: Useful for plotting raw data points and understanding the correlation between two variables
• Bar Plot: Useful for categorical data to show comparisons between different groups
• Histogram: Good for showing the distribution of a single variable
• Pie Chart: Used for showing proportions or percentages of categories
import matplotlib.pyplot as plt
Matplotlib Approaches x = [1, 2, 3, 4]
y = [2, 4, 6, 8]

• Functional • Object-oriented
• plt.plot(x, y) # Plotting • fig, ax = plt.subplots() # Create a figure
using a simple and a set of subplots
functional call • ax.plot(x, y) # Plot on the axes object
• plt.xlabel('x-axis') • ax.set_xlabel('x-axis') # Set label for x-axis
• plt.ylabel('y-axis') • ax.set_ylabel('y-axis') # Set label for y-axis
• plt.show() • plt.show()
Seaborn
• Seaborn: Statistical plotting library
• Built on top of Matplotlib, but uses a simpler one-line syntax
• Can directly work with Python Dataframes
• Easy to use, but less customization possible as compared to Matplotlib
• Types of plots
• Scatter plots: Relationship between two continuous variables (Trends, correlations)
• Distribution plots: How a single variable is distributed (patterns, skew, outliers)
(Histogram, KDE plot)
• Categorical plots: Categorical variables and their relationships with continuous data
(Box plot, bar plot, count plot)
• Comparison plots: Compare two or more variables (pair plot)
• Matrix plots: Complex relationship in a matrix form (heatmap)
Data Visualization Summary
Plot Usage Example Code
Line plot Trends over time periods/data Stock prices over a month plt.plot([1, 2, 3], [4, 5, 6]); plt.show()
points
Scatter plot Relationship between two House price versus Area of the plt.scatter([1, 2, 3], [4, 5, 6]); plt.show()
numeric variables house
Bar plot Compare categories/groups Sales across product plt.bar(['A', 'B', 'C'], [4, 7, 1]); plt.show()
with respect to numeric values categories
Histogram Distribution of a single numeric Distribution of ages in a plt.hist([1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5],
variable country bins=5); plt.show()
Box plot Distribution of data with Income distribution across plt.boxplot([7, 2, 5, 13, 9, 6]); plt.show()
reference to minimum, Q1, Q2, professions
Q3, maximum
Heatmap Matrix to display values using Correlation matrix between sns.heatmap(np.random.rand(5, 5),
colour intensity height and weight annot=True); plt.show()
Pie chart Proportions/percentages Market share of mobile phone plt.pie([40, 30, 20, 10], labels=['A', 'B', 'C',
among categories brands 'D'], autopct='%1.1f%%'); plt.show()

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy