0% found this document useful (0 votes)

17 views

NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)

The document provides an overview of key Python libraries for data analytics and science, including NumPy, Pandas, Matplotlib, and Seaborn. It highlights the functionalities and differences between NumPy and Pandas, detailing their respective data structures, operations, and performance characteristics. Additionally, it covers data visualization techniques using Matplotlib and Seaborn, outlining various plot types and their applications.

Uploaded by

rgrewal112233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)

Uploaded by

rgrewal112233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Python Data Analytics/Science:

NumPy, Pandas, MatplotLib,

Seaborn, ScikitLearn (SkLearn)
NumPy
• NumPy: Foundation of Python Data Analytics
• Library for creating N-dimensional arrays of various data types
• More efficient than lists: Homogeneous unlike lists (All elements are of the same
type) and are stored in contiguous memory locations
• Vectorization: Apply a function/operation simultaneously on entire arrays,
without needing a for loop, e.g. result = arr + 5
• Broadcasting: Align arrays of different shapes, e.g. result = arr_2d + arra_1d
• Linear algebra: Matrix, Eigenvalues/Eigenvectors, etc
• Statistical functions: Mean, Mode, Median, Standard deviation, Covariance, etc
• Random number generation: From uniform, normal, binomial distributions
• Missing data: Can handle missing data
• Compatibility: With Pandas, Matplotlib, SciPy, TensorFlow, etc
NumPy Arrays
• Main ways of creating a NumPy array:
• Transform Python list
• Use built-in functions
• Generate random data
• Indexing and Selection: Single element, Slicing, Broadcasting, 2D
indexing and selection, Conditional selection
• Operations on arrays: Arithmetic, Universal functions, Summary
statistics, 2D arrays
Pandas
Pandas
• Pandas: Library for Data Analysis
• Extremely powerful and flexible table (DataFrame) system built on top
of NumPy
• Computationally very efficient
• Features
• Read/Write data – Many formats supported
• Indexing, Applying logic, Sub-setting, etc
• Handle missing data
• Adjust and restructure data
NumPy Compared to Pandas
NumPy Pandas
• Aim: Numerical computation using • Aim: Data processing using series
n-dimensional arrays and dataframes
• Data types: Mainly Integer, Float • Data types: Numeric, Text, Date
• Performance: Very fast • Performance: Relative slower
• Indexing: Integer-based (e.g. array • Indexing: Additionally also supports
[0,1]) label-indexing (e.g. df[‘age’]
• Built-in operations: Numerical and • Built-in operations: Data analysis
linear-algebra related tools such as merging, sorting,
• Time-series data: No support joining, handling missing data, etc
• Time-series data: Excellent support
such as date-based indexing,
shifting, resampling, etc
Pandas: Main Topics
• Series and DataFrames
• Conditional filtering and useful methods
• Missing data
• Grouping operations
• Combining dataframes
• Text methods and Time methods
• Inputs and Outputs
Series
• Series: A data structure that holds an array of information along with
a named index
• The named index distinguishes it from a NumPy array
NumPy array has Pandas series has a Note: Data is internally still
numeric index labelled index numerically organized!

Index Data Labelled Index Data Numeric Index Labelled Index Data
0 1776 USA 1776 0 USA 1776
1 1867 Canada 1867 1 Canada 1867
2 1821 England 1821 2 England 1821

Finding data using this Finding data using this We can still use the
index is not easy index is very easy numeric index, if we want
DataFrame
• DataFrame: Table of columns and rows that can be easily
restructured/filtered
Series Multiple Series with the Same Index Dataframe

Index Year Index Year Index Pop Index GDP Index Year Pop GDP
USA 1776 USA 177 USA 328 USA 20.5 USA 1776 328 20.5
Canada 1867 6 Canada 38 Canada 1.7 Canada 1867 38 1.7
England 1821 Canada 186 Englan 126 Englan 3.9 England 1821 126 3.9
7 d d
Englan 182
• So, Dataframe
d = Several
1 series that share the same index, like a
spreadsheet
DataFrame
• Basic operations
• Create a dataframe
• Select a column/multiple columns
• Select a row/multiple rows
• Insert a new column/row
• Advanced Operations
• Indexing
• Filtering
• Missing data
• Grouping
• Joining
MatplotLib
• Data visualization is very important to quickly understand trends and relationships in the data
• Matplotlib: One of the most popular plotting libraries in Python
• Grandfather of plotting and visualization libraries in Python
• Seaborn/Pandas built-in visualization are built on top of Matplotlib
• Heavily inspired by the plotting functions in the MatLab programming language
• Two approaches: (1) Functional (2) OOP
• Main goals
• Plot a functional relationship, e.g. y = 2x
• Plot a relationship between raw data points: x = [1, 2, 3, 4] and y = [2, 4, 6, 8]
• Main types of plots
• Line Plot: Great for showing functional relationships and continuous data
• Scatter Plot: Useful for plotting raw data points and understanding the correlation between two variables
• Bar Plot: Useful for categorical data to show comparisons between different groups
• Histogram: Good for showing the distribution of a single variable
• Pie Chart: Used for showing proportions or percentages of categories
import matplotlib.pyplot as plt
Matplotlib Approaches x = [1, 2, 3, 4]
y = [2, 4, 6, 8]

• Functional • Object-oriented
• plt.plot(x, y) # Plotting • fig, ax = plt.subplots() # Create a figure
using a simple and a set of subplots
functional call • ax.plot(x, y) # Plot on the axes object
• plt.xlabel('x-axis') • ax.set_xlabel('x-axis') # Set label for x-axis
• plt.ylabel('y-axis') • ax.set_ylabel('y-axis') # Set label for y-axis
• plt.show() • plt.show()
Seaborn
• Seaborn: Statistical plotting library
• Built on top of Matplotlib, but uses a simpler one-line syntax
• Can directly work with Python Dataframes
• Easy to use, but less customization possible as compared to Matplotlib
• Types of plots
• Scatter plots: Relationship between two continuous variables (Trends, correlations)
• Distribution plots: How a single variable is distributed (patterns, skew, outliers)
(Histogram, KDE plot)
• Categorical plots: Categorical variables and their relationships with continuous data
(Box plot, bar plot, count plot)
• Comparison plots: Compare two or more variables (pair plot)
• Matrix plots: Complex relationship in a matrix form (heatmap)
Data Visualization Summary
Plot Usage Example Code
Line plot Trends over time periods/data Stock prices over a month plt.plot([1, 2, 3], [4, 5, 6]); plt.show()
points
Scatter plot Relationship between two House price versus Area of the plt.scatter([1, 2, 3], [4, 5, 6]); plt.show()
numeric variables house
Bar plot Compare categories/groups Sales across product plt.bar(['A', 'B', 'C'], [4, 7, 1]); plt.show()
with respect to numeric values categories
Histogram Distribution of a single numeric Distribution of ages in a plt.hist([1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5],
variable country bins=5); plt.show()
Box plot Distribution of data with Income distribution across plt.boxplot([7, 2, 5, 13, 9, 6]); plt.show()
reference to minimum, Q1, Q2, professions
Q3, maximum
Heatmap Matrix to display values using Correlation matrix between sns.heatmap(np.random.rand(5, 5),
colour intensity height and weight annot=True); plt.show()
Pie chart Proportions/percentages Market share of mobile phone plt.pie([40, 30, 20, 10], labels=['A', 'B', 'C',
among categories brands 'D'], autopct='%1.1f%%'); plt.show()

Week 2 - 1-Steve Jobs Knew How To Write An Email. Here's How He Did It - PDF
No ratings yet
Week 2 - 1-Steve Jobs Knew How To Write An Email. Here's How He Did It - PDF
3 pages
ATPOL II User Manual 5 0
No ratings yet
ATPOL II User Manual 5 0
100 pages
PP_unit-5_notes
No ratings yet
PP_unit-5_notes
15 pages
Day2Part2. DataVisualization
No ratings yet
Day2Part2. DataVisualization
29 pages
BDA File
No ratings yet
BDA File
26 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
unit 4
No ratings yet
unit 4
105 pages
unit 5
No ratings yet
unit 5
28 pages
visualization
No ratings yet
visualization
18 pages
3-numpy_pandas
No ratings yet
3-numpy_pandas
37 pages
Mohit
No ratings yet
Mohit
19 pages
Unit 5
No ratings yet
Unit 5
27 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Python Interview Prep Doc
No ratings yet
Python Interview Prep Doc
6 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
No ratings yet
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
19 pages
Session3 - Analytics For Programming II - Siryani - 090524
No ratings yet
Session3 - Analytics For Programming II - Siryani - 090524
28 pages
Visualization Library Documentation
No ratings yet
Visualization Library Documentation
16 pages
UNIT5
No ratings yet
UNIT5
18 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
Unit 5 PythonPackages(Matplotlib)
No ratings yet
Unit 5 PythonPackages(Matplotlib)
24 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
FDS Notes Unit-5
No ratings yet
FDS Notes Unit-5
24 pages
Data Manipulation and Visualization
No ratings yet
Data Manipulation and Visualization
21 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Datascienece
No ratings yet
Datascienece
18 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Unit 3 (Python)
No ratings yet
Unit 3 (Python)
29 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
Python Libraries 2
No ratings yet
Python Libraries 2
80 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
13_Data Visualization
No ratings yet
13_Data Visualization
15 pages
Lesson 03 3.01 Python Libraries For Data Science
No ratings yet
Lesson 03 3.01 Python Libraries For Data Science
79 pages
AIES Assignment1
No ratings yet
AIES Assignment1
15 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Unit 5
No ratings yet
Unit 5
11 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
scrib1
No ratings yet
scrib1
7 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
2.5. Introduction To Matplotlib 1
No ratings yet
2.5. Introduction To Matplotlib 1
45 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
DM File
No ratings yet
DM File
22 pages
Python Week+1 New
No ratings yet
Python Week+1 New
44 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
T Test,ANOVA,Chi Square Test
No ratings yet
T Test,ANOVA,Chi Square Test
26 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
Continuous Distributions
No ratings yet
Continuous Distributions
17 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Naïve Bayes’ Classifier
No ratings yet
Naïve Bayes’ Classifier
17 pages
Practical Application of Simulation of Business Processes
No ratings yet
Practical Application of Simulation of Business Processes
6 pages
Technical Specification
100% (3)
Technical Specification
103 pages
Top 40 DAA Interview Questions and Answers
100% (1)
Top 40 DAA Interview Questions and Answers
2 pages
BCP DRP
No ratings yet
BCP DRP
21 pages
AI Based Abnormal Human Detection System
No ratings yet
AI Based Abnormal Human Detection System
3 pages
Issn 2488-8648: 10.5555/edpv8104
No ratings yet
Issn 2488-8648: 10.5555/edpv8104
9 pages
Path Testing Solution
No ratings yet
Path Testing Solution
4 pages
FACSLyric Product List
No ratings yet
FACSLyric Product List
4 pages
Pricing Asian Options by Contour Integration, Including Asymptotic Methods For Low Volatility
No ratings yet
Pricing Asian Options by Contour Integration, Including Asymptotic Methods For Low Volatility
14 pages
IT Text Book
No ratings yet
IT Text Book
65 pages
Department of Education: y X - 2 y 2+x y 4 y X - 2
No ratings yet
Department of Education: y X - 2 y 2+x y 4 y X - 2
2 pages
RSS-2-300WL Datasheet
No ratings yet
RSS-2-300WL Datasheet
2 pages
How To Learn AI From Scratch in 2024 - A Complete Expert Guide - DataCamp
No ratings yet
How To Learn AI From Scratch in 2024 - A Complete Expert Guide - DataCamp
31 pages
Loader Diagrama Electrico
No ratings yet
Loader Diagrama Electrico
74 pages
Operations Management Stevenson 11th Edition Test Bank download pdf
100% (22)
Operations Management Stevenson 11th Edition Test Bank download pdf
69 pages
Annoying People in An Elevator
No ratings yet
Annoying People in An Elevator
36 pages
Caltech Coursera Planning Guide
No ratings yet
Caltech Coursera Planning Guide
6 pages
Mcsl-217 2024 English
No ratings yet
Mcsl-217 2024 English
8 pages
Granny Log
No ratings yet
Granny Log
130 pages
(En) Ethos User Manual 1.5.8
No ratings yet
(En) Ethos User Manual 1.5.8
412 pages
MGT6153.E1 Summer 2012 Multiple-Choice Exercise 1 - Answer Key
100% (1)
MGT6153.E1 Summer 2012 Multiple-Choice Exercise 1 - Answer Key
14 pages
Title - The Enigmatic World of Quantum Computing - Unraveling The Power of Qubits
No ratings yet
Title - The Enigmatic World of Quantum Computing - Unraveling The Power of Qubits
2 pages
GG-Unit 17 Understanding and Leading Change Sep 2019 NN
0% (1)
GG-Unit 17 Understanding and Leading Change Sep 2019 NN
6 pages
Load Reactors 1708928527
No ratings yet
Load Reactors 1708928527
5 pages
Developing Ict Content For Specific Purposes
67% (3)
Developing Ict Content For Specific Purposes
23 pages
FC190 AR Motec 32 EN V50.00
No ratings yet
FC190 AR Motec 32 EN V50.00
30 pages
Using Spectrum Laboratory (Spec Lab) For Precise Audio Frequency Measurements
No ratings yet
Using Spectrum Laboratory (Spec Lab) For Precise Audio Frequency Measurements
16 pages
SAP Interface
No ratings yet
SAP Interface
17 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)

Uploaded by

NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)

Uploaded by

Python Data Analytics/Science:

NumPy, Pandas, MatplotLib,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.