0% found this document useful (0 votes)

24 views

Ap Python

This document provides an overview of key concepts in data preprocessing for machine learning including data cleaning, normalization, and working with NumPy and Pandas. [1] It discusses common data cleaning techniques like dropping rows with missing data, imputing missing values with the mean/median/mode, and issues with simply filling in missing data with placeholders. [2] It then covers different normalization techniques: min-max scaling to scale data between 0-1, z-score normalization to standardize data, and simple feature scaling to divide features by the maximum value. [3] The document also demonstrates basic NumPy operations for creating arrays and common functions, and Pandas operations for reading data, slicing frames, grouping, merging

Uploaded by

mailadwaitharun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Ap Python

Uploaded by

mailadwaitharun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

🐍

AP PYTHON
Mod 1
Basics:
You would usually use a lot for extra libraries to do all of the pre processing of the data. So far, just NUMPY and Pandas should be
enough. Going further, SKLearn and PYTorch might be used. but those are not important for this mod.
So, what is data pre processing? This is the core of AI&ML as everything runs on data. So when the data is incomplete or
inconsistent or have outliers or have not be standardized. You fix it, that’s data cleaning, which is one half of processing, the other
half being transforming. After that, the data should be ready to be ran thru a learning algorithm of a AI/ML/DL model.

What does pandas do? Basically, it lets you read the file and do basic functions like df.describe() to know the features of the dataset.
Apart from that, almost everything you can do with pandas can also be done with numpy. One draw back with numpy is that it is very
tedious with work with when it comes to strings and datatypes other than int/float. Pandas is more easier to work with textbased
datatypes.
What does NUMPY do? After you read the files using pandas, you do everything else on numpy.

Data cleaning.
There is a lot of data cleaning methods and each will try to fix a different issue in the dataset. The only data cleaning that we want
here is dealing with missing data.
So, there are three ways to do it, you can delete the rows that has missing data, you can fill those missing data with a
mean/median/mode or just fill the missing data with an indicator - aka something like “Missing” (the last method is total trash. don’t
do that.)

import numpy as np
import pandas as pd
from google.colab import files
file = files.upload() #calling the titanic.xls file
df = pd.read_excel("titanic.xls")
df.isnull().sum() #tells you the no of null value per feature.
df.shape #(rows,col)
new = df.dropna() #drops all rows with null.
new2 = df.dropna(subset=['body']) #drops all rows with null value in body col
new.isnull().sum()
new.shape
new2.isnull().sum()
new2.shape

new3=df.fillna(69) #fills all rows with null value with the value of 69 - Nice!
#use inplace=true to make the permanent to the og dataframe.

AP PYTHON 1
# you can go this values = {"Height": 0, "Weight": 1, "Country": 2, "Place": 3, "Number of days":4, "stay":5}
# and make the fillne like this. df.fillna(value=values)

Note : Axis = 0 means rows and Axis = 1 means columns

Normalization
Here, you only would need MinMaxScaler, Z Score and Simple Feature Scaling.
But what is normalization? Will be explained a bit later.

Simple Feature Scaling = you take a column’s elements and then divide it with the max value of that column.

df["colname"]=df["colname"]/df["colname"].max()

MinMaxScaler = you usually do this for Y values (aka outputs of every ML/AL/DL dataset) which gives you an normalized dataset of
range [0,1] or [-1,1] (the highest value will be 1 and the lowest will be 0)

from sklearn.preprocessing import MinMaxScaler

mm1=MinMaxScaler()
mm1.fit_transform("datasetname, which has to be in numpy form")

#or you can take every element and then sub it with the min value of the column and then divide it by the range of the column (max-min value

Z Score Normalization = Standardization.

which just gets rid of the various ranges of diff features. gives you very real number values without any range limits like [0,1]. It is
much less affected by outliers whereas normalization is. (the middle most term will be 0 and then terms greater than the middle term
will be positive and then terms lesser than the middle term will be -ive. NOTE - the middle most term here means in terms of value,
not position of said value)

from sklearn.preprocessing import StandardScaler

ss1=StandardScaler()
ss1.fit_transform("datasetname, which has to be in numpy form")

#or,
df['colname']=(df['colname'] - df['colname'].mean())/df['colname'].std()

Mod 2
Numpy
You define it like this:

import numpy as np
a=np.array([1,2,3,4] , dtype = int)
# or
k=np.arange(0,10,2) #starts from 0 to 10, 0 is inclusive and 10 is not and the step value is 2.
j=np.identity(2) #identity matrix of size 2
t=np.full(5,2) #for 1d. for 2d; t.full((2,2),2) = fills the 2 value of size 5 in 1d case and (2,2) in 2d case

Changing a pandas file to numpy

np_new2 = new2.values
np_new2.shape
np_new2.dtype
np_new2.ndim

AP PYTHON 2
#to get random number or zeros or ones.
z=np.zeros((1,2))
o=np.ones((1,2))

k=np.random.randint(1,5,10) #gives 10 elements in the range of 1,5

p=np.random.rand(2,3) #2,3 is the shape and gives rand values within 0-1
j=np.random.randn(1,2) #1,2 = shape and gives random natural numbers

Indexing in Numpy

#for any array.

<numpyarray_name>[<index>]
print(x[1]) #1d
print(x[1:,:]) #2d.

#operations.
print(x+k) #two diff arrays. or you can use np.add(x,k)

#likewise, there is np.subtract(x,k) np.multiply(x,k) np.divide(x,k) np.reciprocal(x)

#np.power(base,exp) np.mod(reminder, divisor)

Pandas

Basics

#refer the above code snippets to learn how to insert a dataframe file.
#now to understand how to create a pandas file on your own.
e={"sno":[1,2,3,4,5],"Test":[1,2,3,4,5]}
e=pd.DataFrame(e)

Important operations (slicing, group by and joins)

#slicing this shit.

#now there is two slicing functions. loc and iloc and both are more or less do the same
#thing but in different ways. iloc is based on indexes and loc is based on label names.

#to get a single row.

print(e.loc["""rowname, if there is no specific rowname cuz there won't be.
just give the index of the row in int form
(NOT STRING"""]) #or, print(pd.iloc[index value])

#to get a single col

print(e.loc[:,"colname"]) #or, print(pd.iloc[index_of_row,index_of_col])

#to get multiple stuff

e.loc[[1,2],["sno","Test"]] #and iloc is very similar to this as well.

#you can also have step values as well.

print(e.iloc[::2,1:3]) #before, i am was giving lists of exact column names
#here, i am just mentioning the size. ::2 means all the rows with step value of 2.

#a great example:
Report_Card.loc[(Report_Card["Name"] == "Benjamin Duran"),
["Lectures","Grades","Credits","Retake"]]

#Concatenation
pd.concat(objs=[e,v])
#output:
sno Test Test2
0 1 1.0 NaN
1 2 2.0 NaN
2 3 3.0 NaN
3 4 4.0 NaN

AP PYTHON 3
4 5 5.0 NaN
0 1 NaN 1.0
1 2 NaN 2.0
2 3 NaN 3.0
3 4 NaN 4.0
#notice how the index is restarting.

pd.concat(objs=[e,v],ignore_index=True)
#output:
sno Test Test2
0 1 1.0 NaN
1 2 2.0 NaN
2 3 3.0 NaN
3 4 4.0 NaN
4 5 5.0 NaN
5 1 NaN 1.0
6 2 NaN 2.0
7 3 NaN 3.0
8 4 NaN 4.0
#You can get similar results using print(e.append(v, ignore_index=True))

#Joins

p = pd.merge(e,b,on="sno",how="outer")
#p is a new dataframe and e and b are the preexisting dataframes. on is the common col
#in both e and b. how is the type of join.
# all how values: left, right, outer, inner

j=pd.merge(e,v,how="outer",on="Test",sort=True)
#you can also sort them by the on col

#GROUP BY

group1=j.groupby(["sno"])

#other stuff that you can do.

len(group1)
group1["Test"].nunique()
group1.groups #gives you all the groups.
group1.get_group(group_name) #gives that group.
group1["sno"].value_counts()
#you can also do stuff like sum, mean, max, min.
#using these you can solve complex shit like.
group1[group1["test"]==2] #gives all values where test = 2
group1[group1["test"]==2]["sno"].sum() #now you are taking the sum of sno where test = 2

for the_name_of_Group,The_stuff_in_group in group1:

print(the_name_of_Group)
print()
print(The_stuff_in_group)

AP PYTHON 4
print()
#to go thru the groups.

#there also something called. agg function

#example : g1.agg({"Courses": "count", "Fee": "min", "Duration": "min", "Discount": "min"})
#there is count, sum, min, max, mean, median,average,product,std and var

Pandas Data Structures.

#series.
s = pd.Series() #that's empty

k=np.array([1,2,3,4])
kdf=pd.Series(k,index=[1,2,3,4]) #this is just a way to turn numpy stuff to pandas.
#index is not required but it is nice to have,
#you can also use dictionary for this where the key will be the index.

#scalar creation, you just create a pandas series with just one value.
k=69
kdf=pd.Series(k,index=[1,2,3,4,5]) #index is requried here as it says the size.

#you can get the values like normal slicing.

kdf[1:]

#or you can use conditions too.

kdf[kdf<3]

#between works too.

data_c['Salary'].between(12000, 20000, inclusive=True) #that is not series but dataframe.

#to delete a col from dataframs

del op["Test"]

#and you can use this too.

data_pt.describe()

#and you can turn the dataframe back to csv

dataframename.to_csv('your_file_name.csv')

CA3
mod 3 and 4

Mod 4
Linear regression - XW+B (perceptron)(X - the independent variable, W - Slope mamaaiyo. B - bias (weights))
there could be another error term, which would end up in an equ like this (XW + B +- C) (that +- means plus or minus
error sum of residual = sum of (Y-true - Y-pred)^2
and sum of squares = sum of (y^2) - ((sum of Y^2)/n)
degree of freedom - learn after this.

Logistical regression = So linear reg works for real number values, what if you want to go into classifications into categorical data?
(basically sigmoid, which is a type of logisitical regression)

the important terms:

the prob of an event = p/1-p (would give you a lot of weird values. and you don’t want to have weird values right? so you plug a log
into that bitch, making it log(p/1-p) and that will give the proper prob of an event.)

AP PYTHON 5
z = XW+B
and XB+B = log(p/1-p)

and sigmoid = 1/1+e^(-z) = that also gives you the prob of an event.

classification vs regression = you know this shit alr.

important models of classification; log reg, K-nearest neighbour, support vector machines, decision tree classifications (can be split
into binary and mulitple class)
important models of regerssion; linear reg, poly reg, support vector reg, decision tree reg. (can be split into linear regression and
non linear regression.)

Suppot vector machine

it works for both linear and non linear separatable data, but only linear separatable data is on the portions. so we’ll study only that.
so you have a bunch of points on the graph which are linearly separatable and then you want to draw a line inbetween which has
the biggest distance between the points. and that line is called max margin line. there can be other lines which could separate but
with less disstance than the max margine line, so if you combine all of those lines, you get the hyperplane.

so how do you find the SVM if you are given a set of values?
so you plot the points and then take the closet points to the hyperplane.
so lets assume the pts are (1,2) and (3,4)
now you have sv1 (1,2) and sv2 (3,4) and now you are assuming the bias of 1, so you add that, sv1(1,2,1) and sv2(3,4,1)

now you got d1 * sv1 + d2 * sv2

now you muliplte that by sv1 and sv2 and those values equal to the polarity of those sv (if you mulitple by sv1, and if sv1 is true/yes,
then that equ equals 1 and if sv2 is false/no, sv2’s equ equals to -1)

1. d1 (1,2,1)(1,2,1) + d2(3,4,1)(1,2,1) = 1

2. d1 (1,2,1)(3,4,1) + d2(3,4,1)(3,4,1) = -1

so it equals to

1. d1(1,4,1) + d2(3,8,1) = 1

2. d1(3,8,1) + d2(9,16,1) =-1

now do dot product.

d1(1+4+1) + d2(3+8+1) = 1 (like that.)
so, 1. 6d1 +12d2 = 1

2. 12d1 +26d2 = -1

now multiple equ1 by 2 and solve normally.

12d1 + 24d2 = 2
(-)12d1 (-) 26d2 = 1
so that gives you -2d2 = 3 ⇒ d2 = -3/2 = -1.5 (3/2 in the negative side.)
12d1 + 24(-3/2) = 1
d1 = 37/12 = 3.08 (for the positive side.)
now get the weights: sum of Di * SVi

AP PYTHON 6
so;
3.08(1,4,1) + -1.5(3,4,1)
(3.08,12.32, 3.08) + (-4.5,-6,-1.5)
=(-1.42, 6.32, 1.58), so the actual bias is 1.58 and w1 = -1.42 and w2 = 1.58

so the prediction formal is w1x1 + w2x2 + b

Accuracy
put the confusion matrix as usual and follow these formulas.
precision = tp/fp+tp (note: tp= true postive and fp = false postive)
recall = tp/tp+fn

accuary = tp+tn / n
error rate = fp+fn / n
sensitivity = recall.

specificity = recall but inverse the values. (tn/tn+fp)

KNN and KMean

Kmean, so you take a cluster and then check all of the points in that cluster and if they all (or almost all of them) point to one class,
then future points which land in that cluster would point to the same class.

AP PYTHON 7
you get the points and then sub it to the cluster’s points and then which has the least distance is the correct cluster. (mod the value
is it’s -ive)
and then you take the mid point of all of the points in that cluster to form the new points of the cluster.

AP PYTHON 8
now redo the whole thingy with the new cluster points.
redo till you are satisfied
KNN - you find out the nearest elements/clusters and then who has the least distance from all of the pre-existing, you just assign
your test element to the class of the pre-existing element with the least distance. that’s all

Mod 3 (Data visualization with MatPlotLib)

always import matplotlib.pyplot as plt.
and import numpy as np (just to be sure.)
to draw lines.

xaxis = numpy.array([0,10])
yaxis = numpy.array([1,100])
plt.plot(xaxis,yaxis)
plt.show()
#draws a line from o,o to 10,100

plt.plot(x, y)

AP PYTHON 9
# Set limits and labels
plt.xlim([0, 10])
plt.ylim([-1, 1])
plt.xlabel('x-axis label')
plt.ylabel('y-axis label')

to draw it without lines.

plt.plot(xaxis,yaxis,) #marks the points with

#if xaxis is not given, then it will take the index values of the yaxis array as the xaxis
#plt.scatter()
#plt.bar()
#plt.hist()
#plt.boxplot()
#or just use kind = 'line'/or wtv you wanna add. and title gives it title

All marks:
o - circle
‘*’ - star.
‘-’ - lines.
‘—’ - dotted lines.
‘.’ - point. (dot)
‘,’ - pixel (comma)
‘+’ - plus
‘s’ - square.
‘D’ - diamond
‘d’ - pentagon
‘H’ - hexagon

‘V’ -triangle down.

colour:
lets say you want the line to be in red; you type out -:r
white,k - white,black
r,g,b - red, green, blue.
c,m,y - cyan, magneta, yellow.

With Seaborn.
now you import seaborn as sns.

sns.distplot([1,2,3,4], hist=False)
plt.show()
#shows you the distribution of any one numerical col
#or you can go like this:
sns.lineplot(x='col1',y='col2', data = data_variable)
plt.show()
# you can even use scatterplot(), histplot(), barplot(), hue parameter = diff colour for diff values.

kde plot = i have no idea tbh, check that later on.

sns.jointplot(x='col1',y='col2', data = data_variable)

AP PYTHON 10
plt.show()

sns.pairplot(data_variable)
and that plots all the possible combinations of plot there is for all of the variables columns
stripplot(x,y) is for one categorical and one numerical values.

swarmplot(x,y) is basically the same as stripplot but with less overlapping

violinplot(x,y) is like whisper and box plot. (similar)
countplot(x) takes only x value.

#for plt subplots.

fig, ax = plt.subplots(2, 2)
ax[0][1].plot(what ever you want)
#the 0 and 1 will denote the indexes of the plot.

Mod 5 (Tensorflow an
d Basics of ML)
Tensorflow is a library which is mainly used to handle high complexity data and perform basic ML tasks (mostly for coding up math
functions which we don’t need to understand at the moment) and many high level ML/DL (and even AI) libraries like Keras, PyTorch
are built ontop of Tensorflow (It is the library which using CPU and GPU). The syntax sort of resemble C++.

Importing the library and a basic code:

import tensorflow as tf
with tf.compat.v1.Session() as s: #to run code
a=tf.constant(60)
b=tf.constant(9)
nice = tf.add(a,b)
print(nice) #the output is Tensor("Add_1:0", shape=(), dtype=int32), but you don't want it like that, right?
print(s.run(nice)) #so the run function will give the output as 69 of type numpy (not numpy array, just one numpy value)

Variables

with tf.compat.v1.Session() as s:
test_var = tf.Variable(tf.zeros([3,2])) #here, you are creating a variable, which can be changed unlike constants
var_inst = tf.compat.v1.global_variables_initializer() #and you are initializing the said variables
s.run(var_inst)
print(s.run(test_var))
test_var = (tf.zeros([5,2]))
print(s.run(test_var))

#another example code:

tv1=tf.Variable([1,2,3,4,5])
print(tv1)
tv1[1].assign(103) #you change the index 1 with 103
print(tv1)
tv1.assign_sub([1,2,3,4,5]) #you are subtracting the values given in the list to the actual tv1 list.
print(tv1)

And i just got to know that tensor flow is not in the portions for this current finals, so don’t study that. Moving on to the sklearn stuff

For linear and log regression:

AP PYTHON 11
import the numpy data and then pandas to fetch the dataset and then preprocess them and after you done, you split then into
traning and testing.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

using sklearn.preprocessing module, import standardscaler or minmaxscaler and then normalize them if required.

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)
#to predict
results = regressor.predict(X_test)
pred=scaler.inverse_transform(results)
#now you can do the accuracy check.
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, explaine
d_variance_score, accuracy_score
print(mean_squared_error(Y_test,pred)) #or you can do anything other than that.

for Log reg.

#the following code from above:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
#to
from sklearn.linear_model import LogisticRegression
regressor = LogisticRegression()
#and everything else is the same, all you need to do is change the functions

from sklearn.metrics import classification_report, confusion_matrix

cm = confusion_matrix(y_train, regressor.predict(X_train))
print(cm)
# ^ for confusion matrix

#for svm, keep everything literally the same as log reg but change the function.
from sklearn.svm import SVC
model=SVC()
model.fit(x_train, y_train)

#for KNN, there is one important parameter called n_neighbors which is the number of neighbours that the model is going to use, which has a
from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors=3)

classifier.fit(x_train, y_train)

#For KMeans, you have n_clusters which is basically the same as the n_neighbors by for numbr of clusters. (default=8)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = i)
kmeans.fit(X_Train,Y_Train)

Learn how to plot stuff along finding the values for these models and you are set.

AP PYTHON 12

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Banking System Documentation
88% (16)
Banking System Documentation
59 pages
Against Comprehensible Input: The Input Hypothesis and The Development of Second-Language Competence
No ratings yet
Against Comprehensible Input: The Input Hypothesis and The Development of Second-Language Competence
16 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Data Science With Python
No ratings yet
Data Science With Python
12 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
DATASCIENCE_INTERNSHIP[1]
No ratings yet
DATASCIENCE_INTERNSHIP[1]
43 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
83% (12)
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
PR Final File
No ratings yet
PR Final File
70 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
# (Data Preprocessing) : (Cheatsheet)
No ratings yet
# (Data Preprocessing) : (Cheatsheet)
10 pages
Loading Pandas
No ratings yet
Loading Pandas
23 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
python interviews
No ratings yet
python interviews
154 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Lab File
No ratings yet
Lab File
96 pages
Python
No ratings yet
Python
32 pages
numpy_dataframe
No ratings yet
numpy_dataframe
12 pages
Pandas & Mysql
No ratings yet
Pandas & Mysql
20 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Pandas
No ratings yet
Pandas
21 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
120 Most Common Pitman Shorthand Words
40% (5)
120 Most Common Pitman Shorthand Words
12 pages
Aylor Imes: Sweet and Sour Week AT A Glance
No ratings yet
Aylor Imes: Sweet and Sour Week AT A Glance
2 pages
Bach Invention 8
No ratings yet
Bach Invention 8
2 pages
Lecture 8 HTML and XHTLM
No ratings yet
Lecture 8 HTML and XHTLM
18 pages
SE MODULE 5 ANSWER QBANK Merged
No ratings yet
SE MODULE 5 ANSWER QBANK Merged
30 pages
The Eternality and Deity of The Word: John: David J. Macleod
No ratings yet
The Eternality and Deity of The Word: John: David J. Macleod
18 pages
Barker - Cultural Studies
No ratings yet
Barker - Cultural Studies
8 pages
The Atlantis Story
No ratings yet
The Atlantis Story
9 pages
The Impact of Effective Teaching Strategies On TH
50% (2)
The Impact of Effective Teaching Strategies On TH
14 pages
Q3 Module4
No ratings yet
Q3 Module4
11 pages
Normalisation Workbook
No ratings yet
Normalisation Workbook
9 pages
CEO Olympiad Book For Class 1
No ratings yet
CEO Olympiad Book For Class 1
11 pages
Soal Bing PTS 1 Kelas 6
No ratings yet
Soal Bing PTS 1 Kelas 6
5 pages
Conformal ECO Flow
No ratings yet
Conformal ECO Flow
3 pages
Metraux Religion and Shamanism
No ratings yet
Metraux Religion and Shamanism
42 pages
Thelemic Talismanic Magick
75% (4)
Thelemic Talismanic Magick
7 pages
Chapter A3
No ratings yet
Chapter A3
20 pages
CLASS TWO HARMONISED FIRST TERM EXAMINATION For
No ratings yet
CLASS TWO HARMONISED FIRST TERM EXAMINATION For
13 pages
Electronic Embedded Systems: COURSE OUTLINE 2019 - 2021
No ratings yet
Electronic Embedded Systems: COURSE OUTLINE 2019 - 2021
40 pages
Implementation and Runtime Analysis of Merge Sort VS Insertion Sort Algorithm
No ratings yet
Implementation and Runtime Analysis of Merge Sort VS Insertion Sort Algorithm
4 pages
STS Repoort Introduction
No ratings yet
STS Repoort Introduction
57 pages
Life 2e - Elementary - Unit 7 Test - Word Correction
No ratings yet
Life 2e - Elementary - Unit 7 Test - Word Correction
6 pages
Pre TT 2P
No ratings yet
Pre TT 2P
3 pages
P - 5 English Lesson Notes Term I - Iii 2018
No ratings yet
P - 5 English Lesson Notes Term I - Iii 2018
61 pages
Mode Type Working Description
No ratings yet
Mode Type Working Description
9 pages
The Song of The Ocean
No ratings yet
The Song of The Ocean
5 pages
Nepali Alphabet Tools Nepal
No ratings yet
Nepali Alphabet Tools Nepal
10 pages
Web Designing: Certificate Course in
No ratings yet
Web Designing: Certificate Course in
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ap Python

Uploaded by

Ap Python

Uploaded by

🐍

Note : Axis = 0 means rows and Axis = 1 means columns

from sklearn.preprocessing import MinMaxScaler

Z Score Normalization = Standardization.

from sklearn.preprocessing import StandardScaler

Changing a pandas file to numpy

k=np.random.randint(1,5,10) #gives 10 elements in the range of 1,5

#for any array.

#likewise, there is np.subtract(x,k) np.multiply(x,k) np.divide(x,k) np.reciprocal(x)

Important operations (slicing, group by and joins)

#slicing this shit.

#to get a single row.

#to get a single col

#to get multiple stuff

#you can also have step values as well.

#other stuff that you can do.

for the_name_of_Group,The_stuff_in_group in group1:

#there also something called. agg function

Pandas Data Structures.

#you can get the values like normal slicing.

#or you can use conditions too.

#between works too.

#to delete a col from dataframs

#and you can use this too.

#and you can turn the dataframe back to csv

the important terms:

classification vs regression = you know this shit alr.

Suppot vector machine

now you got d1 * sv1 + d2 * sv2

2. d1(3,8,1) + d2(9,16,1) =-1

now do dot product.

now multiple equ1 by 2 and solve normally.

so the prediction formal is w1x1 + w2x2 + b

specificity = recall but inverse the values. (tn/tn+fp)

KNN and KMean

Mod 3 (Data visualization with MatPlotLib)

to draw it without lines.

plt.plot(xaxis,yaxis,*) #marks the points with *

‘V’ -triangle down.

kde plot = i have no idea tbh, check that later on.

sns.jointplot(x='col1',y='col2', data = data_variable)

swarmplot(x,y) is basically the same as stripplot but with less overlapping

#for plt subplots.

Importing the library and a basic code:

#another example code:

For linear and log regression:

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

for Log reg.

#the following code from above:

from sklearn.metrics import classification_report, confusion_matrix

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

plt.plot(xaxis,yaxis,) #marks the points with