Machine Learning With Scikit-Learn: George Boorman

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Machine learning

with scikit-learn
SUPERVISED LEARNING WITH SCIKIT-LEARN

George Boorman
Core Curriculum Manager, DataCamp
What is machine learning?
Machine learning is the process whereby:
Computers are given the ability to learn to make decisions from data

without being explicitly programmed!

SUPERVISED LEARNING WITH SCIKIT-LEARN


Examples of machine learning

SUPERVISED LEARNING WITH SCIKIT-LEARN


Unsupervised learning
Uncovering hidden pa erns from unlabeled data

Example:
Grouping customers into distinct categories (Clustering)

SUPERVISED LEARNING WITH SCIKIT-LEARN


Supervised learning
The predicted values are known

Aim: Predict the target values of unseen data, given the features

SUPERVISED LEARNING WITH SCIKIT-LEARN


Types of supervised learning
Classi cation: Target variable consists of Regression: Target variable is continuous
categories

SUPERVISED LEARNING WITH SCIKIT-LEARN


Naming conventions
Feature = predictor variable = independent variable

Target variable = dependent variable = response variable

SUPERVISED LEARNING WITH SCIKIT-LEARN


Before you use supervised learning
Requirements:
No missing values

Data in numeric format

Data stored in pandas DataFrame or NumPy array

Perform Exploratory Data Analysis (EDA) rst

SUPERVISED LEARNING WITH SCIKIT-LEARN


scikit-learn syntax
from sklearn.module import Model
model = Model()
model.fit(X, y)
predictions = model.predict(X_new)
print(predictions)

array([0, 0, 0, 0, 1, 0])

SUPERVISED LEARNING WITH SCIKIT-LEARN


Let's practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
The classification
challenge
SUPERVISED LEARNING WITH SCIKIT-LEARN

George Boorman
Core Curriculum Manager, DataCamp
Classifying labels of unseen data
1. Build a model

2. Model learns from the labeled data we pass to it

3. Pass unlabeled data to the model as input

4. Model predicts the labels of the unseen data

Labeled data = training data

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors
Predict the label of a data point by
Looking at the k closest labeled data points

Taking a majority vote

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN


k-Nearest Neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN


KNN Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN


KNN Intuition

SUPERVISED LEARNING WITH SCIKIT-LEARN


Using scikit-learn to fit a classifier
from sklearn.neighbors import KNeighborsClassifier
X = churn_df[["total_day_charge", "total_eve_charge"]].values
y = churn_df["churn"].values
print(X.shape, y.shape)

(3333, 2), (3333,)

knn = KNeighborsClassifier(n_neighbors=15)
knn.fit(X, y)

SUPERVISED LEARNING WITH SCIKIT-LEARN


Predicting on unlabeled data
X_new = np.array([[56.8, 17.5],
[24.4, 24.1],
[50.1, 10.9]])
print(X_new.shape)

(3, 2)

predictions = knn.predict(X_new)
print('Predictions: {}'.format(predictions))

Predictions: [1 0 0]

SUPERVISED LEARNING WITH SCIKIT-LEARN


Let's practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
Measuring model
performance
SUPERVISED LEARNING WITH SCIKIT-LEARN

George Boorman
Core Curriculum Manager, DataCamp
Measuring model performance
In classi cation, accuracy is a commonly used metric

Accuracy:

SUPERVISED LEARNING WITH SCIKIT-LEARN


Measuring model performance
How do we measure accuracy?

Could compute accuracy on the data used to t the classi er

NOT indicative of ability to generalize

SUPERVISED LEARNING WITH SCIKIT-LEARN


Computing accuracy

SUPERVISED LEARNING WITH SCIKIT-LEARN


Computing accuracy

SUPERVISED LEARNING WITH SCIKIT-LEARN


Computing accuracy

SUPERVISED LEARNING WITH SCIKIT-LEARN


Train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=21, stratify=y)
knn = KNeighborsClassifier(n_neighbors=6)
knn.fit(X_train, y_train)
print(knn.score(X_test, y_test))

0.8800599700149925

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity
Larger k = less complex model = can cause under ing

Smaller k = more complex model = can lead to over ing

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity and over/underfitting
train_accuracies = {}
test_accuracies = {}
neighbors = np.arange(1, 26)
for neighbor in neighbors:
knn = KNeighborsClassifier(n_neighbors=neighbor)
knn.fit(X_train, y_train)
train_accuracies[neighbor] = knn.score(X_train, y_train)
test_accuracies[neighbor] = knn.score(X_test, y_test)

SUPERVISED LEARNING WITH SCIKIT-LEARN


Plotting our results
plt.figure(figsize=(8, 6))
plt.title("KNN: Varying Number of Neighbors")
plt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")
plt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")
plt.legend()
plt.xlabel("Number of Neighbors")
plt.ylabel("Accuracy")
plt.show()

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity curve

SUPERVISED LEARNING WITH SCIKIT-LEARN


Model complexity curve

SUPERVISED LEARNING WITH SCIKIT-LEARN


Let's practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy