0% found this document useful (0 votes)
30 views4 pages

Final Project

Uploaded by

mertdene10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

Final Project

Uploaded by

mertdene10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ADA 442 Statistical Learning | Classification

Author: Dr. Hakan Emekci


Created: 13 Nov 2023
Version: v23.01.01
Final Project Assignment
Objective
The objective of this project is to build a machine learning model to predict
whether a client of a bank will subscribe to a term deposit or not. The dataset
used for this project is the Bank Marketing Data Set, which can be found
at https://archive.ics.uci.edu/ml/datasets/Bank+Marketing. Use the “bank-
additional.csv” with 10% of the examples (4119), randomly selected from full
dataset, and 20 inputs.
The data is related with direct marketing campaigns (phone calls) of a Portuguese
banking institution. The classification goal is to predict if the client will subscribe
a term deposit (variable y). The data is related with direct marketing campaigns
of a Portuguese banking institution. The marketing campaigns were based on
phone calls. Often, more than one contact to the same client was required, in
order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’)
subscribed.

Source:
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to
Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier,
62:22-31, June 2014

Requirements
Your project should meet the following requirements:
1. Data cleaning: Perform necessary data cleaning operations to make sure
the data is in a suitable format for analysis.
2. Data preprocessing: Perform necessary data preprocessing operations such
as feature scaling, encoding categorical variables, etc.
3. Feature selection: Use feature selection techniques to select the most
relevant features for the model.
4. Model selection: Compare the performance of at least three different
models (e.g., logistic regression, random forest, neural network) and choose
the best one based on evaluation metrics.
5. Hyperparameter tuning: Tune the hyperparameters of the selected model
to improve its performance.

1
6. Evaluation: Evaluate the performance of the final model using appropriate
evaluation metrics.
7. Deployment: Deploy the final model using streamlit and create a web
interface for the model.

Grading
The project will be graded based on the following criteria:
• Data Cleaning (10%): The dataset should be thoroughly cleaned, and any
data quality issues should be addressed appropriately.
• Data Preprocessing (10%): The categorical variables should be appropri-
ately encoded, and numerical variables should be scaled if necessary.
• Feature Engineering (10%): New features should be created where appro-
priate.
• Model Selection (20%): Several models should be trained, and the best-
performing model should be selected based on appropriate metrics. Stu-
dents should evaluate the performance of their model using appropriate
metrics and compare it with other models. The selected model’s hyperpa-
rameters should be tuned using appropriate techniques.
• Creating Pipeline (20%): Students should create a pipeline cover all process.
• Deployment (30%): The selected model should be deployed using the
Streamlit framework, and the deployed model should be usable by end-
users.

Submissions
Submit a Jupyter Notebook containing the code for the project. Make sure
to include sufficient documentation and comments in your code. Also, provide
a separate document with instructions on how to run and interact with the
deployed mode in your report file. Each student should submit three files zipped
in a single file as “Group_0XX.zip” on LMS system:
• Jupyter Notebook (project.ipynb)
• Powerpoint Presentation (presentation.ppt) (Max 5 slides)
• A report (report.pdf) summarizing your findings and recommendations.
(Max 2 pages). Give the Streamlit cloud address of your project.
The deadline for submission is 24 December 2023 at 11:59 PM.

Instructions
• The project is open-book and open-internet. You are free to use any
resources available to you, but you are not allowed to collaborate with
other groups or to copy from other sources.

2
• This is an group project.
• Each student has to implement the required steps and come up with an
optimal solution.
• You are allowed to use any Python libraries for data analysis and modeling.
• You have to provide brief explanations of each step in the notebook and
presentation.
• The notebook should be well documented and easy to follow.
• Your performance will be evaluated based on the grading criteria mentioned
above.
• Academic honesty is expected from each student. Any act of plagiarism or
cheating will not be tolerated and will be reported to the university.
• All submissions must be made on or before the specified deadline. Late
submissions will not be accepted.
• Each group has to prepare short presentation for their project. Selected
projects will be discussed in the class as a demo.

Academic Honesty
• This is an group project, and each group is expected to work independently
and each member of group contribute equally to the project.
• Collaboration between groups is not allowed, and any instance of academic
dishonesty will be reported to the university authorities.
• You may use online resources and libraries, but you must cite them properly
in your code and presentation.
• Your code and presentation must be original and free of plagiarism.

Deadline
The deadline for submission is 24.12.2023 at 11:59 PM. Late submissions will
not be accepted.

Resources
You may find the following resources helpful:
• Our Lecture notes and notebooks on Teams
• [Pandas documentation] (https://pandas.pydata.org/docs/)
• [Scikit-learn documentation] (https://scikit-learn.org/stable/documentation.html)
• [Seaborn documentation] (https://seaborn.pydata.org/documentation.html)
• [Matplotlib documentation] (https://matplotlib.org/stable/contents.html)
• [Streamlit documentation] (https://streamlit.io/documentation)
• [Hyperparameter tuning with GridSearchCV] (https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)
• [Understanding Precision-Recall in Scikit-Learn] (https://scikit-
learn.org/stable/auto_examples/model_selection/plot_precision_recall.html)
• [Data Cleaning and Preprocessing in Python] (https://towardsdatascience.com/data-
cleaning-and-preprocessing-techniques-for-your-machine-

3
learning-project-ec50b8b7996b)
• [Machine Learning Pipeline: What is it and how to build one]
(https://towardsdatascience.com/machine-learning-pipeline-
what-is-it-and-how-to-build-one-7fddc3413e1d)
• [How to Deploy Machine Learning Models with Streamlit]
(https://towardsdatascience.com/how-to-deploy-machine-
learning-models-with-streamlit-379493145b58)
• [Machine Learning Project Checklist] (https://www.kdnuggets.com/2018/05/general-
ml-advice-project-checklist.html)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy