Final Project
Final Project
Source:
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to
Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier,
62:22-31, June 2014
Requirements
Your project should meet the following requirements:
1. Data cleaning: Perform necessary data cleaning operations to make sure
the data is in a suitable format for analysis.
2. Data preprocessing: Perform necessary data preprocessing operations such
as feature scaling, encoding categorical variables, etc.
3. Feature selection: Use feature selection techniques to select the most
relevant features for the model.
4. Model selection: Compare the performance of at least three different
models (e.g., logistic regression, random forest, neural network) and choose
the best one based on evaluation metrics.
5. Hyperparameter tuning: Tune the hyperparameters of the selected model
to improve its performance.
1
6. Evaluation: Evaluate the performance of the final model using appropriate
evaluation metrics.
7. Deployment: Deploy the final model using streamlit and create a web
interface for the model.
Grading
The project will be graded based on the following criteria:
• Data Cleaning (10%): The dataset should be thoroughly cleaned, and any
data quality issues should be addressed appropriately.
• Data Preprocessing (10%): The categorical variables should be appropri-
ately encoded, and numerical variables should be scaled if necessary.
• Feature Engineering (10%): New features should be created where appro-
priate.
• Model Selection (20%): Several models should be trained, and the best-
performing model should be selected based on appropriate metrics. Stu-
dents should evaluate the performance of their model using appropriate
metrics and compare it with other models. The selected model’s hyperpa-
rameters should be tuned using appropriate techniques.
• Creating Pipeline (20%): Students should create a pipeline cover all process.
• Deployment (30%): The selected model should be deployed using the
Streamlit framework, and the deployed model should be usable by end-
users.
Submissions
Submit a Jupyter Notebook containing the code for the project. Make sure
to include sufficient documentation and comments in your code. Also, provide
a separate document with instructions on how to run and interact with the
deployed mode in your report file. Each student should submit three files zipped
in a single file as “Group_0XX.zip” on LMS system:
• Jupyter Notebook (project.ipynb)
• Powerpoint Presentation (presentation.ppt) (Max 5 slides)
• A report (report.pdf) summarizing your findings and recommendations.
(Max 2 pages). Give the Streamlit cloud address of your project.
The deadline for submission is 24 December 2023 at 11:59 PM.
Instructions
• The project is open-book and open-internet. You are free to use any
resources available to you, but you are not allowed to collaborate with
other groups or to copy from other sources.
2
• This is an group project.
• Each student has to implement the required steps and come up with an
optimal solution.
• You are allowed to use any Python libraries for data analysis and modeling.
• You have to provide brief explanations of each step in the notebook and
presentation.
• The notebook should be well documented and easy to follow.
• Your performance will be evaluated based on the grading criteria mentioned
above.
• Academic honesty is expected from each student. Any act of plagiarism or
cheating will not be tolerated and will be reported to the university.
• All submissions must be made on or before the specified deadline. Late
submissions will not be accepted.
• Each group has to prepare short presentation for their project. Selected
projects will be discussed in the class as a demo.
Academic Honesty
• This is an group project, and each group is expected to work independently
and each member of group contribute equally to the project.
• Collaboration between groups is not allowed, and any instance of academic
dishonesty will be reported to the university authorities.
• You may use online resources and libraries, but you must cite them properly
in your code and presentation.
• Your code and presentation must be original and free of plagiarism.
Deadline
The deadline for submission is 24.12.2023 at 11:59 PM. Late submissions will
not be accepted.
Resources
You may find the following resources helpful:
• Our Lecture notes and notebooks on Teams
• [Pandas documentation] (https://pandas.pydata.org/docs/)
• [Scikit-learn documentation] (https://scikit-learn.org/stable/documentation.html)
• [Seaborn documentation] (https://seaborn.pydata.org/documentation.html)
• [Matplotlib documentation] (https://matplotlib.org/stable/contents.html)
• [Streamlit documentation] (https://streamlit.io/documentation)
• [Hyperparameter tuning with GridSearchCV] (https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)
• [Understanding Precision-Recall in Scikit-Learn] (https://scikit-
learn.org/stable/auto_examples/model_selection/plot_precision_recall.html)
• [Data Cleaning and Preprocessing in Python] (https://towardsdatascience.com/data-
cleaning-and-preprocessing-techniques-for-your-machine-
3
learning-project-ec50b8b7996b)
• [Machine Learning Pipeline: What is it and how to build one]
(https://towardsdatascience.com/machine-learning-pipeline-
what-is-it-and-how-to-build-one-7fddc3413e1d)
• [How to Deploy Machine Learning Models with Streamlit]
(https://towardsdatascience.com/how-to-deploy-machine-
learning-models-with-streamlit-379493145b58)
• [Machine Learning Project Checklist] (https://www.kdnuggets.com/2018/05/general-
ml-advice-project-checklist.html)