Welcome to my Master's Project Portfolio repository! This repository contains all the key projects, research work, and implementations I developed during my M.Sc. in Applied Statistics and Data Science at Jahangirnagar University, Dhaka, Bangladesh.
- About Me
- Projects
- TweetGuard: Combining Transformer and Bi-LSTM Architectures for Fake News Detection
- Unveiling Hidden Patterns: A Deep Learning Framework Utilizing PCA for Fraudulent Scheme Detection
- AnimeLens: A Hybrid Approach for Personalized Anime Recommendations using Deep Collaborative Filtering and TF-IDF Content Based Filtering
- Impact of Transformations in Modeling and Forecasting with ARIMA
- Exploring Customer Segmentation through Dimensionality Reduction and Clustering: A Comparative Analysis with KMeans, Hierarchical (Agglomerative), and DBSCAN in Python
- Technologies Used
- Acknowledgments
- License
I am Kowshik Sankar Roy, a passionate Data Scientist with a background in Electronics and Communication Engineering and an M.Sc. in Applied Statistics and Data Science. My work focuses on data-driven solutions using machine learning, deep learning, and statistical modeling to solve real-world problems. Throughout my master's, I have developed various projects, which you can explore in this repository.
Feel free to connect with me for collaborations or discussions!
- Description: Developed a hybrid model named TweetGuard that combines Transformer and Bi-LSTM architectures for detecting fake news in large-scale tweets.
- Key Contributions:
- Designed a robust text preprocessing pipeline using BERTweet tokenization.
- Conducted ablation studies to evaluate the performance of individual model components.
- Demonstrated superior accuracy on the TruthSeeker dataset.
- Description: Developed a deep learning model that utilizes PCA for dimensionality reduction and Bayesian optimization for hyperparameter tuning to detect fraud in supply chain analytics.
- Key Contributions:
- Achieved a 94.71% fraud detection rate with 99.42% overall accuracy on the DataCo dataset.
- Implemented SMOTE to handle class imbalance.
AnimeLens: A Hybrid Approach for Personalized Anime Recommendations using Deep Collaborative Filtering and TF-IDF Content Based Filtering
- Description: Created a hybrid recommendation system that leverages Deep Collaborative Filtering and TF-IDF Content-Based Filtering to provide personalized anime recommendations.
- Key Contributions:
- Overcame the cold-start problem through hybrid filtering.
- Enhanced user experience by providing accurate, diverse recommendations.
- Description: Conducted an in-depth analysis of time series data using ARIMA models, focusing on achieving stationarity through differencing and various transformations. Utilized statistical tests (ADF and KPSS), model selection criteria, and forecasting techniques to ensure robust predictions, enhancing the overall effectiveness of time series analysis.
- Key Contributions:
- Achieved stationarity in time series data through systematic differencing and transformations.
- Employed statistical tests (ADF, KPSS) for rigorous validation of stationarity.
- Selected optimal ARIMA models using AIC, BIC, and forecasting accuracy measures.
- Generated accurate forecasts, contributing to improved decision-making in time series analysis.
Exploring Customer Segmentation through Dimensionality Reduction and Clustering: A Comparative Analysis with KMeans, Hierarchical (Agglomerative), and DBSCAN in Python
- Description: Performed unsupervised clustering on grocery firm customer data to identify distinct segments for targeted product development and personalized marketing strategies. Used PCA for dimensionality reduction and compared K-Means, Hierarchical Clustering, and DBSCAN for optimal segmentation.
- Key Contributions:
- Applied dimensionality reduction techniques like PCA to streamline the data for efficient analysis.
- Performed comparative analysis using K-Means, Hierarchical (Agglomerative) Clustering, and DBSCAN algorithms to identify optimal customer clusters.
- Evaluated the performance of each algorithm based on clustering quality metrics such as silhouette score and Davies-Bouldin index.
- Visualized the clusters and patterns using Matplotlib and Seaborn to provide business insights.
- Provided actionable recommendations on product development and personalized marketing strategies based on the segmentation results.
- Languages: Python, R, SPSS, Hadoop
- Frameworks: TensorFlow, PyTorch, Scikit-learn
- Tools: Google Colab, Git, Jupyter Notebooks, R Studio
- Techniques: Machine Learning, Deep Learning, Data Mining, Statistical Tests, Time Series Analysis, Big Data, Regression, Clustering
I would like to express my gratitude to my academic advisors, peers, and collaborators who contributed to the success of these projects. Special thanks to Jahangirnagar University for providing the platform to carry out this research.
This repository is licensed under the MIT License. See the LICENSE file for more details.