Skip to content

Performed feature selection using F-score method to filter out the important features in order to optimize the performance of Linear SVM machine learning model. The accuracy achieved was above 63%.

Notifications You must be signed in to change notification settings

Chan2k20/Genotype-Data-Predictive-Classifier-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Project Title: Feature Selection and SVM Model for SNP Genotype Data

Description/Problem Statement: Conducted feature selection on a simulated dataset of single nucleotide polymorphism (SNP) genotype data comprising 29,623 SNPs to extract important features using the f-score method. This process was performed in Python without relying on external libraries. The dataset included 4,000 cases and 4,000 controls as the training dataset.

Subsequently, a linear Support Vector Machine (SVM) model was built and trained using the selected features derived from the f-score method. The objective was to predict the outcomes of 2,000 test individuals accurately. Model optimization was pursued to achieve a target accuracy exceeding 63%.

The output of the project included the total number of features utilized and the column numbers corresponding to the selected features utilized for the final prediction.

Skills Utilized:

  • Feature Selection
  • Machine Learning (SVM)
  • Python Programming
  • Data Analysis
  • Model Optimization

Solution:

  • Implemented feature selection using the f-score method in Python without relying on external libraries.
  • Selected relevant features from the SNP genotype dataset to improve model performance and interpretability.
  • Constructed and trained a linear SVM model using the selected features to predict outcomes for test individuals.
  • Fine-tuned the SVM model to achieve an accuracy threshold of over 63%, ensuring robust predictive performance.
  • Provided the total count of selected features and the corresponding column numbers used for the final prediction, facilitating transparency and reproducibility of the results.

About

Performed feature selection using F-score method to filter out the important features in order to optimize the performance of Linear SVM machine learning model. The accuracy achieved was above 63%.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy