Skip to content

Clasterization of TCGA dataset. Data preprocessing, visualization and clasterization with different alghoritms. Done mostly with Python 3.7 and Scikit-Learn library.

Notifications You must be signed in to change notification settings

adrian-aleks/TCGA-clasterization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

TCGA-clasterization

This repo contains jupyter notebook that shows part of my work done for my bachelor thesis. Dataset data and more sophisticated description can be found at:

Short description

The data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD. Each of the 801 rows describes genome profile of a particular patient. Conducted analysis aim was to answer the question of how well unsupervised learning alghoritms could sepereate different types of cancer within the dataset or are there any other clusters within or between different kinds of cancer.

How to run this

Simply click on this link!

Tools used

  • Python 3.7
  • Scikit-Learn library
  • Pandas
  • Matplotlib

About

Clasterization of TCGA dataset. Data preprocessing, visualization and clasterization with different alghoritms. Done mostly with Python 3.7 and Scikit-Learn library.

Topics

Resources

Stars

Watchers

Forks

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy