0% found this document useful (0 votes)
13 views20 pages

CENG3300 Lecture 10

This document provides an overview of various machine learning techniques relevant to molecular engineering including clustering, dimension reduction, and reinforcement learning. It discusses k-means clustering and how it is used to group similar chemical compounds or protein sequences. Principal component analysis (PCA) is introduced as an unsupervised technique that projects high-dimensional data into a lower-dimensional space while maximizing variance. Dimension reduction techniques like PCA and t-SNE are used for data visualization, noise removal, and preprocessing for machine learning models. Autoencoders and generative adversarial networks are also discussed for dimensional reduction. Reinforcement learning is defined as a method for self-learning complex decisions through simulated experiences and has seen success in chemistry applications.

Uploaded by

huichloemail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views20 pages

CENG3300 Lecture 10

This document provides an overview of various machine learning techniques relevant to molecular engineering including clustering, dimension reduction, and reinforcement learning. It discusses k-means clustering and how it is used to group similar chemical compounds or protein sequences. Principal component analysis (PCA) is introduced as an unsupervised technique that projects high-dimensional data into a lower-dimensional space while maximizing variance. Dimension reduction techniques like PCA and t-SNE are used for data visualization, noise removal, and preprocessing for machine learning models. Autoencoders and generative adversarial networks are also discussed for dimensional reduction. Reinforcement learning is defined as a method for self-learning complex decisions through simulated experiences and has seen success in chemistry applications.

Uploaded by

huichloemail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Science for Molecular

Engineering
Lecture 10
ILOs
• Understand the concept and method for clustering;
• Understand the motivation and methods for dimension reduction;
• Understand the elements of reinforcement learning;
Clustering
• Goal: partition unlabeled data into groups of similar data points
• When and why?
• Data organization (e.g. for easier search)
• Understand the underlying structure of data
• Preprocessing for further analysis
Applications of clustering
• Cluster chemical compounds for better organization

• Cluster protein sequences by function or genes according to


expression profile.
K-means clustering
• Partition the data around k centers

• Using a set of input points and a similarity measure, identify the best
position of k centers so that the sum of distances to the nearest
center is minimized.
Similar clustering algorithms
Principal component analysis
• What is PCA?
• Unsupervised learning technique for extracting variance structure
from high dimensional datasets

PCA is an orthogonal projection or transformation of


the data into a (possibly lower dimensional) subspace
so that the variance of the projected data is
maximized.
Principal component analysis

PCA tries to rotate the axis/coordinate system so that only one feature is kept
Principal component analysis

PCA can apply to higher dimensional data as well


Principal component analysis
• Principal components (PC) are orthogonal directions that capture the
most variance in the data
• The first principal component is the direction of the greatest data variability
• 2nd PC is the direction orthogonal to 1st PC that has the greatest data
variability,
• And so on…
What is PCA used for?
• Visualization
• Noise removal
• Feature transformation – lower dimension can lead to better
generalizability
• Preprocessing for machine learning models
t-distributed stochastic neighbor embedding
(tSNE)
• Non-linear dimension reduction technique
• Project high dimensional data onto a 2-D or 3-D space, where similar
points are modeled by similar lower dimensional points with high
probability
• The most popular method for data visualization, especially for high
dimensional data
t-SNE examples
Autoencoders
Dimensional reduction techniques such as PCA and t-SNE transforms original features into some new features
(latent features), and can be viewed as an encoder.
Autoencoders add a decoder to recover the original input from the latent features
Generative adversarial network (GAN)
Examples of AE and GAN
Reinforcement learning
• A method used for self-learning to make complex decisions through
numerous simulated experiences
Striking success has been seen using RL
Similar concept has been brought into
chemistry

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy