0% found this document useful (0 votes)
77 views9 pages

Parallel K-Means Using Map Reduce On Big Data Cluster Analysis

The document discusses using MapReduce to perform parallel k-means clustering on big data. The mapping step assigns data points to the closest cluster center. The reducing step revises cluster centers by taking the mean of assigned data points. This mapping and reducing is done iteratively until cluster centers converge.

Uploaded by

sunita chalageri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views9 pages

Parallel K-Means Using Map Reduce On Big Data Cluster Analysis

The document discusses using MapReduce to perform parallel k-means clustering on big data. The mapping step assigns data points to the closest cluster center. The reducing step revises cluster centers by taking the mean of assigned data points. This mapping and reducing is done iteratively until cluster centers converge.

Uploaded by

sunita chalageri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Parallel K-means using Map

Reduce on Big Data Cluster


Analysis

Big Data Computing Vu Pham Machine Learning Classification Algorithm


MapReducing 1 iteration of k-means
Classify: Assign observations to closest cluster center

Map: For each data point, given ({μj},xi), emit(zi,xi)

Recenter: Revise cluster centers as mean of assigned


observations

Reduce: Average over all points in cluster j (zi=k)

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Classification step as Map
Classify: Assign observations to closest cluster center

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Recenter step as Reduce
Recenter: Revise cluster centers as mean of
assigned observations

reduce(j, x_in_cluster j : [x1, x3,…, ])


sum = 0
count = 0
for x in x_in_cluster j
sum += x
count += 1
emit(j, sum/count)

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Distributed KMeans Iterative Clustering

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Distributed KMeans Iterative Clustering

Find Nearest Center

Key is Center, Value is Movie

Average Ratings

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Summary of Parallel k-means using MapReduce

Map: classification step;


data parallel over data points

Reduce: recompute means;


data parallel over centers

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Some practical considerations
k-means needs an iterative version of MapReduce
Not standard formulation

Mapper needs to get data point and all centers


A lot of data!
Better implementation:
mapper gets many data points

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Conclusion

In this lecture, we have given an overview of cluster


analysis and also discussed machine learning
classification algorithm k-means using Mapreduce for
big data analytics

Big Data Computing Vu Pham Machine Learning Classification Algorithm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy