Skip to content

irhallac/blog-spark-naive-bayes-reuters

 
 

Repository files navigation

blog-spark-naive-bayes-reuters

Simple example on how to use Naive Bayes on Spark using the popular Reuters 21578 dataset. More info on our blog: http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/

Requirements

GravityLabs Goose (use our fork to be compatible with Scala 2.10):

$ git clone http://github.com/Chimpler/goose
$ mvn install

Setup

$ ./download_reuters.sh

Running classification

$ sbt run    

Examples

http://www.coinflation.com/coins/1942-1945-Silver-War-Nickel-Value.html => gold
http://www.businessweek.com/news/2014-06-10/china-using-dubai-style-fake-islands-to-reshape-south-china-sea => ship
http://en.wikipedia.org/wiki/Soybean => grain
http://en.wikipedia.org/wiki/Whole_wheat_bread => grain
http://en.wiktionary.org/wiki/cow => livestock

About

Simple example on how to use Naive Bayes on Spark using the popular Reuters 21578 dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy