BDA Assignment 3
BDA Assignment 3
Fault Tolerance Replication of data across nodes Resilient Distributed Datasets (RDDs)
Data Caching No caching, relies on disk I/O In-memory caching of intermediate data
Machine Learning Limited support (Hadoop ML) Advanced machine learning with MLlib
SQL Queries Hive for SQL-like queries Spark SQL for interactive queries
Data Sources
HDFS, HBase, and others HDFS, HBase, Cassandra, and others
Supported
Simpler cores optimized for parallel More complex cores optimized for
Core Complexity
tasks sequential tasks