Big Data Developerに贈る ~ Microsoft Azure による Big Data Architecture と、Elasticsearch、Databricks 解説 [セミナー] 東京開催
https://www.microsoftevents.com/profile/form/index.cfm?PKformID=0x8311627abcd
This document discusses Hortonworks Data Platform (HDP) updates and releases. It notes that HDP will have more frequent releases of components like Spark, Hive, and Ambari, while having longer release cycles for core Hadoop components. HDP 2.5 is highlighted as including interactive Hive queries using LLAP, enterprise Spark support in Zeppelin notebooks, real-time applications support in Storm and HBase/Phoenix, streamlined operations using Ambari, and dynamic security with Atlas and Ranger integration.
This document discusses the evolution of Hadoop and its use cases in the adtech industry. It describes how Hadoop was initially used primarily for batch processing via Hive and MapReduce. Over time, improvements like Tez, Presto, and Impala enabled faster interactive SQL queries on big data. The document also outlines how the Hadoop ecosystem is now used for real-time log collection, reporting, model generation, and more across the entire adtech stack. Key recent developments discussed include improvements in Hive like LLAP that enable sub-second SQL and ACID transactions, as well as tools like Cloudbreak for deploying Hadoop clusters in the cloud.
Dynamic Resource Allocation in Apache SparkYuta Imai
Dynamic resource allocation in Apache Spark allows executors to be dynamically added or removed based on the workload of applications. Extra executors are added when applications have pending tasks to help balance workload, and idle executors are removed to free resources for other applications. The dynamic allocation policies control when executors are requested or removed based on factors like pending tasks and executor idle time. An external shuffle service is also used to improve shuffle performance.
今回のウェビナーでは、Hadoop1.xからみなさまに深く親しまれてきたApache Hiveが昨今、どのような形で高速化されてきたかについて話します。MapReduceからTezに変わった実行エンジン、インデックスを持ったカラムナーファイルフォーマットであるORC、モダンなCPUを最大限に活用するVectorization、Apache Calciteを利用したCost Based Optimizerによる実行計画の最適化、そして1秒以下のクエリレスポンスを実現するLLAPについて説明します。いずれの機能も数行の設定やコマンドで活用可能なものばかりですが、今回はそれらの背景でどんな仕組みが動いているのか、どんな仕組みで実現されているのかということについて話します。
The story about how to figure out what to measure, and how you can benchmark that. This slide deck tells the idea of benchmarking and does not tell actual commercial/open source benchmark tools.
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.