Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
17 views
bdcc-2.6
big data
Uploaded by
yexadat679
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save bdcc-2.6 For Later
Download
Save
Save bdcc-2.6 For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
17 views
bdcc-2.6
big data
Uploaded by
yexadat679
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save bdcc-2.6 For Later
Carousel Previous
Carousel Next
Save
Save bdcc-2.6 For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 7
Search
Fullscreen
yamn6r24, 23 AN BDCC Apache Drill Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. Asingle query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop. Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, so it's a good idea to co-locate Drill and the datastore on the same nodes. ntps:odce santechz com/uni-26-apache-dil, BOCC-6 - Apache Dri APACHE DRILL Ohedaap @mongo e, ainaz00|SB Pes El Windows Azureya7n6124, 83 AN BDCC Drill gets rid of all that overhead so that users can just query the raw data in-situ. There's no need to load the data, create and maintain schemas, or transform the data before it can be processed. Instead, simply include the path to a Hadoop directory, MongoDB collection or $3 bucket in the SQL query. Drill leverages advanced query compilation and re-compilation techniques to maximize performance without requiring up-front schema knowledge. Drill features a JSON data model that enables queries on complex/nested data as well as. rapidly evolving structures commonly seen in modern applications and non-relational datastores. Drill also provides intuitive extensions to SQL so that you can easily query complex data. Drill is the only columnar query engine that supports complex data. It features an in-memory shredded columnar representation for complex data which allows Drill to achieve columnar speed with the flexibility of an internal SON document model. hips bac. santechz.comvunit2I6-apache-dril BOCC- 8 - Apache Dil SELECT * FROM dfs.root.*/web/Logs”; SELECT country, count(*) FROM mongodb.web.users GROUP BY country; SELECT timestamp A cata can be represented a+ ‘eva the SON data mode dein data must beya7n6124, 83 AN BOCC- 8 - Apache Dil BDCC Tableau, Qlik, MicroStrategy, Spottire, SAS and Excel to interact with non- 9 i att . relational datastores by leveraging Dril's = {++} +a bleau JDBC and ODBC drivers. ++ QlikQ * Developers can leverage Drill's simple MxcroStrategy @Spotfire’ REST API in their custom applications to TIBCO Software EE] Excel §sas 3 create beautiful visualizations. Drill's virtual datasets allow even the most complex, non-relational data to be mapped into Bl-friendly structures which users can explore and visualize using their tool of choice. Drill isn't the world's first query engine, but it's the first that combines both flexibility and speed. To achieve this, Drill features a radically different architecture that enables record-breaking performance without sacrificing the flexibility offered by the JSON document model. Drill's design includes: Columnar execution engine (the first ever to support complex data!) Data-driven compilation and recompilation at execution time Specialized memory management that reduces memory footprint and eliminates garbage collections Locality-aware execution that reduces network traffic when Drill is co-located with the datastore Advanced cost-based optimizer that pushes processing into the datastore when possible hips bac. santechz.comvunit2I6-apache-dril ayamn6r24, 23 AN BOCC-6 - Apache Dri BDCC cul Tableau, Excel, Qlik, Web/Custom a set Apache Drill eee) NoSQL Search Files laaS/PaaS: Relational HBase Elasticsearch NAS (NetApp, etc.) Amazon $3 Oracle MongoDB HDFS MySQL Kudu SQL Server INSTALLING AND USING APACHE DRILL First we download Apache Drill wget http://apache.mirrors.hoobly.com/drill/drill-1.18.0/apache-drill-1. 18.0. tar.gz Then we extract it tar -xvzf apache-drill-1.18.0. tar.gz my apache-drill-1.18.0 apache-drill Then we launch it apache-drill/bin/drill-embedded hadoop@aaron-hadoop:~$ apache-drill/bin/drill-embedded ‘Apache Drill 1.18.0 “Data is the new oil. Ready to Drill some?" apache drill> §f ntps:odce santechz com/uni-26-apache-dil,ya7n6124, 83 AN BDCC BOCC- 8 - Apache Dil Plugin Management Enabled Storage Plugins Disabled Storage Plugins From the menu bar, Select Query eet] Sample Sol query: SELECT + FROM cp."employee. zon” LUT 20 ‘Query ype: OAL Physical Logical very int: Use Metter to submat te) FEY eect 000 ome @ ott ca: hips bac. santechz.comvunit2I6-apache-dril 57ya7n6124, 83 AN Bpce Se || symm nsencrat arena BOC - 6 - Apache Dri The query returns results that are not useable. We convert the data from byte arrays to UTF8 types that are meaningful. We also store this query in a view. (CREATE VIEW dis.tmp.students AS ‘SELECT CONVERT_FROM(ow key, 'UTF8) AS studentc, ‘CONVERT_FROM(studenis.account.name, UTF8) AS name, CONVERT FROM(students.address.state, ‘UTF8) AS state, — CONVERT_FROM(ctudents.address street, 'UTF8) AS stroot, trae set cov scent ete CONVERT_FROM(etudents. address zipcode, 'UTFB) AS zipcode FROM hbase. students; Soeaeitices, ‘SELECT * FROM ats mp students; Se Bets seo or shea etn ntps:odce santechz com/uni-26-apache-dil, er276124, 8:49 AM [BOCC-6 - Apache Dri BDCC CONVERT_FROMiclcks.cickinfo ur, UTF8) AS ul FROM hbase.cicks; Note:- We write time within "backquotes' as tis an sql keyword. SELECT * FROM dis.imp.clicks; Pelelelejelele/ee Join the two tables together using a join ‘SELECT * FROM (SELECT * FROM dfs.tmp.students) s LEFT JOIN (SELECT * FROM ais.tmp clicks) ¢ ON s.studentid = ¢.studentid; ‘tenis = same = state > stent © spcote = ald = sade = tine wt : siete CA are tests leet sate owororrzooiow pew grg com suet) ce CA aR nr tats eka aot rowororororowon —— yawnamsrancam dena A tine ms etd. stufentz_——=—=ow or oorozmteco! pwn con suena 80 CA Hin ee sen So CR Nita ats cekS_—— dene aTTZOFOLOOD puree sed Fak Ck as Matar ts cok sudena—=«=«2ORLERGTIZOFELCOD! mum gnogacom a Cl tus coy adens om ezartzaseioooY gmat seule My_—— CR SStPeny Heb aot SRT ZPOFONONOY pawn set ay CASS Pny KS ekd | adett 20RD GLZBONONONOY Mp fawamarncam Compiled by Aaron Stanislaus Johns ntps:ifbdce santechz.com/unit-216-apache-dil, a"
You might also like
Hadoop HIVE
PDF
No ratings yet
Hadoop HIVE
41 pages
NoSQL and SQL - Open Analytics Summit
PDF
No ratings yet
NoSQL and SQL - Open Analytics Summit
28 pages
Drill Slides
PDF
No ratings yet
Drill Slides
14 pages
Apache Drill: SQL For Nosql
PDF
No ratings yet
Apache Drill: SQL For Nosql
7 pages
Drill High Performance SQL Engine With Json Data Model 150519024433 Lva1 App6891
PDF
No ratings yet
Drill High Performance SQL Engine With Json Data Model 150519024433 Lva1 App6891
23 pages
Bigquery, Google'S Enterprise Data Warehouse: Slid02
PDF
No ratings yet
Bigquery, Google'S Enterprise Data Warehouse: Slid02
3 pages
Google Cloud Data Platform & Services: Gregor Hohpe
PDF
No ratings yet
Google Cloud Data Platform & Services: Gregor Hohpe
35 pages
Apache Hive
PDF
No ratings yet
Apache Hive
3 pages
Ha Do Op World
PDF
No ratings yet
Ha Do Op World
24 pages
Unstructured Data: User Price Shipped
PDF
No ratings yet
Unstructured Data: User Price Shipped
14 pages
Apache Hive Cookbook - Sample Chapter
PDF
100% (1)
Apache Hive Cookbook - Sample Chapter
27 pages
Module 5_data analytics
PDF
No ratings yet
Module 5_data analytics
4 pages
Apache Spark - DataFrames and Spark SQL
PDF
100% (2)
Apache Spark - DataFrames and Spark SQL
146 pages
Unit1 - Database Engine
PDF
No ratings yet
Unit1 - Database Engine
16 pages
Smart Data Boden Introduction Flink
PDF
No ratings yet
Smart Data Boden Introduction Flink
37 pages
BDA Answers
PDF
No ratings yet
BDA Answers
10 pages
Session 3.2
PDF
No ratings yet
Session 3.2
27 pages
CMSC476676-TermPaperPatnaikPratiksha
PDF
No ratings yet
CMSC476676-TermPaperPatnaikPratiksha
8 pages
Big data
PDF
No ratings yet
Big data
79 pages
BigData Nov2019
PDF
No ratings yet
BigData Nov2019
50 pages
DB Material
PDF
No ratings yet
DB Material
7 pages
NoSQL DB
PDF
No ratings yet
NoSQL DB
33 pages
Apache Calcite Tutorial
PDF
No ratings yet
Apache Calcite Tutorial
83 pages
Data Engineering 101 - Spark Concepts
PDF
No ratings yet
Data Engineering 101 - Spark Concepts
100 pages
DW
PDF
No ratings yet
DW
4 pages
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
PDF
No ratings yet
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
18 pages
Apache Hive: An Introduction
PDF
No ratings yet
Apache Hive: An Introduction
51 pages
NOSQL Databases
PDF
No ratings yet
NOSQL Databases
18 pages
Chapter 5 Hive
PDF
No ratings yet
Chapter 5 Hive
69 pages
NoSql-Unit-3
PDF
No ratings yet
NoSql-Unit-3
85 pages
BDA Unit - II
PDF
No ratings yet
BDA Unit - II
66 pages
Mark Raasveldt & Hannes Mühleisen: Duckdb
PDF
No ratings yet
Mark Raasveldt & Hannes Mühleisen: Duckdb
38 pages
Hive_Main
PDF
No ratings yet
Hive_Main
33 pages
Apache HIVE
PDF
100% (1)
Apache HIVE
105 pages
BIG DATA 4
PDF
No ratings yet
BIG DATA 4
14 pages
BDS ASSIGNMENT
PDF
No ratings yet
BDS ASSIGNMENT
14 pages
List of NOSQL Database
PDF
No ratings yet
List of NOSQL Database
23 pages
Unit 4
PDF
No ratings yet
Unit 4
60 pages
EUC1502 Module5 Big-Data
PDF
No ratings yet
EUC1502 Module5 Big-Data
46 pages
Tarea Académica 2 POO
PDF
No ratings yet
Tarea Académica 2 POO
5 pages
15 Big Data Tools and Technologies To Know About in 2021
PDF
No ratings yet
15 Big Data Tools and Technologies To Know About in 2021
7 pages
HIVE AND PIG
PDF
No ratings yet
HIVE AND PIG
57 pages
Hortonworks Data Platform (HDP)
PDF
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Big Data Management
PDF
No ratings yet
Big Data Management
55 pages
Mongodbinternalsdevternity 151209084136 Lva1 App6891
PDF
No ratings yet
Mongodbinternalsdevternity 151209084136 Lva1 App6891
52 pages
Hive PPT
PDF
No ratings yet
Hive PPT
61 pages
Analyzing Big Data in Hadoop Spark
PDF
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
BD U-5 (Anupam Sir)
PDF
No ratings yet
BD U-5 (Anupam Sir)
12 pages
Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12
PDF
No ratings yet
Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12
21 pages
Spark
PDF
No ratings yet
Spark
96 pages
Hive
PDF
No ratings yet
Hive
29 pages
Hive and Hiveql
PDF
No ratings yet
Hive and Hiveql
10 pages
sparkapache
PDF
No ratings yet
sparkapache
2 pages
Apache Hive lessons for beginner
PDF
No ratings yet
Apache Hive lessons for beginner
93 pages
Apache Spark 101 for Data Engineering
PDF
No ratings yet
Apache Spark 101 for Data Engineering
15 pages
Apache Hive: Prashant Gupta
PDF
100% (1)
Apache Hive: Prashant Gupta
61 pages
ds2 5 Pig Pyspark
PDF
No ratings yet
ds2 5 Pig Pyspark
64 pages
DOC-20241221-WA0006.
PDF
No ratings yet
DOC-20241221-WA0006.
14 pages
Time Table
PDF
No ratings yet
Time Table
7 pages
Monthly Current Wallah October 2024 English.pdf
PDF
No ratings yet
Monthly Current Wallah October 2024 English.pdf
148 pages
Weekly Digest - Must Read Highlights.pdf
PDF
No ratings yet
Weekly Digest - Must Read Highlights.pdf
7 pages
BIBLE QUIZ
PDF
No ratings yet
BIBLE QUIZ
1 page
bdcc-2.3
PDF
No ratings yet
bdcc-2.3
16 pages
bdcc-2.2
PDF
No ratings yet
bdcc-2.2
12 pages
bdcc-2.5
PDF
No ratings yet
bdcc-2.5
9 pages
bdcc-2.4
PDF
No ratings yet
bdcc-2.4
5 pages