0% found this document useful (0 votes)
17 views

bdcc-2.6

big data

Uploaded by

yexadat679
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
17 views

bdcc-2.6

big data

Uploaded by

yexadat679
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
yamn6r24, 23 AN BDCC Apache Drill Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. Asingle query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop. Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, so it's a good idea to co-locate Drill and the datastore on the same nodes. ntps:odce santechz com/uni-26-apache-dil, BOCC-6 - Apache Dri APACHE DRILL Ohedaap @mongo e, ainaz00|SB Pes El Windows Azure ya7n6124, 83 AN BDCC Drill gets rid of all that overhead so that users can just query the raw data in-situ. There's no need to load the data, create and maintain schemas, or transform the data before it can be processed. Instead, simply include the path to a Hadoop directory, MongoDB collection or $3 bucket in the SQL query. Drill leverages advanced query compilation and re-compilation techniques to maximize performance without requiring up-front schema knowledge. Drill features a JSON data model that enables queries on complex/nested data as well as. rapidly evolving structures commonly seen in modern applications and non-relational datastores. Drill also provides intuitive extensions to SQL so that you can easily query complex data. Drill is the only columnar query engine that supports complex data. It features an in-memory shredded columnar representation for complex data which allows Drill to achieve columnar speed with the flexibility of an internal SON document model. hips bac. santechz.comvunit2I6-apache-dril BOCC- 8 - Apache Dil SELECT * FROM dfs.root.*/web/Logs”; SELECT country, count(*) FROM mongodb.web.users GROUP BY country; SELECT timestamp A cata can be represented a+ ‘eva the SON data mode dein data must be ya7n6124, 83 AN BOCC- 8 - Apache Dil BDCC Tableau, Qlik, MicroStrategy, Spottire, SAS and Excel to interact with non- 9 i att . relational datastores by leveraging Dril's = {++} +a bleau JDBC and ODBC drivers. ++ QlikQ * Developers can leverage Drill's simple MxcroStrategy @Spotfire’ REST API in their custom applications to TIBCO Software EE] Excel §sas 3 create beautiful visualizations. Drill's virtual datasets allow even the most complex, non-relational data to be mapped into Bl-friendly structures which users can explore and visualize using their tool of choice. Drill isn't the world's first query engine, but it's the first that combines both flexibility and speed. To achieve this, Drill features a radically different architecture that enables record-breaking performance without sacrificing the flexibility offered by the JSON document model. Drill's design includes: Columnar execution engine (the first ever to support complex data!) Data-driven compilation and recompilation at execution time Specialized memory management that reduces memory footprint and eliminates garbage collections Locality-aware execution that reduces network traffic when Drill is co-located with the datastore Advanced cost-based optimizer that pushes processing into the datastore when possible hips bac. santechz.comvunit2I6-apache-dril a yamn6r24, 23 AN BOCC-6 - Apache Dri BDCC cul Tableau, Excel, Qlik, Web/Custom a set Apache Drill eee) NoSQL Search Files laaS/PaaS: Relational HBase Elasticsearch NAS (NetApp, etc.) Amazon $3 Oracle MongoDB HDFS MySQL Kudu SQL Server INSTALLING AND USING APACHE DRILL First we download Apache Drill wget http://apache.mirrors.hoobly.com/drill/drill-1.18.0/apache-drill-1. 18.0. tar.gz Then we extract it tar -xvzf apache-drill-1.18.0. tar.gz my apache-drill-1.18.0 apache-drill Then we launch it apache-drill/bin/drill-embedded hadoop@aaron-hadoop:~$ apache-drill/bin/drill-embedded ‘Apache Drill 1.18.0 “Data is the new oil. Ready to Drill some?" apache drill> §f ntps:odce santechz com/uni-26-apache-dil, ya7n6124, 83 AN BDCC BOCC- 8 - Apache Dil Plugin Management Enabled Storage Plugins Disabled Storage Plugins From the menu bar, Select Query eet] Sample Sol query: SELECT + FROM cp."employee. zon” LUT 20 ‘Query ype: OAL Physical Logical very int: Use Metter to submat te) FEY eect 000 ome @ ott ca: hips bac. santechz.comvunit2I6-apache-dril 57 ya7n6124, 83 AN Bpce Se || symm nsencrat arena BOC - 6 - Apache Dri The query returns results that are not useable. We convert the data from byte arrays to UTF8 types that are meaningful. We also store this query in a view. (CREATE VIEW dis.tmp.students AS ‘SELECT CONVERT_FROM(ow key, 'UTF8) AS studentc, ‘CONVERT_FROM(studenis.account.name, UTF8) AS name, CONVERT FROM(students.address.state, ‘UTF8) AS state, — CONVERT_FROM(ctudents.address street, 'UTF8) AS stroot, trae set cov scent ete CONVERT_FROM(etudents. address zipcode, 'UTFB) AS zipcode FROM hbase. students; Soeaeitices, ‘SELECT * FROM ats mp students; Se Bets seo or shea etn ntps:odce santechz com/uni-26-apache-dil, er 276124, 8:49 AM [BOCC-6 - Apache Dri BDCC CONVERT_FROMiclcks.cickinfo ur, UTF8) AS ul FROM hbase.cicks; Note:- We write time within "backquotes' as tis an sql keyword. SELECT * FROM dis.imp.clicks; Pelelelejelele/ee Join the two tables together using a join ‘SELECT * FROM (SELECT * FROM dfs.tmp.students) s LEFT JOIN (SELECT * FROM ais.tmp clicks) ¢ ON s.studentid = ¢.studentid; ‘tenis = same = state > stent © spcote = ald = sade = tine wt : siete CA are tests leet sate owororrzooiow pew grg com suet) ce CA aR nr tats eka aot rowororororowon —— yawnamsrancam dena A tine ms etd. stufentz_——=—=ow or oorozmteco! pwn con suena 80 CA Hin ee sen So CR Nita ats cekS_—— dene aTTZOFOLOOD puree sed Fak Ck as Matar ts cok sudena—=«=«2ORLERGTIZOFELCOD! mum gnogacom a Cl tus coy adens om ezartzaseioooY gmat seule My_—— CR SStPeny Heb aot SRT ZPOFONONOY pawn set ay CASS Pny KS ekd | adett 20RD GLZBONONONOY Mp fawamarncam Compiled by Aaron Stanislaus Johns ntps:ifbdce santechz.com/unit-216-apache-dil, a"

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy