0% found this document useful (0 votes)
22 views

Da 1

data analytics unit 1 aktu

Uploaded by

Puneet Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
22 views

Da 1

data analytics unit 1 aktu

Uploaded by

Puneet Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
Part-I: Introduction to Data Analytics 1 Introduction To Big Data What Is Big Data? + Big Data is often described as extremely large data sets that have grown beyond the ability to manage and analyze them with traditional data processing tools, + Big Data defines a situation in which data sets have grown to such enormous sizes that conventional information technologies can uo longer effectively handle either the size of the data set or the seale and growth of the data set + Im other words, the data set has grown so large that it is difficult to manage and even harder to garner value out of it * The primary difficulties are the acquisition, storage, searching, sharing, analytics, and visual- ization of data. © Big Data has its roots in the scientific and médical communities, where the complex analysis of massive amounts of data has been done for drug development, physics modeling, and other forms of research, all of which involve large data sets, These 4Vs (See Figure 1) {13} ? of Big Data lay ont the path to analy in the proce with each hnving intrinsic wale of discovering value. Nevertheless, the complexity of Big Data does not end with jist four "Volume—Organieations collect data from a variety of sources, including transactions, sinart (Io) devices, industrial cquipment, videos, images, audio, social media and more. In the past, storing all that data would have been too costly — but cheaper storage using data lakes, Hadoop and the cloud have eased the burden Velocity—With the growth in the Internet of Things, data streams into businesses nt an unprecedented speed and must be handled in a timely manner. RFID tags. sensors and sinart yacters are driving the need to deel with thesc torrents of data in Variety — jaca comes in all types of formate ~ from structured, nunneric data in traditional databases to unstructured ‘ext locimpents, emails, videos, audion, stock ticker dats nnd Buinncial transactions, Veracity—Verncity to the quality of data. Because data comes froin so many different sources, i's dificult to I maich, cleanse and transform data across systems. Tlusinesses need to connect und correlate telationshipe, bierarehien multiple data tniges. Otherwise, their data cnn quickly spiral out of conta ‘Value—This refers to the value that the big data can provide aud it relates dircetly ko wh lected dava, Tt is often quantified a ganizations eat do. wi moe the data rnight create VOLUME Huge amount of data VARIETY Different formats of data from various Big Data sources Inconsistencies and uncertainty in data VELOCITY High speed of ‘accumulation of data Extract useful Figure 1: Mustration of Big Data (14) dimensions. There are other factors at work as well: the processes that Big Data drives. These processes ace a conglomeration of technologies and analytics that are used to define the vulue of data sources, which translates to actionable clements that move businesses forward. Many of those technologies or concepts are not new but have come to fall under the umbrella of Big ‘Volatility it deals wth “How long the dats is valid Validity —ic refers to accuracy and corectness of data. Any’ data picked up for analysis needs to be accurate Variability —tn addition vo the mereasin Data, Best defined as analysis categories, these technologies and concepts include the following ‘Tracltional business intelligence (BI). This consists of a brond category of applications and technologies for gathering, storing, analyzing, and providing access to data BI delivers actionable information, which helps enterprise tsers make better business decisions using fact-based support systems. BE works by using an indepth analysis of detailed business data, provided by databases, application data, and other tangible data sources, In some circles, BI can provide historical, current, and predictive views of business operations. Data mining, This is « process in which data are analyzed from different perspectives and then turned imo summary data that are deemed useful. Data mining is normally used with data at rest or with archival data, Data mining techniques focus on modeling and knowledge discovery for predictive, rather than purely descriptive, purposes—an ideal process for uncovering new patterns from large date. sets. Statistical applications, These look at data using algorithms based on statistical principles and normally concentrate on data sets related to polls, cenqus, and other static data sets. Statistical applications ideally deliver sample observations that can be used to Study populated data sets for the purpose of estimating, testing, and predictive analysis, Empirical data, such as surveys and experimental reporting, are the primary sources for analyzable information. Predictive analysis. This is a subset of statistical applications in which data sets are examined to come up with predictions, based on trends and information gleaned from databases. Predictive analysis tends to be big in the financial and scientific worlds, where trending tends to drive predictions, once external elements are added to the data set. Oue of the main goals of predictive analysis is to identi © the risks and opportunities for business process, markets, aud manufacturing. Data modeling. This is « conceptual application of analytics in which multiple “what-if” scenarios can be applied via algorithms to multiple date sets, Ideally, the modeled information changes based on the information made available to the algorithms, which then provide insight wo the effects of the change ou the data sets. Data inodeling works hand in hand. with data visualization, in which uncovering information can help with a particular business endeavor: The preceding analysis categories constitute only a portion af where Big Data is headed and why it hi intrinsic value to business, That valve is driven by the never ending quest for a competitive advantage encouraging organizations to turn to large repositories of corparate and ¢ ternal data to uncover trend statistics, and other actionable information to help them decide on their ext move, This has helped the concept of Big Data to gain popularity with technologists and executives alike, along with its associated tools, platforms, and analytics 1.1 ARRIVAL OF ANALYTICS ‘As analytics and research were applied to large data sets, scientists came to the conclusion that more is betterin this case, more data, more analysis, and more results. Researchers started to incorporate related data sets, unstructured data, archival data, and real-time data into the process In the business world, Big Data is all about opportunity According to IBM, every day we create 2.5 quintillion (2.5 x 1018) bytes of data, so much that 90 percent of the data in the world today has been created in the last two years, These data come from everywhere: sensors nsed to gather climate information, posts to so- cial media sites, digital pictures and videos posted online, transaction records of online purchases, and cell phone GPS signals, to name just a few. That is the eatalyst for Big Data, along with the more important fact that all of these data have intrinsic value that can be extrapolated using analytics, algorithms, and other techniques NOAA uses Big Data approaches to aid in climate, ecosyst whil em, weather, and commercial research, ASA uses Big Date for acronautical and other research. Pharmaceutical companies and encrsy companies have leveraged Big Data for more tangible esults, suchas drug testing and geophysical © New York Times has used Big Data tools for text anulysis and Web mining, while the Walt Disney Company hem to correlate and understand custoaner behavior in all of its stores. theme parks © Big, Data is full of challenges. ranging from the technical to the conceptual to the operational, any of which can derail the ability to discover value and leverage what Big Data is all about 2 Characteristics of Data '® Data is « collection of details in the form of either figures or texts or symbols, or descriptions ete © Data contains raw figures and facts. Information unlike data provides insights analyzed through the data collected. Data hus 3 characteristics: 1. Composition:— The composition of data deals with the structure of data, i.e; the sources of data, the granularity. the types aud nature of data us to whether it is static or real tinne streaming. The condition of data deals with the state of data, i.e: “Can one use this data as is lor “Does it require cleaning for further enhaucement and enrichment?” data” ‘The context of data deals with “Where has this data been generated”. “Why was ‘What are the events associated with this”. this data generated?”, “How sensitive is this data?”, 3 Data Classification ‘The volume and overall size of the data set is-only one portion of the Big Data equation. There is @ erowing consensus that both sexai-structured and unstructured data sources contain business-citical information andl must therefore be made accessible for both BI and operatiousl needs. It is also clear that the sisoumt of relevant unstructured business data is not only growing but will continne to grow for the foreseeable Eaxass: Data can be classified under several categories: 1, Structured data:—Structured data are normally found in traditional databases (SQU or others) “ehere data are organized into tables based on defined business rules, Structured data usually Prove type of data to work with, simply because the data are defined and indexed, making to be the easi ‘access und filtering easier. For example, Database, Spread sheets, OLTP systems. ‘structured data fall between unstructured and structured data, Semi- 2. Semi. structured data do uot have a fornal structure Tike a database with tables and relationships. However, nlike uustructured data, semi-structured data have tags oF flier thariers to /Scbekals SASS aia srovide a hierarchy of records und fields, which definé the data, For example, SMEs JSON, Email. 3. Unstructured data:—unstructured data, in contrast, normally have no BE behind them, Unstrue tured data are not organized into tables and cannot be natively used by applications or interpreted by a database. A gaod example of unstructured datit would be a collection of binary image files For example, memos, clat-rooms, PowerPoint presentations, it 8, videos, letters, researches, white papers, body of an email, etc 4 Introduction to Big Data Platform Big data platforms refer to software technologies that are designed to manage and process large volumes of data, often in real-time or near-real-time. These platforms are typically used by businesses and organizations that generate or collect massive amounts of data, such as social media companies, financial institutions, and healtheare providers, There are several key components of big data platforms, including: * Data storage: Big data platforms provide large-scale data storage capabilities, often utilizing dis tributed file systems or NoSQL * databases to accommodate large amounts of data. * Data processin; + Big data platforms offer powerful data processing capabilities, often utilizing par allel processing, distributed computing, and real-time streaming processing to analyze and transform data, * Data analytics: Big data platforms provide advanced analytics capabilities, often utilizing machine learning algorithms, statistical models. and visualization tools to extract insights from large datasets. « Data integration: Big data platforms allow for integration with other data sources, such as databases, APIs, and streaming data sources, to provide a unified view of data. Some of the most popular big data platforms include Hadoop, Apache Spark, Apache Cassandra. Apai a Storm, and Apache Kafka. These platforms are open source and freely railable, making them accessible to wnizations of all si fo ove DBMS schenme, big data system nocepts NoSQL, NOSQL is & method ve manage and sane ueotructured und nosrelational data, also nun ws "Not Only SQL” [15] for example, HBase database 5 Need of Data Analytic: Data analytics is the process of examining and analyzing large sets of data to uncover useful insights, patterns and trends. There ar several reasons why organizations and businesses need data analytics 1. Better decision-making: Data analytics can provide valuable insights that enable organizations to make better-informed decisio: By analyzing data, organizations can identify patterns and trends that may not be visible through tion or traditional methods of analysis. 2. Improved efficiency Data analytics can help organizations optimize their operations and improve efficiency. By analyzing data on business processes, orgat vations can identify areas for improvernent and streaualine operations to reduce costs and increase productivity 3, Enhanced customer experic .ce: Data analytics can help organizations gain a better understanding of their customers and their preferences. By analyzing customer data, organizations can tailor their products and servic to better meet customer needs, resulting in a more satisfying customer experience 4. Competitive advantage: Data analytics can provide organizations with a competitive advantage by enabling them to make better-informed decisions and identify new opportunities for growth. By leveraging data analytics, organizations can stay. alead of their competitors and position themselves for success, Risk management: Data analytics can help organizations identify potential risks and mitigate them before they becoine major issues, By analyzing data on business processes and operations, organizations can identify potential areas of risk and take steps to prevent them from occurring: In summary, data analytics is essential for organizations looking to improve their decision-making, ef ficiency, customer experience, compet ‘ve advantage, and risk management. By leveraging the insights provided by data analytics. organizations can stay ahead of the curve and position themselves for long-term 6 Evolution of Data Analytics Scalability The evolution of dato process anid analyze ever-inereasing 1, Traditional databases: In the carly days of dats analytics. traditional databases were used to stor and analyze data, These database ‘were limited in their ability to handle large volumes of data, which made them unsuitable for many analytics use cases 2. Data warchouses: To address the lim ions of traditional databases, data warchouses were devel: oped in the 1900s. Data warchouses were designed to store and manage large volumes of structured data, providing w more scalable solution for data analytics, 3, Hadoop and MapRedue In the mid 20008, Hadvop and MapReduce were developed ns open- source solutions for big, data processing, These technologies enabled organizations to store and analyze massive volumes of data in a distributed computing environment, making data analytics sore scalable and cost-efivetive 4. Cloud computing: With the rise of cloud computing in the 2010s, organizations were able to scale their data analytics infrastructure more easily and cost-effectively. Cloud-based data analytics plat- forms such as Amazon Web Services (AWS) and Micresoft Azure provided scalable storage and pro- cessing capabilities for big data. 5, Real-time analytics: With the growth of the Internet of Things (Io) und other real-time data sources, the need for real-time analvties capabilities became increasingly impertant. ‘Technologies such tas Apache Kafla and Apache Spark Streaming were developed to enable real-time processing and analysis of streaming data. 6. Machine learning and AI: In recent years, machine learning and artificial intelligence (AT) have become key components of data analytics scalability. These technologies enable organizations to analize and make predictions based on massive volumes of data, providing valuable insights for cecision-making and business optimization. Overall, the evolution of data analytics scalability as heen driven by’ the ced to process and analyze incre! singly large and complex datasets. With the development of new techuologies and appronches, Orsi fom data at a scale that would have been unimaginublle just a few nizations are now able to derive insigh decade §) Prescriptive ar Predictive [wich Diagnostic Tels What's hey to faved on curent ata anatyis, predefined Descriptive Helps usin comes cate ANandy for gs obese 1 be Seco 0 rnagemen esr {Easy to Visualize Figure 2: Ilustration of types of analytics. 7 What is Data Analytics? Data analytics is the process of examining large sets Of data to extract insights, identify patterns, and male informed decisions. It involves using various teciiigues, including statistical analysis, machine learning, and data visualization, to analyze data and draw conclusions from it Data analytics can be applied to different types of data, including structured data (e.g., date stored in databases) and unstructured data (e.g., social media posts, emails, and images). The goal of data analytics js to turn raw data into meaningful and actionable insights that can help organizations make better decisions and improve their operations. Data analytics is used in many different fields. including business, healthcare, finance, marketing, and social sciences. It can help businesses identify opportunities for growth, optimize their marketing strategies, reduce costs, and improve customer experiences, In healthcare, data analytics can be used to predict and prevent di vases, improve patient ontcomes, and optimize resouree allocation, Overall, data analytics is a powerful tool that enables organizations to make informed decisions and gain, in today’s data-driven world. Types of Data Analytics There are five typ ta analytics (See Figure 2) 1, Descriptive Analytics: —what is happening itn your business? it gives us only insight about every thing is going well or not in onr business without explaining the root cause Diagnostic Au [es:—why’ it is happening in your business? it explain the root cause behind the outcome of descriptive analytic 3, Predictive Analytics:—explains what likely to happen in the future based on previous trends and patterns. By utilizi various statistical and machine learning algorithins to provide recommendations and provide answers to questions related to what might happen in the future, thist ean be answer BL Prescriptive Analytics:—helps you to determine the best course of action to choose to bypass oF eliminate future issues, You can use prescriptive analytics to advise users on possible outcomes and what should they do to maximize their key metrics i.e., business metrics, 5. Cognitive Analytics: combines a number of intelligent techniques like AT, ML, DL. ete. to apply hmman brain like intelligence to perform certain task, 8 Analytic processes and tools ‘There are several analytic processes and tools used in date snalyticato extract insights emniatassss aa some of the most commonly used: 1. Data collection: This involves gathering relevant data from various sources, including databases, data warehouses, and data lakes, Data cleaning: Once the data is collected, it needs to be cleaned and preprocessed vo remove ais exrors. duplicates, or inconsistencies. 4, Data integration: This involves combining data from different sourees into a single unified dataset 4. Data analysis: This is the core of data analytics, where various technicnes, such as statistical analysis, machine Iearning. and data ining are nsed to extract insighlsfrgmaithe ds ee Data visualization: Once the dati hns beet analyzed. its often visualized ng eraphes, ¢ other visual aids to make it easier to understand and conununicate the findings 6, Business intelligence (BI) tools: These are software tools that help organizations make sense of their data by p dashboards, reports, and other tools for dat visualization and analynis, 7. Big data tools: These are specialized tools desigued to handle large volumes of data and process it efficiently, Examples include Apache Hadoop, Apache Spark, and Apache Storm, Machine learning tools: These are tools that use algorithms to learn from data and make predictions ikit-learn, TensorFlow, and Keras or decisions based on that learning, Examples include Overall, the tools and processes used in data analytics are constantly evolving, driven by advances in tech nology and the increasing demand for data-driven insights in various industries 9 Analysis vs Reporting ‘Analysis aud reporting are two important aspects of data management ‘and interpretation, but they serve different purposes. » ms i cally using charts, Reporting involves the presentation of information in # standardized format. typically . se ide a clear_and_concise view of to gular updates om business — graphs. or tables, The purpose of reporting is to_pr communicate key insights to stelholdess. Reporting = ofl Sed caiigai a Feeformance. highlight trends, or sare key ‘Analysis, on the other hand, involves the explorat metrics with stakeholders jon and interpretation of data to gein ins ghts and make informed decisions. Anal trends that may not be immediately apparent from simple FEPGTSDE: “Analysis often involves using me learning to extract insights from the data. ; enting data in a clear and coneise way. while analysis is fo ‘and make decisions. Both reporting and techniques, modeling, and anachi In summary, reporting is focused on pres arpreting the data to gain insights on exploring and inte date manogement) but they/eny, difierenla BUS and require important for effective tools ae aS a a a ae 10 Modern Data Analytic Tools There are many modern data analytic tools uvailable today that are designed to help organizations analyte and interpret large volumes of data, Here are some of the most poplar ones Tableau: This isa popular data visualization tool that allows users to er te interactive dashiboards ‘and reports from their data. It supports a wide range of data sources and is used by many organizations to quickly visualize and explore data, Power BI: This is x business analy! cs service provided by Microsoft that allows users to create interactive visualizations and reports from thelr data. Tt integrates with other Microsoft products Hike Excel and SharePoint, making it a popular choice for organizations that use these tools Google Analytics: This is free web analytics service provided by Google that allows users to track and analyze website traffic, It provides « wealth of data on user behavior cluding pageviews, bounce rates, and conversion rates, 4. Apache Spark: This isa fast and powerful open-source data processing enine that can be used for Jarge-scalé data processing, machine learning, aud graph processing. It supports multiple programing languages, including Java, Scala, and Python. “. Python: This is « popular programming language for data analysis and machine learning. It has & large and active community that has developed many libraries and tools for data analysis, including pandas. NumPy, and scikit-leam. Re This is another popular programming language for data unalysis and statistical computing: Te has # large Library of statistical and graphical techniques and is used by many researchers and data analysts. Overall, these are just a few examples of the many modern data analytic tools available todas: Organizations can dhoexe the toals that best fit their necds and use them to gain insights and make informed decisions based on their data. 11 Applications of Data Analytics Data analytics hae » vide range of applications across industries and organization most common application: 6. Business intelligence: Dati analytics is used to analyee data and generate insights that help orga- nizations make data-driven decisio aticdriven decisions. Business intelligence tools and teelmiques are used to track key performance indicators (KPIs i P ance indicators (KPIs), monitor business processes, and identify trends and patterns Marketing: Data analytics is used (o analyze customer bebivior, pre ences, aud demographics to de- velop targeted marketing campaigns. This includes analyzing website traffic, social media engagement, and email marketing campaigns Healthcare: Data analytics is used in healthcare to analyze patient data and improve patient out- comes. This includes analyzing electronic health records (EHRs) to identify disease patterns and improve treatment plans, as well as analyzing clinical trial data to develop new treatments and drugs. Finance: Data analytics is used in finance to analyze financial date and identify trends and patterns. ‘This includes analyzing stock prices, predicting market trends, and identifying fraudulent wetivity, ‘Manufacturing: Data analytics is used in manufacturing to optimize production prove product quality, This includes analyzing sensor data from production lines, predicting equipment failures. and identifying quality issues. processes and im- \ Human resources: Data analytics is used in.alman resources to analyze employee dat and identify areas for improvement. This includes aualyzing employee performance, identifying training needs, and predicting employee tumover. ‘Transportation: Data analytics is used in transportation to optimize logistics and improve Cus tomer service. This includes analyzing shipping data to optimize routes and delivery times, as well as analyzing customer data to improve the customer experience. 3 Overall, data analytics has a wide rauge of applications across industries and org fugly seen as a critical tool for success in the modern business world. Part-II: Data Analytics Life-cycle Li 1 What is Data Analytics > Cycle’ Data is precious in today’s digital envi W's digital enviromnent. It goes through several life stages, including creation, testing, processing, consumption, and reuse. These stages are mapped out in the Data Analytics Life Cycle for professionals working on data analytics initiatives. Each stage has its significance and characteristics. 1.1 key roles for successful analytic projects ‘There are several key roles that are essential for successful analytic projects, These roles are: © Project Sponsor: The project sponsor is the person who champious the project and is responsible ccuring funding and resources, They are the driving force behind the project and are accountable for for its sui 085, * Project Manager: The project manager is responsible for the overall within budget, and meets: execution of the project. They ensure that the project is completed on time, the required quality standards. analyzing, and interpreting date. They in the data, and to © Data Analyst: The data analyst is responsible for collecting, use statistical methods and software tools to identify patterns and relationships develop insights and recommendations. « Data Scientist: The data scientist is responsible for developing predictive models and “They use machine leaming and other advanced teclmiques to analyze complex data sets and vo hidden patterns and trends. « Subject Matter Expert: The subject matter expert (SMB) is an individual who has deep and expertise in a particular domain, They provide insights into the context and meaning of ‘and help to ensure that the project aligus with the business objectives. « IT Specialis the project. They ensure that the necessary hardware nnd software are in plnce, ane is secure, scalable, and reliable. ‘« Business Analyst: The business analyst i resp ble for understanding the business requirements and translating them into technical spec th pecifications. ‘They work closely with the project manager an data analyst to ensure that the project meets the needs of the busin © Quality Assurance Specialist: The quality assurance specialist is respousible for testing the project jeliverables to ensure that they ine deli \ that they mect the required quality standards. ‘They perform various tests and evaluations to identify defects and ensure that the systent is functioning as intended. Each of these roles is essential for the success of analytic projects, and the team must work together closely to achieve the project objectives 1.2. Importance of Data Analytics Life Cycle In today’s digital-first world, data is of immense importance. Tt undergoes various stages throughout its life, consumption, and reuse. Data Analytics Lifecycle maps out thete ‘Grcular structure during its creation, testing, processing, data analytics projects. These phases are arranged in # stages for professionals working on fee Figure 3). Each step his its significance and characteristics. that forms a Data Analytics Lifecycle. (5+ ‘The Data Analytics Lifecycle is designed to pesuped portray the actual project correctlvi(the cycle is,iterative, A step-by-step technique i= sing, analyzing, and reusing ysis is modifying, processing, and cleaning ‘actions and tasks involved in gathering. pro‘ Yor assessing the information on big data. Data anal obtain useful, significant informatio jorts business decision-making. 1.3 Data Analytics Lifecycle Phases are of the phases in the life eycle of Date Analytics: thus, there ere can be some date professionals that follow additional steps. x} altogether or work on different phases simultaneously. Let There's no defined structi uniformity in these steps. The may be some who skip some stages various phases of the data analytics life evele. to be present phases - Do | have enough information to draft analytic plan and share for per review? Do! have enough good = quality data to start building eed ‘the model? ee Results Decl good idee —. or abe re eon ‘enough? Have we vies analytic pian? failed for sure? cycle [12] Figure 3: Uhustration of phases of data analytics Ife 1.8.1 Phase 1: Data Discovery ‘This phase is all‘sbout efing the dagwe puipore and Ma 2/2" by the end of the data analytics wer by mapping out lifecycle. The stage consists of i uss, the team learns about the business dou the data, During this proce has worked on similar projects to refer to any learnings: data, and time in this phass organizatio For example, the team an ‘The team also evaluates technology, people: | qehile dealing with a small dataset. However, leftier taka ‘demand more rigid tools for data on, R, Tableau Desktop or Tableat Prep, will need to use Pyth use Exce and preparation and exploration. The te other date-cleaning tools in such scenarios: wmnlating initial hypotheses to misiness problem, for This phase’s critical uetivities include framing the bi test, and beginning 4.9.2 Phase 2: Data Prepar In this phase, the experts’ focus shifts from business req ta olin o requirements to information requirements, One ¢ eswontial aspects of this phase is ens p nnsnring data availability for processing. The stage encompasses collecting. processing, and cleansing the accumulated data 1.3.3 Phase 3: Model Planning lata and perform analytics This phase needs the availability of an analytic sandbox for the teary to work with di throughout the project duration. ‘The team can load data in several ways. business rules before londing # Extract, Transform, Load (ETL) ~ It transforms the datn based on «set of it into the sandbox. and then transforms it based om © Extract, L Transform (ELT) ~ It Jouds the data into the sandbox fa set of business rules, «Extract, Transform, Load, Transform (ETT) ~ 10 the eombination of ET and ELT and has two ante aiid identifies and transformation levels. ‘The team identifies variables for categorizing data iMogical values, duplicates, and spelling enables more efficient data Processing be anything, including missing data, imputes the average date score for categories or missing valves. It without skewing the data eam determines the techniques, methods, and workflow: for building amodel in the next phase, ‘The team explores the data, identifies relations between date “After cleaning the data, the & variables, and eventually devises a suitable model. 1.3.4 Phase 4: Model Building and production datasets in this phase. Further, the team: 1g the model planning phase. They test data U ‘The team develops testing. training, executes moclels meticulously as planued durin {stical modeling methods such as rest They use various stat and neural networks and perform a trial ram | answers to the given objectives. decision trees, random forest models corresponds to the datasets 4.3.6 Phase 5: Communication and Publication of Result fs phase aims to determine wheth ‘This phi o ether the project results ure a success or failure and start collaborating swith significant stakeholders. The team identifies the vital findings of their analysis, measures the asnacinted business value, and creates summarized nutrative to convey the stakeholders’ results. 1.8.6 Phase 6: Operationalize/Measuring of Effectiveness In this final phase, the team presents an in-depth report with coding, briefing, key fin data is moved to a live environment and documents and papers to the stakeholders. Besides this, the nionitored to measure the analysis's effectiveness. If the findings are lized. On the other hand, if they deviate from the set intent, te Hearn moves bockward different outcome. and reports ate fi in the lifecycle to any previous phase to change the input and get & 1.4 Data Analytics Lifecycle Example Consider an example ofa retail store chain that wants to optimize its products prices to boost its revenue: ‘The store chain lus thousands of products over hundreds of outlets, making #6 sight Once you identify the store chain's objective, you fing thie data you need, prepare it, and 6° through the <0 y Data Analytics lifecycle process se You observe different types of customers, such as ‘treating various types ‘ordinary customers and customers like buy in bulk. According to you. 1d need to discuss this with the client However, you don’t have enough information about it ans ‘and conduct hypothesis testing to che right output. Once you are convi the business, and you are all set to de Jn this ease, you need to get the definition, find data, customer types impact the model results and get the ‘and integrate it into various ‘you can deploy the model, prices you think are the most optimal neross the outlets of ‘the store. model rest fg Subject Code: KITOOI EXAMINATION 2021-22 DATA ANAL AT Yvics Time: 3 Hours Note: Attempt all Sections. If you requ Total Mark: Wire any missing data, then choose suitably Fl a SECTION 4 ; tempt all Awestions in briet, [Qno[ eee Questions sss uuu cone (a) co} bat the need of data analytic sampling di & __| Discuss the use of Timited pass algorithm: i (hy | call iW What is the principle behind hierarchical clustering technique? five R functions used in descriptive statistics. __ List the names of any 2 visualization tools. SECTION B 2. Attempt any three of the following: ime |Qno | ‘Questions j.con| (a) | Explain the process model and computation model for Big data | 1 4 \ | platform. \ 4 (b) | Explain the use and advantages of decision trees. {2 \ [(©)_| Explain the architecture of data stream model. i (a) | Illustrate the K-means algorithm in detail with its advantages. (25a [(@_| Differentiate between NoSQL and RDBMS databases. (sae SECTION C Attempt any one part of the following: 10*1 = 10 [Qno 7 Questions f.comy ) | Explain the various phases of data analytics life cycle aay [(e)_| Explain modern data analytics tools in detail, Loa 4, Attempt any one part of the following: 10*1=10 roi ei f “Questions mae a) ous types of support vector and kernel methods of data 2 7 ompute the principal a. 0 0 QE Subject Code: KITO0! Roll No: 2 BTECH (SEM VI) THEORY EXAMINATION 2021-22 DATA ANALYTICS Attempt any one part of the following: 10*1 = 10 co a3 Qno Questions (a) | Explain any one igorithm to count number of distinct clements in a rock market predictions in detail — 6. Attempt any one part of the following: 1045, SA9, jQno}l Questions (a ie} COE of Ee (a) __ Differentiate between CLIQUE and ProCLUS cluster sae a eT 4a |(b) | A database has 5 transactions. Let min_sup=60% and min_conf=80%. | | Ttems Bought | | | | i) Find all frequent itemsets esing Apriori algorithm. |i) List all the strong association mules (with support s and confidence \o). 7, Attempt any one part of the following: 10*1 = 10 ‘Qno Questions: co {Ge [Explain the HIVE architecture with its features in detail. 5 (by Write R function to check whether the given mumber is prime oF not 15)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy