HW14
HW14
HW14
Data analysis
Data science
04/12/20
Q- W Mapreduce, Endgame
Q1 Storage Costs
a. Explain the different types of storage on AWS
b. Compare the storage costs between AWS, Azure and Google Cloud Platform
Helpful links:
• http://www.enterprisestorageforum.com/storage-management/cloud-storage-pricing.html
• https://www.slideshare.net/rightscale/cloud-storage-comparison-aws-vs-azure-vs-google-vs-ibm
Q2 Big Data Architecture
Explain the Big Data Architecture in
a. Microsoft Azure
b. Amazon AWS
c. Google Cloud Platform GCP
Links
http://azure.microsoft.com/en-us/documentation/services/hdinsight/
https://aws.amazon.com/big-data/getting-started/
https://cloud.google.com/products/big-data/
Q3 How Uber uses Big Data
Review the Big Data/Machine Learning Tools used by Uber
https://www.forbes.com/sites/janakirammsv/2019/06/26/managing-machine-learning-models-the-uber-way/
#7e1cf82f4ae4
Q4 Netflix
How does Netflix use Machine Learning
https://becominghuman.ai/how-netflix-uses-ai-and-machine-learning-a087614630fe
DATA ANALYSIS 2
Question 1
database class that offers high capacity, accessibility of object space for regularly used data.
ii. AWS S3 Standard is the type for long term applications, but it is considered to be less
accessed data.
iii. AWS S3_Glacier implies that once the S3 Lifecycle is set, the firm's data will be moved
immediately to another processing class with no adjustments. It's the most stable, the most
durable.
iv. AWS S3RR this is an Amazon S3 database alternative that allows users to preserve anti-
critical data at a low level of durability than the standard Amazon S3 storage. It offers a free
b) Comparison of the storage costs between Azure, AWS and Google cloud platform
The three will also be capable of reaching almost similar average estimated costs by AWS and
Azure based on their total sum in various ways (Tiwary, 2019). Azure pays a premium for a
virtual computer, while AWS sets a limit on processing. Google Cloud also imposes a service fee
but reduces the expense of Active Directory service, which only poses more concerns that need
Question 2.
a. Microsoft Azure
• The main objective is the development of big data is to track, evaluate, and handle specific
datasets that are too massive for the older dataset framework.
DATA ANALYSIS 3
• Data connected to graphics rendering activities can be stored in a cloud server in various ways,
and massive databases could also be saved in large volumes. This store is called a data center.
• Huge databases of big data are analyzed using graphics rendering for filtering, replication, and
quite often data collection. The retrieval of source files and the speaking and listening effects are
b. Amazon AWS
• The supplier gives the company the ability to park their information online. The data is stored on
computers that are wireless locations for businesses, and the information is stored and viewed
online. Cloud storage has been used where the company uses a huge amount of data space to
hold big data (Muangprathub, 2019). That's why they take computing services from providers,
including Amazon cloud services. Data stored in the data storage means that remote repositories
store data and have protection measures to protect information from unauthorized persons.
• Data owners can store information online and can encrypt files online from web servers. Such
techniques are valuable to businesses because their data are in a secure position, and no one can
steal the data. And the organization can also view and change data remotely at any period.
• These days, big data linked to different sectors is rapidly increasing, with Wikipedia being
among the prominent software warriors on the industry, processing hundreds of Petabyte data.
• The development of sophisticated data collection and storage systems, as well as the ability to
collect and analyze data to make important hospitality management decisions, have also been
greatly enhanced.
DATA ANALYSIS 4
Question 3.
1. Michelangelo
It is a machine learning framework that has integrated frameworks and resources from groups in
an edge-to-end method. It has made it easier for designers and data analysts around the business
2. Horovod
This is a scalable training program that abuses the GPUs. Uber access sourced Horovod and gave
it to the Software Framework for Machine intelligence, Big data and Deep Processing.
3.Ludwig
It is the most popular machine learning program in Uber. It's an open and free, machine learning
set of tools designed that allows the user to educate and check machine learning systems before
developing software.
Question 4.
subscribers with shared interests to suggest what you might be much more willing to watch next
to ensure you remain active and maintain your subscription service for much more.
2. Personalization and auto-generation, utilizing millions of video images from that of a current
film or display as a base point for the collection of photographs, Netflix transcribes these photos
and then rates each picture in an attempt to determine which thumbnails are most likely to result
DATA ANALYSIS 5
in a button (Basu, 2019). Such estimates are based on what other people who are close to you
3. Location for movie production, utilizing information to help determine when and where to start
filming set the parameters of the planning and production movie demands. Remember, this is
more a software engineering automation issue than just an artificial intelligence system that
References
Basu, S., Kaminski, J. W., Panepucci, E., Huang, C. Y., Warshamanage, R., Wang, M., & Wojdyla, J. A.
(2019). Automated data collection and real-time data analysis suite for serial synchrotron
Muangprathub, J., Boonnam, N., Kajornkasirat, S., Lekbangpong, N., Wanichsombat, A., & Nillaor, P.
(2019). IoT and agriculture data analysis for smart farm. Computers and electronics in
Tiwary, S., Levy, R., Gutenbrunner, P., Soto, F. S., Palaniappan, K. K., Deming, L., ... & Cox, J. (2019).