Data Science PDF
Data Science PDF
Data Science PDF
Preeti Panwar
DATA SCIENCE
Introduction
Data Science is a field of Big Data which seeks to provide meaningful information
from large amounts of complex data. It combines different fields of work in statistics
and computation in order to interpret data for the purpose of decision making. Data
science is the study of where information comes from, what it represents and how it
can be turned into a valuable resource in the creation of business and IT strategies.
Mining large amounts of structured and unstructured data to identify patterns can help
an organization rein in costs, increase efficiencies, recognize new market
opportunities and increase the organization's competitive advantage. It is a
multidisciplinary field of study with goal to address the challenges in big data.
Data scientist
As the name implies this role involves doing analysis on the data using various tools
and techniques. This could be using the various programming languages like R,
Python, SQL and so on.
Data Engineer :
The role of a Data Engineer includes working with huge amounts of data by accessing
it through large databases, deploy large amount of processing on the data and coming
up with inferences and results. The Data Engineer must be well-versed in the domain
of statistics and programming languages as well. He should normally have a strong
background in software engineering.
Data Architect :
This person takes on a very high-level role in the organization when it comes to
working with data and deriving insights from it. The person creates the blueprint for
integrating, streamlining, centralizing and protecting the data that others can work
with. The Data Architect needs to have a mastery in the various tools like Hive, Pig,
Spark and more of such tools in order to work with different types of data.
Learn about the issue at ground, ask the right questions which is at the center of what
a Data Scientist does and forms the foundation for the later stages of the Data
Scientist’s role. Define the problem and convert it into a concrete framework which
can then be worked upon.
As the name implies the Data Scientist has to collect enough data in order to make
sense of the problem at hand and get a better grip of the issue with respect to the time,
money and resources needed to make the process successful.
Data can rarely be used in its original form. It needs to be processed and various
methods exist to convert it into a usable format. This is an essential part of every Data
Scientist’s job routine and this consumes a major chunk of his time and resources.
After the data has been processed and converted into a form that can then be used for
the later stages, you need to explore it further so as to get the characteristics of the
data and find out more about the obvious trends, correlation and the not so obvious
hidden relationships and more.
This is where the magic happens. The data scientist deploys the various arsenals in his
repository like machine learning, statistics and probability, linear and logistic
regression, time-series analysis and more in order to make sense of the data. At the
end of this step the Data Scientist would be able to gain valuable business insights like
predictions, business process optimization, finding new ways of doing the same old
things among other things.
At the end of the entire process there is a need to communicate the findings to the
right stake-holders in order to get the groundwork done for the action to be taken and
deployment of the decisions that are taken
SQL refers to the structured programming that is used to work with relational database
management systems. This SQL is useful for data follows a certain format like the
row and column standard type that is used to depict a huge amount of data even in
today’s world of unstructured data. SQL is extensively used by database
administrators and developers alike.
Python
Hadoop
This is a tool used for big data applications and it is the most powerful as well as an
open source solution. It has a huge ecosystem that comprises of some of the best tools
for working with big data. You store, compute, deploy real-time analytics among
things on big data through the Hadoop and its ecosystem of tools.
SAS
SAS is a powerful business intelligence and analytical tool. It is a software suite for
extracting, analyzing and reporting on a wide range of data and derive valuable
business insights from it. It includes a whole set of tools for working across the
various steps of converting data into business insights.
Tableau
This is the most powerful data visualization, analysis and reporting tool. The best of
Tableau is that you don’t need any technical knowledge or programming skills in
order to derive valuable insights from Tableau.
According to a survey the top used tool in 2017 was Python (60% of respondents said
they used this in the previous year), followed by R (46%) and SQL (42%). The top 10
tools are rounded out by TensorFlow, Amazon Web Services, Unix shell/awk,
Tableau, C/C++, NoSQL and MATLAB/Octave.
Amazon : Amazon is another global ecommerce and cloud computing giant that is
hiring data scientists on a big scale. They need data scientists to find out about the
customer mindset, enhance the geographical reach of both the ecommerce domain and
cloud domain among other business-driven goals.
Visa : Visa is an online financial gateway for most of the companies and Visa does
transactions in the range of hundreds of millions over the course of a regular day. Due
to this the requirement for data scientists is huge at Visa to generate more revenue,
check fraudulent transactions, customize the products and services as per the customer
requirements among other things.
Here are various reasons why data science will always remain an integral part of the
culture and economy of the global world:
One of the reasons why data science is gaining so much of attention is because
it allows brands to communicate their story in such a engaging and powerful
manner. When brands and companies utilize this data in a comprehensive
manner, they can share their story with their target audience, thereby creating
better brand connect. After all, nothing connects with consumers like an
effective and powerful story, that can inculcate all human emotions.
Big Data is a new field that is constantly growing and evolving. With so many
tools being developed, almost on a regular basis, big data is helping brands and
organisations to solve complex problems in IT, human resource , and resource
management in an effective and strategic manner. This means effective use of
resources, both material and non-material.
One of the most important aspect of data science is that its findings and results
can be applied to almost any sector like travel, healthcare and education among
others. Understanding the implications of data science can go a long way in
helping sectors to analyse their challenges and address them in an effective
fashion.
Data science is accessible to almost all sectors. There is a large amount of data
available in the world today and utilising them in an proper manner can spell
success and failure for brands and organisations. Utilizing data in a proper
manner will hold the key for achieving goals for brands, especially in the
coming times.
Not only this, but internet search and recommender systems are also implementing
data science to gear up the performances. Having said this, it is clear that big data
analytics has become one of the key ingredients to reap both short and long-run
benefits.