Data Science Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

UNIT 4 - Data science

1) What is Data Science


Data science is the domain of computer science where we
extract insights from available data with the help of scientific
methods, algorithms and statistics.
Data Sciences majorly work around analyzing the data and
when it comes to AI, the analysis helps in making the machine
intelligent enough to perform tasks by itself.

2) What are the sources of Data collection


There exist various sources of data from where we can collect
any type of data required and the data collection process can
be categorized in two ways: Offline and Online
Offline data collection Online data collection
Sensors & camera Open-sourced Government
Portals (data.gov.in,
India.gov.in)
Surveys World Organisations’ open-
sourced statistical websites
Interviews Reliable Websites
Observations Google Kaggle, web scraping

3) What are the points to be kept in mind while doing data


collection
1. Data which is available for public usage only should be taken
up.
2. Personal datasets should only be used with the consent of
the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data
collected from random sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data
which helps in proper training of the AI model.

4) What are the different formats in which data is stored


Usually the data collected for Data Science is in the form of
tables. These tabular datasets can be stored in different
formats. Some of the commonly used formats are:
1. CSV: CSV stands for comma separated values. It is a simple
file format used to store tabular data. Each line of this file is a
data record and each record consists of one or more fields
which are separated by commas. Since the values of records
are separated by a comma, hence they are known as CSV files.
2. Spreadsheet: A Spreadsheet is a piece of paper or a
computer program which is used for accounting and recording
data using rows and columns into which information can be
entered. Microsoft excel is a program which helps in creating
spreadsheets.
3. SQL: SQL is a programming language also known as
Structured Query Language. It is a domain specific language
used in programming and is designed for managing data held in
different kinds of DBMS (Database Management System) It is
particularly useful in handling structured data.
5) Explains some Applications of Data Sciences
There exist various applications of Data Science in today’s
world. Some of them are:
Fraud and Risk Detection*: Banking companies learn to divide
and conquer data via customer profiling, past expenditures,
and other essential variables to analyse the probabilities of risk
and default. Moreover, it also helped them to push their
banking products based on customer’s purchasing power.
Genetics & Genomics*: Data Science applications enable an
advanced level of treatment personalization through research
in genetics and genomics. Data science techniques allow
integration of different kinds of data with genomic data in
disease research, which provides a deeper understanding of
genetic issues in reactions to particular drugs and diseases.
Internet Search*: Search engines like Yahoo, Bing, Ask, AOL,
Google) make use of data science algorithms to deliver the best
result for our searched query in the fraction of a second.
Google processes more than 20 petabytes of data every day,
Targeted Advertising*:
The entire digital marketing spectrum Starting from the display
banners on various websites to the digital billboards at the
airports use data science algorithms. . They can be targeted
based on a user’s past behaviour.
Website Recommendations: websites like Amazon help to find
relevant products from billions of products available with them
but also add a lot to the user experience. Internet giants like
Amazon, Twitter, Google Play, Netflix, LinkedIn, IMDB and
many more use this system to improve the user experience and
to promote their products. The recommendations are made
based on previous search results for a user.
Airline Route Planning*: The Airline Industry use data Science
to identify the strategic areas of improvements. Using Data
Science, the airline companies can
• Predict flight delay
• Decide which class of airplanes to buy
• Whether to directly land at the destination or take a halt in
between
• Effectively drive customer loyalty programs

6) What are packages in Python . Explain any 3 packages.


A collection of relevant modules saved under the same
directory and a name is called a Package. Some of the open-
source packages available needed for Artificial Intelligence are:
• NumPy: Numerical Array Data Handling Package. It is used for
data analysis and calculation related to large numerical data
sets.
• Matplotlib: Data Visualization Package. It is used for the
graphical representation to produce high quality data
visualization of the numerical data.
• Pandas : Pandas is a software library written for the Python
programming language for data manipulation and analysis.
data structures and operations for manipulating numerical
tables and time series.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy