0% found this document useful (0 votes)
17 views

03-07-2024-Data Science - Orentation Programme

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

03-07-2024-Data Science - Orentation Programme

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Orientation Programme

On 03.07.2024

• Basic Concepts
of Data
Science
Basic Concepts of Data Science

On 03.07.2024
• Dr.C.Srivenkateswaran
Professor

Department of AI & Data Science

Rajalakshmi Institute of
Technology(Autonomous)
Chennai.600 124 Tamil Nadu - India
Out line of the Presentation
• Difference between Computer Science & Data science
• AI With Data Science
• Data Science
• Data Analytics
• Big Data
• Tools for Data science
• Application for Data Science
• Role of Data Scientist
What is Science

• The intellectual and practical


activity encompassing the
systematic study of the structure
and behavior of the physical and
natural world through observation
and experiment
Computer Science & Data
Science
Why is Computer Science
• Computer science is the study of computing
technology, including hardware, software, and
algorithms. It focuses on the design,
development, and implementation of
computer systems, software applications,
and databases.
• Computer science covers a broad range of topics,
including programming languages, data
structures, algorithms, operating systems,
computer networks, and artificial intelligence.
What is Data Science
• Data science is a multidisciplinary field that uses
statistical and computational methods to extract
insights and knowledge from data.
• It involves a combination of skills and knowledge from
various fields such as statistics, computer science,
mathematics, and domain expertise.
• Data science involves the entire process of data
collection, cleaning, exploration, analysis, and
interpretation
Why Artificial Intelligence ?
Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
Data Science & Statistics
Concept of Data science
Why AI Integrated with Data
Science
Conti..
What is data?
• Measureable units of information gathered or
captured from activity of people, places and
things.

• Data is everywhere, we need to handle and


store it properly, without any error.
Facets of Data

• Very large amount of data will generate in big data


and data science. These data is various types and
main categories of data are as follows:
• a) Structured
• b) Natural language
• c) Graph-based
• d) Streaming
• e) Unstructured
• f) Machine-generated
• g) Audio, video and images
Categories of Data
• Data can be categories into two main parts –
• Structured Data: This type of data is organized data
into specific format, making it easy to search , analyze
and process.
• Structured data is found in a relational databases that
includes information like numbers, data and
categories.
• Unstructured Data: Unstructured data does not
conform to a specific structure or format. It may
include some text documents , images, videos, and
other data that is not easily organized or analyzed
without additional processing.
Examples of Data
• Statistics operate on variables, not data
• A variable is a function mapping data objects to
values
• A variable is a named unit of data that is assigned
a value. If the value is modified, the name does not
change.

• Visualization represent data


• There are two types of data: Qualitative and
Quantitative data
Data structure
• A data structure is a way of organizing and storing data in a computer so
that it can be accessed and updated efficiently.
• These structures define how data elements are arranged and manipulated
within a program.
• Understanding data structures is crucial for developing efficient algorithms1.
• There are several types of data structures, including:
• Arrays: A linear data structure where elements are stored sequentially.
• Linked lists: Another linear structure where each element is connected to
its previous and next adjacent elements.
• Stacks and queues: Linear structures with specific rules for adding and
removing elements.
• Trees and graphs: Non-linear structures that allow more complex
relationships between elements.
Classification of data
Types of Data
• Quantitative Data:
• Nature: Quantitative data consists of numerical values that
can be counted, measured, or expressed using numbers.
• Examples: Examples include age, weight, temperature,
income, and test scores.

• Qualitative Data:
• Nature: Qualitative data is descriptive and conceptual. It
includes non-numerical information such as words, images, and
sounds.
• Examples: Examples include interview transcripts,
observations, and open-ended survey responses.
What is Data Science?
• Data science is a deep study of the massive
amount of data, which involves extracting
meaningful insights from raw, structured, and
unstructured data that is processed using the
scientific method, different technologies, and
algorithms.

• It is a multidisciplinary field that uses tools and


techniques to manipulate the data so that you
can find something new and meaningful.
Need for Data Science:
Examples Structured Data
• Online booking
• ATMs
• Inventory control systems
• Banking and accounting
Examples of Unstructured
data
• Sound recognition. Call centers use speech recognition to identify
customers and collect information about their queries and emotions.
• Image recognition. Online retailers take advantage of image
recognition so that customers can shop from their phones by
posting a photo of the desired item.
• Text analytics. Manufacturers make use of advanced text analytics
to examine warranty claims from customers and dealers and elicit
specific items of important information for further clustering and
processing.
• Chatbots. Using natural language processing (NLP) for text
analysis, chatbots help different companies boost customer
satisfaction from their services. Depending on the question input,
customers are routed to the corresponding representatives that
would provide comprehensive answers
Qualitative or Categorical
Data
• Can’t be measured or counted in the form of numbers
• Data consist of audio, images, symbols, or text. The gender
of a person, i.e., male, female, or others, is qualitative data.
• Qualitative data tells about the perception of people.
• Helps market researchers understand the customers’ tastes
and then design their ideas and strategies accordingly
• The other examples of qualitative data are :
• What language do you speak
• Favorite holiday destination
• Opinion on something (agree, disagree, or neutral)
• Colors
Nominal Data
• Nominal Data is used to label variables without any order or
quantitative value. we can’t do any numerical tasks or can’t give
any order to sort the data.
• nominal” comes from the Latin name “nomen,” which means
“name.”
• Examples of Nominal Data :
• Colour of hair (Blonde, red, Brown, Black, etc.)
• Marital status (Single, Widowed, Married)
• Nationality (Indian, German, American)
• Gender (Male, Female, Others)
• Eye Color (Black, Brown, etc.)
Ordinal Data
• Ordinal data have natural ordering where a number is
present in some kind of order by their position on the scale.
These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any
arithmetical tasks on them.
• Shows the sequences and cannot use for statistical analysis.
Compared to the nominal data, ordinal data have some kind
of order that is not present in nominal data.
Examples of Ordinal Data
• When companies ask for feedback, experience, or
satisfaction on a scale of 1 to 10
• Letter grades in the exam (A, B, C, D, etc.)
• Ranking of peoples in a competition (First, Second, Third,
etc.)
• Economic Status (High, Medium, and Low)
• Education Level (Higher, Secondary, Primary)
Quantitative Data
• Quantitative data can be expressed in numerical values, which
makes it countable and includes statistical data analysis.
These kinds of data are also known as Numerical data.
• It answers the questions like, “how much,” “how many,” and
“how often.” For example, the price of a phone, the
computer’s ram, the height or weight of a person, etc., falls
under the quantitative data
• Quantitative data can be used for statistical manipulation and
these data can be represented on a wide variety of graphs and
charts such as bar graphs, histograms, scatter plots, boxplot,
pie charts, line graphs, etc.
Examples of Quantitative Data
:
• Height or weight of a person or object
• Room Temperature
• Scores and Marks (Ex: 59, 80, 60, etc.)
• Time
• The Quantitative data are further classified into two
parts :
Discrete Data
• The term discrete means distinct or separate. The discrete
data contain the values that fall under integers or whole
numbers.
• The total number of students in a class is an example of
discrete data. These data can’t be broken into decimal or
fraction values.
• The discrete data are countable and have finite values; their
subdivision is not possible.
• These data are represented mainly by a bar graph, number
line, or frequency table.
Examples of Discrete Data
• Total numbers of students present in a class
• Cost of a cell phone
• Numbers of employees in a company
• The total number of players who participated in a
competition
• Days in a week
Continuous Data
• Continuous data are in the form of fractional numbers. It can
be the version of an android phone, the height of a person,
the length of an object, etc.
• Continuous data represents information that can be divided
into smaller levels. The continuous variable can take any
value within a range.
• The key difference between discrete and continuous data is
that discrete data contains the integer or whole number.
• Still, continuous data stores the fractional numbers to record
different types of data such as temperature, height, width,
time, speed, etc.
Examples of Continuous Data
• Height of a person
• Speed of a vehicle
• “Time-taken” to finish the work
• Wi-Fi Frequency
• Market share price
Data Science Process
Data science…..??????
Applications of Data Science
Data Analytics
• Big Data Analytics
• Health care Analytics
• Text analytics
• Speech analytics
• Image & Video Analytics
• Business Analytics
• Human resource analytics
• Operation & Supply chain analytics
Types of Data Analytics
Tools using in Data Science
• Anaconda Navigator is a desktop graphical user
interface (GUI) included in the Anaconda distribution
that simplifies package management and deployment
for data science and machine learning applications.
• It provides an easy-to-use interface for managing
various aspects of your data science workflow without
needing to use the command line.
Conti..
• Jupyter Notebook: Launch this popular web application for creating
and sharing documents that contain live code, equations,
visualizations, and narrative text.
• JupyterLab: An advanced, extensible interface for interactive
computing.
• Spyder: An integrated development environment (IDE) specifically for
scientific programming in Python.
• RStudio: An IDE for R programming, offering tools for plotting, history,
debugging, and workspace management.
• VS Code: A versatile source-code editor with support for Python and
other languages.
• Orange: A component-based data mining framework and visual
programming tool for data analysis and visualization.
Libraries in Python
• Programming Languages:
• Python: Widely used for its simplicity and extensive libraries like
Pandas, NumPy, and Scikit-learn.
• R: Known for statistical analysis and visualization with packages like
ggplot2 and dplyr.
• Data Manipulation and Analysis:
• Pandas (Python): A library for data manipulation and analysis.
• NumPy (Python): Used for numerical computing.
• Dplyr (R): A grammar of data manipulation.
• Machine Learning:
• Scikit-learn (Python): A library for machine learning.
• TensorFlow (Python): An open-source platform for machine learning.
• Keras (Python): A high-level neural networks API
• Data Visualization:
• Matplotlib (Python): A plotting library.
• Seaborn (Python): Based on Matplotlib, provides a high-level
interface for drawing attractive statistical graphics.
• ggplot2 (R): A system for declaratively creating graphics
• Integrated Development Environments (IDEs):
• Jupyter Notebook: An open-source web application that allows you to
create and share documents that contain live code, equations,
visualizations, and narrative text.
• RStudio: An integrated development environment for R.
• Big Data Tools:
• Apache Hadoop: A framework for processing large data sets in a
distributed computing environment.
• Apache Spark: An open-source unified analytics engine for large-scale
data processing.
• Database Management:
• SQL: A domain-specific language used in programming for managing
and manipulating relational databases.
• NoSQL Databases: Like MongoDB and Cassandra for handling large
volumes of unstructured data.
• Version Control:
• Git: A distributed version control system.
• Deep Learning
• TensorFlow: An end-to-end open-source platform for machine
learning.
• Keras: Provides a Python interface for artificial neural networks.
• PyTorch: Developed by Facebook's AI Research lab (FAIR), it provides
a flexible framework for deep learning research and development.
• Natural Language Processing (NLP)
• NLTK (Natural Language Toolkit): A suite of libraries and programs
for symbolic and statistical natural language processing.
• SpaCy: An open-source software library for advanced NLP.
• TextBlob: Simplifies text processing and provides a consistent API for
diving into common NLP tasks.
• Data Storage and Retrieval
• SQLAlchemy: A SQL toolkit and Object-Relational
Mapping (ORM) library for Python.
• PyMongo: A native Python driver for MongoDB.
• h5py: Provides a Pythonic interface to the HDF5 binary
data format.
Responsibilities
• Data Science:
• Definition: Data science involves using mathematics, statistics, programming, data analytics, AI,
and machine learning expertise to discover insights hidden in a dataset.

• Responsibilities: Data scientists collect and analyze data, develop machine learning models, and
share actionable insights with stakeholders. They work with raw and unstructured data, clean it,
and extract patterns and trends.

• Average Salary (as of July 2024): Data scientists can expect to earn an average of $157,000.
• Computer Science:
• Definition: Computer science is the study of computer hardware and software. It covers areas
like software engineering, programming, mathematics, probability, statistics, data analysis,
machine learning, and network design.

• Responsibilities: Computer scientists maintain infrastructure performance, design networks, and


develop new products. They understand how hardware and software work.
• Average Salary (as of July 2024)
: Computer scientists can expect to earn an average of $129,000
Openings for Data Science
• Data Scientist
• Data Analyst
• Data Engineer
• Data Architect
• Data Administrator
• Business Analyst
• Business Intelligence Manager
• Data Storyteller
• Machine Learning Scientist
• Machine Learning Engineer
• Business Intelligence Developer
• Database Administrator
• Technology Specialized Roles

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy