0% found this document useful (0 votes)
14 views

Full Lesson (1)

The document outlines the course 'Data Science Foundations' (Unit Reference Number: F/650/5562) at Level 7, which covers the scope of data science, the roles of data scientists, and key disciplines such as analytics, statistics, and machine learning. It emphasizes the importance of data-driven decision-making and provides real-world applications across various industries. Additionally, it includes coding practices in Python and R, along with hands-on exercises for building machine learning models.

Uploaded by

schrishan.online
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Full Lesson (1)

The document outlines the course 'Data Science Foundations' (Unit Reference Number: F/650/5562) at Level 7, which covers the scope of data science, the roles of data scientists, and key disciplines such as analytics, statistics, and machine learning. It emphasizes the importance of data-driven decision-making and provides real-world applications across various industries. Additionally, it includes coding practices in Python and R, along with hands-on exercises for building machine learning models.

Uploaded by

schrishan.online
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Data Science Foundations

F/650/5562

W.M.Chirantha Jananath Thiwanka Kithulwatta


Mphil in Computer Science, BSc (Hons) in Software Engineering,
CTHE, MIEEE, MYSF, MIS, MSLAIHEE, MKLNOG

1
Course Details
▪ Unit Reference Number: F/650/5562

▪ Unit Title: Data Science Foundations

▪ Unit Level : 7

▪ Number of Credits: 20

2
Chapter 01
▪ Understand the scope of Data Science and the roles of Data Scientists

3
Overall Chapter Objectives
▪ To define the landscape of Data Science.

▪ To evaluate key topics of Data Science.

▪ To assess the role of a Data Scientist in comparison to other IT roles.

4
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Objectives of the lesson: To define the landscape of Data Science and evaluate its key topics,
focusing on the foundational skills required for a career in this field.

5
Lecture 1: Introduction to Data Science & Key
Disciplines
1. Introduction to Data Science

6
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ 1. Introduction to Data Science

7
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Introduction the concept of Data Science and its relevance.
▪ Definition of Data Science: Data Science is an interdisciplinary field that uses scientific
methods, processes, algorithms, and systems to extract insights and knowledge from
structured and unstructured data.

8
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Why Data Science?: The vast amount of data generated today makes it necessary for
organizations to find meaningful patterns that inform decision-making. Data science offers
the tools to handle, analyze, and interpret this data.

9
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ THINK …
▪ What’s the difference between traditional data analysis and Data Science? How does
machine learning fit into the picture?

10
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Importance of Data Science
▪ Data-Driven Decision Making: Businesses now rely on data to make informed decisions,
improve strategies, and gain competitive advantages.
▪ Automation and Efficiency: Data Science automates processes and enables machines to
make decisions (e.g., recommendation systems, chatbots).

11
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Predictive Analytics: The power of data science lies in its predictive capabilities, which can
forecast future trends based on historical data (e.g., stock market trends, demand
forecasting).

12
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Example:
▪ Netflix: Netflix uses data science to recommend shows to users by analyzing their
past viewing behavior.
▪ Healthcare: Predicting patient outcomes and disease trends using data from
electronic medical records.

13
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ THINK …
▪ Can you think of another industry where data science plays a crucial role?"

14
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Core Disciplines of Data Science
▪ Analytics: The process of examining data sets to uncover hidden patterns, correlations, and
trends.
▪ Statistics: Central to hypothesis testing, data interpretation, and deriving inferences from sample
data.
▪ Machine Learning: The study of algorithms that improve automatically through experience,
allowing computers to make data-driven decisions without being explicitly programmed.

15
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Example: In machine learning, algorithms like decision trees or neural networks can classify
images or predict outcomes based on input data. These models are trained on vast amounts
of data to "learn" patterns.

16
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Real-World Applications of Data Science
▪ Finance: Fraud detection, risk management, and algorithmic trading.
▪ Healthcare: Personalized medicine, disease prediction, and drug discovery.
▪ Retail: Customer behavior analysis, personalized marketing, and supply chain optimization.
▪ Technology: AI-driven services, like voice assistants (e.g., Siri, Alexa) and autonomous
vehicles.
▪ Social Media: Analyzing user data to improve engagement and target advertisements

17
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Real-World Applications of Data Science
▪ Finance: Fraud detection, risk management, and algorithmic trading.
▪ Healthcare: Personalized medicine, disease prediction, and drug discovery.
▪ Retail: Customer behavior analysis, personalized marketing, and supply chain optimization.
▪ Technology: AI-driven services, like voice assistants (e.g., Siri, Alexa) and autonomous
vehicles.
▪ Social Media: Analyzing user data to improve engagement and target advertisements

18
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Real-World Example 1:
▪ Amazon: Use of predictive analytics to suggest products to customers based on browsing and
purchasing history.
▪ Real-World Example 2:
▪ Google Maps: Uses data science to predict traffic patterns and provide optimized travel
routes.

19
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ THINK …
▪ Which real-world data science applications have you noticed in your day-to-day life? How
do you think they work?

20
Lecture 1: Introduction to Data Science & Key
Disciplines
2 Mathematical and Statistical Skills

21
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Introduction to Mathematical and Statistical Skills for Data Science
▪ Importance of Mathematics and Statistics in Data Science: Mathematics provides the
theoretical foundation, while statistics helps make sense of data, draw conclusions, and identify
patterns.
▪ Key Role of These Skills in Data Science: Essential for building machine learning models,
analyzing large datasets, and making informed decisions based on probability and data analysis.

22
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ THINK …
▪ Discuss how Google uses statistics to improve search results based on user behavior.

23
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Core Mathematical and Statistical Concepts
▪ Probability Theory
▪ Definition: Probability theory helps in quantifying uncertainty and modeling random
events in data.

24
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Key Concepts:
▪ Basic Probability: P(Event) = Number of favorable outcomes / Total number of outcomes.
▪ Conditional Probability: The probability of an event given that another event has occurred
(used in Bayesian analysis).
▪ Bayes’ Theorem: A method to update the probability of a hypothesis based on new evidence.

25
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Example:
▪ Calculate the probability of drawing a red card from a deck of cards or determining the
chance of an email being spam based on certain features (common in spam filters).

26
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Core Mathematical and Statistical Concepts
▪ Descriptive and Inferential Statistics
▪ Definition:
▪ Descriptive Statistics summarize or describe the main features of a dataset (mean,
median, mode).
▪ Inferential Statistics help make predictions or inferences about a population based on a
sample.

27
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Key Concepts:
▪ Mean, Median, and Mode: Measures of central tendency.
▪ Standard Deviation: Measures how spread out the values in a dataset are.
▪ Hypothesis Testing: Procedure to test if a hypothesis about a dataset is true (e.g., t-test, chi-
square test).

28
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Example:
▪ Discuss how descriptive statistics can summarize sales data for a company, and how
inferential statistics can predict future sales based on a sample of historical data.

29
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Core Mathematical and Statistical Concepts
▪ Linear Algebra and Calculus
▪ Definition:
▪ Linear Algebra: Fundamental for understanding data matrices, transformations, and
optimization problems in machine learning.
▪ Calculus: Necessary for optimizing models (e.g., gradient descent in machine learning).

30
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Key Concepts:
▪ Vectors and Matrices: Representing data points and operations in a dataset.
▪ Eigenvalues and Eigenvectors: Used in dimensionality reduction techniques like Principal
Component Analysis (PCA).
▪ Derivatives: Essential for understanding optimization in machine learning (e.g., how to
minimize error functions).

31
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Key Takeaways:
▪ Probability Theory: Helps in dealing with uncertainty and making predictions.
▪ Descriptive and Inferential Statistics: Provide tools for summarizing data and making
inferences about a population from a sample.
▪ Linear Algebra and Calculus: Crucial for understanding data structure and optimizing
models in machine learning.

32
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ THINK …
▪ Which mathematical or statistical tool do you think is the most important for a Data
Scientist and why?

33
Lecture 1: Introduction to Data Science & Key
Disciplines
3 Coding & Applied Mathematics

34
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Coding & Applied Mathematics in Data Science
▪ Python & R Basics:
▪ Python: A popular language for data manipulation, analysis, and machine learning due to its
simplicity and wide support.
▪ R: Focused more on statistical analysis and visualization, but equally powerful.

35
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Coding & Applied Mathematics in Data Science
▪ Libraries Used in Data Science:
▪ Python: Pandas (data manipulation), NumPy (numerical computing), Matplotlib/Seaborn
(visualization), scikit-learn (machine learning).
▪ R: dplyr (data manipulation), ggplot2 (visualization), caret (machine learning).

36
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ THINK …
▪ Explain how Python is often used for data processing and modeling in real-world business
scenarios (e.g., retail, finance).

37
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Applied Mathematics for Machine Learning
▪ a) Linear Regression
▪ Definition: Linear regression models the relationship between a dependent variable and one or
more independent variables using a linear equation.
▪ Key Concepts:
▪ Equation: y = mx + b + error (for simple linear regression).
▪ Slope (m): Measures the relationship between the independent variable (x) and the
dependent variable (y).
▪ Intercept (b): The value of y when x = 0.

38
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Applied Mathematics for Machine Learning
▪ b) Classification Algorithms
▪ Definition: Classification is the process of predicting a label or category for a given data point.
▪ Key Algorithms:
▪ Logistic Regression: Predicts probabilities and assigns labels based on thresholds (e.g., spam
vs. non-spam emails).
▪ Decision Trees: A tree-like model of decisions used for classification tasks.
▪ Example: Classify whether a customer will purchase a product based on their past behavior (e.g.,
predict 1 for purchase, 0 for no purchase).

39
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Applied Mathematics for Machine Learning
▪ c) Clustering Algorithms
▪ Definition: Clustering groups similar data points into clusters without predefined labels.
▪ Key Algorithm:
▪ K-Means Clustering: Partitions data into k clusters based on similarity.
▪ Example: Segment customers into different groups based on purchasing behavior (e.g., high-
spenders vs. low-spenders).

40
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Live Coding: Basic Data Manipulation in Python
▪ Task 1: Loading a CSV File
▪ Library: Pandas (import pandas as pd)
▪ Code:
import pandas as pd
# Load a CSV file
data = pd.read_csv('sample_data.csv')
print(data.head())

41
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Live Coding: Basic Data Manipulation in Python
▪ Task 2: Data Cleaning
▪ Handling missing values, basic filtering.
▪ Code:
# Check for missing values
print(data.isnull().sum())
# Drop rows with missing values
clean_data = data.dropna()
print(clean_data.head())

42
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Live Coding: Basic Data Manipulation in Python
▪ Task 3: Data Visualization
▪ Visualize the relationship between two variables.
▪ Code:
import matplotlib.pyplot as plt
# Scatter plot to visualize relationship between Large_LOH_Telomere and Number_of_Breakpoints
plt.scatter(data['Large_LOH_Telomere'], data['Number_of_Breakpoints'])
plt.xlabel('Large LOH Telomere')
plt.ylabel('Number of Breakpoints')
plt.title('Large LOH Telomere vs Number of Breakpoints')
plt.show()
43
Lecture 1: Introduction to Data Science & Key
Disciplines

44
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Updated Hands-On Exercise: Linear Regression Model
▪ In this section, we'll build a linear regression model to predict the "Number_of_Breakpoints" based
on several other variables like "Large_LOH_Telomere" and "Small_LOH_Centromere.“
▪ STEP 01: Import Necessary Libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

45
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Updated Hands-On Exercise: Linear Regression Model
▪ In this section, we'll build a linear regression model to predict the "Number_of_Breakpoints" based
on several other variables like "Large_LOH_Telomere" and "Small_LOH_Centromere.“
▪ STEP 02: Prepare the Data
▪ We’ll split the dataset into features (X) and the target variable (y).
# Features: All columns except 'Number_of_Breakpoints' and 'Label'
X = data.drop(columns=['Number_of_Breakpoints', 'Label'])
# Target: 'Number_of_Breakpoints'
y = data['Number_of_Breakpoints']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
46
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Updated Hands-On Exercise: Linear Regression Model
▪ In this section, we'll build a linear regression model to predict the "Number_of_Breakpoints" based
on several other variables like "Large_LOH_Telomere" and "Small_LOH_Centromere.“
▪ STEP 03: Train the Model
▪ Fit a linear regression model using the training data.
# Create and fit the model
model = LinearRegression()
model.fit(X_train, y_train)

47
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Updated Hands-On Exercise: Linear Regression Model
▪ In this section, we'll build a linear regression model to predict the "Number_of_Breakpoints" based
on several other variables like "Large_LOH_Telomere" and "Small_LOH_Centromere.“
▪ STEP 04: Make Predictions and Evaluate
▪ Now, we'll make predictions using the test set and evaluate the model.
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

48
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Updated Hands-On Exercise: Linear Regression Model
▪ In this section, we'll build a linear regression model to predict the "Number_of_Breakpoints" based
on several other variables like "Large_LOH_Telomere" and "Small_LOH_Centromere.“
▪ STEP 05: Interpretation
▪ Discuss the output MSE: A lower MSE indicates a better fit of the model to the test
data.Optionally, visualize the predicted vs actual values to see the performance of the model.
# Plot actual vs predicted values
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Number of Breakpoints')
plt.ylabel('Predicted Number of Breakpoints')
plt.title('Actual vs Predicted Breakpoints')
plt.show() 49
Lecture 1: Introduction to Data Science & Key
Disciplines

50
Lecture 1: Introduction to Data Science & Key
Disciplines
4. Machine Learning & AI Overview

51
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Machine Learning & AI Overview
▪ Definition of AI and Machine Learning
▪ Artificial Intelligence (AI): Technology enabling machines to mimic human intelligence.
▪ Machine Learning (ML): A subset of AI that allows systems to learn and improve from
experience without being explicitly programmed.
▪ Importance in Today’s World
▪ Look at AI applications in daily life: virtual assistants, recommendation systems, and
healthcare advancements.

52
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Machine Learning & AI Overview
▪ Types of Machine Learning
▪ Supervised Learning
▪ Learning with labeled data (e.g., email spam detection).
▪ Examples: Image classification, fraud detection.
▪ Unsupervised Learning
▪ Learning without labeled data (e.g., clustering customers based on behavior).
▪ Examples: Market segmentation, recommendation systems.

53
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Machine Learning & AI Overview
▪ Role of AI in Data Science
▪ AI and Machine Learning in Data Science
▪ AI and ML enable predictive analytics and pattern recognition in massive datasets.
▪ Examples of AI-driven insights: Predictive maintenance in manufacturing,
recommendation systems in e-commerce.

54
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures

55
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ What is Data Warehousing?
▪ A data warehouse is a central repository where data is stored from various sources for
reporting, analysis, and decision-making.
▪ Note that unlike traditional databases, data warehouses are optimized for querying and
analysis rather than transaction processing.

56
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ Key Concepts:
▪ Data Storage: How data is collected from various operational databases and systems into the
data warehouse.
▪ Data Extraction: Process of retrieving relevant data from the warehouse using ETL (Extract,
Transform, Load) tools.
▪ Data Transformation: Converting data into an appropriate format for analysis.

57
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ Importance in Data Science:
▪ Data warehouses support business intelligence (BI) tools, allowing data scientists and analysts
to gain insights from large amounts of historical data.
▪ Mention real-world applications: retail, healthcare, finance industries, and how they use data
warehousing to make data-driven decisions.

58
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ Overview of Data Structures:
▪ Why Data Structures Matter in Data Warehousing:
▪ Data structures define how data is stored, organized, and retrieved efficiently.
▪ In data warehouses, proper data structure design helps improve the speed and efficiency
of queries and data extraction.

59
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ Overview of Data Structures:
▪ Types of Data Structures:
▪ Arrays: Fixed-size, contiguous blocks of memory. Good for simple, indexed data storage.
Example: Storing a list of sales for each day of the month.
▪ Linked Lists: A sequence of nodes, each containing data and a reference to the next
node. Useful for dynamic data structures.
▪ Trees: Hierarchical structures that store data in a parent-child relationship (e.g., binary
trees, B-trees). Efficient for searching and sorting. Example: Storing hierarchical company
data.
60
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ Overview of Data Structures:
▪ Types of Data Structures:
▪ Graphs: Consists of nodes (vertices) and edges (connections). Useful for representing
complex relationships like social networks or logistics networks.

61
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ Data Warehousing & Data Structures
▪ Data Extraction Process:
▪ Walk through the ETL process:
▪ Extract: Collecting raw data from multiple sources.
▪ Transform: Applying rules or functions to clean and standardize the data.
▪ Load: Loading the cleaned data into the data warehouse for analysis.

62
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ ETL

63
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ ETL

64
Lecture 1: Introduction to Data Science & Key
Disciplines
▪ ETL

65
Thank
You!!!
66

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy