Sushil 7th (1 PDF
Sushil 7th (1 PDF
Sushil 7th (1 PDF
An Internship report on
DATA SCIENCE
Submitted By
Sushil Meher
[ Roll No : 21134501032 ]
[ B.Tech (C.S.E) VIIth ]
Date-17/12/2024 SIGNATURE
Sushil Meher
1
CERTIFICATE
2
ACKNOWLEDGEMENT
We would like to express our deepest gratitude to all people for sprinkling their
help and kindness in the completion of this Project. We would like to start this
moment by invoking our purest gratitude to M P Thapliyal, Department of
Computer Science and Engineering, Hemvati Nandan Bahuguna Garhwal
University (A Central University), Srinagar (Garhwal), Uttarakhand, our
project instructor.
The completion of this seminar could not have been possible without his
expertise and invaluable guidance in every phase at Hemvati Nandan
Bahuguna Garhwal University (A Central University), Srinagar (Garhwal),
Uttarakhand for helping us.
3
CERTIFICATE
4
ABSTRACT
The area of Machine Learning deals with the design of programs that can learn
rules from data, adapt to changes, and improve performance with experience. In
addition to being one of the initial dreams of Computer Science, Machine
Learning has become crucial as computers are expected to solve increasingly
complex problems and become more integrated into our daily lives. This is a
hard problem, since making a machine learn from its computational tasks
requires work at several levels, and complexities and ambiguities arise at each of
those levels.
So, here we study how Machine learning takes place, what are the methods,
discuss various Projects (Implemented during Training) applications, present
and future status of machine learning.
5
CHAPTER 1: INTRODUCTION
Training in data science with artificial intelligence (AI) and machine learning
(ML) is an exciting and dynamic field that equips individuals with the skills to
extract valuable insights, automate decision-making processes, and unlock the
potential of data. This training encompasses a wide array of knowledge and
practical expertise.
6
TECHNICAL TRAINING PLATFORM
VS Code
Visual Studio Code (VS Code) is increasingly important in data science due to
its versatility, ease of use, and extensive extension ecosystem. Data scientists
can leverage VS Code for several critical tasks. It supports various
programming languages commonly used in data science, such as Python and R,
making it a unified environment for coding, data manipulation, and analysis. VS
Code's extensions enable integration with Jupyter notebooks, version control
systems, and data visualization libraries. It offers a streamlined interface for
writing code, running experiments, and collaborating with teams, making it an
invaluable tool for data scientists seeking efficiency and productivity in their
workflow.
Jupyter Notebook
Kaggle
Data Science, Artificial Intelligence (AI), and Machine Learning (ML) are
transformative fields that have revolutionized how businesses, organizations,
and researchers analyze and extract insights from data. In this introduction, we'll
explore the fundamental concepts and their interplay in these domains.
Before developing a web site once should keep several aspects in mind like:
● Data Collection: Gathering relevant data from various sources, such as
databases, APIs, and sensors.
● Data Cleaning and Preprocessing: Ensuring data is accurate, complete, and
ready for analysis.
● Exploratory Data Analysis (EDA): Examining data visually and statistically
to discover patterns.
● Feature Engineering: Selecting or creating relevant variables for analysis.
● Machine Learning: Building predictive models and making data-driven
decisions.
8
CHAPTER 3: HARDWARE AND SOFTWARE
REQUIREMENT
HARDWARE REQUIRED :
SOFTWARE REQUIRED:
9
CHAPTER 4: TOOLS
4.1 Introduction
Fundamental tools in data science serve as the cornerstone for various tasks in
data analysis and machine learning. Python, a versatile programming language,
is the linchpin of the data science toolkit. It's complemented by Jupyter
Notebook, an interactive environment perfect for data exploration and
documentation. Pandas, a robust library, takes center stage in data manipulation
and analysis, particularly suited for structured data. Data visualization is
achieved through Matplotlib, a versatile plotting library, and Seaborn, which
simplifies creating appealing statistical graphics. For machine learning
endeavors, Scikit-Learn provides essential algorithms and tools, making it
accessible for beginners and powerful for experts. Version control is essential,
and Git is the industry standard for tracking code changes and collaboration.
These core tools empower data scientists to clean, explore, and analyze data, as
well as develop machine learning models. While more specialized tools may be
necessary for certain projects, these foundational tools remain indispensable and
are the starting point for anyone venturing into the field of data science.
4.2 Features
Data science involves collecting and analyzing data to derive insights,
employing machine learning for predictions, and using visualization to
communicate results.
10
CHAPTER 5:PYTHON
4.1 Introduction
11
Operators
Data Types
Variables:
Variables are used to store data. In Python, you can create a variable by
assigning a value to a name, like x = 10.
Conditional Statements:
Conditional statements allow you to make decisions in your code using if, elif
(else if), and else.
12
For example:
if x > 10:
elif x == 10:
else:
Loops:
for i in range(5):
print(x) x += 1
Functions:
Functions allow you to group code into reusable blocks. You can define a
function using the
def greet(name):
message = greet("Alice")
print(message)
13
CHAPTER 6: STATISTICS
Statistics plays a vital role in various fields, including science, economics, and
social sciences, enabling researchers and analysts to make informed decisions,
test hypotheses, and build predictive models based on empirical evidence and
data patterns. In data science,statistics forms the basis for deriving actionable
insights from large datasets.
variance.
samples.
and binomial.
14
➢ Bayesian Statistics (Optional): Deals with uncertainty and probabilistic
modeling.
Mode
Code
import pandas as pd
print(mode_data)
15
Mean
import pandas as pd
print(mean_data)
Median
import pandas as pd
print(median_data)
16
5.4: Probability Distribution
17
CHAPTER 7: GRAPHS
6.2 Types of Graphs Here are some common types of graphs used in data
science:
➢ Bar Charts: Bar charts are used to display and compare categorical data.
They represent categories on one axis and the corresponding values on the
other, typically using vertical or horizontal bars.
18
Scatter Plots:
Scatter plots show individual data points as dots on a two-dimensional plane.
They are used to visualize the relationship between two continuous variables
and identify patterns or correlations.
19
➢ Line Charts:
Line charts display data points as connected lines, often used to show trends or
changes in data over time.
➢ Pie Charts:
Pie charts represent parts of a whole, where each slice corresponds to a
percentage of the total. They are used to visualize the composition of a dataset.
20
FINAL PROJECT
SNAPSHOT :
21
Helper Function :
22
Regulization Parameter :
OUTPUT :
23
OUTPUT :
24
25
CONCLUSION
26
FUTURE SCOPE
The future of data science, AI, and ML is poised for significant expansion and
impact. AI and ML will increasingly underpin advanced applications in various
sectors, from healthcare to finance. As data generation continues to explode,
data scientists will be at the forefront of managing and deriving insights from
big data. Ethical considerations surrounding AI fairness, transparency, and
accountability will become even more crucial.
Security and privacy challenges will intensify with increased reliance on AI and
data, demanding innovative solutions. Research and development will continue
to drive breakthroughs in AI algorithms and architectures. In summary, the
future of data science, AI, and ML holds immense potential for transforming
industries, improving decision-making, and shaping a technologically advanced
future.
27
REFERENCES
➢ https://www.kaggle.com/learn/overview
➢ https://www.edx.org/micromasters/data-science
➢ https://www.fast.ai/
➢ https://towardsdatascience.com/
➢ https://www.youtube.com/user/joshstarmer
➢ https://github.com/josephmisiti/awesome-machine-learning
➢https://github.com/campusx-official/book-recommender-system
/commit/
678c7ab5a67adfcafaadf5b2924e4d04acafe9ac#diff5983284b94671
de74632c367234334917d7e2de10e4be9c255afb37e33f5352e
➢ https://www.youtube.com/user/sentdex
➢ https://www.coursera.org/specializations/deep-learning
➢https://github.com/ChristosChristofidis/awesome-deep-learning
➢ https://youtu.be/1YoD0fg3_EM?feature=shared
28