0% found this document useful (0 votes)
5 views3 pages

Unit 1 DS

Uploaded by

Deepesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Unit 1 DS

Uploaded by

Deepesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Unit 1 DS

1. Data Integration: Data integration ka matlab hai alag-alag sources se data ko ikattha karna
aur ek jagah par unify karna. Jaise databases, excel sheets, APIs, etc. Iska fayda yeh hota hai
ki hum ek saath sab data ko access kar sakte hain, aur analysis asaan ho jata hai. Yeh real-
world applications jaise business intelligence aur decision-making mein zaroori hota hai.

2. Outlier Detection aur Removal: Outlier wo data points hote hain jo baaki data se kaafi
different ya extreme hote hain. Inko detect karna zaroori hota hai kyunki yeh data ko distort
kar sakte hain. Outliers ko detect karke unhe remove karna ya handle karna model ki
accuracy aur performance ke liye zaroori hota hai.

3. Nominal aur Ordinal Data Scales:

o Nominal: Isme data ko categories mein divide karte hain jisme koi order nahi hota,
jaise colors (red, blue, green) ya gender (male, female).

o Ordinal: Isme categories ke beech ek natural order hota hai, jaise ranks (first, second,
third) ya rating scale (bad, average, good).

4. Google Colab ke advantages machine learning mein:

o Free aur cloud-based environment hai, toh hardware configuration ki tension nahi
hoti.

o Powerful GPUs aur TPUs ka access milta hai, jo training ko fast banata hai.

o Sab kuch browser mein hota hai, toh system resources consume nahi hote.

o Collaboration ka option hai, toh multiple log ek hi notebook par kaam kar sakte hain.

5. Feature Selection ka importance: Feature selection se hum important features ko choose


karte hain jo model ke liye relevant hote hain. Isse model ka performance better hota hai aur
training faster hoti hai kyunki hum irrelevant data ko hata dete hain.

Yeh sab concepts tumhare exam mein kaafi useful rahenge. Best of luck!

4o

You said:

these one alseo What is Data Science? Explain its importance in today's world. Differentiate between
Data Science, Machine Learning, and Artificial Intelligence. What is Data Preprocessing? Explain the
key steps involved. Explain the importance of Data Cleaning and give methods to handle missing
data. Discuss Data Transformation techniques like Normalization and Standardization. What are the
different Data Scales? Explain each with examples. Explain the use and advantages of Google Colab in
Data Science. What is Sampling? Explain types like Random and Stratified Sampling. What is the role
of Python in Data Science? Discuss key features of Python. Explain the types of Data Visualizations
used in Data Science.

ChatGPT said:

ChatGPT
Memory updated

1. Data Science kya hai aur iska importance:


Data Science ek field hai jo data ko samajhne aur use karne ke liye scientific methods,
processes, algorithms ka use karti hai. Aaj ke digital world mein companies ke paas bohot
saara data hota hai, toh Data Science unhe samajhne aur insights derive karne ke kaam aati
hai. Isse decisions ko data-driven banaya jata hai, jo business aur technology dono mein
helpful hai.

2. Data Science, Machine Learning aur Artificial Intelligence ka fark:

o Data Science: Yeh ek broader field hai jo data ko analyze karke insights find karti hai.

o Machine Learning (ML): Yeh Data Science ka part hai jisme algorithms banake
computers ko data se seekhne diya jata hai bina explicitly program kiye.

o Artificial Intelligence (AI): Yeh broader field hai jo intelligent systems develop karti
hai jo human-like decisions le sake. Machine Learning AI ka ek subset hai.

3. Data Preprocessing kya hai aur key steps:


Data Preprocessing ka matlab raw data ko clean aur process karna taki wo model ke liye
ready ho jaye. Important steps hain:

o Data Cleaning: Missing values ko handle karna.

o Data Transformation: Data ko normalize ya standardize karna.

o Data Reduction: Unnecessary data ko hatana.

o Feature Selection: Relevant features ko select karna.

4. Data Cleaning ka importance aur missing data handle karne ke methods:


Data Cleaning bahut zaroori hai kyunki agar data me errors ya missing values ho to model ki
accuracy down ho sakti hai. Missing data ko handle karne ke methods:

o Remove karna: Wo rows ya columns jisme bohot saare missing values ho, unhe hata
sakte hain.

o Impute karna: Mean, median ya mode se missing values ko fill karna.

5. Data Transformation techniques: Normalization aur Standardization:

o Normalization: Data ko ek fixed range me convert karna (e.g., 0 to 1). Yeh tab useful
hai jab data ka scale bohot different ho.

o Standardization: Data ko is tarah transform karna taki mean 0 aur standard deviation
1 ho jaye. Yeh tab useful hai jab distribution normal ho.

6. Different Data Scales:

o Nominal: Categories without order (e.g., colors: red, blue).

o Ordinal: Categories with order (e.g., ranks: first, second).

o Interval: Ordered categories with known intervals but no true zero (e.g.,
temperature).
o Ratio: Like interval, but with a true zero (e.g., height, weight).

7. Google Colab ka use aur advantages Data Science mein:


Free cloud environment hai jo ML/DL tasks ke liye GPUs provide karta hai. Python aur
TensorFlow libraries already installed hote hain, toh setup ka time bacha sakte hain. Isme
collaboration bhi asaan hai, jo teamwork ke liye beneficial hai.

8. Sampling kya hai aur types like Random and Stratified Sampling:

o Random Sampling: Har element ka equal chance hota hai select hone ka.

o Stratified Sampling: Population ko groups (strata) me divide karke un groups se


proportionate samples liye jate hain.

9. Python ka role Data Science mein aur key features:


Python Data Science ke liye popular language hai kyunki isme bohot saare libraries hain jaise
Pandas, NumPy, Matplotlib, aur TensorFlow. Python easy-to-learn hai aur iska syntax clean
aur simple hai.

10. Types of Data Visualizations in Data Science:

• Bar Charts: Category-wise comparison.

• Line Charts: Trends over time dikhane ke liye.

• Scatter Plots: Two variables ke relationship ko visualize karne ke liye.

• Histograms: Data distribution ko dikhane ke liye.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy