Unit 1 DS
Unit 1 DS
1. Data Integration: Data integration ka matlab hai alag-alag sources se data ko ikattha karna
aur ek jagah par unify karna. Jaise databases, excel sheets, APIs, etc. Iska fayda yeh hota hai
ki hum ek saath sab data ko access kar sakte hain, aur analysis asaan ho jata hai. Yeh real-
world applications jaise business intelligence aur decision-making mein zaroori hota hai.
2. Outlier Detection aur Removal: Outlier wo data points hote hain jo baaki data se kaafi
different ya extreme hote hain. Inko detect karna zaroori hota hai kyunki yeh data ko distort
kar sakte hain. Outliers ko detect karke unhe remove karna ya handle karna model ki
accuracy aur performance ke liye zaroori hota hai.
o Nominal: Isme data ko categories mein divide karte hain jisme koi order nahi hota,
jaise colors (red, blue, green) ya gender (male, female).
o Ordinal: Isme categories ke beech ek natural order hota hai, jaise ranks (first, second,
third) ya rating scale (bad, average, good).
o Free aur cloud-based environment hai, toh hardware configuration ki tension nahi
hoti.
o Powerful GPUs aur TPUs ka access milta hai, jo training ko fast banata hai.
o Sab kuch browser mein hota hai, toh system resources consume nahi hote.
o Collaboration ka option hai, toh multiple log ek hi notebook par kaam kar sakte hain.
Yeh sab concepts tumhare exam mein kaafi useful rahenge. Best of luck!
4o
You said:
these one alseo What is Data Science? Explain its importance in today's world. Differentiate between
Data Science, Machine Learning, and Artificial Intelligence. What is Data Preprocessing? Explain the
key steps involved. Explain the importance of Data Cleaning and give methods to handle missing
data. Discuss Data Transformation techniques like Normalization and Standardization. What are the
different Data Scales? Explain each with examples. Explain the use and advantages of Google Colab in
Data Science. What is Sampling? Explain types like Random and Stratified Sampling. What is the role
of Python in Data Science? Discuss key features of Python. Explain the types of Data Visualizations
used in Data Science.
ChatGPT said:
ChatGPT
Memory updated
o Data Science: Yeh ek broader field hai jo data ko analyze karke insights find karti hai.
o Machine Learning (ML): Yeh Data Science ka part hai jisme algorithms banake
computers ko data se seekhne diya jata hai bina explicitly program kiye.
o Artificial Intelligence (AI): Yeh broader field hai jo intelligent systems develop karti
hai jo human-like decisions le sake. Machine Learning AI ka ek subset hai.
o Remove karna: Wo rows ya columns jisme bohot saare missing values ho, unhe hata
sakte hain.
o Normalization: Data ko ek fixed range me convert karna (e.g., 0 to 1). Yeh tab useful
hai jab data ka scale bohot different ho.
o Standardization: Data ko is tarah transform karna taki mean 0 aur standard deviation
1 ho jaye. Yeh tab useful hai jab distribution normal ho.
o Interval: Ordered categories with known intervals but no true zero (e.g.,
temperature).
o Ratio: Like interval, but with a true zero (e.g., height, weight).
8. Sampling kya hai aur types like Random and Stratified Sampling:
o Random Sampling: Har element ka equal chance hota hai select hone ka.