Data science is the domain of computer science where we extract insights from available data with the help of scientific methods, algorithms and statistics. Data Sciences majorly work around analyzing the data and when it comes to AI, the analysis helps in making the machine intelligent enough to perform tasks by itself.
2) What are the sources of Data collection
There exist various sources of data from where we can collect any type of data required and the data collection process can be categorized in two ways: Offline and Online Offline data collection Online data collection Sensors & camera Open-sourced Government Portals (data.gov.in, India.gov.in) Surveys World Organisations’ open- sourced statistical websites Interviews Reliable Websites Observations Google Kaggle, web scraping
3) What are the points to be kept in mind while doing data
collection 1. Data which is available for public usage only should be taken up. 2. Personal datasets should only be used with the consent of the owner. 3. One should never breach someone’s privacy to collect data. 4. Data should only be taken from reliable sources as the data collected from random sources can be wrong or unusable. 5. Reliable sources of data ensure the authenticity of data which helps in proper training of the AI model.
4) What are the different formats in which data is stored
Usually the data collected for Data Science is in the form of tables. These tabular datasets can be stored in different formats. Some of the commonly used formats are: 1. CSV: CSV stands for comma separated values. It is a simple file format used to store tabular data. Each line of this file is a data record and each record consists of one or more fields which are separated by commas. Since the values of records are separated by a comma, hence they are known as CSV files. 2. Spreadsheet: A Spreadsheet is a piece of paper or a computer program which is used for accounting and recording data using rows and columns into which information can be entered. Microsoft excel is a program which helps in creating spreadsheets. 3. SQL: SQL is a programming language also known as Structured Query Language. It is a domain specific language used in programming and is designed for managing data held in different kinds of DBMS (Database Management System) It is particularly useful in handling structured data. 5) Explains some Applications of Data Sciences There exist various applications of Data Science in today’s world. Some of them are: Fraud and Risk Detection*: Banking companies learn to divide and conquer data via customer profiling, past expenditures, and other essential variables to analyse the probabilities of risk and default. Moreover, it also helped them to push their banking products based on customer’s purchasing power. Genetics & Genomics*: Data Science applications enable an advanced level of treatment personalization through research in genetics and genomics. Data science techniques allow integration of different kinds of data with genomic data in disease research, which provides a deeper understanding of genetic issues in reactions to particular drugs and diseases. Internet Search*: Search engines like Yahoo, Bing, Ask, AOL, Google) make use of data science algorithms to deliver the best result for our searched query in the fraction of a second. Google processes more than 20 petabytes of data every day, Targeted Advertising*: The entire digital marketing spectrum Starting from the display banners on various websites to the digital billboards at the airports use data science algorithms. . They can be targeted based on a user’s past behaviour. Website Recommendations: websites like Amazon help to find relevant products from billions of products available with them but also add a lot to the user experience. Internet giants like Amazon, Twitter, Google Play, Netflix, LinkedIn, IMDB and many more use this system to improve the user experience and to promote their products. The recommendations are made based on previous search results for a user. Airline Route Planning*: The Airline Industry use data Science to identify the strategic areas of improvements. Using Data Science, the airline companies can • Predict flight delay • Decide which class of airplanes to buy • Whether to directly land at the destination or take a halt in between • Effectively drive customer loyalty programs
6) What are packages in Python . Explain any 3 packages.
A collection of relevant modules saved under the same directory and a name is called a Package. Some of the open- source packages available needed for Artificial Intelligence are: • NumPy: Numerical Array Data Handling Package. It is used for data analysis and calculation related to large numerical data sets. • Matplotlib: Data Visualization Package. It is used for the graphical representation to produce high quality data visualization of the numerical data. • Pandas : Pandas is a software library written for the Python programming language for data manipulation and analysis. data structures and operations for manipulating numerical tables and time series.