CH 5 Summary
CH 5 Summary
• Data refer to unorganised facts that can be processed to generate meaningful result or information.
• Hard Disk, SSD, CD/DVD, Pen Drive, Memory Card, etc. are some of the commonly used storage devices.
• Data Processing cycle involves input and storage of data, its processing and generating output.
• Mean, Median, Mode, Range, and Standard Deviation are some of the statistical techniques used for data
summarisation.
• Median is the mid value when data are sorted in ascending/descending order.
• Standard deviation is the positive square root of the average of squared difference of each value from the mean.
Ch 6 Summary
• Array is a data type that holds objects of same datatype (numeric, textual, etc.). The elements of an array are
stored contiguously in memory. Each element of an array has an index or position value.
• NumPy is a Python library for scientific computing which stores data in a powerful n-dimensional ndarray object for
faster calculations.
• Each element of an array is referenced by the array name along with the index of that element.
• All arithmetic operations can be performed on arrays when shape of the two arrays is same.
• NumPy arrays are not expandable or extendable. Once a numpy array is defined, the space it occupies in memory is
fixed and cannot be changed.
• numpy.loadtxt() and numpy.genfromtxt() are functions used to load data from files. The savetxt() function is used
to save a NumPy array to a text file.
Ch 7 Summary
• File system suffers from Data Redundancy, Data Inconsistency, Data Isolation, Data Dependence and Controlled
Data sharing.
• Database Management System (DBMS) is a software to create and manage databases. A database is a collection of
tables. • Database schema is the design of a database
• A database constraint is a restriction on the type of data that that can be inserted into the table.
• A query is a request to a database for information retrieval and data manipulation (insertion, deletion or update). It
is written in Structured Query Language (SQL).
• Relational DBMS (RDBMS) is used to store data in related tables. Rows and columns of a table are called tuples and
attributed respectively. A table is referred to as a relation.
• Destructions on data stored in a RDBMS is applied by use of keys such as Candidate Key, Primary Key, Composite
Primary Key, Foreign Key.
• Each column in a table represents a feature (attribute) of a record. Table stores the information for an entity
whereas a row represents a record.
• Each row in a table represents a record. A tuple is a collection of attribute values that makes a record unique.
• A tuple is a unique entity whereas attribute values can be duplicate in the table.
Ch 8 Summary
• Database is a collection of related tables. MySQL is a ‘relational’ DBMS. A table is a collection of rows and columns,
where each row is a record and columns describe the feature of records.
• SQL is the standard language for most RDBMS. SQL is case insensitive.
• USE statement is used for making the specified database as active database.
• Every attribute in a CREATE TABLE statement must have a name and a datatype.
• ALTER TABLE statement is used to make changes in the structure of a table like adding, removing or changing
datatype of column(s).
• The DESC statement with table name shows the structure of the table.
• The SELECT statement is used to retrieve data from one or more database tables.
• SELECT * FROM table_name displays data from all the attributes of that table.
• DISTINCT clause is used to eliminate repetition and display the values only once.
• The BETWEEN operator defines the range of values inclusive of boundary values.
• The IN operator selects values that match any value in the given list of values.
• LIKE clause is used for pattern matching. % and _ are two wild card characters. The percent (%) symbol is used to
represent zero or more characters. The underscore (_) symbol is used to represent a single character.