Sample Paper For Preparation
Sample Paper For Preparation
Sample Paper For Preparation
We have/will discuss(ed) in class about the following in detail. The answer you write in
ESE should be in detail.
Question: Explain the difference between CHAR and VARCHAR data types in MySQL.
● CHAR: CHAR is a fixed-length character data type in MySQL, meaning it stores a
fixed number of characters. When you define a CHAR column, you must specify
the maximum length of the data it can hold. Any unused space is padded with
spaces.
● VARCHAR: VARCHAR is a variable-length character data type in MySQL, meaning
it can store a variable number of characters up to a specified maximum length. It
only uses as much storage as necessary for the actual data, without padding
with spaces.
Question: Describe the purpose and syntax of the SELECT statement in MySQL. Provide
an example query.
The SELECT statement is used to retrieve data from one or more tables in a MySQL
database. Its basic syntax is as follows:
Example query:
Question: Explain the purpose of the GROUP BY clause in MySQL. Provide an example
query demonstrating its usage
The GROUP BY clause is used in conjunction with aggregate functions (such as SUM,
AVG, COUNT, etc.) to group the result set by one or more columns. It is typically used to
perform operations on groups of rows rather than on individual rows.
Example query:
In this example, the product_id column in the orders table is a foreign key that
references the product_id column in the products table. This ensures that every
product_id in the orders table must exist in the products table, maintaining referential
integrity.
Scenario:
Question: You have a table "products" with columns product_id, product_name, and
price. Write a SQL query to find the top 5 most expensive products.
SELECT product_id, product_name, price
FROM products
ORDER BY price DESC
LIMIT 5;
This query retrieves the product_id, product_name, and price columns from the products
table and sorts the results in descending order based on the price. The LIMIT clause
ensures that only the top 5 results are returned.
Question: Assume you have two tables "employees" and "departments" with relevant
columns. Write a SQL query to find the department with the highest average salary.
SELECT department_name
FROM departments
JOIN employees ON departments.department_id = employees.department_id
GROUP BY department_name
ORDER BY AVG(salary) DESC
LIMIT 1;
This query joins the employees and departments tables on the department_id column
and calculates the average salary for each department using the GROUP BY clause. It
then orders the results in descending order based on the average salary and selects the
first row using the LIMIT clause to retrieve the department with the highest average
salary.
Question: Consider a situation where you need to update a column "status" in a table
"orders" to 'Completed' for all orders where the order_date is before '2023-01-01'. Write a
SQL query to perform this update.
To update the status column in the orders table to 'Completed' for all orders where the
order_date is before '2023-01-01', we can use the following SQL query:
UPDATE orders
SET status = 'Completed'
WHERE order_date < '2023-01-01';
This query updates the status column in the orders table to 'Completed' for all rows
where the order_date is before '2023-01-01'.
Question: Assume you have a table "transactions" with columns transaction_id, amount,
and transaction_date. Write a SQL query to calculate the total transaction amount for
each month, sorted by month in ascending order.
To calculate the total transaction amount for each month and sort the results by month
in ascending order, we can use the following SQL query:
This query uses the DATE_FORMAT function to extract the year and month from the
transaction_date column and groups the transactions by month using the GROUP BY
clause. It then calculates the total transaction amount for each month using the SUM
function and sorts the results by month in ascending order.
Question: You are tasked with designing a database for an e-commerce website. The
database needs to store information about customers, products, orders, and shipments.
Provide a high-level schema design for the database, including tables, primary keys, and
relationships between tables.
Customers Table:
Products Table:
Orders Table:
Order_Items Table:
Shipments Table:
Question: You are hired by a retail company to optimize their database queries for
improved performance. After analyzing the database, you find that certain queries are
slow due to lack of indexing. Identify the tables and columns that would benefit from
indexing, and explain why. Provide SQL statements to create the necessary indexes.
Based on the analysis, the tables and columns that would benefit from indexing are:
Indexes on these columns will speed up queries that involve filtering or sorting by date,
such as retrieving orders or transactions within a specific date range.
These indexes will improve query performance by allowing the database engine to
quickly locate rows based on the indexed columns, reducing the need for full table
scans.
1 mark:
1. Write a SQL statement to rename the table countries to country_new.
ALTER TABLE countries RENAME TO country_new;
3. Write a SQL statement to add a columns ID as the first column of the table
locations.
ALTER TABLE locations ADD COLUMN ID INT FIRST;
4. Write a SQL statement to add a column region_id after state_province to the
table locations.
ALTER TABLE locations ADD COLUMN region_id INT AFTER state_province;
5. Write a SQL statement change the data type of the column country_id to
integer in the table locations.
ALTER TABLE locations MODIFY COLUMN country_id INT;
6. Write a SQL statement to drop the column city from the table locations.
ALTER TABLE locations DROP COLUMN city;
8. Write a SQL statement to add a primary key for the columns location_id in the
locations table.
ALTER TABLE locations ADD PRIMARY KEY (location_id);
10. Write a SQL statement to drop the existing primary from the table locations
on a combination of columns location_id and country_id.
ALTER TABLE locations DROP PRIMARY KEY;
11. Write a SQL statement to add a foreign key on job_id column of job_history
table referencing to the primary key job_id of jobs table.
ALTER TABLE job_history ADD CONSTRAINT fk_job_id FOREIGN KEY (job_id)
REFERENCES jobs(job_id);
12. Write a SQL statement to add a foreign key constraint named fk_job_id on
job_id column of job_history table referencing to the primary key job_id of jobs
table.
ALTER TABLE job_history ADD CONSTRAINT fk_job_id FOREIGN KEY (job_id)
REFERENCES jobs(job_id);
K-Means:
a. Describe the K-Means algorithm and its objective function.
Answer : The K-Means algorithm is an iterative clustering algorithm that aims to
partition a dataset into k clusters, where each data point belongs to the cluster with the
nearest mean. Its objective function is to minimize the sum of squared distances
between data points and their respective cluster centroids.
b. How does the K-Means algorithm initialize cluster centroids? Discuss the impact of
different initialization strategies.
Answer : The K-Means algorithm typically initializes cluster centroids randomly or using
a specific initialization method such as K-Means++ to improve convergence. Different
initialization strategies can lead to different final cluster assignments and convergence
rates.
c. Explain how the number of clusters (k) is determined in the K-Means algorithm.
Answer : The number of clusters (k) in K-Means is often determined using techniques
such as the elbow method, silhouette score, or domain knowledge. These methods help
identify the optimal value of k that balances cluster compactness and separation.
d. Discuss the strengths and weaknesses of the K-Means algorithm. Provide examples
of scenarios where K-Means might perform well and where it might fail.
Answer : Strengths of K-Means include its simplicity, scalability to large datasets, and
effectiveness in identifying spherical clusters. However, it may fail to handle non-linear
or irregularly shaped clusters, and its performance can be sensitive to the initial centroid
positions.
Hierarchical Clustering:
a. Compare and contrast agglomerative and divisive hierarchical clustering approaches.
Answer :
b. Explain the linkage criteria used in hierarchical clustering (e.g., single-linkage,
complete-linkage, average-linkage). How do these criteria impact the resulting
dendrogram?
Answer :
c. Discuss the concept of dendrogram and how it is used to interpret hierarchical
clustering results.
Answer :
d. What are the advantages and disadvantages of hierarchical clustering compared to
K-Means clustering?
Answer :
Answer Hierarchical Clustering:
a. Agglomerative hierarchical clustering starts with each data point as a separate
cluster and iteratively merges the closest clusters until only one cluster remains.
Divisive hierarchical clustering, on the other hand, starts with all data points in one
cluster and recursively splits clusters until each data point is in its own cluster.
b. Linkage criteria determine how the distance between clusters is calculated when
merging or splitting them. Single-linkage considers the shortest distance between
points in two clusters, complete-linkage considers the maximum distance, and average-
linkage considers the average distance.
c. A dendrogram is a tree-like diagram that illustrates the hierarchical clustering process
and the relationships between clusters at different levels of similarity.
d. Hierarchical clustering has the advantage of producing a hierarchical structure that
can be visualized using dendrograms, allowing for flexible interpretation of cluster
relationships. However, it can be computationally expensive and less scalable than K-
Means, especially for large datasets.
Apriori Algorithm:
a. Describe the Apriori algorithm for association rule mining. What are frequent itemsets
and association rules?
Answer :
b. Explain the concept of support, confidence, and lift in association rule mining. How
are these metrics used to evaluate the quality of association rules?
Answer :
c. Discuss the role of the Apriori property in reducing the search space during frequent
itemset generation.
Answer :
d. Provide examples of real-world applications where the Apriori algorithm can be
applied for market basket analysis or recommendation systems.
Answer :