Report
Report
Report
Contents
Problem 01...................................................................................................2
Problem 02...................................................................................................6
Problem 03.................................................................................................14
Problem 4...................................................................................................20
2
Problem 01
1. import pandas as pd
2. import matplotlib.pyplot as plt
3.
4. # Define the column names based on the dataset description
5. columns = [
6. "mpg", "cylinders", "displacement", "horsepower",
7. "weight", "acceleration", "model_year", "origin", "car_name"
8. ]
9.
10. # Load the dataset
11. url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data"
12. df = pd.read_csv(
13. url,
14. delim_whitespace=True,
15. names=columns,
16. na_values='?'
17. )
18.
19. # Display the structure of the dataset
20. print("Dataset structure:")
21. print(df.info())
22. print("\nFirst few rows of the dataset:")
23. print(df.head())
Code explanation:
The pd.read_csv() function loaded the dataset directly from the
URL, using whitespace as a delimiter.
Missing values were identified using the na_values parameter.
The info() method summarized the dataset structure, including
data types and non-null counts for each column.
The head() function displayed the first few rows to understand
the dataset content.
3
print(df.describe())
Code explanation:
The median() function was applied to the mpg and weight
columns to compute their respective medians.
Median MPG and weight values provide insights into the typical
fuel efficiency and size of vehicles in the dataset, unaffected by
extreme values.
# Create a scatter plot with MPG on the y-axis and Weight on the x-
axis
plt.figure(figsize=(8, 6))
plt.scatter(df['weight'], df['mpg'], alpha=0.7, color='blue')
plt.title("Relationship between Car Weight and Fuel Efficiency (MPG)",
fontsize=14)
plt.xlabel("Car Weight", fontsize=12)
plt.ylabel("Miles per Gallon (MPG)", fontsize=12)
plt.grid(alpha=0.3)
plt.show()
5
Explanation:
The correlation results indicate a strong negative relationship between
both MPG and Displacement (-0.804) and MPG and Weight (-0.832).
These values suggest that as engine displacement (size) and car
weight increase, fuel efficiency (MPG) decreases significantly. The high
negative correlations indicate that larger engines tend to consume
more fuel, reducing fuel efficiency, and heavier cars require more
energy to move, further lowering their MPG. This aligns with
automotive principles, where heavier and more powerful vehicles are
generally less fuel-efficient due to higher fuel consumption demands.
Problem 02
import pandas as pd
DF = pd.read_csv(file_path)
print(DF.head())
print("\nList of columns:")
print(DF.columns)
print(f"\n{column}: {DF[column].dtype}")
Code explanation:
# Create a new column 'word_count' that stores the number of words in each
review (text column)
print(DF[['text', 'word_count']].head())
Code explanation:
Used the apply() method with a lambda function to split the text in the
text column into words using .split(), and counted the resulting words
with len().
9
conditions = [
Code explanation:
Used a for loop to iterate over the DataFrame rows with iterrows().
10
Checked conditions for word count (>200, 50–200, <50) and appended
the corresponding categories (long, medium, short) to a list.
Added this list as a new column.
length_categories = []
length_categories.append('long')
length_categories.append('medium')
else:
length_categories.append('short')
DF['length_category_for_loop'] = length_categories
length_categories = []
index = 0
length_categories.append('long')
length_categories.append('medium')
else:
length_categories.append('short')
index += 1
DF['length_category_while_loop'] = length_categories
12
length_category_counts = DF['length_category_vectorized'].value_counts()
plt.figure(figsize=(7, 7))
plt.show()
14
Problem 03
import pandas as pd
url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-
majors/recent-grads.csv"
15
df = pd.read_csv(url)
print("Dataset structure:")
print(df.info())
print(df.head())
16
17
correlation = df["Median"].corr(df["Unemployment_rate"])
plt.figure(figsize=(10, 6))
plt.ylabel('Unemployment Rate')
plt.grid(True)
plt.show()
# Filter the dataset for majors with a Median salary above $60,000 and
Unemployment_rate below 5%
# Group by Major_category and calculate the average Median salary for each
category
grouped_data = df.groupby("Major_category")["Median"].mean()
print(grouped_data)
plt.xlabel('Major Category')
plt.xticks(rotation=90)
plt.show()
Problem 4
import sqlite3
conn = sqlite3.connect('company.db')
21
cursor = conn.cursor()
cursor.execute('''
);
''')
cursor.execute('''
branch_id INTEGER,
);
''')
conn.commit()
conn.close()
22
conn = sqlite3.connect('company.db')
cursor = conn.cursor()
cursor.executemany('''
VALUES (?, ?, ?, ?, ?, ?)
''', [
(102, '12 Hill Avenue', 'M1 5GH', 'Manchester', 'Greater Manchester', 'UK'),
(104, '19 Lake View', 'B4 2DD', 'Birmingham', 'West Midlands', 'UK')
])
cursor.executemany('''
VALUES (?, ?, ?)
''', [
])
conn.commit()
conn.close()
conn = sqlite3.connect('company.db')
cursor = conn.cursor()
cursor.execute('''
FROM teams
''')
teams_with_locations = cursor.fetchall()
print(team)
cursor.execute('''
FROM branches
''')
branches_without_teams = cursor.fetchall()
print(branch)
conn.close()