0% found this document useful (0 votes)

2 views10 pages

Day 60

The document outlines a problem statement requiring the calculation of the salary difference between the highest salaries in the 'engineering' and 'marketing' departments using two datasets: employees and departments. It includes sample data, schema designs, and a SQL query to achieve the desired output. The expected result is the absolute difference in maximum salaries from the specified departments.

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views10 pages

Day 60

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Scenario Based

Interview
Question

Ganesh. R
Problem Statement

Problem Statement:
We have two datasets: one for employees and one for
departments. Your task is to calculate the difference
between the maximum salaries in the 'engineering'
and 'marketing' departments.
Input Table Data

#Employee Data
employee_data = [(1, "John", "Doe", 60000, 1),
(2, "Jane", "Smith", 55000, 2),
(3, "Emily", "Johnson", 70000, 1),
(4, "Michael", "Brown", 80000, 3),
(5, "Chris", "Davis", 45000, 4),
(6, "Anna", "Wilson", 52000, 5)]

#Schema design
employee_schema = StructType([ StructField("id",
IntegerType(), True),
StructField("first_name",
StringType(), True),
StructField("last_name",
StringType(), True),
StructField("salary",
FloatType(), True),
StructField("department_id", IntegerType(), True) ])
Input Table Data

# Department Data
department_data = [
(1, "engineering"),

(2, "human resource"),

(3, "operation"),
(4, "marketing"),
(5, "sales"),
(6, "customer care"),
]

#Schema design

department_schema = StructType(
[
StructField("id", IntegerType(), True),
StructField("department", StringType(), True),
]
)
Output Table

(max(Max_Salary) - min(Max_Salary))
25000
Problem Statement:

We have two datasets: one for employees and one for departments. Your task is to calculate the
difference between the maximum salaries in the 'engineering' and 'marketing' departments.

Employee Dataset:

id: Unique identifier for each employee. first_name: Employee's first name. last_name:
Employee's last name. salary: Employee's salary. department_id: Foreign key to the department
the employee belongs to.

Department Dataset:

id: Unique identifier for each department. department: Name of the department. Write a query
that calculates the difference between the highest salaries found in the marketing and
engineering departments. Output just the absolute difference in salaries.

from pyspark.sql.types import *

# Employee Data
employee_data = [
(1, "John", "Doe", 60000, 1),
(2, "Jane", "Smith", 55000, 2),
(3, "Emily", "Johnson", 70000, 1),
(4, "Michael", "Brown", 80000, 3),
(5, "Chris", "Davis", 45000, 4),
(6, "Anna", "Wilson", 52000, 5),
]

employee_schema = StructType(
[
StructField("id", IntegerType(), True),
StructField("first_name", StringType(), True),
StructField("last_name", StringType(), True),
StructField("salary", IntegerType(), True),
StructField("department_id", IntegerType(), True),
]
)

employee_df = spark.createDataFrame(employee_data,
schema=employee_schema)

# Department Data
department_data = [
(1, "engineering"),
(2, "human resource"),
(3, "operation"),
(4, "marketing"),
(5, "sales"),
(6, "customer care"),
]

department_schema = StructType(
[
StructField("id", IntegerType(), True),
StructField("department", StringType(), True),
]
)

department_df = spark.createDataFrame(department_data,
schema=department_schema)

# Show both DataFrames

employee_df.display()
department_df.display()

employee_df.createOrReplaceTempView("employee")
department_df.createOrReplaceTempView("dept")

%sql
With CTE as (
select
D.department,
max(salary) as Max_Salary
from
employee E
Join dept D on E.department_id = D.id
Where
D.department in ('engineering', 'marketing')
Group by
D.department
)
Select
Max(max_salary) - Min(max_salary) as Diff
From
CTE

from pyspark.sql.functions import *

# Step 1: Join Employee and Department DataFrames

joined_df = employee_df.join(
department_df, employee_df.department_id == department_df.id
)
# Step 2: Filter by departments 'engineering' and 'marketing'
filtered_df = joined_df.filter(
department_df.department.isin("engineering", "marketing")
)
# Step 3: Group by department and calculate the maximum salary for
each department
max_salary_df = filtered_df.groupBy(department_df.department).agg(
max("salary").alias("Max_Salary")
)
# Step 4: Find the difference between max and min salaries from the
grouped result
salary_diff_df = max_salary_df.agg(max("Max_Salary") -
min("Max_Salary")).alias("Diff")
# Display the result
salary_diff_df.display()

Expected Output:

The difference between the highest salary in 'engineering' and the highest salary in 'marketing'.

Breakdown:

joined_df: Joins the employee and department DataFrames on department_id.

filtered_df: Filters the departments to only include 'engineering' and 'marketing'.

max_salary_df: Groups the data by department and computes the maximum salary for each.

salary_diff_df: Aggregates to find the difference between the max and min of the maximum
salaries.
IF YOU FOUND THIS POST
USEFUL, PLEASE SAVE IT.

Ganesh. R
THANK YOU
For Your Support

I Appreciate for your support on

My Account, I will Never Stop to Share the
Knowledge.

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

Collins FMS-4200 Flight Management System PDF
100% (4)
Collins FMS-4200 Flight Management System PDF
606 pages
E Book Odontology
No ratings yet
E Book Odontology
347 pages
KAPWA: A Core Concept in Filipino Psychology
No ratings yet
KAPWA: A Core Concept in Filipino Psychology
15 pages
Internship Report
No ratings yet
Internship Report
6 pages
Day 77
No ratings yet
Day 77
10 pages
Quewtion SQL - Pyspark
No ratings yet
Quewtion SQL - Pyspark
4 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Spark-Scala Code
No ratings yet
Spark-Scala Code
3 pages
SQL & Python Interview Q&A
No ratings yet
SQL & Python Interview Q&A
7 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
Grade 12 Informatics Practical Practice 2024-25
No ratings yet
Grade 12 Informatics Practical Practice 2024-25
12 pages
Py Spark
No ratings yet
Py Spark
10 pages
Practice Paper For Ip
No ratings yet
Practice Paper For Ip
3 pages
Interview Qs - Batch 34
No ratings yet
Interview Qs - Batch 34
5 pages
Spark Test Que
No ratings yet
Spark Test Que
3 pages
Assignment 3 - Shouvik (1159)
No ratings yet
Assignment 3 - Shouvik (1159)
15 pages
XII IP Model 1 Ans
No ratings yet
XII IP Model 1 Ans
8 pages
LEET CODE Medium Hard Problems
No ratings yet
LEET CODE Medium Hard Problems
10 pages
Practical 2024
No ratings yet
Practical 2024
10 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Ans Key Set A
No ratings yet
Ans Key Set A
6 pages
Answer Key For SET-1 TO 3
No ratings yet
Answer Key For SET-1 TO 3
7 pages
MLS Week 3-Solution
No ratings yet
MLS Week 3-Solution
8 pages
Online 2 Solution - CSE 216 - July 2023
No ratings yet
Online 2 Solution - CSE 216 - July 2023
9 pages
Question
No ratings yet
Question
24 pages
Aissce 2020 21
No ratings yet
Aissce 2020 21
3 pages
Pract - 12 - IP - Practice - A - B
No ratings yet
Pract - 12 - IP - Practice - A - B
6 pages
Happay
No ratings yet
Happay
21 pages
Quantiphi Interview
No ratings yet
Quantiphi Interview
2 pages
Dbms Questions 2
No ratings yet
Dbms Questions 2
7 pages
DBMS 3a (Employee, Department, Location)
No ratings yet
DBMS 3a (Employee, Department, Location)
6 pages
HTML Code
No ratings yet
HTML Code
4 pages
MySQL Interview Questions.!!
No ratings yet
MySQL Interview Questions.!!
13 pages
Big Data With Spark and Hadoop
No ratings yet
Big Data With Spark and Hadoop
9 pages
Pyspark and SQL
No ratings yet
Pyspark and SQL
57 pages
3 Windows Function 08-01-2025
No ratings yet
3 Windows Function 08-01-2025
2 pages
Practical Exam Papers (2024) (Set - 1 and 2) With Solutions
No ratings yet
Practical Exam Papers (2024) (Set - 1 and 2) With Solutions
8 pages
Practical Questions
No ratings yet
Practical Questions
7 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
3 pages
Database Systems Spring 2024
No ratings yet
Database Systems Spring 2024
8 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Interview Pract Questions
No ratings yet
Interview Pract Questions
10 pages
XII IP JPR MS Set-A
No ratings yet
XII IP JPR MS Set-A
4 pages
DBMS 3b (Employee Department Location)
No ratings yet
DBMS 3b (Employee Department Location)
9 pages
Pyspark Spark SQL: Scenario Based Interview
No ratings yet
Pyspark Spark SQL: Scenario Based Interview
6 pages
Assignment SQL
No ratings yet
Assignment SQL
3 pages
Pyspark 500
No ratings yet
Pyspark 500
103 pages
Ip Sample Paper 6 Answer Key
No ratings yet
Ip Sample Paper 6 Answer Key
6 pages
Data Analysis Using Python
No ratings yet
Data Analysis Using Python
12 pages
Flipkart Business Analyst Interview Questions
No ratings yet
Flipkart Business Analyst Interview Questions
16 pages
Oracle HR Schema Practise Queries
No ratings yet
Oracle HR Schema Practise Queries
10 pages
Question Set A
No ratings yet
Question Set A
3 pages
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
Class 12 IP Practical Record
No ratings yet
Class 12 IP Practical Record
33 pages
Company Database
No ratings yet
Company Database
21 pages
Lab Practice Sheet - 7 (JOINS)
100% (1)
Lab Practice Sheet - 7 (JOINS)
3 pages
Assig 9
No ratings yet
Assig 9
14 pages
Set B
No ratings yet
Set B
8 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Practice 1,2
No ratings yet
Practice 1,2
8 pages
Exp1d
No ratings yet
Exp1d
6 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Day 62
No ratings yet
Day 62
9 pages
Redshift DG
No ratings yet
Redshift DG
733 pages
Day 24
No ratings yet
Day 24
8 pages
Day 27
No ratings yet
Day 27
6 pages
Day 28
No ratings yet
Day 28
5 pages
Day 76
No ratings yet
Day 76
10 pages
AWS Learning Material
No ratings yet
AWS Learning Material
13 pages
Day 57
No ratings yet
Day 57
11 pages
Activity Sheet 10.1 Lesson Plan Template - 5E Model
No ratings yet
Activity Sheet 10.1 Lesson Plan Template - 5E Model
5 pages
Class 6 Ch-5 Changes Around Us
No ratings yet
Class 6 Ch-5 Changes Around Us
6 pages
Accumulator Charging Valve: Spool Type, Direct-Acting UNF Cartridge - 350 Bar
No ratings yet
Accumulator Charging Valve: Spool Type, Direct-Acting UNF Cartridge - 350 Bar
5 pages
House Committee Letter To Energy Secretary Regarding Use of Personal Email
No ratings yet
House Committee Letter To Energy Secretary Regarding Use of Personal Email
8 pages
Traditional or Modern Way: A Description On The Desired Purchasing Method of Students
No ratings yet
Traditional or Modern Way: A Description On The Desired Purchasing Method of Students
25 pages
Sintactical Stylistic Devices
No ratings yet
Sintactical Stylistic Devices
2 pages
Symfony-Doctrine & The Database
No ratings yet
Symfony-Doctrine & The Database
108 pages
Chinese Checkers Board With Dragon
No ratings yet
Chinese Checkers Board With Dragon
7 pages
The Pros and Cons of Social Media
No ratings yet
The Pros and Cons of Social Media
4 pages
Swherosjourney
No ratings yet
Swherosjourney
2 pages
Mark Connelly, Jo Fox, Stefan Goebel, Ulf Schmidt (Ed.) - Propaganda and Conflict. War, Media and Shaping The Twentieth Century-Bloomsbury Academic (2019)
No ratings yet
Mark Connelly, Jo Fox, Stefan Goebel, Ulf Schmidt (Ed.) - Propaganda and Conflict. War, Media and Shaping The Twentieth Century-Bloomsbury Academic (2019)
367 pages
PUBLIC ADDRESS AND GENERAL Alarm System
No ratings yet
PUBLIC ADDRESS AND GENERAL Alarm System
3 pages
Unit 08-Information System Project - Assignment
No ratings yet
Unit 08-Information System Project - Assignment
9 pages
Roll Forming Technology
No ratings yet
Roll Forming Technology
24 pages
PXW-FS5M2 /PXW-FS5M2K: Stunning New Super 35 Look High Frame Rate (HFR) Recording Spectacular HDR Performance
No ratings yet
PXW-FS5M2 /PXW-FS5M2K: Stunning New Super 35 Look High Frame Rate (HFR) Recording Spectacular HDR Performance
2 pages
10th House in Kundli - 10th House in Vedic Astrology, Significance and Effects
No ratings yet
10th House in Kundli - 10th House in Vedic Astrology, Significance and Effects
3 pages
Department of Mechanical Engineering GE 6075 Professional Ethics in Engineering V Semester Unit - I
No ratings yet
Department of Mechanical Engineering GE 6075 Professional Ethics in Engineering V Semester Unit - I
10 pages
Presentation Rubric: Missing 1 Pts Poor 2 Pts Fair 3 Pts Good 4 Pts Excellent 5 Pts Organization Audience
No ratings yet
Presentation Rubric: Missing 1 Pts Poor 2 Pts Fair 3 Pts Good 4 Pts Excellent 5 Pts Organization Audience
1 page
Civil Structural Permit
No ratings yet
Civil Structural Permit
2 pages
Final Analysis
No ratings yet
Final Analysis
29 pages
Technical Offer Rev 0 - Opt
No ratings yet
Technical Offer Rev 0 - Opt
47 pages
TIME2021 Catalog
No ratings yet
TIME2021 Catalog
181 pages
PGJR Sdls301 Final
No ratings yet
PGJR Sdls301 Final
6 pages
STT13005D: High Voltage Fast-Switching NPN Power Transistor
No ratings yet
STT13005D: High Voltage Fast-Switching NPN Power Transistor
10 pages
CA Preksha Gupta's Resume
No ratings yet
CA Preksha Gupta's Resume
1 page
Math in Our World - Module 3.3
No ratings yet
Math in Our World - Module 3.3
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Day 60

Uploaded by

Day 60

Uploaded by

Scenario Based

(2, "human resource"),

from pyspark.sql.types import *

# Show both DataFrames

from pyspark.sql.functions import *

# Step 1: Join Employee and Department DataFrames

joined_df: Joins the employee and department DataFrames on department_id.

filtered_df: Filters the departments to only include 'engineering' and 'marketing'.

I Appreciate for your support on

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.