Task - Level1 Database Module

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

General instructions

1. The approach of solving the Problem solely depends on the Candidate 


2. Make sure to have Draw.io diagrams for the workflowns and application architecture 
3. Every configuration, code written should be pushed on git (Private Repo) 
4. Your are not permitted to share the doc with anyone, even with your colleagues 

1. Problem: ​We have given some turbine failure data we need to perform some analytics
on this.
a. Create a sap data table with flowing columns:
i. Alert_id
ii. Turbine_id
iii. Alert_start_date
iv. Alert_end_date
v. Farm_name (Farm name where turbine available)
vi. Fail_component (component name which one failed)
vii. Fail_window
viii. Fail_within_fail_window
ix. fail_within_ninty_days
x. fn_check_gen
b. Check the count of null turbines.
c. Count the no. of Fail_within_fail_window is `yes` as new column `fail_window`
d. Count the no. of Fail_within_fail_window is `yes` or `no` as new column
`non_pending`
e. Count the no. of fail_within_ninty_days is `yes` as new column `TP`
f. Count the no. of fn_check_gen is `yes` as new column `FN`
g. Create new column fp=non_pending-TP
h. Calculate the precision(%) =tp*100/non_pending
i. Calculate the fail_window(%) = fail_window*100/non_pending
j. Calculate the reliability(%) = tp*100/(non_pending)

File link:​turbine_data.csv

2. Problem​:Table 1:IOT devices are submitting live statuses for every second to databases
as a code from 0-5.
Columns: “timestamp”, “source_device”, “status”
Table 2:
Hierarchical metadata of the IOT devices leveling 1-5
- Company(L1) -> country(L2) -> region/state(L3) -> hub(L4) -> device(L5)
Columns: “child_id”, “level”, “parent_id”
Analytical Rules:
- For any sec, the status of the device is whatever that is available in the
database, as all the IOT device is updating the status every second.

- For any sec, the status of the hub is whatever status has max number of
occurrences in the statues from the underlying iot-devices, if the number
of occurrences are same for two statuses then larger one will be our
status there, similarly for every level.
For example:
- some hub (h1) have 4 underlying IOT-devices and 3 of them are 2
status, then the status of hub (h1) for that second will be 2
- some hubs (h2) have 4 underlying IoT-devices and 2 of them are 2
status and 2 of them have 3 status for a second, then for that second the
status of h2 will be 3.

- For any longer duration, like 10mins, the most continuous status will be
the status for that complete duration, if two status were active for the
same duration, the greater status will be the resultant status.
For example: A device has status 2 for continues 2 secs, then the status 4
for next continues 30 secs, and at last the status got changed to 2 again
for next 28 secs, so for that 60secs (1min) the status of the device will be
4.

Views:
Get analytics over the data:
- Status of each level device for every sec.
- Status of each level device for every 5min.
- Status of each level device for a given duration (can be a
procedure, or query with duration as changeable parameters)
Data File : ​data.csv

3. Problem: 
Description:​There are multiple files having statues of some devices at different
times. All the files are csv files (with columns: rn(row_number), id, ts(timestamp)
status).

Aim​: We need to generate a single file out of these which will be sorted over
timestamp and id columns in the same priority sequence.
We need to get time taken to process as the main benchmark.
Hint:
- Read all files as multi-processing.
- Use merge sorting to sort the data.
File: ​data.csv​ ​hie_data.csv

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy