CI Entrance Test
CI Entrance Test
CI Entrance Test
Questions
Suggestions/Tips:
1 Think through on key metrics, what are the significant elements or components involved, with some reason
2 Clear on what are rationale/reference for the assumptions
Q1 Generate monthly reports of DAU (daily unique login users), Daily buyers, Daily orders and Daily GMV
Breakdown the reports by Location
(Template & Queries are required)
WITH dau AS (
SELECT
FORMAT_DATE('%Y-%m', DATE(login_datetime)) AS month,
DATE(login_datetime) AS date_,
dim_user.default_delivery_address_state AS state,
dim_user.default_delivery_address_city AS city,
COUNT(DISTINCT dwd_login_event.user_id) AS dau
FROM dwd_login_event
JOIN dim_user
USING(user_id)
GROUP BY month, date_, state, city),
monthly_dau AS (
SELECT
month,
state,
city,
AVG(daily_dau) AS avg_dau
FROM
dau
GROUP BY
month, state, city),
orders AS (
SELECT
FORMAT_DATE('%Y-%m', DATE(create_datetime)) AS month,
DATE(create_datetime) AS date_,
buyer_shipping_address_state AS state,
buyer_shipping_address_city AS city,
COUNT(DISTINCT buyer_id) AS avg_daily_unique_buyers,
COUNT(DISTINCT order_id) AS avg_daily_orders,
SUM(gmv_usd) AS avg_daily_gmv
FROM
order_item_mart
GROUP BY
month, order_date, state, city),
monthly_orders AS (
SELECT
month,
Column Data Type Description
month STRING Month of the report in YYYY-MM format.
state STRING State of the user's default or buyer's shipping address.
city STRING City of the user's default or buyer's shipping address.
avg_dau FLOAT Monthly average of daily unique login users (DAU) for the sp
avg_daily_unique_buyers FLOAT Monthly average of daily unique buyers for the specified mo
avg_daily_orders FLOAT Monthly average of daily orders for the specified month and
avg_daily_gmv FLOAT Monthly average of daily Gross Merchandise Value (GMV) i
Q2 Generate a reports of daily new customers, and cohort retention from first order month
(Template & Queries are required)
WITH first_order AS (
SELECT
buyer_id,
FORMAT_DATE('%Y-%m', MIN(DATE(create_datetime))) AS cohort_month
FROM order_item_mart
GROUP BY buyer_id
),
cohort_retention AS (
SELECT
f.cohort_month,
COUNT(DISTINCT f.buyer_id) AS total_new_customers_acquired,
FORMAT_DATE('%Y-%m', DATE(o.create_datetime)) AS order_month,
DATE_DIFF(DATE(o.create_datetime), PARSE_DATE('%Y-%m', f.cohort_month),
MONTH) AS month_number,
COUNT(DISTINCT o.buyer_id) AS retained_customers
FROM
first_order f
LEFT JOIN
order_item_mart o
USING(buyer_id)
GROUP BY
f.cohort_month, order_month, month_number
)
SELECT
cohort_month,
total_new_customers_acquired,
order_month,
month_number,
retained_customers,
ROUND(SAFE_DIVIDE(retained_customers, total_new_customers_acquired), 4) AS
retention_rate
FROM
cohort_retention
ORDER BY
cohort_month, order_month;
Column Name Data Type Description
cohort_month STRING The month in which new customers made their first purchas
total_new_customers_acquiredINT The total number of new customers acquired in the cohort m
order_month STRING The month in which subsequent orders were made, formatte
month_number INT The month difference from the cohort month to the order mo
retained_customers INT The number of customers from the cohort who made purcha
retention_rate FLOAT The ratio of retained customers to total new customers acqu
Q3 Find out top 10 items by orders per earch seller segment by month, together with average selling price/item
Seller segment are define by:
- Short Tail (>20 average daily orders)
- Mid Tail (between 10 - 20 average daily orders)
- Long Tail (< 10 average daily orders)
(Queries are required)
WITH seller_segment AS (
SELECT
seller_id,
CASE
WHEN COUNT(order_id) > 20 THEN 'Short Tail'
WHEN COUNT(order_id) BETWEEN 10 AND 20 THEN 'Mid Tail'
ELSE 'Long Tail'
END AS segment
FROM
order_item_mart
GROUP BY
seller_id
),
monthly_stats AS (
SELECT
o.item_id,
FORMAT_DATE('%Y-%m', DATE(o.create_datetime)) AS order_month,
COUNT(o.order_id) AS total_orders,
SUM(o.gmv_usd) AS total_gmv,
AVG(o.gmv_usd / o.item_amount) AS avg_price,
MIN(o.gmv_usd / o.item_amount) AS min_price,
MAX(o.gmv_usd / o.item_amount) AS max_price,
1. Frequency of Purchase:
- Frequent Buyers: Purchased > 5 times
- Occasional Buyers: Purchased between 1 to 5 times
- New Buyers: First purchase
3. Recency of Purchase:
- Active Customers: Purchased within the last month
- At-Risk Customers: Last purchase 1-3 months ago
- Inactive Customers: No purchases in the last 3 months
Rationale
Frequent Buyers
This approach & High
utilizes theValue: optimizewhich
RFM model, their is
marketing
a popularstrategies to enhance
method used to
retention and increase average order value. Campaigns may include
segment customers based on their purchasing behavior. This helps identify
exclusive offersoforusers,
distinct groups loyaltyallowing
programsfor designed to reward these valuable
tailored campaigns.
customers.
Example
Recent Buyers: They’re often more likely to buy again, so targeted follow-up
communications or tailored promotions can help maintain their interest.
Note
),
buyer_segments AS (
SELECT
buyer_id,
Whiletotal_orders,
the answer below uses SQL queries, I would consider scoring
customers based on these three dimensions using Python as a more flexible
total_gmv,
solution. This also allow developing more conmprehensive scoring systems
last_order_date,
and possibilities
CASE of adding other attributes like engagement metrics,
demographic
WHEN total_orderswhich
information, evaluate
> 5 THEN a more understanding of customer
'Frequent'
profiles WHEN total_orders BETWEEN 1 AND 5 THEN 'Occasional'
ELSE 'New'
END AS frequency,
CASE
WHEN total_gmv > 500 THEN 'High-Value'
WHEN total_gmv BETWEEN 100 AND 500 THEN 'Medium-Value'
ELSE 'Low-Value'
END AS monetary,
CASE
WHEN last_order_date >= DATE_SUB(CURRENT_DATE(),
INTERVAL 1 MONTH) THEN 'Active'
WHEN last_order_date >= DATE_SUB(CURRENT_DATE(),
INTERVAL 3 MONTH) THEN 'At-Risk'
ELSE 'Inactive'
END AS recency
FROM
buyer_stats
)
SELECT
frequency,
monetary,
recency,
COUNT(buyer_id) AS number_of_buyers,
AVG(total_orders) AS avg_orders,
AVG(total_gmv) AS avg_gmv
FROM
buyer_segments
GROUP BY
frequency_segment, monetary_segment, recency_segment
ORDER BY
total_gmv DESC, frequency_segment, monetary_segment,
recency_segment
Q5 Open-Ended Question
Build a Projection Model (From Orders, GMV to P&L) For TikTokShop for 2025
To build a comprehensive projection model for TikTokShop’s performance in 2025, I would combine historic
and specific performance factors. This approach would allow us to segment projections by item category, sh
and make comparisons with competitor platforms such as Shopee and Lazada. Below is a detailed breakdo
1. Establish a Baseline
Data Collection & Cleaning: Gather historical data from 2023 to 2024 on key metrics like daily orders, month
merchandise value (GMV), average order value (AOV), and other relevant financial metrics. Clean and prep
Identify Patterns and Seasonality: Visualize historical data to identify trends over time. This includes analyz
sales spikes during campaigns, holidays, or specific promotional periods. Using seasonal decomposition te
into trend, seasonal, and residual components to better understand how different factors impact overall per
Statistical Testing: Run statistical tests to assess stationarity in the time series data. If non-stationary, differ
the mean and variance before moving forward with further analysis.
2. Forecasting Techniques
Model Selection: To find the optimal forecasting model, I would explore several approaches:
Linear Regression: Identify independent variables (e.g., MAUs, marketing spend) that may impact the depe
fitting a regression model, we can estimate how changes in each variable (e.g., a 1% increase in MAUs) mi
Moving Averages: Use simple or weighted moving averages to smooth out short-term fluctuations, highlight
moving average, for example, can be useful to visualize underlying trends by minimizing seasonal noise.
Advanced Time Series Models: Models like ARIMA or Holt-Winters Exponential Smoothing account for sea
more accurate forecasts. These models assign greater weight to recent data, which is particularly valuable
conditions.
Revenue Streams: Estimate platform fees (e.g., 10% commission on GMV) and advertising revenue potent
growing. For example, if ad engagement trends are rising, we could see an increase in ad revenue as part o
der Note: for bundle deal, the bundle is split into the actual item, and the actual itemid of the physical item is recorded
e order Note: for bundle deal, the bundle is split into the actual item, and the actual model id of the physical item is recorded
t performed on
m web portal (web include PC and Mobile Web)
d (epoch time)
local time in string)
ed / accepted by the buyer after receiving all the parcels for the order (local time in string)
completed and escrow started
completed and escrow started
d by the buyer (local time in string)
system based on the buyer and seller location, parcel size
Null is Unknown
olved, with some reasonable assumptions to enable you to proceed with your thought process
verage selling price/item, min price, max price, and orders and gmv coverage
tion for rationale
I would combine historical data, relevant market trends,
ions by item category, shop metrics (like follower counts),
ow is a detailed breakdown of my proposed methodology:
proaches: