Real Python Interview Questions
Real Python Interview Questions
Microsoft, Flipkart
Data Analyst Interview Questions
(0-3 Years)
17-19 lpa
Python Questions
1. How do you implement memoization in Python to optimize
recursive functions?
from functools import lru_cache
@lru_cache(maxsize=None)
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
print(fibonacci(10)) # Output: 55
Explanation:
• Memoization stores the results of expensive function calls and returns the cached result when
the same inputs occur again.
• @lru_cache is a built-in Python decorator that handles this caching automatically.
• In recursive algorithms like Fibonacci, it reduces time complexity from exponential to linear.
Tip:
Use @lru_cache from functools for dynamic programming problems or when repeated function
calls occur with the same inputs — it improves both performance and readability.
gen = countdown(3)
print(next(gen)) # Output: 3
print(next(gen)) # Output: 2
def __iter__(self):
return self
def __next__(self):
if self.n <= 0:
raise StopIteration
self.n -= 1
return self.n + 1
it = Countdown(3)
for i in it:
print(i) # Output: 3, 2, 1
Explanation:
Generator: A function using yield to return items one at a time. It automatically manages state
and raises StopIteration.
Both allow lazy evaluation, but generators are shorter and more Pythonic for simple use cases.
Tip:
Use generators when performance and memory efficiency matter, especially with large datasets
or streaming data — they're a common interview expectation for scalable solutions.
Returns all records from both tables, matching where possible.
def logger_decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
# Log function name and arguments
print(f"Calling {func.__name__} with args: {args} and kwargs: {kwargs}")
result = func(*args, **kwargs)
print(f"{func.__name__} returned {result}")
return result
return wrapper
@logger_decorator
def compute_sum(a, b, c=0):
return a + b + c
# Usage
compute_sum(1, 2, c=3)
Explanation:
• The decorator logger_decorator demonstrates the use of *args and **kwargs to pass an
arbitrary number of positional and keyword arguments to the decorated function.
• *args collects extra positional arguments, while **kwargs collects extra keyword arguments.
• Using functools.wraps(func) preserves the original function’s metadata (like its name and
docstring).
Tip:
When creating decorators, always consider how to transparently pass through all arguments and
preserve metadata. This makes your decorators versatile and easier to debug, especially in larger
codebases.
4. How do you perform custom aggregation on a pandas
groupby() object?
import pandas as pd
df = pd.DataFrame({
'Department': ['IT', 'IT', 'HR', 'HR', 'Finance'],
'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Salary': [80000, 90000, 60000, 65000, 70000],
'Bonus': [5000, 6000, 3000, 3200, 4000]
})
Explanation:
• groupby() groups the data based on unique values in the Department column.
• .agg() applies multiple functions per column. Built-ins like 'mean', 'max' and custom
ones (like lambda) can be mixed.
• Here, we calculate average salary, max salary, and average bonus manually via a
lambda.
Tip:
Explanation:
Tip:
Always prefer broadcasting and vectorized operations in NumPy instead of loops — it reduces
code complexity and significantly boosts performance, especially in large-scale numerical
computations.
6. How do you create a plot with dual y-axes using
Matplotlib?
Definition:
x = [1, 2, 3, 4, 5]
sales = [100, 120, 130, 150, 170]
profit_margin = [10, 12, 15, 18, 20]
Explanation:
Tip:
Dual y-axes are helpful when comparing variables with different units/scales. But avoid overuse
— too many axes can confuse readers. Always color-code and label both axes clearly.
7. How do you write a Python decorator that accepts
arguments?
def repeat(n):
def decorator(func):
def wrapper(*args, **kwargs):
for _ in range(n):
result = func(*args, **kwargs)
return result
return wrapper
return decorator
@repeat(3)
def greet(name):
print(f"Hello, {name}!")
greet("Alice")
# Output:
# Hello, Alice!
# Hello, Alice!
# Hello, Alice!
Explanation:
• To make a decorator accept arguments, you nest functions three levels deep:
1. Outer function: accepts the decorator argument (n)
2. Middle function: accepts the function being decorated (func)
3. Inner function: defines the wrapper logic
• @repeat(3) runs the greet() function 3 times using the value passed to the decorator.
Tip:
Always remember: decorators with arguments require three nested functions. This
pattern is widely used in logging, retry mechanisms, authentication checks, and more.
8. How does NumPy vectorization compare to traditional Python
loops in terms of performance?
import numpy as np
import time
Explanation:
• Python list comprehensions iterate one element at a time — they're readable but slower
for large computations.
• NumPy vectorization leverages optimized C-backed operations and processes entire arrays
in bulk.
• In this example, squaring a million numbers is 5–50x faster with NumPy depending on your
machine.
Tip:
For large datasets and numerical operations, always prefer vectorized NumPy operations over
loops — it’s one of the most important optimizations for any data-heavy or ML workload.
9. How do you filter rows in a pandas DataFrame using
the .query() method?
import pandas as pd
df = pd.DataFrame({
'Department': ['IT', 'HR', 'Finance', 'IT', 'HR'],
'Salary': [80000, 60000, 70000, 90000, 65000],
'Experience': [5, 2, 3, 7, 1]
})
Explanation:
• .query() lets you filter rows using a string-based expression, similar to SQL WHERE clauses.
• It avoids long boolean indexing chains like df[(df['Department'] == 'IT') & (df['Salary'] >
85000)].
• Internally, it parses the query string and evaluates it efficiently using pandas’ eval engine.
Tip:
Use .query() for cleaner, more readable code, especially when filtering using multiple conditions.
Just remember: column names with spaces must be enclosed in backticks ( ` ` ).
10. What is the difference between Seaborn and Matplotlib, and
when should you use each?
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Sample data
df = pd.DataFrame({
'Department': ['IT', 'HR', 'Finance', 'IT', 'HR'],
'Salary': [80000, 60000, 70000, 90000, 65000],
'Experience': [5, 2, 3, 7, 1]
})
Explanation:
• Matplotlib is the base library for plotting in Python. It offers full control but requires more
code for formatting.
• Seaborn is built on top of Matplotlib and provides high-level functions with beautiful default
themes, especially for statistical plots.
• In the examples above, Seaborn handles grouping and aesthetics automatically, while
Matplotlib requires manual steps.
Tip:
Use Seaborn for quick, publication-ready plots and Matplotlib when you need fine-grained
control over axes, ticks, annotations, or figure composition. In practice, they are often used
together.
11. How do you use MultiIndex in pandas and reshape data
using stack() and unstack()?
Definition:
import pandas as pd
# Set MultiIndex
df_multi = df.set_index(['Department', 'Year'])
print(reshaped)
Explanation:
• set_index() creates a MultiIndex from Department and Year, giving you hierarchical
row labels.
• unstack() pivots one level of the index (here: Year) into columns.
• You can use stack() to reverse this operation — flattening columns back into a deeper
index.
Tip:
Use MultiIndex and stack()/unstack() when working with time series, pivot tables, or
grouped data across multiple dimensions — they provide powerful reshaping without loops
or manual joins.
12. How do you define and use custom exceptions in
Python, and integrate them with logging?
import logging
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(message)s')
try:
process_salary(-50000)
except InvalidSalaryError as e:
logging.exception("Handled exception:")
Explanation:
Tip:
Use custom exceptions to clarify error intent and logging to trace and debug errors without
breaking execution flow — especially important in APIs, ETL jobs, or multi-layer systems.
13. What is the difference between is and == in Python?
a = [1, 2, 3]
b=a
c = [1, 2, 3]
Explanation:
• == checks value equality — whether the contents are the same.
• is checks identity — whether two variables point to the same object in memory.
Tip:
Use == to compare values, and is to check identity (e.g., if x is None). Overusing is for equality
checks is a common bug in beginner code.
14. How do you check if a key exists in a dictionary?
my_dict = {'name': 'Alice', 'age': 25}
Explanation:
Tip:
Avoid using try...except KeyError for control flow unless absolutely necessary — use 'key' in dict
instead.
Explanation:
• List comprehension is a compact way to build lists using a single line of code.
• It’s equivalent to writing a for loop and appending to a list.
Tip:
Use list comprehensions for cleaner, faster code — and add conditions like [x for x in range(10) if
x % 2 == 0] for filtered results.
16. How do you remove duplicates from a list?
my_list = [1, 2, 2, 3, 4, 4, 5]
unique = list(set(my_list))
print(unique) # Output: [1, 2, 3, 4, 5] (order not guaranteed)
Explanation:
Converting a list to a set removes duplicates since sets can’t have repeating values.
Tip:
To preserve order, use:
list(dict.fromkeys(my_list))
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Score': [85, 90, 95]
})
print(df)
Explanation:
Tip:
When applying functions across rows with multiple columns involved, use axis=1:
df.apply(lambda row: row['Score'] + len(row['Name']), axis=1)
18. How do you use groupby().transform() to perform group-
level operations while keeping original row structure?
import pandas as pd
df = pd.DataFrame({
'Team': ['A', 'A', 'B', 'B'],
'Player': ['P1', 'P2', 'P3', 'P4'],
'Score': [10, 20, 30, 40]
})
print(df)
Explanation:
Tip:
Use transform() for feature engineering, like z-score, ratio, or percent contribution
within a group — while keeping the DataFrame intact.
19. How do you use .pipe() in pandas to write clean,
modular transformation chains?
Assume a transactions table:
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Score': [82, 91, 78]
})
# Another transformation
def boost_scores(df, increment):
df['Score'] += increment
return df
print(result)
Explanation:
• .pipe() allows you to chain custom functions in a clear, functional style — each function
receives the DataFrame as input.
• Unlike deeply nested function calls, pipe() maintains readability.
• Parameters like increment=5 are passed after the DataFrame.
Tip:
Use .pipe() to build modular, testable ETL or preprocessing steps — especially useful when
functions are reused across pipelines or notebooks.
Data Analysis/Scenario-Based Questions
20. Scenario: You receive a list of user login records. Some users
logged in multiple times on the same day. Write a Python
script to find users who logged in more than once per day.
logins = [
("alice", "2024-05-10"),
("bob", "2024-05-10"),
("alice", "2024-05-10"),
("charlie", "2024-05-11"),
("alice", "2024-05-11"),
("bob", "2024-05-11"),
("bob", "2024-05-11")
]
Explanation:
We use a defaultdict(int) to count (user, date) pairs.
This simulates a grouping operation — similar to groupby in pandas but done in pure Python.
Finally, we filter keys where the count exceeds 1.
Tip:
Scenario questions like this test your ability to simulate SQL or pandas logic using core Python.
Prioritize clarity: use collections like defaultdict or Counter for clean logic instead of nested loops.
21. Scenario: You are given a system log file. Each line contains
a timestamp and a log level. Your task is to parse the file and
extract all lines where the log level is "ERROR"
def extract_error_logs(file_path):
with open(file_path, 'r') as f:
lines = f.readlines()
# Writing the logs to a file (in real case, logs.txt would already exist)
with open('logs.txt', 'w') as f:
f.write(sample_logs.strip())
Explanation:
• This script reads the file line-by-line and filters entries that contain "[ERROR]".
• .strip() removes newline characters or extra spaces.
• This is a typical pre-processing step before sending alerts, logging metrics, or dashboarding.
22. Scenario: You are given a list of user records. Some names
have extra spaces, inconsistent casing, or are missing. Write a
script to clean the data.
Assume a transactions table:
raw_users = [
{'name': ' Alice ', 'email': 'alice@example.com'},
{'name': 'bob', 'email': 'bob@example.com'},
{'name': None, 'email': 'charlie@example.com'},
{'name': ' CHARLIE', 'email': 'charlie2@example.com'},
{'name': 'david', 'email': 'DAVID@example.com'},
]
def clean_user_data(users):
cleaned = []
for user in users:
name = user['name']
if name is None:
continue # skip records with missing names
clean_name = name.strip().title()
cleaned.append({
'name': clean_name,
'email': user['email'].lower()
})
return cleaned
cleaned_users = clean_user_data(raw_users)
for user in cleaned_users:
print(user)
Explanation:
• .strip() removes unwanted whitespace from names.
• .title() standardizes name casing (e.g., " CHARLIE" → "Charlie").
• .lower() ensures email addresses are case-insensitive.
• None values are filtered out early to avoid downstream errors.
Tip:
In interviews, emphasize defensive programming — always check for None, invalid types, or
unexpected formats when cleaning raw data. You’ll stand out if you mention reusability, like
wrapping it in a function or pipeline step.