0% found this document useful (0 votes)
1 views6 pages

Pandasmohali

Pandas is an open-source Python library for data analysis and manipulation, featuring data structures like Series and DataFrame. A DataFrame is a 2D labeled structure that can store various data types, and users can create it from dictionaries, lists, or CSV files. The document also covers advanced functionalities such as multi-indexing, groupby, merging DataFrames, and handling missing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views6 pages

Pandasmohali

Pandas is an open-source Python library for data analysis and manipulation, featuring data structures like Series and DataFrame. A DataFrame is a 2D labeled structure that can store various data types, and users can create it from dictionaries, lists, or CSV files. The document also covers advanced functionalities such as multi-indexing, groupby, merging DataFrames, and handling missing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Beginner Level (Easy)

1. What is Pandas?

Pandas is an open-source data analysis and manipulation library in Python. It provides data
structures like Series (1D) and DataFrame (2D) to handle structured data efficiently.

2. What is a DataFrame in Pandas?

A DataFrame is a 2D labeled data structure, similar to a table in a database or an Excel


spreadsheet, with rows and columns. It can store data of different types.

3. How do you create a Pandas DataFrame?

You can create a DataFrame from a dictionary, a list, or a NumPy array:

python
CopyEdit
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)

4. What is the difference between a Series and a DataFrame?

●​ Series: A one-dimensional array with labeled indices.​

●​ DataFrame: A two-dimensional table with rows and columns, where each column can
have a different data type.​

5. How do you read data from a CSV file into a DataFrame?

You can use the read_csv() function:

python
CopyEdit
df = pd.read_csv('file.csv')
🟡 Intermediate Level
6. How do you select a column from a DataFrame?

You can select a column by its name:

python
CopyEdit
df['column_name']

7. What is the purpose of iloc and loc?

●​ iloc[]: Used to select rows and columns by integer position.​

●​ loc[]: Used to select rows and columns by label.​

8. How do you handle missing data in Pandas?

You can handle missing data using:

●​ df.isna() to check for NaN values.​

●​ df.fillna(value) to fill missing values.​

●​ df.dropna() to remove rows with NaN values.​

9. How do you filter rows in a DataFrame based on a condition?

You can filter rows using boolean indexing:

python
CopyEdit
filtered_df = df[df['column_name'] > 10]
10. What is the difference between apply() and map()?

●​ apply(): Used to apply a function along a DataFrame axis (rows or columns).​

●​ map(): Used to map a function or dictionary to individual elements in a Series.​

11. How do you sort a DataFrame by a column?

You can use the sort_values() method:

python
CopyEdit
df.sort_values(by='column_name', ascending=False)

12. How do you add a new column to an existing DataFrame?

You can add a new column by assigning a value to a new column name:

python
CopyEdit
df['new_column'] = [value1, value2, value3]

🔵 Advanced Level
13. What are multi-indexes in Pandas, and why are they used?

Multi-indexes allow you to work with higher-dimensional data in a 2D DataFrame, making it


easier to handle hierarchical data. You can create a multi-index using set_index() or
pd.MultiIndex.

14. What is the purpose of groupby() in Pandas?

The groupby() function is used to group data based on a column and then apply aggregation
or transformation functions to the grouped data.
15. How do you merge/join DataFrames in Pandas?

You can merge DataFrames using the merge() function:

python
CopyEdit
merged_df = pd.merge(df1, df2, on='common_column', how='inner')

Common join types are inner, outer, left, and right.

16. What is vectorized computation in Pandas?

Vectorized computation refers to performing operations on entire columns or rows without


explicit loops. Pandas uses this approach for efficient computation, e.g., df['column_name']
+ 10.

17. What is the difference between concat() and append()?

●​ concat(): Used to concatenate DataFrames along a particular axis (rows or columns).​

●​ append(): Used to add rows to a DataFrame, but it is less efficient than concat().​

18. How do you pivot a DataFrame?

You can pivot a DataFrame using the pivot() function:

python
CopyEdit
df_pivot = df.pivot(index='col1', columns='col2', values='col3')

19. What is the purpose of crosstab() in Pandas?

crosstab() computes a cross-tabulation (contingency table) of two or more variables:


python
CopyEdit
pd.crosstab(df['column1'], df['column2'])

20. How do you optimize memory usage in Pandas?

●​ Use category dtype for categorical data.​

●​ Downcast numeric columns using pd.to_numeric() with the downcast argument.​

●​ Load only relevant columns with usecols during file reading.​

21. How do you perform time series analysis in Pandas?

You can use pd.to_datetime() to convert a column to datetime type, and use time-based
indexing and resampling:

python
CopyEdit
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.resample('D').sum() # Resample data by day

22. What is query() in Pandas?

The query() function allows you to filter data using a string expression:

python
CopyEdit
df.query('column_name > 10')

23. How do you calculate moving averages in Pandas?

You can use the rolling() function to calculate moving averages:

python
CopyEdit
df['moving_avg'] = df['column_name'].rolling(window=3).mean()

24. How do you handle duplicate rows in a DataFrame?

You can remove duplicates using drop_duplicates():

python
CopyEdit
df.drop_duplicates(inplace=True)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy