0% found this document useful (0 votes)
41 views18 pages

Fundamentals of Data Analytics - GROUP 4

The document analyzes household consumer expenditure data from India. It describes obtaining the data, creating a data dictionary, and performing descriptive statistics. The analysis will focus on comparing consumption trends between male and female consumers and understanding relationships between expenditure variables.

Uploaded by

Yagyansh Kapoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views18 pages

Fundamentals of Data Analytics - GROUP 4

The document analyzes household consumer expenditure data from India. It describes obtaining the data, creating a data dictionary, and performing descriptive statistics. The analysis will focus on comparing consumption trends between male and female consumers and understanding relationships between expenditure variables.

Uploaded by

Yagyansh Kapoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

FUNDAMENTALS OF DATA ANALYTICS

Assignment Report
Topic: Analysis of Household Consumer Expenditure

Faculty- Dr. Vinay Kumar

Submitted by – Group 4

HARSH RAGHUWANSHI (MB23002)


ASHIT SINGH (MB23047)
DIKSHANT PANDEY (MB23035)
YAGYANSH KAPOOR (MB23021)
PRATYUSH KISHAN (MB23023)

1
Table of Contents
Introduction...............................................................................................3
Objective of the Analysis..........................................................................3
Getting the data.........................................................................................4
Data Dictionary.........................................................................................5
About the Data –Descriptive Statistics.....................................................6
Analytics...................................................................................................8
Result.......................................................................................................10
Visualization............................................................................................13
Recommendations and Predictions .........................................................15
Conclusion...............................................................................................17
References................................................................................................18

2
Introduction
Household consumer expenditure (HCE) is expenditure incurred by households on consumption
of goods and services. Household consumer expenditure (HCE) during a specified period, called
the reference period, may be defined as the following:

(a) expenditure incurred by households on 'consumption goods and services' during the
reference period
(b) imputed value of goods and services produced as outputs of household (proprietary or
partnership) enterprises owned by households and used by their members themselves
during the reference period
(c) imputed value of goods and services received by households as remuneration in kind
during the reference period
(d) imputed value of goods and services received by households through social transfers in
kind received from government units or non-profit institutions serving households
(NPISHs) and used by households during the reference period.

Objective of the consumer expenditure survey (CES): Firstly, as an indicator of level of living,
monthly per capita expenditure (MPCE) is both simple and universally applicable. Average
MPCE of any sub-population of the country (any region or population group) is a single number
that summarizes the level of living of that population. Apart from these major uses of the CES,
the food (quantity) consumption data are used to study the level of nutrition of different regions,
and disparities therein. Further, the budget shares of a commodity at different MPCE levels are
used by economists and market researchers to determine the elasticity (responsiveness) of
demand to income increases.

Objective of the Analysis


During this project, we are focusing on two objectives:

• Unde
• rstanding consumption trends for Male and Female in a region for all household goods
and comparing to check for significant difference between their consumption.
• Understanding the relationship between several consumer expenditure variables like
entertainment and education expenditure.

This measurement can be useful to understand demand patterns of various household products as
consumer expenditure can be directly correlated to the consumer demand of the products.

3
Getting the data
To import the dataset of Household Consumer Expenditure from the National Data Archive
available on the NADA website (https://microdata.gov.in/nada43/index.php/home), several steps
are followed to ensure its usability for analysis:

1. Export from National Data Archive in Nesstar Explorer: Initially, the data is retrieved
from the National Data Archive using Nesstar Explorer. Nesstar Explorer is a tool commonly
used for accessing and managing datasets, providing a user-friendly interface for data
exploration.

2. Export to Excel Workbook: Subsequently, the specific data pertaining to Household


Consumer Expenditure from the NSS 68th round (July 2011-June 2012) is extracted from
Nesstar Explorer and exported to an Excel workbook. This process involves selecting the
relevant variables and exporting them in a tabular format.

3. Conversion for Analytical Purposes: Once exported to Excel, the dataset undergoes
conversion to a format conducive to analytics. This step involved cleaning the data, transforming
variables, and restructuring the dataset as needed to facilitate statistical analysis, visualization, or
modelling tasks.

The dataset from the NSS 68th round (July 2011-June 2012) is prepared for analysis, offering
insights into household spending and economic behaviour. Key points:

• Originating from the NSS 68th round, it provides a snapshot of household consumer
expenditure during July 2011 to June 2012.
• Accessed via Nesstar Explorer from the National Data Archive, ensuring ease of use.
• Exported to an Excel workbook for manipulation, analysis, and visualization.
• Pre-processed to address data quality issues, making it suitable for statistical analysis or
modelling.
• Vital for informing policy decisions, academic research, and understanding household
welfare.

4
Data Dictionary:
A data dictionary is a structured repository that provides a comprehensive description of the data
elements used in a database or information system. It typically includes metadata such as data
types, field names, definitions, constraints, and relationships, serving as a reference for
understanding and managing the database schema and its contents.

Index Name label Type Format


1 Age Age (Years) Discrete Character
2 Total food expenditure Total food (rupees) Discrete Character
3 Consumption of energy Fuel_light (Rupees) Discrete Character
4 Expenditure on medical medical_noninst Discrete Character
where person does not (Rupees)
visit to hospital or
another medical
institute
5 Expenditure on entertainment Discrete Character
entertainment (Rupees)
6 Other minor durable minor_durables Discrete Character
goods (small (Rupees)
appliances, kitchen
utensils etc.)
7 other household hhd_consumable Discrete Character
consumables (Cleaning (Rupees)
supplies, Paper
products etc.)
8 Consumer_services(tra consumer_services Discrete Character
vel tourism, (Rupees)
communication etc.)
9 Expenditure on medical medical_insti Discrete Character
where person visits to (Rupees)
hospital or another
medical institute

10 Durable_goods(Applia Durable_goods Discrete Character


nces, Furnitures etc.) (Rupees)
11 Total expenditure Total_nonfood Discrete Character
excluding expenditure (Rupees)
on food
12 Expenditure on Education_expend Discrete Character
education (Rupees)

5
About the Data - Descriptive Statistics

Consumer Expenditure provides information on the below subject areas:

• Expenditure on consumption goods and services.


• Imputed value of self-consumed produce of own farm or other household enterprise.
• Any household expenses reimbursed by employer (medical, electricity, LTC, etc.).
• Cost of minor repairs of assets & durable goods.
• All compulsory payments to schools and colleges including so-called “donations”
• Goods and services received as payment in kind or received free from employer (incl.
imputed rent of quarters).
• Payments for medical care reimbursed or directly paid by the insurance company.

in Age total Fuel_ medic entert minor_ hhd_co consum consu medi Durab Total_ Educati
de _food light al_noni ainme durabl nsuma er_servi mer_ta cal_in le_goo nonfo on_exp
x nst nt es ble ces xes sti ds od end
co 591 5911 5911 59119 59119 59119 59119 59119 59119 5911 59119 59119 59119
un 19 9 9 9
t
m 46. 619.0 108.2 49.382 13.51 4.4056 25.478 63.9372 3.4970 17.77 79.37 576.1 36.742
ea 581 4768 5512 24976 50714 07321 53355 7899 05707 5535 22345 67761 76224
n 35 48 01 8 25 8 1
st 13. 336.4 72.41 163.54 37.53 44.806 24.105 101.594 11.153 525.4 2304. 2525. 224.07
d 244 7999 0350 56892 36958 48716 57599 4696 5641 5876 52724 54776 57856
19 78 71 1 56 6
mi 2 0 0 0 0 0 0 0 0 0 0 0 0
n
25 36 422.2 66 0 0 0 12.5 18 0 0 0 226.8 0
% 5 33333
3
50 45 551.6 93.33 16 0 0 20 40.5 0 0 0 361.6 1.25
% 3333
33
75 55 728.2 131.5 50 20 0 31.25 76.6666 2.8571 0 7.5 605.5 27.5
% 5 6667 42857
m 100 1511 3148 17500 4750 6377.8 2285 7740 575 9000 40250 40411 41194
ax 6 57143 0 0 0
• Second-hand purchases of clothing, footwear, books, durables.

Key findings from Descriptive Statistics

1. The highest Average expenditure was for category total food, indicating inelastic demand
of basic food items like cereals, milk, pulses, salt and sugar.

6
2. Medical non- instutional has higher average consumer expenditure than Medical
institutional, indicating lack of health insurance among Indian masses

Total value count of whole NSSO Dataset: 3310664

Total number of missing values: 58215

FIG 1

Histogram

• The distribution appears right-skewed, with a longer tail towards higher expenditure
values. This indicates that most people spend less, while a smaller proportion spends
significantly more.
• It's difficult to discern specific spending patterns due to the scale on the x-axis.

7
Boxplot

• The Boxplot confirms the rightward skew. The median (line in the middle of the box) is
left, and the whisker extends further towards higher values.
• The interquartile range (IQR), depicted by the box, shows a spread in expenditure. There
are outliers, data points beyond the whiskers, indicating high spending in certain
categories.

Normal Q-Q Plot

• The normal Q-Q plot suggests the data may not follow a normal distribution. The data
points fall below the diagonal line at higher expenditure values, indicating a right skew as
seen in the histogram and boxplot.

Overall, the data likely reflects a right-skewed distribution of consumer spending across various
categories. A small portion of the population spends significantly more than the rest on certain
goods and services.

Analytics

Here, we took Manipur (State Code 14) to do our analysis. Our first objective was to understand
difference between consumer expenditure patterns of male and female.

After cleansing the data, we used python for running statistical analysis on the data. Following
steps were followed during the analysis.:

1. Importing Libraries:

• The code begins by importing necessary Python libraries:


o pandas (as pd): Utilized for data manipulation and analysis, especially handling
tabular data.
o numpy (as np): Essential for numerical operations and computations.

8
o matplotlib.pyplot (as plt): Used for creating various types of plots and
visualizations.
o seaborn (as sns): Built on top of Matplotlib, Seaborn provides additional statistical
plotting capabilities.
o os: Provides functions for interacting with the operating system.
o warnings: Used to handle warning messages during the code execution.

2. Reading the Data:

• Two datasets are loaded:


o nss: Contains NSSO data.
o Manipur_dist: Appears to be a dataset containing names of districts in Manipur.

3. Data Exploration:

• The code performs various operations to explore and understand the loaded data:
o Checking unique values of the 'state' column in the nss dataset.
o Subsetting data related to Manipur by filtering records where the 'state' column
matches the code for Manipur.
o Checking for outliers in the 'mpc_mrp' column using a boxplot visualization.
o Handling outliers by removing records where the 'mpc_mrp' value exceeds a
certain threshold.

4. Statistical Analysis:

• Statistical tests are conducted to analyse relationships and differences within the data:
o T-tests are performed to compare expenditure between different demographic
groups, such as male and female.
o ANOVA (Analysis of Variance) tests are conducted to analyse differences in
expenditure among multiple groups.

For understanding relationship between entertainment and education expenditure, we used R


programming for linear regression analysis.

1. Data Loading and Libraries:


• Several libraries are imported, including `tidyverse`, `dplyr`, `tidyr`, and `zoo`, for data
manipulation and analysis.
• Two datasets (`NSSO68.csv` and `Untitled spreadsheet-2.csv`) are loaded into
DataFrames (`Df1` and `d0`)
2. Handling Missing Values:

9
• The script checks for missing values in `Df1` and replaces them with the mean using the
`na.aggregate() ` function from the `zoo` package.
3. Subsetting Data for Manipur:
• Data specific to Manipur is extracted from `Df1` based on the state code (14) and stored
in a new DataFrame called `Manipur`.
4. Identifying and Handling Outliers:
• Outliers in expenditure (`mpc_mrp`) data are identified using histogram, boxplot, and
normal Q-Q plot visualizations.
• Identified outliers are treated.
5. Summarizing Expenditure by District:
• Expenditure data is summarized by district, and the top three and bottom three districts in
terms of expenditure are identified.
6. Statistical Analysis:
• The script calculates the difference between the mean expenditure on education and
entertainment.
• A paired t-test is performed to determine if there is a significant difference between
education and entertainment expenditures.
• Simple linear regression (OLS) analysis is conducted with education expenditure as the
dependent variable and entertainment expenditure as the independent variable.
• The results of the regression analysis are summarized, including coefficients, standard
errors, t-values, p-values, and R-squared values.

Results
T-test Results

• T-statistics: 1.3451
• P-value: 0.1788

Regression Results

Dependent Variable: Sex

Independent Variable: MPC_MRP

10
Dep. Variable: mpc_mrp R-squared: 0.001

Model: OLS Adj. R-squared: 0.001

Method: Least Squares F-statistic: 1.809

Date: Sat, 23 Jul 2022 Prob (F-statistic): 0.179

Time: 12:38:50 Log-Likelihood: -10351.

No. Observations: 1376 AIC: 2.071e+04

Df Residuals: 1374 BIC: 2.072e+04

Df Model: 1

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Intercept 1072.7998 12.685 84.572 0.000 1047.915 1097.684

C(Sex) -55.7257 41.429 -1.345 0.179 -136.997 25.546


[T.2]

Omnibus: 1136.206 Durbin-Watson: 1.510

11
Prob (Omnibus): 0.000 Jarque-Bera (JB): 35866.957

Skew: 3.659 Prob (JB): 0.00

Kurtosis: 26.917 Cond. No. 3.46

Anova Results

sum_sq df F PR(>F)
C(Sex) 3.630348e+05 1.809229 0.178823
1.0

Residual 1374.0 NaN NaN


2.757029e+08

Both the t-test and regression results suggest that the variable Sex is not statistically significant
in explaining the variance in the dependent variable mpc_mrp.

Regression Results:

Dependent Variable: Education Expenditure

Independent Variable: Entertainment

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 38.6584 2.5292 15.285 < 2e-16 ***


## entertainment 1.0546 0.1302 8.101 1.19e-15 **

12
Visualization:

Here is a Histogram, showing total consumption of both food and other household consumables. District
index used in this is:

Index District Name


0 Bishnupur
1 Chandel
2 Churachandpur
3 Imphal
4 Senpati
5 Tamegiong
6 Thoubal
7 Ukhrul

According to this chart, most of the consumption is focused in one region, Imphal, which is the source of
major demand in the state. This aligns with the population data of Manipur. The second chart focuses on
the top three distre measuring their MPCE (Monthly Per Capita Expenditure).

13
FIG 2

FIG 3

14
Recommendations and Predictions:

1. Expenditure Patterns Analysis:

Understanding the expenditure patterns within Manipur is crucial for policymakers and
stakeholders. By identifying districts with high expenditure, we can infer areas of greater
economic activity, potential affluence, and consumer demand. This information could guide
investment decisions, infrastructure development, and resource allocation to capitalize on
economic opportunities in these districts. Conversely, districts with lower expenditure may
signal economic challenges, such as lower income levels or limited access to goods and services.
Addressing the underlying causes of lower expenditure in these areas could involve targeted
interventions aimed at stimulating economic growth, reducing poverty, and improving living
standards.

2. Gender-Based Spending Analysis:

Although the statistical tests did not find significant differences in expenditure between genders,
a deeper investigation into gender-based spending patterns are warranted. Exploring underlying
socio-economic factors that may influence spending behaviors differently between males and
females could provide valuable insights. Factors such as employment status, household roles,
access to resources, and cultural norms might contribute to variations in spending patterns.
Understanding these dynamics can inform the design of gender-sensitive policies and programs
aimed at promoting economic empowerment, reducing gender disparities, and fostering inclusive
growth.

3. Policy Implications:

Policymakers can leverage the findings from this analysis to tailor economic policies and welfare
programs more effectively to the diverse needs of Manipur's population. For instance, districts
exhibiting higher expenditure levels may require policies focused on supporting local businesses,
improving infrastructure, and enhancing access to quality education and healthcare services. In
contrast, districts with lower expenditure may benefit from targeted interventions aimed at
boosting employment opportunities, enhancing social safety nets, and addressing infrastructural
gaps to stimulate economic development and uplift living standards.

15
4. Data Quality and Further Analysis:

Ensuring the quality and comprehensiveness of data is critical for robust analysis and informed
decision-making. Further exploration of factors beyond gender, such as age, education level,
occupation, and household composition, can provide a more nuanced understanding of
expenditure patterns. Conducting more sophisticated analyses, such as regression models or
machine learning algorithms, can uncover complex relationships and identify key determinants
influencing expenditure behavior. Additionally, ongoing data collection efforts and collaboration
with relevant stakeholders can help enrich the dataset and enable more comprehensive analysis
in the future.

5. Longitudinal Analysis:

A longitudinal analysis spanning multiple time periods can offer valuable insights into the
dynamics of expenditure patterns over time. By tracking changes and trends in expenditure
behavior, policymakers can better anticipate future needs, identify emerging challenges, and
assess the effectiveness of past interventions. Longitudinal data can also facilitate the
identification of causal relationships and the evaluation of policy impacts, enabling evidence-
based decision-making and more efficient resource allocation for long-term planning and
sustainable development initiatives in Manipur.

16
Conclusion

In conclusion, the analytical methods employed in this study have provided valuable insights into
household consumer expenditure patterns in Manipur. While the statistical tests did not yield
significant differences in expenditure between genders, the application of T-tests and regression
analysis has enhanced our understanding of spending behaviour within the region.

Through exploratory data analysis, outlier detection, and statistical modelling, we have identified
key factors influencing expenditure patterns and highlighted potential areas for further
investigation. The visualization techniques employed, including histograms and boxplots, have
effectively conveyed the distribution and variability of expenditure data across districts.

Despite the limitations encountered, such as the lack of significant gender-based differences, the
analytical approach adopted has facilitated evidence-based decision-making and provided a
foundation for future research and policy development. By leveraging advanced statistical
techniques and robust data analysis methodologies, policymakers can refine interventions, target
resources more effectively, and address socio-economic disparities within Manipur.

Moving forward, continued efforts to improve data quality, expand analytical capabilities, and
incorporate longitudinal analysis will be essential for generating deeper insights and guiding
more informed policy interventions. By harnessing the power of analytics, stakeholders can drive
positive change, promote sustainable development, and enhance the well-being of communities
in Manipur.

17
References

• https://microdata.gov.in/nada43/index.php/catalog/123
• https://www.qualtrics.com/au/experience-
management/research/anova/#:~:text=ANOVA%2C%20or%20Analysis%20of%20Variance,more
%20unrelated%20samples%20or%20groups.
• https://www.investopedia.com/terms/t/t-test.asp#:~:text=Error%20Code%3A%20100013)-
,What%20Is%20a%20T%2DTest%3F,flipping%20a%20coin%20100%20times.
• https://www.hackerearth.com/practice/machine-learning/linear-regression/multivariate-linear-
regression-1/tutorial/
• https://www.learnpython.org/
• https://www.w3schools.com/r/
• https://pib.gov.in/

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy