Fundamentals of Data Analytics - GROUP 4
Fundamentals of Data Analytics - GROUP 4
Assignment Report
Topic: Analysis of Household Consumer Expenditure
Submitted by – Group 4
1
Table of Contents
Introduction...............................................................................................3
Objective of the Analysis..........................................................................3
Getting the data.........................................................................................4
Data Dictionary.........................................................................................5
About the Data –Descriptive Statistics.....................................................6
Analytics...................................................................................................8
Result.......................................................................................................10
Visualization............................................................................................13
Recommendations and Predictions .........................................................15
Conclusion...............................................................................................17
References................................................................................................18
2
Introduction
Household consumer expenditure (HCE) is expenditure incurred by households on consumption
of goods and services. Household consumer expenditure (HCE) during a specified period, called
the reference period, may be defined as the following:
(a) expenditure incurred by households on 'consumption goods and services' during the
reference period
(b) imputed value of goods and services produced as outputs of household (proprietary or
partnership) enterprises owned by households and used by their members themselves
during the reference period
(c) imputed value of goods and services received by households as remuneration in kind
during the reference period
(d) imputed value of goods and services received by households through social transfers in
kind received from government units or non-profit institutions serving households
(NPISHs) and used by households during the reference period.
Objective of the consumer expenditure survey (CES): Firstly, as an indicator of level of living,
monthly per capita expenditure (MPCE) is both simple and universally applicable. Average
MPCE of any sub-population of the country (any region or population group) is a single number
that summarizes the level of living of that population. Apart from these major uses of the CES,
the food (quantity) consumption data are used to study the level of nutrition of different regions,
and disparities therein. Further, the budget shares of a commodity at different MPCE levels are
used by economists and market researchers to determine the elasticity (responsiveness) of
demand to income increases.
• Unde
• rstanding consumption trends for Male and Female in a region for all household goods
and comparing to check for significant difference between their consumption.
• Understanding the relationship between several consumer expenditure variables like
entertainment and education expenditure.
This measurement can be useful to understand demand patterns of various household products as
consumer expenditure can be directly correlated to the consumer demand of the products.
3
Getting the data
To import the dataset of Household Consumer Expenditure from the National Data Archive
available on the NADA website (https://microdata.gov.in/nada43/index.php/home), several steps
are followed to ensure its usability for analysis:
1. Export from National Data Archive in Nesstar Explorer: Initially, the data is retrieved
from the National Data Archive using Nesstar Explorer. Nesstar Explorer is a tool commonly
used for accessing and managing datasets, providing a user-friendly interface for data
exploration.
3. Conversion for Analytical Purposes: Once exported to Excel, the dataset undergoes
conversion to a format conducive to analytics. This step involved cleaning the data, transforming
variables, and restructuring the dataset as needed to facilitate statistical analysis, visualization, or
modelling tasks.
The dataset from the NSS 68th round (July 2011-June 2012) is prepared for analysis, offering
insights into household spending and economic behaviour. Key points:
• Originating from the NSS 68th round, it provides a snapshot of household consumer
expenditure during July 2011 to June 2012.
• Accessed via Nesstar Explorer from the National Data Archive, ensuring ease of use.
• Exported to an Excel workbook for manipulation, analysis, and visualization.
• Pre-processed to address data quality issues, making it suitable for statistical analysis or
modelling.
• Vital for informing policy decisions, academic research, and understanding household
welfare.
4
Data Dictionary:
A data dictionary is a structured repository that provides a comprehensive description of the data
elements used in a database or information system. It typically includes metadata such as data
types, field names, definitions, constraints, and relationships, serving as a reference for
understanding and managing the database schema and its contents.
5
About the Data - Descriptive Statistics
in Age total Fuel_ medic entert minor_ hhd_co consum consu medi Durab Total_ Educati
de _food light al_noni ainme durabl nsuma er_servi mer_ta cal_in le_goo nonfo on_exp
x nst nt es ble ces xes sti ds od end
co 591 5911 5911 59119 59119 59119 59119 59119 59119 5911 59119 59119 59119
un 19 9 9 9
t
m 46. 619.0 108.2 49.382 13.51 4.4056 25.478 63.9372 3.4970 17.77 79.37 576.1 36.742
ea 581 4768 5512 24976 50714 07321 53355 7899 05707 5535 22345 67761 76224
n 35 48 01 8 25 8 1
st 13. 336.4 72.41 163.54 37.53 44.806 24.105 101.594 11.153 525.4 2304. 2525. 224.07
d 244 7999 0350 56892 36958 48716 57599 4696 5641 5876 52724 54776 57856
19 78 71 1 56 6
mi 2 0 0 0 0 0 0 0 0 0 0 0 0
n
25 36 422.2 66 0 0 0 12.5 18 0 0 0 226.8 0
% 5 33333
3
50 45 551.6 93.33 16 0 0 20 40.5 0 0 0 361.6 1.25
% 3333
33
75 55 728.2 131.5 50 20 0 31.25 76.6666 2.8571 0 7.5 605.5 27.5
% 5 6667 42857
m 100 1511 3148 17500 4750 6377.8 2285 7740 575 9000 40250 40411 41194
ax 6 57143 0 0 0
• Second-hand purchases of clothing, footwear, books, durables.
1. The highest Average expenditure was for category total food, indicating inelastic demand
of basic food items like cereals, milk, pulses, salt and sugar.
6
2. Medical non- instutional has higher average consumer expenditure than Medical
institutional, indicating lack of health insurance among Indian masses
FIG 1
Histogram
• The distribution appears right-skewed, with a longer tail towards higher expenditure
values. This indicates that most people spend less, while a smaller proportion spends
significantly more.
• It's difficult to discern specific spending patterns due to the scale on the x-axis.
7
Boxplot
• The Boxplot confirms the rightward skew. The median (line in the middle of the box) is
left, and the whisker extends further towards higher values.
• The interquartile range (IQR), depicted by the box, shows a spread in expenditure. There
are outliers, data points beyond the whiskers, indicating high spending in certain
categories.
• The normal Q-Q plot suggests the data may not follow a normal distribution. The data
points fall below the diagonal line at higher expenditure values, indicating a right skew as
seen in the histogram and boxplot.
Overall, the data likely reflects a right-skewed distribution of consumer spending across various
categories. A small portion of the population spends significantly more than the rest on certain
goods and services.
Analytics
Here, we took Manipur (State Code 14) to do our analysis. Our first objective was to understand
difference between consumer expenditure patterns of male and female.
After cleansing the data, we used python for running statistical analysis on the data. Following
steps were followed during the analysis.:
1. Importing Libraries:
8
o matplotlib.pyplot (as plt): Used for creating various types of plots and
visualizations.
o seaborn (as sns): Built on top of Matplotlib, Seaborn provides additional statistical
plotting capabilities.
o os: Provides functions for interacting with the operating system.
o warnings: Used to handle warning messages during the code execution.
3. Data Exploration:
• The code performs various operations to explore and understand the loaded data:
o Checking unique values of the 'state' column in the nss dataset.
o Subsetting data related to Manipur by filtering records where the 'state' column
matches the code for Manipur.
o Checking for outliers in the 'mpc_mrp' column using a boxplot visualization.
o Handling outliers by removing records where the 'mpc_mrp' value exceeds a
certain threshold.
4. Statistical Analysis:
• Statistical tests are conducted to analyse relationships and differences within the data:
o T-tests are performed to compare expenditure between different demographic
groups, such as male and female.
o ANOVA (Analysis of Variance) tests are conducted to analyse differences in
expenditure among multiple groups.
9
• The script checks for missing values in `Df1` and replaces them with the mean using the
`na.aggregate() ` function from the `zoo` package.
3. Subsetting Data for Manipur:
• Data specific to Manipur is extracted from `Df1` based on the state code (14) and stored
in a new DataFrame called `Manipur`.
4. Identifying and Handling Outliers:
• Outliers in expenditure (`mpc_mrp`) data are identified using histogram, boxplot, and
normal Q-Q plot visualizations.
• Identified outliers are treated.
5. Summarizing Expenditure by District:
• Expenditure data is summarized by district, and the top three and bottom three districts in
terms of expenditure are identified.
6. Statistical Analysis:
• The script calculates the difference between the mean expenditure on education and
entertainment.
• A paired t-test is performed to determine if there is a significant difference between
education and entertainment expenditures.
• Simple linear regression (OLS) analysis is conducted with education expenditure as the
dependent variable and entertainment expenditure as the independent variable.
• The results of the regression analysis are summarized, including coefficients, standard
errors, t-values, p-values, and R-squared values.
Results
T-test Results
• T-statistics: 1.3451
• P-value: 0.1788
Regression Results
10
Dep. Variable: mpc_mrp R-squared: 0.001
Df Model: 1
11
Prob (Omnibus): 0.000 Jarque-Bera (JB): 35866.957
Anova Results
sum_sq df F PR(>F)
C(Sex) 3.630348e+05 1.809229 0.178823
1.0
Both the t-test and regression results suggest that the variable Sex is not statistically significant
in explaining the variance in the dependent variable mpc_mrp.
Regression Results:
12
Visualization:
Here is a Histogram, showing total consumption of both food and other household consumables. District
index used in this is:
According to this chart, most of the consumption is focused in one region, Imphal, which is the source of
major demand in the state. This aligns with the population data of Manipur. The second chart focuses on
the top three distre measuring their MPCE (Monthly Per Capita Expenditure).
13
FIG 2
FIG 3
14
Recommendations and Predictions:
Understanding the expenditure patterns within Manipur is crucial for policymakers and
stakeholders. By identifying districts with high expenditure, we can infer areas of greater
economic activity, potential affluence, and consumer demand. This information could guide
investment decisions, infrastructure development, and resource allocation to capitalize on
economic opportunities in these districts. Conversely, districts with lower expenditure may
signal economic challenges, such as lower income levels or limited access to goods and services.
Addressing the underlying causes of lower expenditure in these areas could involve targeted
interventions aimed at stimulating economic growth, reducing poverty, and improving living
standards.
Although the statistical tests did not find significant differences in expenditure between genders,
a deeper investigation into gender-based spending patterns are warranted. Exploring underlying
socio-economic factors that may influence spending behaviors differently between males and
females could provide valuable insights. Factors such as employment status, household roles,
access to resources, and cultural norms might contribute to variations in spending patterns.
Understanding these dynamics can inform the design of gender-sensitive policies and programs
aimed at promoting economic empowerment, reducing gender disparities, and fostering inclusive
growth.
3. Policy Implications:
Policymakers can leverage the findings from this analysis to tailor economic policies and welfare
programs more effectively to the diverse needs of Manipur's population. For instance, districts
exhibiting higher expenditure levels may require policies focused on supporting local businesses,
improving infrastructure, and enhancing access to quality education and healthcare services. In
contrast, districts with lower expenditure may benefit from targeted interventions aimed at
boosting employment opportunities, enhancing social safety nets, and addressing infrastructural
gaps to stimulate economic development and uplift living standards.
15
4. Data Quality and Further Analysis:
Ensuring the quality and comprehensiveness of data is critical for robust analysis and informed
decision-making. Further exploration of factors beyond gender, such as age, education level,
occupation, and household composition, can provide a more nuanced understanding of
expenditure patterns. Conducting more sophisticated analyses, such as regression models or
machine learning algorithms, can uncover complex relationships and identify key determinants
influencing expenditure behavior. Additionally, ongoing data collection efforts and collaboration
with relevant stakeholders can help enrich the dataset and enable more comprehensive analysis
in the future.
5. Longitudinal Analysis:
A longitudinal analysis spanning multiple time periods can offer valuable insights into the
dynamics of expenditure patterns over time. By tracking changes and trends in expenditure
behavior, policymakers can better anticipate future needs, identify emerging challenges, and
assess the effectiveness of past interventions. Longitudinal data can also facilitate the
identification of causal relationships and the evaluation of policy impacts, enabling evidence-
based decision-making and more efficient resource allocation for long-term planning and
sustainable development initiatives in Manipur.
16
Conclusion
In conclusion, the analytical methods employed in this study have provided valuable insights into
household consumer expenditure patterns in Manipur. While the statistical tests did not yield
significant differences in expenditure between genders, the application of T-tests and regression
analysis has enhanced our understanding of spending behaviour within the region.
Through exploratory data analysis, outlier detection, and statistical modelling, we have identified
key factors influencing expenditure patterns and highlighted potential areas for further
investigation. The visualization techniques employed, including histograms and boxplots, have
effectively conveyed the distribution and variability of expenditure data across districts.
Despite the limitations encountered, such as the lack of significant gender-based differences, the
analytical approach adopted has facilitated evidence-based decision-making and provided a
foundation for future research and policy development. By leveraging advanced statistical
techniques and robust data analysis methodologies, policymakers can refine interventions, target
resources more effectively, and address socio-economic disparities within Manipur.
Moving forward, continued efforts to improve data quality, expand analytical capabilities, and
incorporate longitudinal analysis will be essential for generating deeper insights and guiding
more informed policy interventions. By harnessing the power of analytics, stakeholders can drive
positive change, promote sustainable development, and enhance the well-being of communities
in Manipur.
17
References
• https://microdata.gov.in/nada43/index.php/catalog/123
• https://www.qualtrics.com/au/experience-
management/research/anova/#:~:text=ANOVA%2C%20or%20Analysis%20of%20Variance,more
%20unrelated%20samples%20or%20groups.
• https://www.investopedia.com/terms/t/t-test.asp#:~:text=Error%20Code%3A%20100013)-
,What%20Is%20a%20T%2DTest%3F,flipping%20a%20coin%20100%20times.
• https://www.hackerearth.com/practice/machine-learning/linear-regression/multivariate-linear-
regression-1/tutorial/
• https://www.learnpython.org/
• https://www.w3schools.com/r/
• https://pib.gov.in/
18