Creation of A Wealth Index - 1
Creation of A Wealth Index - 1
Creation of A Wealth Index - 1
Prepared by: Lisa Hjelm, Astrid Mathiassen, Darryl Miller, Amit Wadhwa.
This guidance aims at complementing the Comprehensive Food Security & Vulnerability Analysis
Guidelines (CFSVA 2009) with a practical step-by-step guidance on how to create one, while
providing VAM officers and other food security analysts with a more general background of the
wealth index (WI).
Measurements of wealth
There are several ways in which wealth, economic status of households and living standards can be
measured. Income, expenditure and consumption are three common measurements.
However, there are challenges in collecting and measuring income and expenditure accurately. An
alternative is to use data on asset ownership and housing characteristics and combine this
information into a proxy indicator such as the wealth index, which is created using principal
component analysis (PCA). Asset ownership gives an indication of the longer-term economic status
of a household and is less dependent on short-term economic changes compared with other wealth
or poverty measures.
The wealth index measures relative wealth and, unlike a poverty line, is not an absolute measure of
poverty or wealth. When referring to the wealth of households based on the wealth index we can
talk about poorer and wealthier households but we cannot conclude who is absolutely poor and
wealthy. The wealth index quintiles divide the whole population into five equally large groups, based
on their wealth rank. For example, in an area where only 10% of households fall below the poverty
line, 40% of households will still fall into the two poorest quintiles and therefore be classified as the
poorest.
The research questions related to the wealth index vary according to the different interests of the
surveys. In DHS, the wealth index is chosen because of the major impact that wealth status has on
household level health. It allows the researchers to identify the impact of wealth status on health
outcomes2. For MICS, the wealth index serves a similar purpose in terms of understanding health
outcomes. It is also used to target poverty alleviation programmes and projects3.
1
WFP, Comprehensive Food Security & Vulnerability Analysis Guidelines, 2009 page 211
2
DHS wealth index. See at. http://dhsprogram.com/topics/Wealth-Index.cfm
3
UNICEF(2008). Measuring child poverty. See at
http://www.southampton.ac.uk/ghp3/docs/unicef/presentation2.3.pdf
Urban-rural considerations
One consideration that should be taken into account in relation to the wealth index is that wealth is
characterized by ownership of different types of assets in urban areas compared with rural areas.
Depending on the variables included in the index, the wealth measure can be biased towards urban
or rural households. One solution is to include variables that are valid as proxies of wealth in both
urban and rural areas. For example, if a high percentage of households live in urban areas and few
households practise agriculture, we may consider excluding productive assets and livestock.
If the living conditions in urban and rural areas are very different, another approach can be to create
separate indices for urban and rural areas. The following chart shows the distribution of rural and
urban households in Uganda using the uniform national wealth index. We observe a remarkable
urban-rural divergence in terms of relative wealth status. The analysts may need to assess if this
distribution reflects the subsutantial urban-rural inequality or if it is a result of variable selection
bias. If the latter situation is likely to be true,
we may need to construct a separate wealth Wealth Index quintiles distribution across
index for rural and urban households that take urban and rural areas in Uganda
into account the differences in assets owned. 100%
11%
The question of when to create a separate 90%
1. Select variables
2. Explore variables
a. Frequencies
b. Missing values
3. Recode into binary variables
4. Principal components analysis (PCA)
5. Create wealth index quintiles
6. Graph the index
7. Select the final result and report the variables
Note: Uganda LSMS 08/09 dataset is used to demonstrate the WI creation and SPSS (Statistical
Package for the Social Sciences) procedures in this guidance.
A broad range of variables could be included in the analysis: a greater number can reduce the
sampling bias and generate a better distribution of households. The final list should be country-
specific, and simultaneously capture the differences in ownership among households (see more in
Step 2).
2.a. Explore the variables by running descriptive analysis including a frequency of each variable.
As a first step in exploring the variables to include in the index some basic cleaning of data may be
needed. A household which has missing values for any of the assets will be excluded in the wealth
index construction. If a substantial proportion of missing values is detected, the analysts should
check the data quality again and if possible go back to the enumerators to ensure accurate data
collection and entry.
We need to select the variables that are capable of distinguishing relatively “wealthy” households
and relatively “poor” ones. The rule of thumb is that if a variable/asset is owned by more than 95%
or less than 5% of the sample, it should be excluded from the analysis. For example, knowing that
99.2% of Ugandan households don’t own a generator will not help the analyst to distinguish between
richer and poorer households by this asset ownership (see table below). Thus, this variable will be
excluded from the index.
In the dialogue box, click the variables in the left field that we want to run frequencies for and click
on the right arrow to move the variables into the right field called ‘Variable(s)’. The option
‘Display frequency tables’ at the bottom of the dialogue box is checked as default.
In a similar way to the rule of thumb discussed in 2.a, we run the frequencies for urban and rural
areas separately to determine the variables to create a national wealth index. If there are certain
assets owned by very few in either urban or rural areas, we will consider not including them
because the national index needs to represent both urban and rural households. The inclusion of
productive assets/livestock and land ownership should be reassessed if a high percentage of
households do not practise agriculture or if many households are located in urban areas.
When the dialogue box pops up, select the urban/rural categorical variable (‘urban/rural identifier’
in this database) in the left field and select the option ‘If condition is satisfied’ in the right field.
Once selected, click the button ‘If’ to customize specific conditions for the cases we want to
analyse.
Click ‘Continue’ and repeat the ‘frequency’ process in 2.a. to examine the asset ownership
frequencies of urban households. To analyse rural households, change the ‘if’ conditions into
‘urban/rural identifier=0’ and follow the same procedures as above.
**After steps above, remember to turn off the selection by going to Data Select cases and
click ‘All cases’.
Questions regarding housing characteristics and access to services are commonly categorical
variables with several options. When this is the case a decision has to be made on how to recode
these variables so that there are only two categories. When doing this, identify the alternatives that
are more likely to be found in wealthy households compared with poorer households. How the
variables should be recoded depends on the context of the country and what is more likely to
distinguish poorer households from wealthier households. For instance, if the light source has many
options, such as “none,” “wood fire,” “oil lamp,” “petrol light,” “electricity,” it might be appropriate
to recode this into “none/primitive” for those households that answered “none” or “wood fire” and
“purchased energy source,” for the remaining light sources. In another country, the same options
might be recoded differently: “no electric” versus “electric.”
The choice is based on what is more likely to define wealth and also by looking at the prevalence of
both categories: if the prevalence is between 30 and 70%, then the indicator will probably help
categorize more households than if the prevalence is only 5%. Two similar variables can be combined
if this will result in a summed prevalence of between 30 and 70%.
For sanitation facilities and source of water the UNICEF/WHO standards can be used (see table
below)4. However, the recoding between improved/ not improved is just one possibility. Other
classifications can be used if this is more likely to separate the poorer households from the richer in
the country context. One example is the alternative classification of source of water: bottled water
is regarded as an unimproved source since quantities are not usually large enough to supply a
household, but in reality, those who can afford to buy bottled water, especially in less developed
countries, are often wealthier. So we may consider including bottled water in the improved group.
4
UNICEF&WHO. Progress on sanitation and drinking water.2013 Update. See at
http://apps.who.int/iris/bitstream/10665/81245/1/9789241505390_eng.pdf
+ Surface drinking water sources include river, dam, lake, pond, stream, canal, irrigation
channels.
* Sanitation facilities of an otherwise acceptable type shared between two or more households
are shared sanitation facilities. Shared facilities include public toilets.
When the variables have been reclassified, we assign categories values with 0 and 1. It is
important to keep a record of how the variables have been recoded which the analyst can track
and refer to during the analysis process. The record can also help the analyst adjust the recoding
accordingly over time.
5
Ibid.
When the dialogue box pops up, select the variable (‘crowding’ which indicates the number of
household members per room in this example) we want to recode from the left field and click the
According to the binary classification for ‘crowding’ (see chart on P10), click ‘Range’ in the ‘Old
Value’ field and enter ‘0’ through ‘5’. Then click ‘Value’ in the ‘New Value’ field and enter ‘0’.
Now we recode the ‘number of household members per room ≤ 5’ into ‘0’. Click the button ‘Add’
to record this recoding.
Click ‘Continue’ to return to the dialogue box at the beginning of recoding. You can choose to
recode other variables. Click ‘OK’ to leave after all recoding is completed.
PCA is a ‘data reduction’ procedure. It involves replacing many correlated variables with a set of
principal uncorrelated ‘principal components’ which can explain much of the variance and
represent unobserved characteristics of the population. The objectives of a PCA are: i) to discover
or reduce the dimensionality of the data set and ii) to identify new meaningful underlying
variables. The first principal component explains the largest proportion of the total
variance and it is used as the wealth index to represent the household’s wealth.
In SPSS the factor analysis procedure is used to calculate the principal component. This procedure
first standardizes the indicator variables by calculating the Z-scores. Then the factor coefficient
scores which are also the factor loadings are generated. The indicator values are multiplied by the
loadings and summed to the household wealth index. The wealth index as created is a continuous
variable which can be used in correlations or regression models. The higher the score of the index,
the wealthier the household.
The ‘Factor Analysis’ dialogue box will pop up. Click to select all the variables we want to include
in the factor analysis (all asset variables in this example) and press the arrow to move them into
the right ‘Variables’ field.
Click the ‘Descriptives’ button to enter the ‘Factor Analysis: Descriptives’ dialogue box. Check
Initial Solution, Coefficients, KMO and Bartlett’s test of sphericity and Anti-image. Click
‘Continue’ to return to the dialogue box.
Click ‘Options’. Check ‘Exclude Cases Listwise’ and ‘Sorted by Size’. Click ‘Continue’.
In the correlation matrix of this example, we can see that the variable ‘Boat’ and ‘Own_cattle’ may
be considered for removal.
The Kaiser-Meyer-Olkin Measure of Sampling Adequacy varies between 0 and 1. The values that
are closer to 1 are better. A value of 0.6 is a suggested minimum acceptable value. In our
example, we have a value of 0.803, which is satisfactory.
6
See more interpretation of SPSS PCA outputs at
http://statistics.ats.ucla.edu/stat/spss/output/principal_components.htm.
14155065.78
Approx. Chi-Square
0
Bartlett's Test of
Sphericity df 276
Sig. .000
The Component 1 is used as the wealth index as it accounts for the largest proportion of the
variance. In the example dataset, the PCA generates a variable labelled as ‘REGR factor score 1 for
analysis 1’ which is the wealth index.
Initial Eigenvalues: ‘Eigenvalues are the variances of the principal components. Because we
conducted our principal components analysis on the correlation matrix, the variables are
standardized, which means that each variable has a variance of 1, and the total variance is equal to
the number of variables used in the analysis, in this case, 12’.
Total: ‘This column contains the Eigenvalues. The first component will always account for the most
variance (and hence have the highest Eigenvalue)’.
Extraction sums of squared loadings: ‘The three columns of this half of the table exactly reproduce
the values given on the same row on the left side of the table. The number of rows reproduced on the
right side of the table is determined by the number of principal components whose Eigenvalues are 1
or greater’. (Sources: IDRE, UCLA)
** The construction of the wealth index is an iterative process. To obtain the best results, we
usually need to conduct a few rounds of PCA including or excluding certain variables based on the
factor coefficient scores we see in the PCA outputs.
In the dialogue box that pops up, select the option ‘Weight cases by’. Click on the weighting
variable in the left field and click on the right arrow to move this variable into the right field called
‘Frequency Variable’.
In order to better understand the wealth index, which is a continuous variable, it is useful to
recode the index into a categorical variable. The best way to do it is to rank the WI (the first
variable created from the PCA) into deciles or quintiles, dividing all households into five or ten
equal groups. In the SPSS demonstration, we rank the WI into quintiles.
Select ‘wealth index’ in the left field and click the arrow to move it to the right ‘Variable(s)’ field.
Then click ‘Rank Types’ to open the dialogue box. Check ‘Rank’ and ‘Ntiles’. Set the number of
Ntiles at 5 as we are ranking in quintiles.
Created variablesa
To create the graph, run a cross-tabulation between the new categorical wealth index quintiles or
deciles and the variables (assets and housing characteristics used in the analysis). This will show
the prevalence of households that own the selected assets in each quintile. There should be a positive
relationship between the independent and dependent variable. If there are variables included where
this is not the case, they should be examined further and considered for removal.
In addition, the analysts may find it very useful to do this cross-tabulation by urban and rural areas
separately as a double check on how this national index applies to both places of residence. If the
variables in either rural or urban areas show insignificant or opposite patterns from what we expect,
we may consider reconstructing the index or creating separate ones for both of them.
In the example graph below, ownership of every variable included in the PCA increases as the
quintiles go from poorest to wealthiest. This indicates that the variables included for the PCA are
appropriate.
Clock
70%
Television
60%
Radio
50%
Furniture (bed, table and chair)
40%
Cupboard, cabinet
30%
Mobile phone
20%
Improved sanitation
10%
Improved roofing
0%
Improved floor
Poorest 2 3 4 Wealthiest
Motorized vehicle
WI quintiles
When a satisfactory wealth index has been created this can be used for further analysis in relation
to other indicators.
Filmer and Prichett: “Estimating Wealth Effects without Expenditure Data -- or Tears:
Fimler and Scott, 2008: “Assessing Asset Indices” Policy Research Working Paper 4605, WB
Howe, L.D., Heargreave, J.R., Gabrysch S., and Huttly. S.R.A., 2009: ”Is the Weath Index a Proxy
for Consumption Expenditure? A Systematic Review” Journal of Epidemiol Community Health,
2009: 63, 871-880
Rutstein, 2008 : «The DHS Wealth Index: Approaches for Rural and Urban Areas” DHS Working
Papers No. 60
McKenzie, D.: “Measuring Inequality with Asset Indicators” Journal of Population Economics 18(2):
229-260 (2005).