Drazanova - Journal of Open Humanities Data
Drazanova - Journal of Open Humanities Data
Drazanova - Journal of Open Humanities Data
RESEARCH PAPER
Dramatic changes in the ethnic composition of countries in the last decades have sparked new interest
among social scientists in studying and uncovering the role of ethnic diversity on social, political and eco-
nomic outcomes. Yet, most ethnic fractionalization indices used by scholars to study these effects treat
ethnic heterogeneity as time-invariant, thus concealing its long-term effects. However, failing to take
into account historical developments in ethnic composition might seriously hinder our understanding of
their effects on social, economic, and political outcomes. This paper introduces a new dataset containing
an annual ethnic fractionalization index for 162 countries across all continents in the period of 1945–
2013. The Historical Index of Ethnic Fractionalization (HIEF) dataset is a natural extension of previous
ethnic fractionalization indices. It offers the opportunity to study the effects of ethnic fractionalization
across countries and over time. The article concludes by offering some preliminary descriptive analysis of
patterns of change in ethnic fractionalization over time.
conflict that can take place between two groups of equal vulnerable to criticism in their attempts to measure eth-
size. The idea behind polarization indices is that ethnic nicity. To begin with, empirical efforts to create an ethnic
conflicts will take place in countries where a large ethnic index require that we collect data on ethnic groups in dif-
minority faces an ethnic majority. The mere existence of ferent countries. However, there is no uniform criterion
a large ethnic group, and/or ethnic dominance by this on how to define ethnicity. Group identities are complex
group, is not a sufficient condition for an ethnic conflict and mostly socially constructed which means that quan-
to develop. There also needs to be an ethnic minority tifying and measuring them is inherently problematic.
that is large and not divided into many different groups. There can be multiple ways to specify ethnic groups in
Theoretically, having a large ethnic minority is the worst a country all of which may be equally valid concepts of
possible situation as measures of polarization reach their “ethnic groups”. Moreover, even within one country, defi-
maximum when two equally sized groups face each other. nitions of ethnicities can change over time. Questions
The two measures represent two different approaches to related to the definition of diversity become even harder
diversity because ultimate fractionalization occurs when in comparative research that involves multiple countries
each individual belongs to a different group, whereas ulti- each of which has its own concept of ethnicity. These facts
mate polarization occurs when there are only two types of notwithstanding and being aware of the possible short-
groups. Thus, the two measures behave quite differently comings in constructing ethnic classifications, the HIEF
[15]. The HIEF dataset quantifies fractionalization rather dataset is largely based on an ethnic, rather than linguis-
than polarization as a first step in providing longitudinal tic, distinction between groups.
measures of diversity. Nevertheless, since the original data
also allow computing a polarization index, this might 2.1 Why change over time matters
become a possible future endeavour. Definitional issues aside, I argue that a major problem
In economics, the majority of studies employ a meas- with a large part of the existing social science research
ure of ethnic fractionalization called Ethno-Linguistic on the effects of ethnic diversity is that diversity is often
Fractionalization (ELF). The ELF measure was first used treated as time-invariant. This limits our knowledge about
in an influential article by Easterly and Levine [14] which diversity’s long-term effects. An increase or decline in eth-
argues that given Africa’s high ethnic diversity and the nic fractionalization over time might have different con-
strong link between ethnic heterogeneity and slow eco- sequences. For instance, countries with steadily increasing
nomic growth, these two factors played a rather important ethnic diversity might be more willing to introduce insti-
part in the explanation for the region’s “growth tragedy”. tutions that effectively manage problems connected to
Easterly and Levine’s ELF measure is based on the work more heterogeneity than countries with shorter histories
carried out by a team of Soviet ethnographers in the early of ethnically diverse societies or with lower average rates
1960s and published as Atlas Narodov Mira [6]. Despite of change in diversity. These institutions may then medi-
ELF’s popularity and usage by several generations of polit- ate the relationship between ethnic diversity and social,
ical scientists, sociologists and economists, the measure economic, and political outcomes. Moreover, in instances
also received criticism and other fractionalization indices such as in the case of the dissolution of multi-ethnic states
have been developed. Alesina, Devleeschauwer, Easterly, ethnic fractionalization may decrease rapidly which poses
Kurlat, and Wacziarg [1] propose a classification that completely different challenges to the newly homogene-
distinguishes between ethnic, linguistic and religious ous societies. Failing to consider these historical develop-
diversity and creates separate indices for each. Their rea- ments might seriously hinder our understanding of the
soning is based on the fact that relying largely on lin- effects of ethnic diversity. With HIEF, it is now possible to
guistic distinctions (as the ELF does) may obscure other depict longitudinal relationships that might improve our
aspects of ethnicity like racial origin, skin colour and so understanding of the causal relationships between ethnic
forth. For instance, in many countries in South America diversity and relevant outcomes. A number of studies con-
groups are largely monolingual, yet ethnically divided. sider changes in ethnic diversity longitudinally in several
Other researchers argued that a distinction must be made countries. However, these studies either rely on immigra-
between ethnically and culturally diverse groups [17] or tion estimates [20], consider only one country at a time
between politically relevant ethnic groups [25]. [11], or focus on subnational units [29]. Recently, some
There have also been efforts to overcome simple frac- scholars published articles that use time-varying measures
tionalization measures by focusing on conjunctures with of ethnic fractionalization [5, 7], but all of the indices used
other heterogeneities such as the index of ethnic inequal- are much more limited than HIEF, either with regard to
ity [3] that puts forward the inter-section of ethnic diver- time-variation or countries covered. Moreover, these stud-
sity and economic inequality or an index that combines ies do not make their original dataset publicly available to
five cleavages, namely race, language, religion, region, and be used by other researchers.
income [27]. Other indices make an effort to account for
the distance between groups [16], the historical depth of 3. Creating the Historical Index of Ethnic
ethnic cleavages [10] or consider heterogeneity between Fractionalization Dataset
individuals rather than groups [4]. The original data on ethnic groups were gathered from
As explained above, heterogeneity may be defined eth- CREG initiated by the Cline Center. The project pro-
nically, religiously, linguistically, culturally, but also eco- vided information regarding the percentage of principal
nomically as income inequality. It is worth underlining ethnic groups present in 162 countries annually for the
that indices regarding ethnic composition are particularly period 1945–2013 [8]. The main sources for the CREG
Drazanova: Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset Art. 6, page. 3 of 8
data were the Britannica Book of the Year, the CIA World “Bosnia-Herzegovina”, “Croatia”, “Slovenia”, “Macedonia”
Factbook, and the World Almanac Book of Facts [24]. In and “Serbia” for 1993–2013. It follows that countries
the original dataset, data were recorded from the main founded after the year 1945 are included beginning from
sources by a group of data collectors and later assessed by the year they have been officially established.
a group of data integrators who performed a number of The variable Year contains the corresponding year of
checks. These checks accounted for consistency of group observation for each country, usually ranging from 1945
names and data outliers such as if there is “a group that to 2013. As described above, a shorter time span may
is reported as constituting 25% of the population in one be included for certain countries which were yet to be
year and 35% in the next” [24: 4], and data inconsisten- founded or ceased to exist.
cies when “different editions of the same source reports The variable EFindex contains the actual value of the
(sic) a group as constituting 18% of the population and ethnic fractionalization index in each country for all
26% of the population in 1968” [24: 4]. Nevertheless, as available years. Every value of the ethnic fractionaliza-
the original dataset still contained some inconsistencies tion index can be, as described above, interpreted as one
such as repeated information regarding certain ethnic minus a weighted sum of population shares pi where the
groups in a single year, the original dataset had to be care- weights are these shares themselves. Table 1 summarizes
fully checked and corrected. the countries and years for which the ethnic fractionaliza-
In the HIEF dataset, the degree of ethnic fractionaliza- tion index is available.
tion has been calculated based on the annual percentage The HIEF dataset is made available as a .csv file and can
of ethnic groups in each country using the most univer- be found along with a document briefly introducing the
sally applied formula in the empirical literature which is dataset on the Harvard Dataverse repository [12].
a decreasing transformation of the Herfindahl concentra-
tion index measured by: 4.1 Comparing changes in diversity among European
countries
n
EFct 1 S2 Explorations of the new dataset illustrate the reasons
i 1 i
why it is important to take account of historical changes
where EFct is the level of ethnic fractionalization in coun- in ethnic diversity within countries. Figure 1 shows the
try c at time t, i indexes ethnic groups, and Si is the propor- change of ethnic fractionalization over time in a sample
tion of the population in unit c belonging to ethnic group of European countries. We can observe, for example, that
i (i = 1, …, n) at time t. Great Britain and the Netherlands had a similar level of
As described above, the ethnic fractionalization index for ethnic fractionalization in 2013, but since 1949, diver-
each country at any given year ranges from 0, where there sity in the Netherlands has grown at a much faster pace
is no ethnic fractionalization in the country and all indi- than in Great Britain. In other words, Dutch society had
viduals are members of the same ethnic group, to 1, where to adapt to diversity more rapidly than the British. In con-
each individual in the country belongs to his or her own trast, Finland’s ethnic fractionalization has stayed quite
ethnic group. stable over the last 50 years and is generally low.
It should be noted that, historically, who was consid- On the other hand, many Central and East European
ered as belonging to a certain ethnic group could change, countries are much more ethnically homogenous than
reflecting the politics and science of the times. The rela- they used to be. Moreover, they became homogeneous in
tive meaning of being in a certain category may not be a short period. For instance, while former Czechoslovakia
the same from one time-point to another [28] both from used to be an ethnically highly heterogeneous coun-
the societal or individual point of view. The challenge try, its successor states, Czechia and Slovakia, are much
arises especially with the introduction of categories such more homogeneous. Apart from separations of what
as “mixed race”, mestizo, mulatto and similar categories used to be ethnically heterogeneous countries such as
in data collection. Thus, the measures may only have Czechoslovakia, Yugoslavia, or the Soviet Union, there are
“nominal equivalence” and lack “functional equivalence” a number of reasons why one can observe changes towards
[9] which makes collecting ethnicity data, and measuring more homogeneity in Central and Eastern Europe. Firstly,
changes over time, challenging. after the collapse of communism, many workers left their
respective countries in search of new economic opportu-
4. Descriptive illustration of the new dataset nities. Secondly, in many post-Soviet countries, Russian
The HIEF dataset contains three variables, namely minorities began to feel unwelcome resulting in return
Country, Year and EFindex. The variable Country contains migration [18].
the names of countries included in the dataset. Countries As one can observe in Figure 2, many African countries
that have changed their name and status are included are highly ethnically heterogeneous with relative frac-
under the official name of the country for the year in tionalization stability. For instance, highly heterogeneous
question. For example, Bosnia-Herzegovina, Croatia, countries such as South Africa or Uganda have not experi-
Slovenia, Macedonia, and Serbia have been part of the enced dramatic changes in fractionalization over the years.
Socialist Federal Republic of Yugoslavia from 1945 until On the other hand, although its overall fractionalization
1992, while the other Yugoslav successor states of Kosovo is quite low, Swaziland has experienced a steady increase
and Montenegro are not included in the HIEF dataset. in heterogeneity while diversity in Tanzania and the
Thus, the variable Country includes the entry “Yugoslavia” Democratic Republic of Congo has actually declined. Thus,
for the years 1945–1992 and five separate entries there might be profound political and societal differences
Art. 6, page. 4 of 8 Drazanova: Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset
Table 1: Overview of countries and years covered by the Historical Ethnic Fractionalization Index.
between these countries not only concerning current eth- four datasets should not differ in a significant way. As
nic fractionalization, but also how fast the levels of their described in detail in Kolo [21], the noise data is created
current ethnic diversity were achieved in recent years. by employing normal randomization, namely by replac-
ing the original group size with a new size produced by a
4.2 Dataset robustness check normal distributed random variable. This way, two alter-
To test the robustness of the HIEF dataset, three new native datasets have been created. Dataset sigma_1 uses
datasets are created that add some noise to the origi- the standard deviation of the group distribution over all
nal data. This procedure is adapted from Kolo [21]. The observations which is thus equal for all countries, while
Art. 6, page. 6 of 8 Drazanova: Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset
Figure 3: Original HIEF values against newly created random datasets sigma_1 and sigma_2 and against reduced data-
set smaller.
dataset sigma_2 uses a country-specific standard devia- sigma_2 t(17568) = –1.411, p = 0.158; smaller t(17568) =
tion. Finally, as a final robustness test a third smaller –0.062, p = 0.949). Figure 3 shows the values of the three
dataset is created in which the smallest group for each noisy datasets plotted against the HIEF original data.
country and for each year is removed. It should be, how-
ever, noted that the group is only removed if the number 5. Conclusion
of groups in a country in a given year is greater than one, The aim of this article has been to describe the new
and the group size of the smallest group is smaller than Historical Index of Ethnic Fractionalization (HIEF) data-
1 percent. set, the procedures used for its calculation and, finally, to
Pearson correlations between the original HIEF dataset illustrate the importance of considering historical devel-
and the three noisy datasets are all very high (sigma_1 r opments in ethnic fractionalization. Focusing on coun-
= 0.982; sigma_2 r = 0.974; smaller r = 1.000) confirming try-year estimates for the period 1945–2013, the HIEF
high congruency. Moreover, there were no statistically sig- dataset complements already existing ethnic fractionali-
nificant differences between the original HIEF dataset and zation indices which do not take into consideration the
the three noisy ones (sigma_1 t(17568) = –0.186, p = 0.852; variation of ethnic fractionalization over time. This is an
Drazanova: Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset Art. 6, page. 7 of 8
23. Lijphart A. Democracy in Plural Societies: A Compara- Skytte Prize Lecture. Scandinavian Political Studies.
tive Exploration. 1977. New Haven, CT: Yale University 2007; 30(2): 137–174. DOI: https://doi.org/10.1111/
Press. URL: https://www.jstor.org/stable/j.ctt1dszvhq j.1467-9477.2007.00176.x
24. Nardulli PF, Wong CJ, Singh A, Peyton B, Bajjalieh J. 27. Selway JS. The Measurement of Cross-Cutting Cleav-
The Composition of Religious and Ethnic Groups (CREG) ages and Other Multidimensional Cleavage Structures.
Project. Cline Center for Democracy, University of Il- Political Analysis. 2011; 19(1): 48–65. DOI: https://doi.
linois at Urbana-Champaign. 2012. URL: http://www. org/10.1093/pan/mpq036
nber.org/ens/feldstein/ENSA_Sources/Cline%20 28. Van Deth JW. Using Published Survey Data. In:
Center/Ethnic%20and%20Religious%20Groups/ Harkness JA, van de Vijver FJR, Mohler PP. Cross-Cul-
CREG-White_Paper.pdf tural Survey Methods. 2003; 329–346. New York, NY:
25. Posner DN. Measuring Ethnic Fractionalization in Wiley.
Africa. American Journal of Political Science. 2004; 29. Ziller C. Ethnic Diversity, Economic and Cultural Con-
48(4): 849–863. DOI: https://doi.org/10.1111/j.0092- texts, and Social Trust: Cross-Sectional and Longitudi-
5853.2004.00105.x nal Evidence from European Regions, 2002–2010. So-
26. Putnam RD. E Pluribus Unum: Diversity and Com- cial Forces. 2014; 93(3): 1211–1240. DOI: https://doi.
munity in the Twenty-First Century. The 2006 Johan org/10.1093/sf/sou088
How to cite this article: Drazanova L 2020 Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset:
Accounting for Longitudinal Changes in Ethnic Diversity. Journal of Open Humanities Data, 6: 6. DOI: https://doi.
org/10.5334/johd.16
Copyright: © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons
Attribution 4.0 Unported License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
Journal of Open Humanities Data is a peer-reviewed open access journal published by Ubiquity
Press
OPEN ACCESS