Börjesson 2022
Börjesson 2022
Börjesson 2022
RESEARCH ARTICLE
Olle Sköld
Department of ALM, Uppsala University
Zanna Friberg
Department of ALM, Uppsala University
Daniel Löwenborg
Department of Archaeology and Ancient History, Uppsala University
Gisli Palsson
Department of Archaeology and Ancient History, Uppsala University
Isto Huvila
Department of ALM, Uppsala University
Although data reusers request information about how research data was created
and curated, this information is often non-existent or only briefly covered in data
descriptions. The need for such contextual information is particularly critical in fields
like archaeology, where old legacy data created during different time periods and
through varying methodological framings and fieldwork documentation practices retains
its value as an important information source. This article explores the presence of
contextual information in archaeological data with a specific focus on data provenance
and processing information, i.e., paradata. The purpose of the article is to identify
and explicate types of paradata in field observation documentation. The method
used is an explorative close reading of field data from an archaeological excavation
enriched with geographical metadata. The analysis covers technical and epistemological
challenges and opportunities in paradata identification, and discusses the possibility
of using identified paradata in data descriptions and for data reliability assessments.
Results show that it is possible to identify both knowledge organisation paradata
(KOP) relating to data structuring and knowledge-making paradata (KMP) relating to
fieldwork methods and interpretative processes. However, while the data contains many
traces of the research process, there is an uneven and, in some categories, low level
of structure and systematicity that complicates automated metadata and paradata
identification and extraction. The results show a need to broaden the understanding
of how structure and systematicity are used and how they impact research data in
archaeology and comparable field sciences. The insight into how a dataset's KOP and
KMP can be read is also a methodological contribution to data literacy research and
practice development. On a repository level, the results underline the need to include
paradata about dataset creation, purpose, terminology, dataset internal and external
relations, and eventual data colloquialisms that require explanation to reusers.
Introduction
Improving the efficiency and effectiveness of scholarly work through data sharing and reuse is a central theme
in the contemporary research policy discourse (see e.g., Wilkinson et al. 2016 on the FAIR principles1; the EU’s
open science policy, “Open Science” n.d.). To this end, collections of aggregated legacy data play a key role by
enabling reuse and allowing new research questions to be asked and addressed. Aggregation of data can, for
example, underpin the creation of new knowledge by facilitating comparisons and cross-analysis of results
from investigations carried out in different locations and contexts during different periods of time. These new
opportunities do, however, come paired with challenges: new research questions often instigate a need to
know more about how the data was created than what was recorded in the original metadata.
Many of the challenges associated with data reuse are common to all research disciplines working with
legacy data. The challenges are especially prominent in field sciences like archaeology that are hallmarked
by large methodological and epistemological variation, a diversity of study contexts, and long-term useful-
ness of legacy data. For example, in archaeology, knowing undocumented details about the excavation tech-
niques and tools that led to particular observations and conclusions can be crucial in new studies that use
aggregated legacy data. Such information might not have been deemed important enough to be included in
the metadata during the original investigation, but may at a later date prove to be crucial for estimating the
usability and reliability of the original findings for, for instance, cross-site comparisons (Ullah 2015).
A key step towards facilitating extended and more purposeful data reuse is to better understand and
assess the process of data creation. In this article, we focus on paradata, a subset of contextual information
that describes data creation and manipulation processes and their underpinnings, which is often left undoc-
umented in structured dataset descriptions but commonly is implicitly present and, at least to a competent
reader, to varying degrees identifiable in the data itself (Huvila 2020a; Huvila, Sköld, and Börjesson 2021).
Paradata, like descriptions of methods and tools used to produce data, is particularly interesting because it
is often central to making and communicating assessments of data reliability and an important facilitator of
productive and efficient data reuse (Faniel et al. 2013).
The aim of this article is to elucidate the challenges and opportunities of identifying paradata in field-
work data. We meet this aim by reporting the findings from an explorative exercise of extracting paradata
from a dataset exported from a fieldwork database created using Intrasis, a Swedish geographic information
system (GIS)-based information management system developed for archaeological field documentation. The
article addresses the following research questions:
RQ1. Which paradata categories are possible to identify in the fieldwork database?
RQ2. What are the technical challenges and opportunities of identifying paradata?
RQ3. What are the epistemological challenges and opportunities in assuming identified paradata as
evidence of data creation processes?
RQ4. What are the implications of the technical and epistemological challenges and opportunities for
using identified paradata as a basis for data reliability assessments?
The article provides insights into the desiderata and challenges of data reuse in field sciences like archaeology
and explores the possibilities and obstacles inherent in repurposing information originally created as in-field
documentation for data description and evaluation purposes. In this article, we assume a functional perspec-
tive on data reliability assessments as activities guided by the objective to evaluate if a certain dataset is
suitable for a specific reuse purpose. From this perspective, reliability assessments can evaluate different
types of data reliability such as internal reliability (is data commonly acceptable), relative reliability (is data
acceptable to the user), and absolute reliability (does data resemble reality) (Agmon and Ahituv 1987). Each
type of reliability assessment draws on different variables internal or external to the dataset and sets different
limits for acceptable versus non-acceptable levels of reliability. The paper does not go as far as elaborating on
how each type of paradata serves the different types of data reliability assessments. Still, the paper contrib-
utes towards the advancement of research data description, provides conceptual foundation for explorative
studies of how paradata could be used in different types of data reliability assessments, and furthers data
literacy by explicating principles and ad-hoc solutions underpinning dataset structuring and content.
Literature Review
of knowing, as a data creator or indeed as repository staff, what high quality is for a potential reuser and how
to go about providing good or even good enough data. Quality is not a universally defined feature of data.
Data quality can be about accuracy or completeness but also relate to suitability, relevance, and availability
and depend on consistency, verifiability, or believability (Koesten et al. 2020; Huvila 2020b).
In order to understand and judge data quality, data reusers need basic or intrinsic metadata (e.g., the
identity of data creator, date of creation, and terms of use), information on the context and methods of data
creation (Kim, Yakel, and Faniel 2019), and processual information (i.e., paradata)—for example, declaration
of selection and exclusion decisions made during the research process (Allison 2008)—about the data
creation. Koesten et al. (2020) identify common features associated with the reuse potential of datasets in
an extensive review of data reuse literature across several disciplines. The difficulties and importance of
understanding the context of data creation are mentioned in multiple studies. There is a consensus on the
critical significance of methodological information about data creation for informed reuse. However, Kim,
Yakel, and Faniel (2019) found in a study of data description requirements that repositories rarely require
detailed contextual information on how data is constructed or manipulated. Only a few of the repositories
surveyed in their study asked for methodology-related information on, for example, data cleaning and edits
(Kim, Yakel, and Faniel 2019). The findings point to a gap between users’ needs and what descriptions are
provided by data creators or mandated by repositories, especially in relation to process information. An
obvious remedy is to reconsider the requirements and ask for more comprehensive descriptions. Another,
ostensibly less labour-intensive and parallel approach investigated closer in the present article is to see to
what extent relevant process information (i.e., paradata) about, for example, methodology, versioning, and
provenance, can be found and extracted from the data itself.
Kim, Yakel, and Faniel (2019) note also that existing data descriptions are often incomplete, inaccurate,
and inconsistent in format and terminology. A partial explanation can be found in lacking standards and
repository requirements, but there are also several additional barriers to sharing data for reuse. As Faniel
et al. (2021) show, there are considerable differences in data practices even within research fields with strong
documentation standards, and there is great variation on an institutional, or even personal, level in what is
deemed important to document (Faniel et al. 2021; cf. Börjesson 2016). When surveying data reuse in the
Earth system sciences, Yan et al. (2020) found that a lack of incentives for investing in facilitating data reuse
was regarded as a major challenge by the study participants. Their responses indicated that institutions and
funding agencies do not encourage the time-consuming work of producing well-documented datasets.
Respondents stated also that the contemporary rapid publishing culture prioritises multiple and novel
results above fewer and more meticulously documented studies (Yan et al. 2020).
While there is some research into the challenges and opportunities of data reuse, relatively few studies
engage in a detailed exploration of the technical or epistemological issues of repurposing legacy data.
Challenges in data reuse are occasionally described in the reporting of projects using legacy data. For
instance, Sobotkova (2018) expands on the difficulties encountered in her aggregation of data for a large-
scale regional study on burial mounds. Sobotkova suggests that archaeology is one of several fields where
data production is slow and labour-intensive, which in turn decreases the pace of archaeological data shar-
ing. Data collection may require years of work in the field and in archives to derive data from multiple
sources. As a result, datasets in archaeology are rarely consistent in form or structure. Even though data
creation in archaeology is labour-intensive, Sobotkova (2018) reminds us that reusing other people’s data
often requires as much or even more work. Considerable efforts must be made to understand, test, and
reconstruct a dataset, and to delve into what Allison (2008) describes as the complex layers of selection and
exclusion present in legacy data. Additionally, finding the right sources and extracting associated data from,
for example, PDF documents can require years of experience, implicit knowledge, and informal contacts
with people with the needed expertise (Sobotkova 2018). Some of the problems of archaeological data reuse
are primarily technical. For instance, Ullah (2015) notes that errors in spatial survey data are hard to discover
and correct without proper contextual information. The findings of Atici et al. (2013) demonstrate in parallel
that technical challenges are not necessarily solvable by technical means but require interpretation. In their
study, three experts examined a zooarchaeological legacy dataset. Even if the experts all chose similar
approaches to study the data and asked similar questions when cleaning and preparing the data, their
non-identical interpretations and decisions on how to fill the lacunae in the contextual information resulted
in three substantially different datasets (Atici et al. 2013).
Sobotkova (2018) laments the perceived unattractiveness of data reuse as compared to archaeological
fieldwork and calls for the archaeological community to put more value on using and citing secondary data.
The sentiment that archaeology is too preoccupied with fieldwork at the expense of enabling future archae-
ological research is echoed in the literature on a curation crisis in archaeology (e.g., Voss 2012; Kersel 2015).
A part of the problem lies in the funding structures that, as both Voss (2012) and Sobotkova (2018) note, are
geared toward financing fieldwork and data generation rather than reuse.
Even a brief overview of data reuse in archaeology illustrates the central message of Kansa and Kansa
(2021): data use, and the accompanying need to understand the data, is as intellectually demanding as
any other research approach and a successful data (re)user needs to be adequately literate for the task.
Literacy in this context does not refer to solely technical skills but rather to the competency to under-
stand the “underlying principles and challenges of data” (Bhargava et al. 2015; cf. Kansa and Kansa
2021)—that is, to analyze and interpret the structuring of data and patterns in data file content as demon-
strated in the forthcoming analysis. A paramount reason that data (re)use takes analytical and interpre-
tative literacy is that data seldom is prêt à utiliser—ready to use as such (e.g., Ullah 2015; Atici et al. 2013;
Sobotkova 2018). A reuser needs to read data in its disciplinary context to bridge gaps in contextual
information (Atici et al. 2013) and might need supplementary information from sources external to the
dataset, knowledge of where to find such sources, and the know-how to extract information therefrom
(Sobotkova 2018).
However, archaeological fieldwork data has most commonly been produced to enable the primary analy-
sis and reporting of a single site rather than to facilitate data aggregation and reuse. Datasets have tradition-
ally not been considered standalone products or genres, and consequently they have not been defined and
prepared as fieldwork output (Huvila 2016). Data provenance and process information may exist in the
documentation produced during, for example, an archaeological excavation, but it is likely to be scattered
across fieldwork diaries, context sheets, notes on maps and drawings, site photos, and the field reports
(Huvila, Sköld, and Börjesson 2021; Huvila 2006). Metadata and process information in attached read-
me-files, table definition files, or in the dataset itself (e.g., supplementary information in column headers)
may be sparse or non-existent.
Considering the spread of information across the documentation, an ideal approach to extract the maxi-
mum amount of paradata would be to harvest all existing documentation for provenance and process infor-
mation. When such an approach is not feasible due to the lack of documentation or access to it, or lack of
funding for manual harvesting, it is relevant to consider—as we do in the present study—if sufficient para-
data can be extracted from the relatively structured fieldwork data file alone. First, data documented during
fieldwork or analysis has the advantage of being intrinsic to the research activity as opposed to being “man-
ufactured” for descriptive purposes (cf. Silverman’s notion of “manufactured” qualitative data [2013, 32]). In
a sense, assigned dataset-level metadata distills dataset content and adds an interpretative layer. Second, a
fully developed paradata extraction approach could enable meta- and paradata extraction ranging from
single data units, through selections of units, to a full dataset. Such scaling would enable, for example, para-
data such as “X percent of the units were excavated by machine, Y percent of the units were excavated man-
ually,” giving a richer insight into the range of methods used to produce the data presented. Although such
approaches should also be treated carefully in terms of the evidentiary quality of the output, metadata
extraction based on file content would provide a basis to test and nuance manually assigned descriptors in,
for instance, metadata or reports.
In addition to data proximity achieved by extraction approaches, a third potential gain for fieldwork
data description practices would be the potential to enable adaptable meta- and paradata (i.e., extraction
based on research interests) following situated information needs of data reusers. Suppose a researcher
aims to aggregate data on sifted soil samples from several locations to learn more about past aquatic envi-
ronments. High-precision information about sieving mesh size is crucial for understanding which types of
fishbones have been found and recorded (and which ones potentially slipped through the sieving mesh).
Thus, knowing about sieving methods and equipment used in the field can be critical in deciding how to
combine and compare fishbone datasets to understand aquatic environments (Olson and Walther 2007),
but is likely insignificant for a researcher with an interest in, for example, settlement patterns. Since fixed
metadata standards build on explicit and implicit assumptions about what things and processes the scheme
should describe and for what purpose, a set standard inevitably has its limitations (Börjesson, Sköld, and
Huvila 2021). Therefore, the need to find ways to generate intentionally purposed meta- and paradata
based on topical user needs grows along with the ambition to make use of legacy data and use data across
disciplinary boundaries.
Established approaches for metadata extraction rely on various methods to identify where to find certain
informative content like a volume number in a journal article PDF (Tkaczyk et al. 2015) or types of cell con-
tents in a spreadsheet (Roy et al. 2016). A common denominator is that the extraction approaches use
defined metadata categories to identify areas and segments with potential informative value. Thus, the first
step in performing paradata extraction on fieldwork datasets is to tease out what content could provide
information about research processes—for example, indications of excavation technique—and where to find
it in the dataset.
The next section of the article outlines our case study by first introducing the research data infrastructure
project and the data migration process necessary for making this type of archaeological fieldwork legacy
data accessible and interoperable. We continue with a descriptive analysis of the structure of, and types of
contents in, the dataset before proceeding to the explorative analysis of how the table structure and the
table content could be identified as knowledge organization paradata (KOP) and knowledge making para-
data (KMP).
2
The Urdar project is funded by the Swedish Riksbankens Jubileumsfond and is a collaboration between Uppsala University, the Swedish
National Heritage Board, Umeå University, and Lund University. Read more at: https://www.arkeologi.uu.se/Research/Projects/urdar-en/.
page 6 of 18 Börjesson et al.: Re-purposing Excavation Database Content as Paradata
enabling data-driven research on human prehistory and history (Larsson and Löwenborg 2020). Urdar’s
main goal is to ensure that existing data will be secured in open and safe formats (CSV and Geopackage)
so that it can be archived in the Swedish National Heritage Board e-archive and linked to the national
excavations register. The long term goal of Urdar is that this information should be included in a permanent
research infrastructure so that researchers, heritage management professionals, and other stakeholders can
access and analyse the information in an aggregated form. The project is running from 2020 to 2023 and
is at the time of this writing in a phase of exploring available information, re-creating missing information,
and investigating ways of making the data useful outside of its original context of creation.
The bulk of digital archaeological data produced in Sweden is managed using Intrasis, a Swedish GIS-
based information management system developed for archaeological field documentation. A major chal-
lenge of working with Intrasis data is the complexity of the relational database at the core of the system.
Making the information usable for archaeologists and others with varying computational skills, while at the
same time preserving as much of the data’s inherent potential as possible, is difficult. As data in Intrasis is
only semi-structured and the design of the database varies to some extent from project to project, there is
no straightforward way of exporting information that ensures its aggregability for reuse. Intrasis has also
been designed to not enforce controlled vocabularies but to allow archaeologists flexibility in what terms to
use. Hence, archaeological information in Intrasis is heterogeneous. In Urdar, that diversity will initially be
preserved in an archival copy which is retained as a representation of the original decisions made during the
fieldwork. However, to analyse the material as a whole, a dedicated database (“Intrasis archive”) with all
information, where data can be rearranged and repurposed to suit different research questions, will be
maintained for research purposes.
Similar to the experiences described in the literature (Faniel, Frank, and Yakel 2019; Yan et al. 2020),
making Intrasis data usable beyond its context of creation requires adequate paradata—that is, documenta-
tion of data collection and management procedures. This includes, for instance, information on georefer-
encing methods used (see also Ullah 2015), the granularity of survey (e.g., how many of the identified post
holes were fully excavated), and the perceived level of certainty of interpretation (is the structure “a wall” or
“a wall?”) to evaluate and potentially correct or enrich the data before reuse (see also Sobotkova 2018). This
information is only partially organised into separate fields in the Intrasis data, which means that paradata
must be identified across data categories. In the next section, we introduce a subset of the excavation data-
base used in this paper as a testbed to explore how content with the capability to function as paradata could
be identified and its reliability assessed.
Table 1: The dataset with excavation data from Örja 1.9., compiled from data in the Intrasis excavation
database together with spatial metadata produced by the Urdar project
Column title
Column (from first row) Data type Column content
A pk Integer Running index from 1 to 24,424
B site_name String Site name: “Örja 1:9 SU” for all posts
C landskap String Province where site is located: “Skåne” for all posts
D lan_name String County where the site is located: “Skåne län” for all posts
E parish_name String Parish of the site: “Örja” for all posts
F intrasis_archive String (ID) ID of the Intrasis archive file: “s2009045” for all posts
G object_id Integer (ID) ID for individual documented features: series (e.g.,
4XXX, 1XXXX, 4XXXX)
H object_name String (code) Codes for object types: e.g., “G 16 Filling trench”
(data unit 61), “JÄÅ B Filling posthole” (data unit 217), “G
20 Burnt layer” (data unit 615), numerous empty cells
I class String (code) Code for feature class: e.g., “Stratigraphic object,” “Find”
J subclass String (code) Code for a subclass: e.g., “Dug hole, posthole” (data
unit 621), “Lead” (data unit 6099), “Copper alloy” (data
unit 6098), “Flintstone UV Syd” (UV Syd refers to the
contractor, data unit 7190), “Stone and brick construct”
(data unit 7294)
K spatial_type String (code) GIS object type: Polygon, Point, Multipoint, or Polyline
L attr_array Attribute array Common attributes include: e.g., “Interpretation_
temp,” “Interpretation,” “Find retrieval technique,”
“Composition” (of feature), “Consistency” (e.g., “wet”),
“Contamination”
M parent_relations Array Stratigraphic relations to other features: e.g., “Above
58449,” “Contains 56027, Above 4610”
N child_relations Array Stratigraphic child relations of the feature: e.g.,
“Belongs to 1312, Below 5525”
O description String Contains fields—e.g., “Interpretation” and “Grouping
discussion” (e.g., to which building the feature belongs,
data unit 13,002)—with free-text descriptions
The flat and the relational database formats serve different purposes. The flattened structure is suitable
for statistical analysis and machine learning, while also being more accessible to users who do not have
experience of relational data structures. The relational format closely adheres to the way in which archaeo-
logical knowledge is produced in the field by modeling the knowledge organization of the widely used
single-context recording system (see Roskams 2001).
The structure of the dataset is described in Table 1 with column titles, the data type in the column,
and a description of the column content with examples from the data. All translations of dataset content
from Swedish to English were made for the purpose of this article. Names of individuals are replaced
by “NN.”
were cross-checked and the analysis was elaborated by Börjesson and Huvila together with the Urdar project
staff who compiled the dataset, co-authors Löwenborg and Pálsson.
In future studies, this procedure could be—if possible and feasible—complemented by an additional
feedback round with the original creators of the data, the archaeologists who performed and docu-
mented the fieldwork. Doing this would also address any ethical concerns of working with data which is
legally public but was not originally intended to be published for reuse. In the current analysis,
data ethics were taken into consideration by being mindful of interpreting intentionality in the data
creation, respectful of data creators’ choices and work, and careful of not referring to personal names.
The research procedure followed relevant international and national guidelines for research ethics and
current legislation in Sweden (Swedish Research Council 2017). A formal ethics review was not applica-
ble for the study.
Unsurprisingly, the project file does not contain a specific paradata column. Only a few metadata stand-
ards document paradata explicitly in specific constructs (exceptions include CARARE; see D’Andrea and
Fernie 2013). However, the dataset contains two types of information with the capability to function as
paradata: 1) the structure of the dataset and the columns provide knowledge organization paradata (KOP) of
how the site and its physical features have been transformed into data units (KOP:DU) and into archaeolog-
ical information (KOP:AI); 2) the dataset contains traces of knowledge-making paradata (KMP)—data that
describes fieldwork and analysis processes and transformations (Table 2). Here follows a descriptive account
of the project file structure and content. In the section “Using Paradata for Reliability Assessments,” we
explain how KOP and KMP could be analysed for data evaluation purposes.
KOP:DU contains evidence and traces of how an archaeological site is transformed into information
as documentation, representation, or substitute of the site (cf. categories of documentation in, for
example, Furner 2021) by categorising and classifying the site in information objects (data units in the
dataset). It is created by relating data units to each other either hierarchically through child-parent
relationships or horizontally across the dataset and to a specific Intrasis archive file; by assigning ids,
names, descriptions, or attributes to them; and by declaring the type of a related resource (the type of
the corresponding spatial object—e.g., polygon). It is notable, however, that some categories only inform
about the aggregation of disparate data units into a single file; in this case, the index was introduced at
the compilation of the analysed dataset and does not reflect on any order or chronology of in-field
observations.
In comparable terms, KOP:AI contains evidence and traces of how a physical archaeological site is trans-
formed into archaeological information. This is done by relating archaeological features to each other. In
addition, archaeological features represented by data units are contextualised (i.e., assigned to a specific
context, that of a named archaeological site), situated in geographical space by relating them to a geograph-
ical point of reference (e.g., parish, region), and assigned a sequential id and a name (literally, naming).
Archaeological features are also documented using information types (an Intrasis archive, spatial shapes such
as polygon or point) and classified, and their aspects are described and interpreted using attributes and tex-
tual descriptions.
In addition, the dataset contains KMP, although with a wide variety of systematicity. Whereas archaeolog-
ical documentation is expected to systematise both observations (i.e., site and archaeological features) as
archaeological information and describe the investigation (transformation) process (Hodder 1989), the lat-
ter tends to be subordinate and instrumental to the aims and understanding of the results of the first one
(Gustafsson and Magnusson Staaf 2001). The immediately legible KMP elucidates interpretations and their
change in time and investigation methods as well as provides occasional indications of who did what
(Columns L, O) and to whom certain tasks were assigned (Column J). Moreover, the dataset contains sedi-
mented traces of knowledge-making processes in its KOP:AI that would require explanation to be legible as
sources of paradata. As an example, the assigned “object_id” (Column G) can reflect several different types
of sequences. A range of numbers (e.g., 10–19) out of a chronological sequence (e.g., 1–100) can be reserved
for a certain group of finds or a certain area of the site without the rationale behind this breakout sequence
being explicated.
Mostly, the data is not explicit about who wields description and selection power (Warner 2010) to decide
what is a site and how it is delimited (Column B)—whether it is an archaeologist with a research question,
the priorities of the land development that led to the excavation, or the premises of singling out features
(i.e., data units) in the dataset. The principal exceptions are Column J, which contains references to organi-
sations where specific find types were planned to be submitted for analysis, and Column O, which contains
questions directed to and comments signed by individual archaeologists who participated in the investiga-
tion, such as “NN: Several trenches under this trench. In this trench a lead ball F 2150” (data unit 61) or
“Comment during post-analysis by 11.03.18/NN” (data unit 284).
Börjesson et al.: Re-purposing Excavation Database Content as Paradata page 9 of 18
Table 2: Paradata categories identifiable in the analysed dataset. Columns marked with * were added to
the data after it was exported from Intrasis
Table 3: Different configurations of detailedness made up by the number of informative elements and
their respective systematicity
Column H, object_
name excerpt KMP cell content Value for data reliability assessments
“F4141 Jetton? 1700c” Several (4) informative Detailed information, format that requires
(data unit 23,040) elements, low degree of modulation of evidence value before
systematicity processing:
• Object description uncertain (“?”)
• Object dating general (to century)
“F4142 Tinplate” Few (2) informative elements, Detailed information, ready-to-process format
(data unit 23,039) high degree of systematicity
Knowledge-Making Paradata
In contrast to the KOP, which provides cues about the reliability of data through largely unwanted and
unplanned variation and disconnects, KMP provides data creators’ version of explicitly identified and
Börjesson et al.: Re-purposing Excavation Database Content as Paradata page 13 of 18
disclosed assessments, doubts, and justifications of the data’s trustworthiness. As a whole, the dataset
contains relatively little KMP, and it seems likely that the explicit descriptions follow a logic of selective
black-boxing of obvious and non-essential information and transparency concerning information that is
considered important to communicate either for personal or social reasons (Huvila, Sköld, and Börjesson
2021). Even if the dataset contains a fair amount of KOP that is readable from the data itself, a thorough
understanding of the KMP appears to require a cross-reading of the specific KMP in the dataset and a generic
description of work procedures for the project, if available in, for instance, the fieldwork report. Here the
limitations of the analysed tabular excerpt of the full Intrasis database also become explicit. The omission
of user log and event registries of the full Intrasis database in producing the analysed spreadsheet means
also that a lot of potentially valuable KMP is excluded from the data for the sake of making the dataset
simpler and to protect personal data. The inclusion of log and event registries would open up other KMP
identification opportunities, but the data processing would then require compliance with the General Data
Protection Regulation (GDPR).
Even if the lack of standardisation in the dataset was identified as a problem in both technical and episte-
mological senses, the rich nuances in the heterogenous lacework of traces can be advantageous from the KMP
perspective. We suggest that a higher number of traces and expressions of uncertainty points to a more thor-
ough investigation process and pondering of a particular interpretation than the presence of a single settled
interpretation. This is obviously not a direct indication a priori of a better evidential value of such interpreta-
tions but, read in context, a potentially useful reliability indicator. From a technical perspective, the obvious
drawback is the difficulty of automating the analysis of interpretation traces and expressions of uncertainty.
we assume that they are playing a significant role for the data creators. If the goal is a dataset free from
uncertainties, these colloquialisms could be framed as problematic. However, in an epistemological sense
and from a knowledge-making perspective, these data points can be seen as heuristic zones in the dataset,
where systematic observations and interpretative reasoning meet and are negotiated. With further research
attention to data colloquialisms in field science—for example, analyses of the traces of interpretative
processes in fieldwork data—there would be a potential to better understand the role of unstructured
data for knowledge-making processes, the dynamics between unstructured and structured data, and
how process paradata could be harvested from unstructured data.
Based on the analysis, a crucial characteristic of effective paradata is its capability to uncover transforma-
tions and, in a broader sense, change. KOP does this through the crookedness of the data itself: mistakes,
errors, incompleteness, and omissions. KMP does the same through explicitly created descriptions and utter-
ances of different kinds. The major difference between the two is that the KOP is system-bound and essen-
tially an involuntary by-product of documenting in a particular system and according to a specific scheme,
whereas KMP is consciously created to provide an explanation of a transformation. Both are political but, on
the one hand, the politics of the metalevel descriptions are primarily functioning on the infrastructural level
through the socio-technic-informational meshwork of the archaeological field practice and its infrastructures.
On the other hand, the politics of KMP are more explicit and explicitly social, and determined by the data
creators themselves. The explicitness of the politics of KMP and the political nature of the data also mean that
it entails ethical issues that are different from those of KOP, including the processing of personal data.
A key prerequisite for effectively exploiting KOP is to develop the means to compile profiles of individual
data creators and stages in the documentation process by identifying patterns in terminological and structural
variation in the data. This would allow users to compare the documentation process over time (how interpre-
tations and documentation practice have changed, how and if uncertainties have been solved), to compare
how data created by specific individuals differs from each other, and, subsequently, to understand the evolu-
tion of data and its dependability. Another key task is to identify elements and alphanumerical structures in the
dataset that have potential informational value for specific potential users. However, when interpreting the
terminology used, it is necessary to be mindful of the differences between individuals who have contributed
data to the dataset because even personal ideals may affect externally standardized documentation (Faniel
et al. 2021; Börjesson 2016). Another issue relates to the tracing of named or anonymous individuals. Traces of
individual archaeologists’ interactions with the database exist in a log file in the original Intrasis database but
have been removed from the export file to avoid personal data processing. Even if definite indications of
the individual agency have arguable advantages, the analysis of the variation of the use of descriptive terms
and patterns of filling in attribute information could still provide enough information on personal preferences
and ideals and biases linked to professional specialisation profiles or stage of professional training.
An advisable next step in developing the paradata extraction approach tested in this study is to opera-
tionalize the iterative close reading approach by computer-aided methods, for example by natural language
processing (NLP). The pre-analysis data structuring and cleaning in such processes would depend on the
goals of the analysis. Within the analysed dataset, it is possible that the individual data units can be too
heterogeneous and as such unreliable for a local or regional analysis conducted by a secondary user, even if
the dataset would be cross-read with all available evidence. In contrast, depending on demands regarding
the type and level of reliability, the data can be sufficient for analyses on a broader cross-regional or global
scale without refinement. Such methods development based on analyses of cell content—for example,
expressions indicating survey methods—would target KMP. However, the analysis presented in this paper
proves the value of parallel manual analysis of KOP to identify the multiple places (e.g., multiple columns)
within a data structure where similar or overlapping data can be found, like the traces of interpretative pro-
cesses found in Columns L and O, and consequently what columns a NLP-supported analysis should target.
Our study has obvious limitations. Observations made on the basis of a single dataset retrieved from one
database does not allow us to infer how Intrasis users in general use the system, how field scientists document
their field observations, or what additional paradata categories could be present in other databases. In the
analysis, we have looked for paradata only in the dataset itself. Compared to the analysed dataset and archaeo-
logical data in general, investigation reports contain more information on, for instance, methods and, to vary-
ing degrees, work processes (Huvila, Sköld, and Börjesson 2021). There are also other potential sources of
paradata, including administrative documents and retrieved finds. These were consciously left out of the pres-
ent study since the focus was on investigating what can and cannot be seen specifically in the data itself.
Although the above-highlighted technical and epistemological challenges in identifying paradata proba-
bly represent only a part of all conceivable hurdles, we are inclined to believe that the present analysis can
still be helpful in clarifying what kinds of challenges should be expected when dealing with a relatively
well-formed and systematically developed fieldwork dataset. The analysis also contributes to a better
Börjesson et al.: Re-purposing Excavation Database Content as Paradata page 15 of 18
understanding of only partly digested datasets—actually, most of them—that have not been created with
reuse in mind. This is, again, something that often characterises legacy fieldwork data that has not been
collected as part of a concerted effort to accumulate uniform data.
Conclusions
Our findings show that a fieldwork dataset contains information on how it came into being (i.e., paradata)
in the forms of both traces and direct evidence. The analysed dataset included KOP relating to how a
fieldsite was transformed into structured information objects and archaeological information, and KMP as
explicit and implicit descriptions of interpretative processes, how the data was created, and the fieldwork
conducted. Different levels of finality and completeness of the data in the dataset; internal inconsistencies
in nomenclature; vocabulary; and references between data units posed challenges from both technical
and epistemological perspectives. This “roughness,” however, also worked as an opportunity by revealing
traces of the data-in-making. In addition, the structural, temporal, and terminological variations in the data
provide cues to assess the reliability of the data: who did what, how specific the data is, how certain the
interpretations appear to be and how they may have changed. As a whole, even if the lack of systematicity
caused problems, its apparent advantages raised concerns about whether data can be made too clean.
Instead of essentially purifying data (cf. Atici et al. 2013), it can be more useful to describe what the data is
about, list the vocabulary used in the data, and explain how data units and concepts relate to each other.
Moreover, the findings point to the need to better understand different levels of completeness in archae-
ological data. First, there is a colossal difference between systematising and making systematic. We see in the
dataset a “sufficient systematisation” that enabled archaeologists to draw necessary conclusions during the
fieldwork project and report writing. The resulting level of systematicity that was apparently enough for a
human reader did not, however, equal the machine-readable “systematic documentation” required in the
Urdar project. Similarly, considering Hodder’s (1989) remarks on archaeological data and writing, it is con-
ceivable that such systematic documentation is not necessarily best suited for a human reader. We also
suggest that there is a difference between systematising knowledge organization (creating the database
structure) and systematising knowledge-making (populating data structure with content). In the analysed
dataset, there are examples of everything from users employing fairly structured terminology to users pop-
ulating the data structure in the least systematic way that the structure allows.
A key implication of the varied degrees of structure is that there is a dire need to broaden the understand-
ing of systematicity in archaeology and comparable field sciences. Systematicity is a gradient and, as the
analysed dataset shows, not always the ultimate goal of those who create datasets. Systematicity has a func-
tion, but only in a certain phase of the knowledge-making process. Messiness or systematicity does not make
data (un)reusable a priori but rather reusable in two very particular senses. The scalarity of systematicity calls
into question not only whether it is always reasonable to have the goal of creating FAIR data when docu-
menting fieldwork (cf. Huvila 2012) but also whether interoperability and reusability should be perpetual
goals. To quote Hodder’s famous expression about where archaeological interpretations are made, making
data truly interoperable and reusable would require that it is made such at “trowel’s edge” (Hodder 1997,
693)—i.e., when first entered in a database or scribbled on a piece of paper. Considering the apparent prev-
alence and social usefulness of certain messiness, it is apparent that making data FAIR from the field has
both advantages and disadvantages that have to be weighed against each other.
Another key implication of the scalarity of systematicity is that data reuse and aggregation needs to be
seen as a research activity that requires methodological framing (see Kansa and Kansa 2021; Sobotkova
2018). Reading the type and grade of systematicity and extracting data provenance and processing informa-
tion are undertakings that have a place within that methodological framing; they require explicit effort,
skills, and a meticulously developed approach to succeed and are a part of the data literacy (see Kansa and
Kansa 2021a) of a competent data reuser.
The findings of this study also have implications for digital preservation and repository practice. Our
analysis shows the importance of supporting data descriptions that spell out what the data is about
(aboutness) and lay out the context of the used vocabulary and terminology, including eventual data collo-
quialisms. It also points to the importance of the relations of data units and concepts when seeking to
understand the underpinning principles of knowledge organisation and knowledge-making. When
aggregating data, it is important to compare the knowledge organisation (as in KOP) and knowledge-making
(as in KMP) principles used in the different datasets to assess whether the datasets can be aggregated for
specific purposes or not. Finally, if strict structural and terminological standards are used to enable
machine-readability of an aggregated dataset, it is crucial to keep the expressions of uncertainties, stages of
interpretation, and questions inscribed in the original data to allow reusers to understand, exploit, and be
aware of how unFAIR (or MEAN—see Huvila 2017) they potentially are.
page 16 of 18 Börjesson et al.: Re-purposing Excavation Database Content as Paradata
Acknowledgements
We would like to thank the archaeologists who originally compiled the excavation database and the Urdar
project for sharing the data export. We are also grateful to Dr. Angus Graham for assisting in translating the
Swedish archaeological terminology to English. The two anonymous reviewers provided valuable feedback.
This article is part of the project CAPTURE that has received funding from the European Research Council
(ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No.
818210) and project Urdar funded by Riksbankens Jubileumsfond (In19-0135:1).
Competing Interests
The authors declare that they have no competing interests. The editors would like to note that Isto Huvila
is an editorial board member of KULA but that this article went through the same submission process,
including anonymous peer review, as all other research articles.
References
Allison, Penelope. 2008. “Dealing with Legacy Data - an Introduction.” Internet Archaeology 24. https://doi.
org/10.11141/ia.24.8.
Atici, Levent, Sarah Kansa, Justin Lev-Tov, and Eric C. Kansa. 2013. “Other People’s Data: A Demonstration of
the Imperative of Publishing Primary Data.” Journal of Archaeological Method and Theory 20 (4): 663–81.
https://doi.org/10.1007/s10816-012-9132-9.
Bhargava, Rahul, Erica Deahl, Emmanuel Letouzé, Amanda Noonan, David Sangokoya, and Natalie Shoup.
2015. “Beyond Data Literacy: Reinventing Community Engagement and Empowerment in the Age of
Data.” Data-Pop Alliance White Paper Series. New York: Internews Center for Innovation and Learning
and the MIT Media Lab Center for Civic Media. https://datapopalliance.org/item/beyond-data-litera-
cy-reinventing-community-engagement-and-empowerment-in-the-age-of-data/. Archived at: https://
perma.cc/7AGR-245E.
Boozer, Anna Lucille. 2014. “The Tyranny of Typologies: Evidential Reasoning in Romano-Egyptian Domestic
Archaeology.” In Material Evidence: Learning from Archaeological Practice, edited by Robert Chapman and
Alison Wylie, 92–109. Abingdon: Routledge.
Börjesson, Lisa. 2016. “Beyond Information Policy: Conflicting Documentation Ideals in Extra-Academic
Knowledge Making Practices.” Journal of Documentation 72 (4): 674–95. https://doi.org/10.1108/
JDOC-10-2015-0134.
Börjesson, Lisa, Olle Sköld, and Isto Huvila. 2021. “Paradata in Documentation Standards and
Recommendations for Digital Archaeological Visualisations.” Digital Culture & Society 6 (2): 1. https://
doi.org/10.14361/dcs-2020-0210.
Brown, Hannah, Helen Goodchild, and Søren M. Sindbæk. 2014. “Making Place for a Viking Fortress. An
Archaeological and Geophysical Reassessment of Aggersborg, Denmark.” Internet Archaeology 36.
https://doi.org/10.11141/ia.36.2.
D’Andrea, Andrea, and Kate Fernie. 2013. “CARARE 2.0: A Metadata Schema for 3D Cultural Objects.” In 2013
Digital Heritage International Congress (DigitalHeritage), 137–43. https://doi.org/10.1109/DigitalHeritage.
2013.6744745.
Ellis, Steven J. R. 2008. “The Use and Misuse of ‘Legacy Data’ in Identifying a Typology of Retail Outlets at
Pompeii.” Internet Archaeology 24. https://doi.org/10.11141/ia.24.4.
European Commission. n.d. “Open Science.” Accessed May 21, 2021. https://ec.europa.eu/info/
research-and-innovation/strategy/goals-research-and-innovation-policy/open-science_en. Archived at:
https://perma.cc/N22U-X2JF.
Faniel, Ixchel, Anne Austin, Sarah Whitcher Kansa, Eric Kansa, Jennifer Jacobs, and Phoebe France. 2021.
“Identifying Opportunities for Collective Curation During Archaeological Excavations.” International
Journal of Digital Curation. https://doi.org/10.2218/ijdc.v16i1.742.
Faniel, Ixchel, Eric Kansa, Sarah Whitcher Kansa, Julianna Barrera-Gomez, and Elizabeth Yakel. 2013. “The
Challenges of Digging Data: A Study of Context in Archaeological Data Reuse.” In JCDL ’13: Proceedings
of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 295–304. https://doi.org/
10.1145/2467696.2467712.
Faniel, Ixchel M., Rebecca D. Frank, and Elizabeth Yakel. 2019. “Context from the Data Reuser’s Point of
View.” Journal of Documentation 75 (6): 1274–97. https://doi.org/10.1108/JD-08-2018-0133.
Federal Geographic Data Committee. n.d. “Content Standard for Digital Geospatial Metadata (CSDGM).”
Accessed May 28, 2021. https://www.fgdc.gov/metadata/csdgm-standard. Archived at: https://perma.
cc/3JMX-Z3AJ.
Furner, Jonathan. 2021. Information Studies and Other Provocations. Sacramento: Litwin Books.
Börjesson et al.: Re-purposing Excavation Database Content as Paradata page 17 of 18
Geiger, R. Stuart., and David Ribes. 2011. “Trace Ethnography: Following Coordination through Documentary
Practices.” In 2011 44th Hawaii International Conference on System Sciences (HICSS), 1–10. https://doi.
org/10.1109/HICSS.2011.455.
Gitelman, Lisa, ed. 2013. “Raw Data” Is an Oxymoron. Cambridge, MA: MIT Press.
Gustafsson, Anders, and Björn Magnusson Staaf. 2001. “Rapport Om Rapporter — En Diskussion Kring
Kvalitetsbedömningar Av Arkeologiska Rapporter.” Report 2001:3. Stockholm: RAÄ.
Heitman, Carrie, Worthy Martin, and Stephen Plog. 2017. “Innovation through Large-Scale Integration of
Legacy Records: Assessing the ‘Value Added’ in Cultural Heritage Resources.” Journal on Computing and
Cultural Heritage 10 (3): 1–10. https://doi.org/10.1145/3012288.
Hodder, Ian. 1989. “Writing Archaeology: Site Reports in Context.” Antiquity 63 (239): 268–74.
Hodder, Ian. 1997. “Always Momentary, Fluid and Flexible’: Towards a Reflexive Excavation Methodology.
Antiquity 71 (273): 691–700.
Huvila, Isto. 2006. The Ecology of Information Work: A Case Study of Bridging Archaeological Work and Virtual
Reality Based Knowledge Organisation. Åbo: Åbo akademis förlag.
Huvila, Isto. 2012. “Being Formal and Flexible: Semantic Wiki as an Archaeological e-Science Infrastructure.”
In Revive the Past: Proceedings of the 39th Conference of Computer Applications and Quantitative
Methods in Archaeology, edited by Mingquan Zhou, Iza Romanowska, Zhongke Wu, Pengfei Xu, and
Philip Verhagen, 186–97. Amsterdam: Amsterdam University Press. https://doi.org/10.
1017/9789048516865.
Huvila, Isto. 2016. “Awkwardness of Becoming a Boundary Object: Mangle and Materialities of Reports,
Documentation Data, and the Archaeological Work.” The Information Society 32 (4): 280–97. https://doi.
org/10.1080/01972243.2016.1177763.
Huvila, Isto. 2017. “Being FAIR When Archaeological Information Is MEAN: Miscellaneous, Exceptional,
Arbitrary, Nonconformist.” Presentation at the Centre for Digital Heritage Conference 2017, Leiden, June
14–16, 2017. http://www.istohuvila.se/node/526.
Huvila, Isto. 2020a. “Information-Making-Related Information Needs and the Credibility of Information.”
Information Research 25 (4): paper isic2002. https://doi.org/10.47989/irisic2002.
Huvila, Isto. 2020b. “Use-Oriented Information and Knowledge Management: Information Production and
Use Practices as an Element of the Value and Impact of Information.” Journal of Information & Knowledge
Management 18 (4): 1950046. https://doi.org/10.1142/s0219649219500461.
Huvila, Isto, Olle Sköld, and Lisa Börjesson. 2021. “Documenting Information Making in Archaeological Field
Reports.” Journal of Documentation 77 (5): 1107–27. https://doi.org/10.1108/JD-11-2020-0188.
ISO. 2014. “ISO 19115-1:2014 Geographic information – Metadata – Part 1: Fundamentals.” https://www.iso.
org/cms/render/live/en/sites/isoorg/contents/data/standard/05/37/53798.html.
Kansa, Eric, and Sarah Whitcher Kansa. 2021. “Digital Data and Data Literacy in Archaeology Now and in the
New Decade.” Advances in Archaeological Practice 9 (1): 81–85. https://doi.org/10.1017/aap.2020.55.
Kersel, Morag M. 2015. “STORAGE WARS: Solving the Archaeological Curation Crisis?” Journal of Eastern
Mediterranean Archaeology and Heritage Studies 3 (1): 42–54. https://doi.org/10.5325/jeasmedarcher-
stu.3.1.0042.
Kim, Jihyun, Elizabeth Yakel, and Ixchel M. Faniel. 2019. “Exposing Standardization and Consistency Issues
in Repository Metadata Requirements for Data Deposition. College & Research Libraries. https://doi.
org/10.5860/crl.80.6.843.
Koesten, Laura, Pavlos Vougiouklis, Elena Simperl, and Paul Groth. 2020. “Dataset Reuse: Toward Translating
Principles to Practice.” Patterns 1 (8). https://doi.org/10.1016/j.patter.2020.100136.
Larsson, Åsa M., and Daniel Löwenborg. 2020. “The Digital Future of the Past - Research Potential with
Increasingly FAIR Archaeological Data.” In Re-Imagining Periphery: Archaeology and Text in Northern
Europe from Iron Age to Viking and Early Modern Periods, edited by Charlotta Hillerdal and Kristin Ilves,
61–70. Oxford: Oxbow.
Löwenborg, Daniel, Maria Jonsson, Åsa Larsson, and Johan Nordinge. 2021. “A Turn Towards the Digital. An
Overview of Swedish Heritage Information Management Today.” Internet Archaeology 58. https://doi.
org/10.11141/ia.58.19.
Montoya, Robert D., and Katherine Morrison. 2019. “Document and Data Continuity at the Glenn A. Black
Laboratory of Archaeology.” Journal of Documentation 75 (5): 1035–55. http://doi.org/10.1108/
JD-12-2018-0216.
Nadim, Tahani. 2021. “The Datafication of Nature: Data Formations and New Scales in Natural History.” Journal
of the Royal Anthropological Institute 27 (S1): 62–75. https://doi.org/10.1111/1467-9655.13480.
Olson, Carina, and Yvonne Walther. 2007. “Neolithic Cod and Herring Fisheries in the Baltic Sea, in the Light
of Fine-Mesh Sieving : A Comparative Study of Subfossil Fishbone Form the Late Stone Age Sites at
Ajvide, Gotland, Sweden and Åland, Finland.” Environmental Archaeology 12 (2): 175–85.
page 18 of 18 Börjesson et al.: Re-purposing Excavation Database Content as Paradata
Richards, Julian D., Ulf Jakobsson, David Novák, Benjamin Štular, and Holly Wright. 2021. “Digital Archiving
in Archaeology: The State of the Art. Introduction.” Internet Archaeology 58. https://doi.org/10.11141/
ia.58.23.
Roskams, Steve. 2001. Excavation. Cambridge: Cambridge University Press.
Roy, Sohon, Felienne Hermans, Efthimia Aivaloglou, Jos Winter, and Arie van Deursen. 2016. “Evaluating
Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants.” In
2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER),
135–45. https://doi.org/10.1109/SANER.2016.98.
Sabo, Katalin Schmidt, Magnus Andersson, Mats Anglert, Caroline Arcini, Adam Bolander, Torbjörn Brorsson,
Annica Cardell, Bo Knarrström, Per Lagerås, Linda Rosendahl, Fredrik Strandmark, Marie Svedin, and
Håkan Svensson. 2013. Arkeologisk Undersökning 2010 Örja 1:9 Skåne, Landskrona Kommun, Örja Socken,
Örja 1:9, Fornlämningarna Örja 9, 35, 40, 41 Och 42. Vol. 2013: 68. UV Rapport. Lund: RAÄ.
Secci, Massimiliano, Carlo Beltrame, Stefania Manfio, and Francesco Guerra. 2019. “Virtual Reality in
Maritime Archaeology Legacy Data for a Virtual Diving on the Shipwreck of the Mercurio (1812).” In
“Multidisciplinary Study of the Sarno Baths in Pompeii,” special issue, edited by Lara Maritan, Caterina
Previato, and Filippo Lorenzoni, Journal of Cultural Heritage 40 (November–December), 169–76. https://
doi.org/10.1016/j.culher.2019.05.002.
Silverman, David. 2013. A Very Short, Fairly Interesting and Reasonably Cheap Book about Qualitative Research.
London: SAGE Publications. https://doi.org/10.4135/9781526402264.
Sobotkova, Adela. 2018. “Sociotechnical Obstacles to Archaeological Data Reuse.” Advances in Archaeological
Practice 6 (2): 117–24. https://doi.org/10.1017/aap.2017.37.
Stilborg, Ole. 2021. “A Study of the Representativity of the Swedish Ceramics Analyses Published in The
Strategic Environmental Archaeology Database (SEAD).” Fornvännen 116 (2): 89–100.
The Swedish Research Council. 2017. God forskningssed. Stockholm: Swedish Research Council. https://
www.vr.se/analys/rapporter/vara-rapporter/2017-08-29-god-forskningssed.html.
Tkaczyk, Dominika, Paweł Szostek, Mateusz Fedoryszak, Piotr Jan Dendek, and Łukasz Bolikowski. 2015.
“CERMINE: Automatic Extraction of Structured Metadata from Scientific Literature.” International
Journal on Document Analysis and Recognition (IJDAR) 18 (4): 317–35. https://doi.org/10.1007/
s10032-015-0249-8.
Ullah, Isaac I. T. 2015. “Integrating Older Survey Data into Modern Research Paradigms: Identifying and
Correcting Spatial Error in ‘Legacy’ Datasets.” Advances in Archaeological Practice 3 (4): 331–50. https://
doi.org/10.7183/2326-3768.3.4.331.
Voss, Barbara L. 2012. “Curation as Research. A Case Study in Orphaned and Underreported Archaeological
Collections.” Archaeological Dialogues 19 (2): 145–69. https://doi.org/10.1017/S1380203812000219.
Warner, Julian. 2010. Human Information Retrieval. Cambridge, MA: MIT Press.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie
Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and
Stewardship.” Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18.
Wylie, Alison. 2017. “How Archaeological Evidence Bites Back: Strategies for Putting Old Data to Work in New
Ways.” Science, Technology, & Human Values 42 (2): 203–25. https://doi.org/10.1177/0162243916671200.
Yan, An, Caihong Huang, Jian-Sin Lee, and Carole L. Palmer. 2020. “Cross-Disciplinary Data Practices in Earth
System Science: Aligning Services with Reuse and Reproducibility Priorities.” Proceedings of the Association
for Information Science and Technology 57 (1). https://doi.org/10.1002/pra2.218.
How to cite this article: Börjesson, Lisa, Olle Sköld, Zanna Friberg, Daniel Löwenborg, Gisli Palsson, and Isto
Huvila. 2022. Re-purposing Excavation Database Content as Paradata: An Explorative Analysis of Paradata
Identification Challenges and Opportunities. KULA: Knowledge Creation, Dissemination, and Preservation
Studies 6(3). https://doi.org/10.18357/kula.221
Copyright: @ 2022 The Author(s). This is an open-access article distributed under the terms of the
Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author and source are credited.
See http://creativecommons.org/licenses/by/4.0/.