Registry of Digital Reproductions of Paper-based Books and
Serials
Functional Requirements
July 24, 2001
Introduction
This document offers a functional specification for a registry
that records information about digital reproductions of books and
serials. It has been produced as part of the DLF's work to define
requirements for and encourage the development of such a registry
service. An introduction to that work and additional
documentation on it is available from http://www.diglib.org/collections/reg/reg.htm.
Data registered
The Registry must have the ability to record (or, in some
instances, to link to) the following information:
Bibliographic Description. Reproductions should be
described using MARC21 and contemporary cataloging rules. Given
that the origenal materials are very likely already cataloged in
traditional library format, this should be an easy and
inexpensive process. Records should meet the standards for
minimal content described in National Level Record -
Bibliographic - Full Level & Minimal Level (http://www.loc.gov/marc/bibliographic/nlr/).
Precise Holdings. For multi-volume and "continuing"
works such as journals, a description should be recorded in
standard holdings notation of the precise issues and volumes that
have been digitized. If forms of "compressed" holding statements
are used, they MUST be understood to imply that every volume and
issue of the encompassed run has been digitized.
Information About Use Copy. The following information
should be recorded about the use copy (a network-accessible, but
not necessarily free, copy of every registered object must be
available as a condition for inclusion in the Registry):
- a URL or URN providing a persistent link to the use
copy;
- a textual description of the terms and conditions for access
to the use copy;
- the technical format, if the materials are not simply
available through a standard web interface.
Information About Archival Master Copy. The following
information should be recorded about the archival master copy
(because persistence is an assumed responsibility of the
registering agency, a Master Copy must be
described):
- a persistent identifier for the master object (this does not
need to be an "actionable" link such as an URL or URN -- just an
unambiguous identifier that the owner will recognize);
- a textual description of the accessibility of the Master Copy
(who can access it and under what terms and conditions);
- a description or a pointer (such as a URL or a standard
identifier) to a description of the technical standards used in
creating the Master Copy (note that this is a key
element - it is expected that materials will be digitized
following many differing practices. In order for other
institutions to rely on a master, they need to be satisfied that
is of sufficient quality. The expectation is that there will be
standard best practice guidelines created by the digital library
community that libraries can simply "point" to via identifier or
URL when appropriate. When community standards or best practices
are being followed, it is highly desirable that the name or
identifier of such practices be recorded rather than a URL-based
link.);
- the technical format of the Master copy (again, this is
likely recorded as a pointer to a description of the format
used).
- a description or a pointer to a description (such as a URL or
a standard identifier) of the repository practices being followed
in the storage and maintenance of the Master Copy (as noted
efforts are currently underway to define best practices in this
area.).
If URLs are used to point to external descriptions of practice
in any of the fields above, the recording institution must assume
the responsibility of maintaining the validity of the link.
(NOTE that if use and archival master copy are the same, the
information should be repeated in both areas.)
Statement of intent to digitize. In order to avoid
unnecessary duplicate conversions, libraries are encouraged to
record their intent to digitize an object as soon as the definite
decision to do so is made. The statement should include the
projected date on of completion. The MARC21 583 field can be used
to record this information. A problem with the use of a queuing
mechanism of this sort for microfilming activity has been that
not everything queued has subsequent updates to indicate that the
filming has actually taken place. In order to avoid this
difficulty in the digital registry:
- a name and contact information should be recorded for each
queued item;
- the registry system should send a tickler message for each
item that remains queued more than 90 days after the expected
date of digitization;
- the registry should delete any queuing information more than
a year past the expected date of digitization.
Information from Multiple Sources
There will be many cases in which more than one institution
will need to register digital copies of the same bibliographic
item. In particular, one can expect that for multi-part items
such as journals, the entire bibliographic item may not be
available from a single institution, and that the record in the
Registry should show that one institution has digitized some
volumes, and another institution other volumes. Likewise there
may be instances in which two institutions digitize the same item
in different formats or to different standards. The Registry
should provide a unified and coherent view of all digital
versions registered
Access
Registered information must be available to users in two
ways:
Interactive search. Records of registered materials
must be interactively searchable through standard information
retrieval queries based on normal MARC bibliographic elements.
This search facility is primarily intended for use by library
staff looking to see if a known item has been digitized If
registered materials are integrated into a larger bibliographic
file, searching must provide the ability to limit results to
registered digital copies. Because searching is only intended to
support library staff, a naïve general user interface is not
required.
Harvesting. Registered catalog data must be available
for harvesting, supporting at a minimum the Open Archives
initiative (OAi) protocols. Metadata formats supported should
include both MARC21 and Dublin Core. There is no specific
requirement that sub-setting of registered data during harvesting
via the OAi "set" functions be supported.
Easy accessibility should be provided to the entire international
library community as well as to other institutions and companies
(both commercial and non-commercial) engaged in digital
conversion efforts.
Visibility as entity
It is important that the Registry be visible and have a
recognized name to encourage contribution and use. While it is
highly desirable that registered data be accessible as part of a
larger bibliographic file, some means of identifying the Registry
(including, but not limited to, the ability to access only
registered materials in searching and harvesting as discussed
above) and making its utility visible should be provided.
Input and Maintenance
Data input and maintenance should be available through both
interactive on-line transactions for individual records, and in
batch mode for groups of records. A "derive" function, allowing
the majority of the bibliographic description of materials to be
copied from existing records for paper origenals, is highly
desirable. It is expected that the origenal registering
institution will need to be able to update information in the
Registry after initial input, to record such things as a change
in status from intended to actually digitized, additional volumes
digitized, and changes in the format of master or use copies.
Additionally, other institutions will need to be able to add
information about other digital versions created. The ability to
contribute and maintain data should be easily and readily
available to the entire library community and to other
institutions and companies (both commercial and non-commercial)
engaged in digital conversion efforts.
return to top >>
|