1940 Census on the AWS Registry of Open Data
The National Archives and Records Administration (NARA) published the 1940 Census dataset to the Amazon Web Services (AWS) Registry of Open Data. This documentation guides users in how to access the data.
Table of Contents
- About the Dataset
- Access Methods
- Accessing the Full Dataset
- Accessing Portions of the Dataset
- Accessing the Dataset By State/Territory
- Accessing the Dataset Based on County, Enumeration District, and Available Geographic Descriptions
About the Dataset
The 1940 Census dataset on the AWS Registry of Open Data - over 15 terabytes of data - includes the metadata index, the population schedules, the enumeration district maps, and the enumeration district descriptions for the 1940 Census records. The metadata index for the datasets is 251 megabytes, and all of the 3.7 million images from the population schedules, the enumeration district maps, and the enumeration district descriptions total 15.1439 terabytes.
The 1940 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1940, although some persons were missed. The population schedules were digitized by the National Archives and Records Administration (NARA) and released publicly on April 2, 2012.
The 1940 Census enumeration district maps contain maps of counties, cities, and other minor civil divisions that show enumeration district, census tract, and related boundaries and numbers used for each census. The coverage is nation wide and includes territorial areas.
The 1940 Census enumeration district descriptions contain written descriptions of Census Districts, Subdivisions, and Enumeration Districts.
The AWS Registry of Open Data, started in 2008, is a service provided by AWS to store open, public datasets for free so that they can be accessed and analyzed on AWS.
Access Methods
Users can access the full 1940 Census dataset using the Amazon Resource Name (ARN), a method to uniquely identify resources on AWS so that users can locate the dataset.
Additionally, users can access both the full dataset and specific portions of the dataset using the AWS Command Line Interface (CLI), an open source tool that enables users to interact with AWS services using commands in their command-line. Documentation for AWS CLI is available here.
Accessing the Full Dataset
The full dataset can be accessed with the following ARN:
arn:aws:s3:::nara-1940-census
To list the full dataset using AWS CLI, use the following command:
aws s3 ls s3://nara-1940-census/ --no-sign-request
To pull the full dataset using AWS CLI, use the following command:
aws s3 sync s3://nara-1940-census/ [destination] --no-sign-request
Accessing Portions of the Dataset
To pull portions of the dataset, i.e. only the metadata index, only the population schedules, only the enumeration district maps, only the enumeration district descriptions, or any combination thereof, the following AWS CLI commands can be used:
Dataset Portion | Description | AWS CLI Command |
---|---|---|
Metadata Index | Full metadata index to the 1940 Census holdings, including the population schedules, the enumeration district maps, and the enumeration district descriptions. Metadata is in JSON format. | aws s3 sync s3://nara-1940-census/metadata/ [destination] --no-sign-request |
Population Schedules | Digitized images of the forms (population schedules) used by Census enumerators to enumerate U.S. citizens. Images are JPG. | aws s3 sync s3://nara-1940-census/population-schedules/ [destination] --no-sign-request |
Enumeration District Maps | Digitized images of annotated maps of counties, cities, and other minor civil divisions that show enumeration districts, census tract, and related boundaries and numbers used for each census. Images are JPG. | aws s3 sync s3://nara-1940-census/ed-maps/ [destination] --no-sign-request |
Enumeration District Descriptions | Digitized images of written descriptions of geographic areas included within enumeration districts. Images are JPG. | aws s3 sync s3://nara-1940-census/ed-descriptions/ [destination] --no-sign-request |
Accessing the Dataset By State/Territory
To pull images for specific states and territories, the following AWS CLI commands can be used based on the example provided for Alaska. The two-letter abbreviations for states and territories can be viewed below:
State/Territory | Dataset Portion | AWS CLI Command |
---|---|---|
Alaska | ||
Population Schedules | aws s3 sync s3://nara-1940-census/population-schedules/ak/ [destination] --no-sign-request | |
Enumeration District Maps | aws s3 sync s3://nara-1940-census/ed-maps/ak/ [destination] --no-sign-request | |
Enumeration District Descriptions | aws s3 sync s3://nara-1940-census/ed-descriptions/ak/ [destination] --no-sign-request |
Accessing the Dataset Based on County, Enumeration District, and Available Geographic Descriptions
To pull records based on county, enumeration district, and geographic description (from the indexed enumeration district descriptions), the following AWS directories can be used, based on the example provided for Alaska, to build AWS CLI commands. Only the population schedules are stored in directories based on ED, since the maps and descriptions often span multiple EDs and are therefore organized by county.
State/Territory | County | ED Number | Geographic Description | Dataset Portion | AWS Directory |
---|---|---|---|---|---|
Alaska | First Judicial Division | 1-1 | Hyder Recording District - No. 1 Including Hyder Town, Bandit Mountain Mining Camp, Salmon River Mining Camp, Texas Creek Mining Camp, And W Shore Of Portland Canal N Of S Entrance To Tombstone Bay; Hyder Town. | Population Schedules | .../population-schedules/ak/first-judicial-division/ed/1-1/ |
Enumeration District Maps | Part of .../ed-maps/ak/first-judicial-division/ | ||||
Enumeration District Descriptions | Part of .../ed-descriptions/first-judicial-division/ |
Enumeration District Summaries by State
Below are links to HTML tables and JSON files for each state and territory that list counties, enumeration districts, geographic descriptions of each enumeration district, and the relevant directory paths to use for AWS CLI commands. It is important to note that not all cities/towns are indexed in the geographic descriptions.