Content-Length: 40244 | pFad | http://search.gov/indexing/indexing-workflow.html

Indexing Workflow | Search.gov Skip to main content
U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Indexing Workflow

This page provides a step-by-step description of the Search.gov indexing process for your website. When you’re ready to index your domain with our service, please reach out to us by email at search@gsa.gov so we can discuss and coordinate.

A. Define Domains and Subdomains

Who: You, the agency web team, in consultation with the Search.gov team
What: The Admin Center Domains list controls what we pull out of our index for a search on your site. But we also need to know what to put in to the index to begin with. We'll work with you to confirm the domains and subdomains you want discoverable through search. For example, after discussing with you, we may plan to index all of your subdomains, or just a selection of the major sections.

If you have Javascript-based content on your domains please let our team know. We will work with you to ensure content on those pages is successfully indexed.

Example Domains:
www.example.gov
data.example.gov
archive.example.gov
www.subagencydomainexample.gov

B. Sitemap or Feed for Each Subdomain

Who: You, the agency web team, in consultation with the Search.gov team
What: The easiest way for us to discover what URLs exist on your domain is via an XML sitemap. Each domain identified above will need a separate sitemap. Please read our detailed discussion of XML sitemaps, and let us know if you have any questions. We understand it can be difficult for some legacy systems, or multi-platform websites to generate comprehensive sitemaps, so if this is the case, please reach out.

We also support valid RSS 2.0 and Atom 2.0 feeds for URL discovery.

We do not crawl websites by default due to the high resource demand of crawling every page on every website all the time. One of the goals of our service is to contain the costs of search government-wide, and a crawling-first model would increase costs significantly.

If you publish your site on Cloud.gov Pages, read these additional instructions.

C. Index Subdomains

Who: The Search.gov team
What: Once sitemaps and/or feeds are posted to your website, our system will be able to index your content. Alert us when they are posted, and we'll add your domains to the list of domains that we monitor. Then, indexing will begin.

By default, we make 1 request per second to a domain. If a `Crawl-delay` is declared in your /robots.txt file, we will honor that delay while fetching your content for indexing. The length of time required to index a site is `(number of items) x (crawl delay) / 3600 = hours to index`.

If you use a firewall service, it's possible our indexer will be blocked. We can provide our IP addresses for you to whitelist in your firewall.

Please note, we can only index domains that are publicly accessible. This means that if you have a password-protected staging environment, we will not be able to index it for you as part of your testing process. Please reach out and we can discuss options if you need to test our service pre-production.

D. Test Index

Who: Search.gov Team
What: For search sites switching from Bing: After your content is indexed, we'll start up a parallel search site using your current site configuration and the new index, and run a number of test queries to ensure the index is performing satisfactorily. Our test will cover your live site's most popular queries.

E. Review Index

Who: You, the agency web team
What: For sites switching from Bing: After we're satisfied with the index, we'll send you a link to the test search site, so you can review and provide feedback.

For brand new sites: You will be able to test the index using your regular search site(s).

F. Ready to Launch

Who: You, the agency web team, in collaboration with Search.gov
What: For brand new sites: Your index is ready to go, you can proceed with the rest of the site launch steps and go live without any further action from our team.

For sites switching from Bing: When you give us the green light to switch to the new index, there is no action needed on your part other than the approval. We will change a setting in our back end, which will point your existing search site's web results module to our index, and the change is effective immediately. All other elements of your search site remain the same: search features, branding, etc.









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://search.gov/indexing/indexing-workflow.html

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy