Skip to main content
Last updated: 26 Jul 2024

GOV.UK's sitemap

GOV.UK’s sitemap is available at https://www.gov.uk/sitemap.xml. GOV.UK is far too big to fit into one sitemap, so this file is more of a ‘sitemap index’, which references around 30 other XML files, such as https://www.gov.uk/sitemaps/sitemap_1.xml.

How the sitemap is generated

Every morning, a search-api-generate-sitemap cronjob runs to generate a fresh sitemap.

The cronjob runs the sitemap:generate_and_upload rake task in search-api. This enumerates over all documents in Search API and generates a sitemap matching the format specified in https://www.sitemaps.org/protocol.html. This job also creates the sitemap index.

How content gets into Search API

The preferred pattern is for content to be published via Publishing API. After an edition is changed, Publishing API publishes a message to the published_documents topic exchange it configured on startup. Interested parties, such as Search API, can subscribe to this exchange to perform post-publishing actions.

Search API listens to the publishing queue using the govuk_message_queue_consumer gem. Its MessageProcessor processes the indexing of the content.

However, message queues aren’t the only way to get content into Search API. Whitehall calls Search API directly, via Whitehall::SearchIndex, which is called by any model that includes the Searchable module.