Make a new document type available to search
Any document type that the publishing-api knows about can be added to our internal search. By default, all document types in internal search also get included in the GOV.UK sitemap, which tells external search engines about our content.
The app responsible for search is Rummager. Rummager listens to RabbitMQ messages about published documents to know when to index documents. For the new document type to be indexed, you need to add it to a whitelist.
1. Decide what fields you want to make available to search
Rummager has its own concept of document type, which represents the schema used to store documents in Elasticsearch (the search engine). Normally, you’ll map your document type to an existing rummager document type. If in doubt, use “edition”, as this is used for most documents. Then, modify mapped_document_types.yml with the mapping from the publishing api document type.
If you want Search to be able to use metadata that isn’t defined in an any rummager document type, then you’ll need to add new fields to rummager.
Rummager knows how to handle most of the core fields from the publishing
public_updated_at. It looks at the
parts fields to work out what text to make searchable. If your
schema uses different fields to render the text of the page, update the
IndexableContentPresenter as well.
The part of rummager that translates between publishing api fields and search fields is ElasticsearchPresenter. Modify this if there is anything special you want search to do with your documents (for example: appending additional information to the title).
2. Add the document type to migrated_formats.yaml
Add the document_type name to the
migrated list in rummager.
govuk index following the instructions in
Reindex an Elasticsearch index.
4. Republish all the documents
Republish all the documents. If they have been published already, you can republish them with the publishing-api represent_downstream rake task:
You can test that the documents appear in search through the API using a query such as: - https://www.gov.uk/api/search.json?count=0&filter_content_store_document_type=guide - https://www-origin.integration.publishing.service.gov.uk/api/search.json?count=0&filter_content_store_document_type=guide
More about Publishing
- Add a new document type
- Change a specialist document base path
- Content that doesn't show up correctly in search or list pages
- Deploy emergency publishing banners
- Documents are published, but the links aren't up to date
- Find hardcoded markup in GovSpeak
- Fix blank options in finder filters
- Help with publishing high priority content
- How the draft stack works
- Manually setting the search popularity of content
- Reindex an Elasticsearch index
- Remove a change note in Whitehall
- Rename a country
- Schema.org structured data
- Topic Taxonomy
- Upload HMRC PAYE files