Table of contents

Search API field reference

Documents in an elasticsearch index have a type, and each type can have different fields.

In Search API, the field is named elasticsearch_type to avoid confusion with content_store_document_type.

All fields

all_searchable_text
all_searchable_text_type
Special field which searchable text is copied into; similar to the standard `_all` field, but more customisable
exact_query
best_bet_exact_match_text
Field used in the best-bet implementation to perform an exact match between a user’s query and a stored best-bet
stemmed_query
best_bet_stemmed_match_text
Field used in the best-bet implementation to perform a stemmed match between a user’s query and a stored best-bet
has_act_paper
boolean

has_command_paper
boolean

has_official_document
boolean

important_to_policy
boolean
A flag set by editors to mark a document as being important to policy
is_historic
boolean
If the content is political and published by a previous government, it is considered historic and not reflecting of the current government
is_political
boolean
If the content is considered political in nature, reflecting views of the government it was published under
is_withdrawn
boolean
If the content has been published but then withdrawn
relevant_to_local_government
boolean

alert_issue_date
date

assessment_date
date
The date when the assessment was held.
build_end_date
date

build_start_date
date

closed_date
date

closing_date
date

date_of_occurrence
date
Date of the event described in the document. Only applies to MAIB, RAIB, AAIB reports.
end_date
date
End date for topical content. Assume null means in the past for topical event pages.
first_published_at
date
The date the content was first published. Should be the same as `first_published_at` in the publishing-api.
issued_date
date

opened_date
date

public_timestamp
date
Time of the last update. Used for 1) weighting more recent editions (mainly whitehall content) more highly, 2) the default sort in finders, 3) showing a date in the search results. Should be the same as `public_updated_at` in the publishing-api.
release_timestamp
date

start_date
date
Start date for topical content.
tribunal_decision_decision_date
date

updated_at
date
TODO: find out where this is used.
popularity
float

rank_14
float
Field used in the page-traffic index to hold the rank of this page in the list of pages on the website when ordered by traffic. This is a fairly stable value used in ranking calculations.
content_id
identifier
The content_id of the item. This will not be present for all items, as most application do not send it.
content_store_document_type
identifier
The document type as stored in the Content Store
detailed_format
identifier
A slugified version of the display_type field
dfid_review_status
identifier
Whether or not this item is peer reviewed (a code)
display_type
identifier

email_document_supertype
identifier

format
identifier

government_document_supertype
identifier

hmrc_manual_section_id
identifier

id
identifier

licence_identifier
identifier
The licence ID associated with a content item of type licence
link
identifier
The link to the document. This is usually the path component of the URL of the document, including a leading slash, but sometimes omits the leading slash, and is sometimes an absolute URL of related content which does not appear on GOV.UK
manual
identifier
Base path of the manual this document belongs to. Eg `/service-manual` or `/hmrc-internal-manuals/air-passenger-duty`
navigation_document_supertype
identifier

operational_field
identifier

organisation_state
identifier

organisation_type
identifier
The organisation type identifier, like ‘ministerial_department’ or 'public_corporation’. Only applies to organisations. Expected value is the `organisation_type_key` from Whitehall, enumerated in https://github.com/alphagov/whitehall/blob/master/app/models/organisation_type.rb.
outcome_type
identifier

publishing_app
identifier
Application that published this page
rendering_app
identifier
Application that renders this page
slug
identifier

statistics_announcement_state
identifier

stemmed_query_as_term
identifier
Field used in the best-bet implementation to hold the raw form of a stemmed query
tribunal_decision_category
identifier

tribunal_decision_country
identifier

tribunal_decision_landmark
identifier

tribunal_decision_reference_number
identifier

tribunal_decision_sub_category
identifier

user_journey_document_supertype
identifier

aircraft_category
identifiers

alert_type
identifiers

case_state
identifiers

case_type
identifiers

contact_groups
identifiers

country
identifiers

development_sector
identifiers

document_collections
identifiers

document_series
identifiers

eligible_entities
identifiers

fault_type
identifiers

faulty_item_model
identifiers

faulty_item_type
identifiers

fund_state
identifiers

fund_type
identifiers

funding_amount
identifiers
Funding (per unit per year)
funding_source
identifiers

grant_type
identifiers

land_use
identifiers

location
identifiers

mainstream_browse_page_content_ids
identifiers
As opposed to “mainstream_browse_pages”, this will include the “content_ids” of each mainstream browse pagerather than the slug. It will eventually replace “mainstream_browse_pages”.
mainstream_browse_pages
identifiers

manufacturer
identifiers

market_sector
identifiers

medical_specialism
identifiers

organisation_content_ids
identifiers
As opposed to “organisations”, this will include the “content_ids” of each organisation rather than the slug. It will eventually replace “organisations”.
organisations
identifiers
The organisations related to this page. This field is copied from the publishing-api. Note that means different things for different formats.
part_of_taxonomy_tree
identifiers
Any taxon tagged to the document and any of their ancestor taxons
path_components
identifiers
Field used in the page-traffic index to hold the full paths of each component of a url. eg: a document with a path of ’/foo/bar/baz’ would have the values ’/foo’, ’/foo/bar’ and ’/foo/bar/baz’ in this field. This allows all the pages under a given URL component to be identified.
people
identifiers

policies
identifiers

policy_areas
identifiers
Policy areas are managed in Whitehall. They’re an old grouping of policies, which we’re expecting to deprecate soon. Formally known as 'topics’.
policy_groups
identifiers

primary_publishing_organisation
identifiers
The organisation that published this page. This field is copied from the publishing-api. It is only populated by Whitehall.
railway_type
identifiers

report_type
identifiers

search_format_types
identifiers

serial_number
identifiers

specialist_sectors
identifiers
The navigation “topics” that the document is assigned to. Nothing to do with “policy areas”
taxons
identifiers

therapeutic_area
identifiers

tiers_or_standalone_items
identifiers

topic_content_ids
identifiers
The navigation “topics” that the document is assigned to. Nothing to do with “policy areas”. As opposed to “specialist_sectors”, this will include the “content_ids” of each topic rather than the slug. It will eventually replace “specialist_sectors”.
tribunal_decision_categories
identifiers

tribunal_decision_judges
identifiers

tribunal_decision_sub_categories
identifiers

value_of_funding
identifiers

vessel_type
identifiers

world_locations
identifiers

attachments
objects

metadata
opaque_object
The “metadata” field is intended for the storage of additional non-searchable document data. This allows additional information to be stored and displayed in search results without having to make changes to the schema.
dfid_document_type
searchable_identifier
The document type of the output (e.g. Book Chapter, Conference Paper)
tribunal_decision_category_name
searchable_identifier

tribunal_decision_country_name
searchable_identifier

tribunal_decision_landmark_name
searchable_identifier

tribunal_decision_sub_category_name
searchable_identifier

business_sizes
searchable_identifiers
The sizes of business served by a business finance support scheme
business_stages
searchable_identifiers
The stages of business served by a business finance support scheme
dfid_authors
searchable_identifiers
A set of author names, which aren’t selected from a predefined list and don’t repeat
dfid_theme
searchable_identifiers
The broad theme (or themes) to which the output applies
industries
searchable_identifiers
The industries served by a business finance support scheme
tribunal_decision_categories_name
searchable_identifiers

tribunal_decision_judges_name
searchable_identifiers

tribunal_decision_sub_categories_name
searchable_identifiers

types_of_support
searchable_identifiers
The types of support provided by a business finance support scheme
title
searchable_sortable_text

acronym
searchable_text

aircraft_type
searchable_text

description
searchable_text

indexable_content
searchable_text

licence_short_description
searchable_text

registration
searchable_text

spelling_text
spelling_text
Generated field, populated with the same content as sent to the _all field, but tokenised into words, lowercased and shingled, not stemmed, etc
details
unsearchable_text
Field used in the best-bet implementation to store the modifications to be made to the query when a best-bet is matched
government_name
unsearchable_text
The name of the Government that first published this document, eg, '1970 to 1974 Conservative government’
latest_change_note
unsearchable_text