Last updated: 8 Sep 2025

publishing-api: Checking parity of GraphQL and Content Store responses

A couple of scripts are available to check the parity of GraphQL and Content Store responses:

script/live_content/diff_frontend - this will guide you through diffing the responses for one page.
script/live_content/bulk_diff_frontend - this allows you to diff multiple pages in one process.

For the bulk script, you'll need to prepare a file with a list of base paths (e.g. /world) and an empty line at the end. See the "Retrieving base paths" section for two ways to do this.

Diffs will be output to tmp/diffs by default. Run the script with --help for information on all the required and optional arguments.

If diffing in the development environment, you'll need to start all the relevant servers in GOV.UK Docker: Publishing API, Content Store, plus any required frontend apps and their depenedencies (e.g. Collections, Frontend, Government Frontend, Static).

Issue with Bash version

If you get a syntax error when running the diffing scripts, you might be using an old version of Bash. At the time of writing, the version of Bash shipped with macOS is two major versions behind the latest release and missing some features used in the scripts. You can install a modern version via Homebrew.

Retrieving base paths

From a local Publishing API database

If you have a replicated database locally (including in GOV.UK Docker), you can use a script to generate a list of one base path per document type per existing GraphQL query (i.e. per schema name). This approach is useful for a quick diff or to test changes to the diffing scripts

# prepend with govuk-docker-run for GOV.UK Docker
bundle exec rails runner script/live_content/generate_base_paths.rb

From logs using Athena

You can use Athena to retrieve base paths of cache misses over a given time period. Below is an example Trino SQL query. You just need to edit the dates.

Save the output to tmp/base_paths/unfiltered_base_paths and then run the script/filter_base_paths script to filter the base paths by one or more schema names in preparation for running the bulk script. You will need a replicated Publishing API or Content Store database for this script to work properly. If using Content Store, pass the --with-content-store flag to the script.

SELECT DISTINCT
  REPLACE(
    SPLIT_PART("url", '?', 1),
    '//',
    '/'
  ) AS "url_path"
FROM
  "fastly_logs"."govuk_www"
WHERE
  "date" = 6
  AND "month" = 5
  AND "year" = 2025
  AND (
    "request_received"
    BETWEEN TIMESTAMP '2025-05-06 12:00'
    AND TIMESTAMP '2025-05-06 17:00'
  )
  AND "content_type" LIKE 'text/html%'
  AND "method" = 'GET'
  AND "status" = 200
  AND "fastly_backend" = 'origin'
  AND "cache_response" = 'MISS'
  AND LOWER("user_agent") NOT LIKE '%bot%'
  AND LOWER("user_agent") NOT LIKE '%crawler%'
  AND LOWER("user_agent") NOT LIKE '%engine%'
  AND LOWER("user_agent") NOT LIKE '%google%'
  AND LOWER("user_agent") NOT LIKE '%java%'
  AND LOWER("user_agent") NOT LIKE '%lua%'
  AND LOWER("user_agent") NOT LIKE '%python%'
  AND LOWER("user_agent") NOT LIKE '%ruby%'
  AND LOWER("user_agent") NOT LIKE '%spider%';