publishing-api: Checking parity of GraphQL and Content Store responses
A couple of scripts are available to check the parity of GraphQL and Content Store responses:
-
script/diff_graphql/run.sh
- this will guide you through diffing the responses for one page. -
script/diff_graphql/bulk.sh
- this allows you to diff multiple pages in one process.For the bulk script, you'll need to prepare a file with a list of base paths (e.g.
/world
) and an empty line at the end. See the "Retrieving base paths from logs using Athena" section for one way to do this.Diffs will be output to
tmp/diff_graphql/diffs
by default. Run the script with--help
for information on all the required and optional arguments.
Issue with Bash version
If you get a syntax error when running the diffing scripts, you might be using an old version of Bash. At the time of writing, the version of Bash shipped with macOS is two major versions behind the latest release and missing some features used in the scripts. You can install a modern version via Homebrew.
Retrieving base paths from logs using Athena
You can use Athena to retrieve base paths of cache misses over a given time period. Below is an example Trino SQL query. You just need to edit the dates.
Save the output to tmp/diff_graphql/unfiltered_base_paths
and then run the
script/diff_graphql/filter_base_paths.sh
script to filter the base paths by
one or more schema names in preparation for running the bulk script. You will
need a replicated Publishing API or Content Store database for this script to
work properly. If using Content Store, pass the --with-content-store
flag to
the script.
SELECT DISTINCT
REPLACE(
SPLIT_PART("url", '?', 1),
'//',
'/'
) AS "url_path"
FROM
"fastly_logs"."govuk_www"
WHERE
"date" = 6
AND "month" = 5
AND "year" = 2025
AND (
"request_received"
BETWEEN TIMESTAMP '2025-05-06 12:00'
AND TIMESTAMP '2025-05-06 17:00'
)
AND "content_type" LIKE 'text/html%'
AND "method" = 'GET'
AND "status" = 200
AND "fastly_backend" = 'origin'
AND "cache_response" = 'MISS'
AND LOWER("user_agent") NOT LIKE '%bot%'
AND LOWER("user_agent") NOT LIKE '%crawler%'
AND LOWER("user_agent") NOT LIKE '%engine%'
AND LOWER("user_agent") NOT LIKE '%google%'
AND LOWER("user_agent") NOT LIKE '%java%'
AND LOWER("user_agent") NOT LIKE '%lua%'
AND LOWER("user_agent") NOT LIKE '%python%'
AND LOWER("user_agent") NOT LIKE '%ruby%'
AND LOWER("user_agent") NOT LIKE '%spider%';