Skip to main content
Last updated: 21 Apr 2026

content-data-api: Requests to re-import Content Data metrics

Occasionally a member of GOV.UK’s Insights team will ask for data to be manually imported.

There’s a script that creates a Kubernetes Job to do the rerun in the background. The script runs from your Mac, so ensure you have AWS credentials for the appropriate environment (this is likely to be production) and your kubeconfig is set up correctly to access the cluster.

To re-import a single date:

bin/rerun 2024-08-03

Command run in the job:

rake etl:rerun_main_list[2024-08-03]

To re-import a set of dates:

./rerun.rb 2024-08-03,2024-09-05,2025-01-31

Command run in the Job:

rake etl:rerun_main_list[2024-08-03,2024-09-05,2025-01-31]

Rake task code: https://github.com/alphagov/content-data-api/blob/main/lib/tasks/etl.rake#L83-L94

Monitoring task progress

The bin/rerun script will provide a kubernetes command for tailing the rake task's logs, e.g.:

Job adhoc-etl-rerun-48dddc183822142a created.
Run this command to follow logs:
  kubectl -n apps logs -f job.batch/adhoc-etl-rerun-48dddc183822142a

You'll be able to monitor the progress of the import job, which takes around 50 mins to produce output describing the import for a single day:

  • main process (metrics, ga_views_navigation, ga, ga_search, feedex)
  • monthly and search aggregations

You should see log output including messages similar to:

Running Etl::Main process for 2026-04-17

Process: 'main' started at 2026-04-21 14:52:06
Process: 'metrics' started at 2026-04-21 14:52:06
Process: 'metrics' : about to get the Dimensions::Date
Process: 'metrics' : got the Dimensions::Date
Process: 'metrics' : processing 5000 items in batch 0
...
Process: 'ga_views_navigation' started at 2026-04-21 15:07:05
Process: 'ga' : Processing 10000 events in batch 1
Process: 'ga_search' ended at 2026-04-21 15:16:13, duration: less than a minute
Process: 'feedex' ended at 2026-04-21 15:16:15, duration: less than 5 seconds
Process: 'main' ended at 2026-04-21 15:16:15, duration: 24 minutes

finished running Etl::Main for 2026-04-17

Running monthly and search aggregations for 2026-04-30

Process: 'aggregations_monthly' ended at 2026-04-21 15:19:31, duration: 3 minutes
Process: 'search_last_thirty_days' ended at 2026-04-21 15:27:40, duration: 8 minutes
Process: 'search_last_three_months' ended at 2026-04-21 15:42:26, duration: 9 minutes
Process: 'search_last_six_months' ended at 2026-04-21 15:52:02, duration: 10 minutes
Process: 'search_last_twelve_months' ended at 2026-04-21 16:03:01, duration: 11 minutes

Verifying import in BigQuery

You can check if the data you expect to have been imported is available in BigQuery by running a query:

SELECT the_date, cleaned_page_location
FROM `govuk-content-data.ga4.GA4 dataform`
WHERE the_date = "2024-10-29"
LIMIT 5