Table of contents

Architectural overview of

Data gov uk architecture


We have three environments: integration, staging (“test”) and production. See the CI docs for more information about each environment.


Most of the service is hosted on GOV.UK PaaS, which divides components into applications (eg Rails apps) and services (databases, messaging services, etc). All applications and services are controlled through cloudfoundry, as used by GOV.UK PaaS. Some familiarity with that documentation will be useful to read this manual.

Legacy DGU

The old platform which Publish and Find are going to replace. It has an API that is accessed to import data on the new platform.

Find Data

The new public-facing service that end-users access to find data. It’s a Rails app, hosted on GOV.UK PaaS. You can find out more about this app here.

Publish Data

The new publisher-facing services that publishers access to add or edit datasets. It’s a Rails app, hosted on GOV.UK PaaS. You can find out more about this app here.

Publish Data Worker

This is a rails worker used to fetch data from legacy. It uses Redis to queue import tasks. It will be removed once legacy is no longer used. The source code is in the Publish Data app.

The way data is normally imported from legacy is:

  • Every hour, pingdom GETs
  • This runs the sync:beta Rails task that queries the Legacy API for new and updated datasets
  • Changes are reflected in the Publish database and pushed to elasticsearch

To run the task manually you can do the following on staging (or replace the app name with the live app):

cf run-task publish-data-beta-staging-worker "bin/rake sync:beta" --name sync
cf logs publish-data-beta-staging-worker


GOV.UK Signon

We use GOV.UK Signon for user authentication in Publish Data, with the app in each environment linked to the corresponding instance of GOV.UK Signon.

The organisations in the Publish Data database have a govuk_content_id field to map them to GOV.UK Signon organisations.

If no organisation can be found for a user (e.g. if no mapping exists), the app will fail.


The database that Publish Data uses. Publish Data gets the details and credentials through the VCAP_SERVICES environment variable.


The search index that Find Data uses to search datasets. It is populated through the search:reindex rake task on Publish Data (see below) and when publishers make changes when using Publish Data. The VCAP_SERVICES environment variable contains the credentials to connect to it. proxy (aka beta-dgu-route)

This is a “cdn-route” PaaS service that proxies the host name to the Find Data application.


There are two “user-provided” services (find-production-secrets and publish-production-secrets) that are used by Publish Data and Find Data to get access to environment variables, some of which contain secrets such as API keys. Those variables are found in the VCAP_SERVICES environment variable for Publish Data and Find Data. The value of those variables is set and encrypted in the datagovuk_infrastructure repository, and cloudfoundry is used to deploy the service when they’re modified.

The environment variables for each app can be accessed using the command cf env <app-name> via the cloudfoundry CLI.


Redis is not currently available on PaaS for production services, so we run two redis instances on AWS that we set up by hand. Details can be found on’s AWS console.

To navigate to the console:

  • Login to AWS.
  • Select ‘Ireland’ from the nav bar drop down menu - top right of page.
  • Select EC2 from the services menu, to reach the EC2 Dashboard

Look for instances called redis-staging and redis-production.


The Publish Data and Find Data applications are monitored by Pingdom.

You can monitor Sidekiq jobs for Publish Data by going to /sidekiq on the website.

We use Sentry to monitor errors, the URLs for which can be found on the app pages in this documentation. Both Publish Data and Find Data look for an environment variable called SENTRY_DSN (provided by the Secrets services) which contains the URL which messages should be sent on Members of the group on Sentry will receive an email in case of errors.


We use Google Analytics, with standard settings and some specific events on datafile download.


We use Logit and take advantage of PaaS’s support for it. We have a logit-ssl-drain cloudfoundry service that is bound to all apps.

The logit URL is syslog-tls:// and maps to the “DGU Beta” stack on the GDS Logit account.

This page was last reviewed . It needs to be reviewed again by the page owner #datagovuk-tech.