Architectural overview of data.gov.uk
We have three environments: integration, staging (“test”) and production. See the CI docs for more information about each environment.
Most of the service is hosted on GOV.UK PaaS, which divides components into applications (eg Rails apps) and services (databases, messaging services, etc). All applications and services are controlled through cloudfoundry, as used by GOV.UK PaaS. Some familiarity with that documentation will be useful to read this manual.
The old platform which Publish and Find are going to replace. It has an API that is accessed to import data on the new platform.
Publish Data Worker
This is a rails worker used to fetch data from legacy. It uses Redis to queue import tasks. It will be removed once legacy is no longer used. The source code is in the Publish Data app.
The way data is normally imported from legacy is:
- Every hour, pingdom GETs https://publish-data-beta-production.cloudapps.digital/api/sync-beta
- This runs the
sync:betaRails task that queries the Legacy API for new and updated datasets
- Changes are reflected in the Publish database and pushed to elasticsearch
To run the task manually you can do the following on staging (or replace the app name with the live app):
cf run-task publish-data-beta-staging-worker "bin/rake sync:beta" --name sync cf logs publish-data-beta-staging-worker
If no organisation can be found for a user (e.g. if no mapping exists), the app will fail.
The database that Publish Data uses. Publish Data gets the details and credentials through the
VCAP_SERVICES environment variable.
The search index that Find Data uses to search datasets. It is populated through the
search:reindex rake task on Publish Data (see below) and when publishers make changes when using Publish Data.
VCAP_SERVICES environment variable contains the credentials to connect to it.
beta.data.gov.uk proxy (aka beta-dgu-route)
There are two “user-provided” services (
publish-production-secrets) that are used by Publish Data and Find Data to get access to environment variables, some of which contain secrets such as API keys. Those variables are found in the
VCAP_SERVICES environment variable for Publish Data and Find Data. The value of those variables is set and encrypted in the datagovuk_infrastructure repository, and cloudfoundry is used to deploy the service when they’re modified.
The environment variables for each app can be accessed using the command
cf env <app-name> via the cloudfoundry CLI.
Redis is not currently available on PaaS for production services, so we run two redis instances on AWS that we set up by hand. Details can be found on data.gov.uk’s AWS console.
To navigate to the console:
- Login to AWS.
- Select ‘Ireland’ from the nav bar drop down menu - top right of page.
- Select EC2 from the services menu, to reach the EC2 Dashboard
Look for instances called
You can monitor Sidekiq jobs for Publish Data by going to
/sidekiq on the website.
We use Sentry to monitor errors, the URLs for which can be found on the app pages in this documentation. Both Publish Data and Find Data look for an environment variable called
SENTRY_DSN (provided by the Secrets services) which contains the URL which messages should be sent on sentry.io. Members of the data.gov.uk group on Sentry will receive an email in case of errors.
We use Google Analytics, with standard settings and some specific events on datafile download.
The logit URL is
syslog-tls://225374f1-0bbc-4aa9-8ba0-b87c33995884-ls.logit.io:19753 and maps to the “DGU Beta” stack on the GDS Logit account.