Table of contents

Architectural overview of data.gov.uk

Data gov uk architecture

Environments

We have three environments: integration, staging (“test”) and production. See the CI docs for more information about each environment.

Applications

Most of the service is hosted on GOV.UK PaaS, which divides components into applications (eg Rails apps) and services (databases, messaging services, etc). All applications and services are controlled through cloudfoundry, as used by GOV.UK PaaS. Some familiarity with that documentation will be useful to read this manual.

Legacy DGU

The old platform which Publish and Find are going to replace. It has an API that is accessed to import data on the new platform.

Find Data

The new public-facing service that end-users access to find data. It’s a Rails app, hosted on GOV.UK PaaS. You can find out more about this app here.

Publish Data

The new publisher-facing services that publishers access to add or edit datasets. It’s a Rails app, hosted on GOV.UK PaaS. You can find out more about this app here.

Publish Data Worker

This is a rails worker used to fetch data from legacy. It uses Redis to queue import tasks. It will be removed once legacy is no longer used. The source code is in the Publish Data app.

The way data is normally imported from legacy is:

  • Every hour, pingdom GETs https://publish-data-beta-production.cloudapps.digital/api/sync-beta
  • This runs the sync:beta Rails task that queries the Legacy API for new and updated datasets
  • Changes are reflected in the Publish database and pushed to elasticsearch

To run the task manually you can do the following on staging (or replace the app name with the live app):

cf run-task publish-data-beta-staging-worker "bin/rake sync:beta" --name sync
cf logs publish-data-beta-staging-worker

Services

GOV.UK Signon

We use GOV.UK Signon for user authentication in Publish Data, with the app in each environment linked to the corresponding instance of GOV.UK Signon.

The organisations in the Publish Data database have a govuk_content_id field to map them to GOV.UK Signon organisations.

If no organisation can be found for a user (e.g. if no mapping exists), the app will fail.

Postgres

The database that Publish Data uses. Publish Data gets the details and credentials through the VCAP_SERVICES environment variable.

Elasticsearch

The search index that Find Data uses to search datasets. It is populated through the search:reindex rake task on Publish Data (see below) and when publishers make changes when using Publish Data. The VCAP_SERVICES environment variable contains the credentials to connect to it.

beta.data.gov.uk proxy (aka beta-dgu-route)

This is a “cdn-route” PaaS service that proxies the beta.data.gov.uk host name to the Find Data application.

Secrets

There are two “user-provided” services (find-production-secrets and publish-production-secrets) that are used by Publish Data and Find Data to get access to environment variables, some of which contain secrets such as API keys. Those variables are found in the VCAP_SERVICES environment variable for Publish Data and Find Data. The value of those variables is set and encrypted in the datagovuk_infrastructure repository, and cloudfoundry is used to deploy the service when they’re modified.

The environment variables for each app can be accessed using the command cf env <app-name> via the cloudfoundry CLI.

Redis

Redis is not currently available on PaaS for production services, so we run two redis instances on AWS that we set up by hand. Details can be found on data.gov.uk’s AWS console.

To navigate to the console:

  • Login to AWS.
  • Select ‘Ireland’ from the nav bar drop down menu - top right of page.
  • Select EC2 from the services menu, to reach the EC2 Dashboard

Look for instances called redis-staging and redis-production.

Monitoring

The Publish Data and Find Data applications are monitored by Pingdom.

You can monitor Sidekiq jobs for Publish Data by going to /sidekiq on the website.

We use Sentry to monitor errors, the URLs for which can be found on the app pages in this documentation. Both Publish Data and Find Data look for an environment variable called SENTRY_DSN (provided by the Secrets services) which contains the URL which messages should be sent on sentry.io. Members of the data.gov.uk group on Sentry will receive an email in case of errors.

Analytics

We use Google Analytics, with standard settings and some specific events on datafile download.

Logging

We use Logit and take advantage of PaaS’s support for it. We have a logit-ssl-drain cloudfoundry service that is bound to all apps.

The logit URL is syslog-tls://225374f1-0bbc-4aa9-8ba0-b87c33995884-ls.logit.io:19753 and maps to the “DGU Beta” stack on the GDS Logit account.

This page was last reviewed . It needs to be reviewed again by the page owner #datagovuk-tech.