Skip to main content
Last updated: 29 Jul 2021

Reference information

As a new starter in the GOV.UK Data Labs team, you should use the following reference information:

  • the GDS wiki
  • Cabinet Office Intranet (CabWeb)
  • Single Operating Platform (SOP)
  • the GDS Way
  • GOV.UK Confluence
  • the Aqua book for maintaining analytical quality assurance (AQA)
  • the GOV.UK developer documentation
  • pre-existing code to reuse
  • learning and development resources
  • example code

GDS wiki

The GDS Wiki contains GDS-specific information. Some useful pages include:

The GDS Wiki is being phased out in Summer 2021, and will be replaced with an updated version.

Cabinet Office Intranet (CabWeb)

You must sign into the GDS Virtual Private Network (VPN) before you can access CabWeb. See the guidance on signing into the GDS VPN using your Google credentials for more information.

Once you’re signed into the VPN, you can access CabWeb. Some useful pages include:

Single Operating Platform (SOP)

SOP is a Cabinet Office-wide platform for most human resource functions, including editing your personal information, accessing your payslip, logging expenses, and requesting special leave. For more information on how to use SOP, see the SOP guidance on CabWeb.

To access SOP:

The GDS Way

The GDS Way is a public-facing website that documents the specific technology, tools, and processes used at GDS and CDDO.

Although software developer-focused, some useful pages include:

GOV.UK Confluence

Contact the GOV.UK Data Labs delivery manager and ask for access to the GOV.UK Confluence workspace.

Once you have access to GOV.UK Confluence, go to the Analytics on GOV.UK Confluence page. This page has definitions on the custom dimensions used in Google BigQuery analytics tables.

The Aqua book for maintaining analytical quality assurance (AQA)

Our analytical work can have far-reaching implications, including impacting individuals and their livelihoods. The Aqua book provides high-level guidance on producing quality analysis for government. This is termed analytical quality assurance (AQA). This book sets out how departments should ensure their work is fit-for-purpose through verification and validation.

These checks apply to anything that can be loosely defined as a “model”. If your work takes an input, processes it, and produces an output, this comes under the scope of AQA. This includes but is not limited to visualisations, spreadsheets, machine learning models, and even back-of-napkin-type calculations.

The Aqua book establishes four principles:

Proportionality of response

The extent of the analytical quality assurance effort should be proportionate in response to the risks associated with the intended use of the analysis. These risks include financial, legal, operational and reputational impacts. In addition, analysis that is frequently used to support a decision-making process may require a more comprehensive analytical quality assurance response.

Assurance throughout development

Quality assurance considerations should be taken into account throughout the life cycle of the analysis and not just at the end. Effective communication is crucial when understanding the problem, designing the analytical approach, conducting the analysis and relaying the outputs.

Verification and validation

Analytical quality assurance is more than checking that the analysis is error-free and satisfies its specification (verification). It must also include checks that the analysis is appropriate, i.e. fit for the purpose for which it is being used (validation).

Analysis with RIGOUR

Quality analysis needs to be the following:

  • repeatable ®
  • independent (I)
  • grounded in reality (G)
  • objective (O)
  • have understood and managed uncertainty (U)
  • the results should address the initial question robustly ®

In particular, it is important to accept that uncertainty is inherent within the inputs and outputs of any piece of analysis. It is important to establish how much we can rely upon the analysis for a given problem.

These principles must be considered when undertaking any work involving data/models. When setting up a project, use govcookiecutter to help concurrently set up all AQA documentation. Note this only sets up the documentation. You still need to perform the AQA itself.

Note that AQA is not just about software quality assurance. It can also include dealing with ethical considerations, reasons for choosing the method/technique, and validating analytical assumptions and caveats.

Further information

Additional Aqua book resources are available, and the Government Analytical Function, Government Data Quality Hub, and other departments have also produced:

GOV.UK developer documentation

See the documentation on document types on GOV.UK for information on the various document types present on GOV.UK. This list may be incomplete.

See the documentation on content schemas for more information on schema used for different content items on the Content Store.

This page is useful if you are working with the Content Store as it tells you what fields are available for different content document types again. This list may be incomplete.

View the JSON of a GOV.UK page

You can view the JSON on a GOV.UK page by either:

Using either of these methods lets you view the A and B versions of a page.

Pre-existing code to reuse

GOV.UK Data Labs has worked on many projects, and has developed code and features you can reuse.

The following table has links to code that you can reuse.

Location Notes
2020-01-08 Possible Data Engineering problems to solve Google Doc listing useful features, and data engineering-related thoughts. Mostly for Google BigQuery.
alphagov/modular_sql/src/tools
alphagov/modular_sql/tests
Scripts and associated tests for logging, setting up Google BigQuery clients, parsing SQL scripts into Python, and loading YAML configuration files

Code examples

The following content has code examples for different data sources.

Google BigQuery code examples

Google BigQuery code examples are available in the govuk-data-labs-onboarding GitHub repo.

Content Store code examples

Downloading the Content Store may take some time.

If you need to use the Content Store in a project, you can instead use the:

GOV.UK mirror code examples

Downloading the GOV.UK mirror may take some time.

If you need to use the GOV.UK mirror in a project, you can instead use the page term TF-IDF matrix notebooks in the govuk-intent-detector GitHub repo.

Learning and development resources

Free? Materials Notes
No O’Reilly ebooks through ACM membership O’Reilly Media publishes technology-oriented books with an associated app for reading their books on the go. Useful books and videos include:
No Standard individual licence for Pluralsight Pluralsight provides online courses that lean towards software development and engineering. Some useful courses include:
Yes Advanced NLP with spaCy Free online course by the creators of spaCy on natural language processing, including exercises, slides, videos, multiple choice questions, and interactive, browser-based coding practice.
Depends Coursera Coursera hosts a number of courses on data science. You can “audit” courses for free; but you cannot complete certain assignments or obtain a completion certificate. It’s generally not worth paying for the courses. Good courses include:
Yes fast.ai Online courses on deep learning using fast.ai, practical data ethics, computational linear algebra, and natural language processing
Yes Interpretable Machine Learning Accessible book on interpretable machine learning, including interpretable machine learning models, as well as model-agnostic methods for interpretability.
Yes The Illustrated Word2vecJay Alammar’s GitHub Pages An illustrated guide to word2vec. The author, Jay Alammar, also has a whole host of other illustrated guides.
Yes Causal Inference for The Brave and True A light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis.
Yes Datasheets for Datasets A paper proposing how to document datasets.
Yes Managing Python Environments Short blog post by Pluralsight summarising Python.
Yes Hypermodern Python A recent review on Python best practice for projects.
Yes Mathematics for Machine Learning Mathematical skills book to be able to interpret other advanced machine learning books.
Yes huggingface/datasets The largest hub of ready-to-use natural language processing datasets for machine learning models with fast, easy-to-use and efficient data manipulation tools.
Yes ONS Best Practice and Impact - Quality Assurance of Code for Analysis and Research Cross Governmental guidance on best practice for analysis and research.
Yes ethen8181/machine-learning Machine learning tutorials
Yes ageron/handson-ml2 Complementary code for the Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow O’Reilly book.
Yes awesomedata/awesome-public-datasets A topic-centric list of high quality open datasets.
Yes Made with ML Machine learning operations and engineering courses.
Yes ikatsov/tensor-house A collection of reference machine learning and optimization models for enterprise operations, including marketing, pricing, and supply chain.
Yes jghoman/awesome-apache-airflow Resources for Apache Airflow.
Yes aws/amazon-sagemaker-examples AWS Sagemaker examples - these are automatically loaded into Sagemaker instances.
Yes Chris-Engelhardt/data_sci_guide A community-curated list of data science courses, including direct, free replacement courses for paid options.
No Introduction to Statistical Learning: With Applications in R An accessible primer into machine learning - recommended read for newcomers to data science, and as a refresher.
Yes datastacktv/data-engineer-roadmap Roadmap for those wishing to study data engineering.
Yes alastairrushworth/free-data-science Resources and learning materials across a broad range of popular data science topics and arranged thematically.