As a new starter in the GOV.UK Data Labs team, you should use the following reference information:
- the GDS wiki
- Cabinet Office Intranet (CabWeb)
- Single Operating Platform (SOP)
- the GDS Way
- GOV.UK Confluence
- the Aqua book for maintaining analytical quality assurance (AQA)
- the GOV.UK developer documentation
- pre-existing code to reuse
- learning and development resources
- example code
The GDS Wiki contains GDS-specific information. Some useful pages include:
- specific guidance for new starters
- guidance on performance management
- learning and development, including mandatory training, and using the learning and development budget
The GDS Wiki is being phased out in Summer 2021, and will be replaced with an updated version.
Cabinet Office Intranet (CabWeb)
You must sign into the GDS Virtual Private Network (VPN) before you can access CabWeb. See the guidance on signing into the GDS VPN using your Google credentials for more information.
Once you’re signed into the VPN, you can access CabWeb. Some useful pages include:
Single Operating Platform (SOP)
SOP is a Cabinet Office-wide platform for most human resource functions, including editing your personal information, accessing your payslip, logging expenses, and requesting special leave. For more information on how to use SOP, see the SOP guidance on CabWeb.
To access SOP:
The GDS Way
The GDS Way is a public-facing website that documents the specific technology, tools, and processes used at GDS and CDDO.
Although software developer-focused, some useful pages include:
- style guides for different programming languages, including the GDS Python style guide
- tracking, and managing third-party software dependencies
- building accessible services, and understanding WCAG 2.1, which is a legal responsibility of public sector websites
- best practice on using version control
The Aqua book for maintaining analytical quality assurance (AQA)
Our analytical work can have far-reaching implications, including impacting individuals and their livelihoods. The Aqua book provides high-level guidance on producing quality analysis for government. This is termed analytical quality assurance (AQA). This book sets out how departments should ensure their work is fit-for-purpose through verification and validation.
These checks apply to anything that can be loosely defined as a “model”. If your work takes an input, processes it, and produces an output, this comes under the scope of AQA. This includes but is not limited to visualisations, spreadsheets, machine learning models, and even back-of-napkin-type calculations.
The Aqua book establishes four principles:
Proportionality of response
The extent of the analytical quality assurance effort should be proportionate in response to the risks associated with the intended use of the analysis. These risks include financial, legal, operational and reputational impacts. In addition, analysis that is frequently used to support a decision-making process may require a more comprehensive analytical quality assurance response.
Assurance throughout development
Quality assurance considerations should be taken into account throughout the life cycle of the analysis and not just at the end. Effective communication is crucial when understanding the problem, designing the analytical approach, conducting the analysis and relaying the outputs.
Verification and validation
Analytical quality assurance is more than checking that the analysis is error-free and satisfies its specification (verification). It must also include checks that the analysis is appropriate, i.e. fit for the purpose for which it is being used (validation).
Analysis with RIGOUR
Quality analysis needs to be the following:
- repeatable ®
- independent (I)
- grounded in reality (G)
- objective (O)
- have understood and managed uncertainty (U)
- the results should address the initial question robustly ®
In particular, it is important to accept that uncertainty is inherent within the inputs and outputs of any piece of analysis. It is important to establish how much we can rely upon the analysis for a given problem.
These principles must be considered when undertaking any work involving data/models. When setting up a project, use
govcookiecutter to help concurrently set up all AQA documentation. Note this only sets up the documentation. You still need to perform the AQA itself.
Note that AQA is not just about software quality assurance. It can also include dealing with ethical considerations, reasons for choosing the method/technique, and validating analytical assumptions and caveats.
Additional Aqua book resources are available, and the Government Analytical Function, Government Data Quality Hub, and other departments have also produced:
- guides to ensure your work is fit for purpose when working to very tight deadlines
- guides to ensure your data is fit for purpose
- a curriculum around quality assurance, validation, and data linkage
GOV.UK developer documentation
See the documentation on document types on GOV.UK for information on the various document types present on GOV.UK. This list may be incomplete.
See the documentation on content schemas for more information on schema used for different content items on the Content Store.
This page is useful if you are working with the Content Store as it tells you what fields are available for different content document types again. This list may be incomplete.
View the JSON of a GOV.UK page
You can view the JSON on a GOV.UK page by either:
- using the GOV.UK Toolkit for Chrome and Firefox Chrome extension
/api/contentinto a page URL, for example, you can change
Using either of these methods lets you view the A and B versions of a page.
Pre-existing code to reuse
GOV.UK Data Labs has worked on many projects, and has developed code and features you can reuse.
The following table has links to code that you can reuse.
|2020-01-08 Possible Data Engineering problems to solve||Google Doc listing useful features, and data engineering-related thoughts. Mostly for Google BigQuery.|
|Scripts and associated tests for logging, setting up Google BigQuery clients, parsing SQL scripts into Python, and loading YAML configuration files|
The following content has code examples for different data sources.
Google BigQuery code examples
Google BigQuery code examples are available in the
govuk-data-labs-onboarding GitHub repo.
Content Store code examples
Downloading the Content Store may take some time.
If you need to use the Content Store in a project, you can instead use the:
- PyMongo Jupyter notebook in the define-content-schemas branch of the
GOV.UK mirror code examples
Downloading the GOV.UK mirror may take some time.
If you need to use the GOV.UK mirror in a project, you can instead use the page term TF-IDF matrix notebooks in the
govuk-intent-detector GitHub repo.
Learning and development resources
|No||O’Reilly ebooks through ACM membership||O’Reilly Media publishes technology-oriented books with an associated app for reading their books on the go. Useful books and videos include:|
|No||Standard individual licence for Pluralsight||Pluralsight provides online courses that lean towards software development and engineering. Some useful courses include:|
|Yes||Advanced NLP with spaCy||Free online course by the creators of spaCy on natural language processing, including exercises, slides, videos, multiple choice questions, and interactive, browser-based coding practice.|
|Depends||Coursera||Coursera hosts a number of courses on data science. You can “audit” courses for free; but you cannot complete certain assignments or obtain a completion certificate. It’s generally not worth paying for the courses. Good courses include:|
|Yes||fast.ai||Online courses on deep learning using fast.ai, practical data ethics, computational linear algebra, and natural language processing|
|Yes||Interpretable Machine Learning||Accessible book on interpretable machine learning, including interpretable machine learning models, as well as model-agnostic methods for interpretability.|
|Yes||The Illustrated Word2vecJay Alammar’s GitHub Pages||An illustrated guide to word2vec. The author, Jay Alammar, also has a whole host of other illustrated guides.|
|Yes||Causal Inference for The Brave and True||A light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis.|
|Yes||Datasheets for Datasets||A paper proposing how to document datasets.|
|Yes||Managing Python Environments||Short blog post by Pluralsight summarising Python.|
|Yes||Hypermodern Python||A recent review on Python best practice for projects.|
|Yes||Mathematics for Machine Learning||Mathematical skills book to be able to interpret other advanced machine learning books.|
|Yes||huggingface/datasets||The largest hub of ready-to-use natural language processing datasets for machine learning models with fast, easy-to-use and efficient data manipulation tools.|
|Yes||ONS Best Practice and Impact - Quality Assurance of Code for Analysis and Research||Cross Governmental guidance on best practice for analysis and research.|
|Yes||ethen8181/machine-learning||Machine learning tutorials|
|Yes||ageron/handson-ml2||Complementary code for the Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow O’Reilly book.|
|Yes||awesomedata/awesome-public-datasets||A topic-centric list of high quality open datasets.|
|Yes||Made with ML||Machine learning operations and engineering courses.|
|Yes||ikatsov/tensor-house||A collection of reference machine learning and optimization models for enterprise operations, including marketing, pricing, and supply chain.|
|Yes||jghoman/awesome-apache-airflow||Resources for Apache Airflow.|
|Yes||aws/amazon-sagemaker-examples||AWS Sagemaker examples - these are automatically loaded into Sagemaker instances.|
|Yes||Chris-Engelhardt/data_sci_guide||A community-curated list of data science courses, including direct, free replacement courses for paid options.|
|No||Introduction to Statistical Learning: With Applications in R||An accessible primer into machine learning - recommended read for newcomers to data science, and as a refresher.|
|Yes||datastacktv/data-engineer-roadmap||Roadmap for those wishing to study data engineering.|
|Yes||alastairrushworth/free-data-science||Resources and learning materials across a broad range of popular data science topics and arranged thematically.|