Skip to main content
Warning This document has not been updated for a while now. It may be out of date.
Last updated: 9 Sep 2021

govuk-data-science-workshop: Data log

This log contains a list of data sources used in this analysis.

Definitions

Assumptions are RAG-rated according to the following definitions for quality and suitability[^1]:

[^1]: With thanks to the Home Office Analytical Quality Assurance team for these definitions.

RAG Data quality Data suitability
Green Data is well understood and there are no major issues with quality. Minor issues are understood and documented. Data is best available for the required purpose and has been validated (for example against published statistics).
Amber Data is well understood. There are quality issues (for example missing values, step changes, large number of outliers) that can be explained, documented or shown to have negligible impact. Not the ideal data set for the analysis, but the best available at the time. Results will reflect the fact that it is not the ideal data set and it will subject to sensitivity analysis where appropriate.
Red Data is not well understood. There are major quality issues that cannot be fully explained and/or have a significant impact on analysis outputs. There are concerns about the suitability of the data set for this application, which could negatively affect the quality and accuracy of the analysis. Its derivation / sample size is not known.

Source 1: Insert plain English title here

  • Quality: Insert RAG rating here
  • Suitability: Insert RAG rating here

Add plain English description here.

Source 2: Insert plain English title here

  • Quality: Insert RAG rating here
  • Suitability: Insert RAG rating here

Add plain English description here.