Repository: govuk-polling-dataform
Dataform configuration for polling data modeling pipeline
- GitHub
- govuk-polling-dataform
- Ownership
- #data-engineering
- Category
- Data engineering
- Links
README
The dataform configuration for modelling GOV.UK polling data. The output tables are made available in BigQuery and Looker.
Nomenclature
TBC
Technical documentation
Data Model
erDiagram
survey_waves {
STRING survey_wave_id PK
STRING name
STRING provider
DATE start_date
DATE end_date
}
survey_responses {
STRING survey_response_id PK
STRING survey_wave_id FK
STRING src_response_id
FLOAT weight
INTEGER imd_quartile_country
STRING gender
STRING age_group
STRING qualification
STRING ethnicity
STRING government_office_region
STRING is_english_main_language
}
question_responses {
STRING question_response_id PK
STRING survey_response_id FK
STRING question_id FK
}
questions {
STRING question_id PK
STRING src_question_id
STRING question_group_text
STRING question_text
STRING question_type
}
question_response_choices {
STRING question_choice_id PK
STRING question_id FK
STRING choice_value
STRING choice_text
}
question_response_selections {
STRING question_response_selection_id PK
STRING question_response_id FK
STRING selection_text
}
survey_wave_questions {
STRING survey_wave_question_id PK
STRING survey_wave_id FK
STRING question_id FK
}
survey_waves ||--o{ survey_responses : ""
survey_responses ||--o{ question_responses : ""
question_responses ||--o{ question_response_selections : ""
questions ||--o{ question_responses : ""
questions ||--o{ survey_wave_questions : ""
survey_waves ||--o{ survey_wave_questions : ""
questions ||--o{ question_response_choices : "has a lookup of"
%% Styling for the question_response_choices table
style question_response_choices fill:#F9F2B2
Development
Adding a new Wave
- Create a new row in
definitions/lookups/lookup_survey_waves.sqlx
. - Upload the source CSV to the
govuk_polling_responses
dataset with the namesrc_{provider}_wave_{number}
. For example,src_bmg_wave_13
. - For each question, add a new row in the
definitions/lookups/lookup_survey_wave_questions.sqlx
file including thewave_name
andsrc_question_id
. If this wave has the same questions as previous waves, just copy and paste being sure to update the values inwave_name
. - Execute the workflow to ensure
survey_wave_questions
is populated as expected. - Execute definitions/config/retrieve_column_names.sqlx and update definitions/config/retrieve_column_names.js variable const allStgColumns.
Adding a new Question
- Create a new row in
definitions/lookups/lookup_questions.sqlx
. - Create a new row in
definitions/lookups/lookup_question_response_choices.sqlx
using the correspondingsrc_question_id
. You’ll need to know the name of the column in the source data which contains the values along with the coded values and the associated selection text. - Execute the workflow and inspect the output of the
question_response_choices
table in BigQuery. - Add the column which contains the coded values to the
UNPIVOT
code indefinitions/staging/stg_unpack_question_responses.sqlx
.
Deployment
Once you PR is reviewed and approved, merge into main
.
The production release configuration is based on main
and will compile once a day. To manually compile, go to Release Configurations.
Then select the production
configuration and select New compliation
which will sync to the latest changes on main
branch. Then, go back to the “Releases & Scheduling” section and choose Start Execution
.