Skip to main content

Application: search-api-v2-dataform

Google Dataform workflow settings and pipeline definitions for GOV.UK Search

Ownership
#govuk-search
Category
Data engineering

README

What’s in this repo

This repo contains:

  • definitions/: Google Dataform SQLX pipeline definitions for processing and transforming GA4 data into Google Verex AI Search datasets
  • workflow_setting.yaml: Google Dataform workflow settings that apply to all pipelines

What’s not in this repo

Terraform definitions for dataform resources used to provision corresponding workflow and release configurations - alphagov/govuk-infrastructure/blob/main/terraform/deployments/search-api-v2/dataform.tf.

Usage

Install the Dataform CLI for local development.

Currently no Dataform Workspace is established for Google Cloud Console based pipeline development - all development is performed locally before pushing progressively into each environment/branch. Each Search API v2 GCP project/environment has a seperate Dataform Release configuration which maps onto the corresponding branch (integration, staging and main).

Development and deployment workflow

This repo follows an unusual workflow to accommodate the integration, staging and production environments in Dataform having fixed, named, deployment branches:

  • The integration environment deploys from the branch integration.
  • The staging environment deploys from the branch staging.
  • The production environment deploys from the branch main.

This is part of the Dataform set up, and means that we can’t easily run a typical CI/CD workflow, where we have the flexibility in integration to test any feature branch and also deploy from main. This workflow tries to follow a typical workflow as closely as possible, whilst allowing us to test changes on integration.

Step 1: Develop and test a new feature

1.1. Make changes in a feature branch.

1.2. Force-push the feature branch to the integration branch (which will be created if it doesn’t exist).

git push --force origin my-feature:integration

1.3. Deploy the changes to integration: Click the button “New compilation” in the integration Release configuration details page.

1.4. Manually run the workflow(s): Click the button “Start execution” in the integration Releases and scheduling page. Choose the “Default Dataform service account” from the options. Under “select actions to execute”, select the actions that you want to test (usually these will be the files you have made changes to).

1.5. Check in the workflow execution logs that the workflow(s) has run successfully, and that the results in BigQuery are as expected.

Step 2: Code review

2.1. Open a pull request from the feature branch into the main branch.

2.2. Any substantial changes made following code review should be re-tested on integration, using steps 1.2-1.5 above.

2.3. Once approved, merge the pull request from the feature branch into the main branch.

Step 3: Deployment

Integration

3.1. Force-push the main branch into the integration branch.

git push --force origin main:integration

3.2. Deploy the changes to integration: Click the button “New compilation” in the integration Release configuration details page.

Staging

3.3. Force-push the main branch into the staging branch (which will be created if it doesn’t exist).

git push --force origin main:staging

3.4. Deploy the changes to staging: Click the button “New compilation” in the staging Release configuration details page.

Production

3.5. Deploy the changes to production: Click the button “New compilation” in the production Release configuration details page.

Documentation

See the Google Dataform documentation.

Team

GOV.UK Search team looks after this repo. If you’re inside GDS, you can find us in #govuk-search