Repository: govuk-s3-mirror
GCP mirror of some GOV.UK AWS S3 buckets
- GitHub
- govuk-s3-mirror
- Ownership
- #data-engineering
- Category
- Data science
README
Repository of the Google Cloud Project. GOV.UK saves nightly database backups to an AWS S3 bucket. This repo configures a Google Cloud Platform project to mirror that bucket, so that other services hosted on GCP can easily access it.
Documentation
GOV.UK Data Community Technical Documentation
Links to GCP projects
IAM roles/permissions required in other projects
This project requires IAM roles and permissions in GOV.UK’s AWS infrastructure, created by this pull request.
How to subscribe to notifications that new files are available
When a database backup file is newly available in the bucket in Google Cloud
Platform, the bucket publishes a notification to PubSub topics in other
projects. This is configured by the blocks at the bottom of the file
/terraform/pubsub.tf
.
To receive notifications in a new project:
- Create a PubSub topic in the new project
- Give this project permission to publish to the PubSub topic by giving the
service account
service-384988117066@gs-project-accounts.iam.gserviceaccount.com
the roleroles/pubsub.publisher
in relation to the PubSub topic. - Add a
resource
to the end of the file/terraform/pubsub.tf
in this repository, similar to the existing resources, but changing thetopic
to the new one. - Deploy the changes to this project by running
terraform apply
.
Below is an example terraform configuration in a project that receives notifications from this project, copied from https://github.com/alphagov/govuk-knowledge-graph-gcp/blob/main/terraform/pubsub.tf.
resource "google_pubsub_topic" "govuk_integration_database_backups" {
name = "govuk-integration-database-backups"
message_retention_duration = "604800s" # 604800 seconds is 7 days
message_storage_policy {
allowed_persistence_regions = [
var.region,
]
}
}
// Allow the govuk-s3-mirror project's bucket to publish to this topic
data "google_iam_policy" "pubsub_topic-govuk_integration_database_backups" {
binding {
role = "roles/pubsub.publisher"
members = [
"serviceAccount:service-384988117066@gs-project-accounts.iam.gserviceaccount.com"
]
}
}
resource "google_pubsub_topic_iam_policy" "govuk_integration_database_backups" {
topic = google_pubsub_topic.govuk_integration_database_backups.name
policy_data = data.google_iam_policy.pubsub_topic-govuk_integration_database_backups.policy_data
}
# Subscribe to the topic
resource "google_pubsub_subscription" "govuk_integration_database_backups" {
name = "govuk-integration-database-backups"
topic = google_pubsub_topic.govuk_integration_database_backups.name
message_retention_duration = "604800s" # 604800 seconds is 7 days
retain_acked_messages = true
expiration_policy {
ttl = "" # empty string is 'never'
}
enable_message_ordering = false
}
Below is the resource
in the file /terraform/pubsub.tf
that corresponds to
the example above.
# Notify the topic from the bucket
resource "google_storage_notification" "govuk_integration_database_backups-govuk_knowledge_graph" {
bucket = google_storage_bucket.govuk-integration-database-backups.name
payload_format = "JSON_API_V1"
topic = "/projects/govuk-knowledge-graph/topics/govuk-integration-database-backups"
event_types = ["OBJECT_FINALIZE"]
depends_on = [google_pubsub_topic_iam_policy.govuk_integration_database_backups]
}
How to include a new file
Not every file in the S3 bucket is transferred to Google Cloud Platform. A
filter is defined in /terraform/transfer.tf
, in the resource that is partially
copied out below. To include a new file, add its prefix to the array of
include_prefixes
.
The filter is on the first part of the name of the object, called the ‘prefix’.
Folder names are included in the object name, so in the filters below, every
object in the folder publishing-api-postgres/
is included.
resource "google_storage_transfer_job" "govuk-integration-database-backups" {
description = "Mirror the GOV.UK S3 bucket govuk-integration-database-backups"
transfer_spec {
object_conditions {
include_prefixes = [
"content-store-postgres/",
"publishing-api-postgres/",
"support-api-postgres/",
"mongo-api/",
]
}
Who to contact
This project is not currently owned by any team in the GOV.UK programme. The
people who are most likely to be able to help are in the Slack channel
#data-engineering
.
Licence
Unless stated otherwise, the codebase is released under [the MIT License][mit]. This covers both the codebase and any sample code in the documentation.
The documentation is [© Crown copyright][copyright] and available under the terms of the [Open Government 3.0][ogl] licence.