Skip to main content
Warning This document has not been updated for a while now. It may be out of date.
Last updated: 18 Mar 2021

content-data-api: ADR 007: ETL to populate Content Items' dimension.

26-01-2018

Context

As per this Trello card, we want to populate the Content Items' dimension with the latest changes that result of editing content.

Decision

Addressing the ETL process for Content Items this way:

  1. The first time, we load all the content items via publishing-api. We retrieve all the content-ids and paths of all the content items that are live.
  2. For each Content Item we will get the JSON from the Content Store, and we will extract the attributes that we need to fulfil the immediate needs. We will also persist all the JSON to be able to extract other attributes in the future.
  3. On a daily basis, we will be listening to publishing-api events, in the same way that Rummager or Email Alert Service do. Once we receive a change for the content item, we will automatically update the content items dimension with the new approach.

Benefits:

  1. This is more aligned with GOV.UK architecture.
  2. This is very light and efficient. It also embrace simple code as the ETL process for Content Items is almost trivial.

Status

Accepted.