Skip to main content
Warning This document has not been updated for a while now. It may be out of date.
Last updated: 12 Mar 2020

content-publisher: 16. Publishing times and change history

Date: 2020-03-20

Context

Content that is published on GOV.UK has metadata to indicate the history of the content. This is comprised of a time to indicate when the first iteration of that content was published (first_published_at), a time for when the content was last changed in a meaningful way (public_updated_at) and a change history which records a change note and the publishing time of each of these meaningful changes. By convention, the first item in a change history of GOV.UK content is provided by the publishing application without publisher input. This item has a change note of "First published." and has the time the content was initially published on GOV.UK.

To set these metadata fields Content Publisher has relied on the Publishing API automatically setting the time fields and using it's ChangeNote model. This model stores a record for each major change published on GOV.UK and can collate them to build a change history. This meant that Content Publisher has not needed to store first_published_at, public_updated_at or a full change history and has only needed to store a change note for an edition.

We had concerns that continued usage of the Publishing API time and change history system would lead to discrepancies when migrating content from Whitehall to Content Publisher. This led to us doing a review of the system and problems, where some relate to migration and others are broader.

Problems identified

1. Having accurate times for Whitehall migrated content

When content is migrated from Whitehall to Content Publisher we want all the times associated with that content to remain the same. Whitehall complicates this by passing two time values to represent the first publishing time: the aforementioned first_published_at field and a deprecated time value, first_public_at, which is used preferentially to first_published_at when rendering the content. Content Publisher does not populate the first_public_at field.

Some content items, notably older ones that pre-date Whitehall sending a first_published_at value, can have different values for each of these times - A consequence of these not matching is that migrated content would show a different first publishing time.

2. Migrating change notes from Whitehall

Whitehall doesn't make use of the Publishing API ChangeNote model to build a change history, instead Whitehall creates it's own change history for content and includes it when updating content with the Publishing API. This has a consequence that the data stored in ChangeNote models in Publishing API for Whitehall content cannot be relied upon as there are no guarantees of its accuracy. There is also not an expectation that this data is accurate as many pieces of Whitehall content were created before the ChangeNote model was introduced to Publishing API.

For Content Publisher to migrate change histories for Whitehall content it would need to ensure the entire change history is in sync with the Publishing API.

3. Removing/editing change notes

It is a relatively common task that GOV.UK 2nd line support engineers are requested to remove or modify a change note for a published piece of content. To resolve this in Whitehall a developer would modify a past edition of a piece of content to either remove or amend the change note.

In Content Publisher it is not ideal to modify a past edition. An edition represents a snapshot of the content as it was published on GOV.UK and any edits invalidate the accuracy of this snapshot.

4. Backdating content

The backdating feature in Content Publisher supports only backdating the first edition of a piece of content and not subsequent ones as is allowed by Whitehall. This is because the Publishing API does not have the means to change the initial item in the change history for content and thus would cause a change history that is inconsistent with the first_published_at time. This has a consequence for publishers that if they make a mistake with backdating or forget to backdate in Content Publisher then they cannot rectify this without creating a new document or requesting support from developers.

Approaches considered

1. Correct Whitehall data in the Publishing API

We considered doing an audit and syncing exercise for times and change histories from Whitehall to the Publishing API. This would involve either doing a large data modification exercise or creating new endpoints in the Publishing API that allow importing this data.

The positive of this approach was that it could mean Content Publisher could continue to rely on the Publishing API for time and change history management without needing this data stored in Content Publisher.

The negatives to this were that there would be no guarantees that this data would remain accurate once the work was complete, as Whitehall continues to operate; nor would it be easy to check whether data is still accurate during migration as Content Publisher does not store it. This also didn't allow any improvements in removing/editing change history or allowing backdating past the first edition.

2. Store times and change history in Content Publisher and sync them to Publishing API

We considered changing Content Publisher to be similar in approach to Whitehall and be the canonical source of publishing times and change history. Each time Content Publisher content is updated the Publishing API would be sent the times and change history.

This approach offers a greater degree of control over data, which allows Content Publisher to enhance the backdating feature and provide more support for change history removal/editing. This also offers a higher degree of compatibility with Whitehall for migrating content due to it being a similar model.

This negative is that it increases Content Publisher's responsibilities to storing time and change history. It also means that prior to publishing content Content Publisher would need to update timestamps in the Publishing API which adds an additional API call.

3. Adapt Publishing API to support an altering change history

We considered modifying the Publishing API to allow the altering of change histories. This would involve change notes having a particular id value that could be used to identify them in requests.

As a positive, this allows each change note to be treated as a separate concept to content and, compared to other approaches, allows a reduction in duplicate data shared between different editions of content.

A negative of this is that it requires additional development and complexity for the Publishing API with a potential migration needed for existing content. It also presented a number of complications over whether change history would be part of a draft/live workflow, since content already has this concern.

Decision

We decided to take the approach of storing times and change history in Content Publisher.

We felt that this was the most pragmatic approach as it didn't require additional development to the Publishing API and was the most compatible with Whitehall's approach to change history. The increased control this offered was also consistent with a potential future feature of allowing change history to be modified via a user interface.

We concluded that the recommended approach for change history in the Publishing API was ultimately flawed and should be reconsidered as a recommendation. Our concerns were:

  • all publishing applications, with the exception of Specialist Publisher, use the Publishing API as a syncing target rather than a store of publishing history, which is a reversal from the unrealised intention that Publishing API is the canonical store for content and history;
  • change history is the lone outlier as user visible data on GOV.UK that a publishing application cannot fix in case of any issues;
  • the Publishing API ChangeNote model was designed to be append-only and is therefore incompatible with the ability to change the time that content is marked as first published (as the change history will disagree with this value) or to resolve any mistakes.

Status

Accepted

Consequences

Content Publisher will store all the information needed to present first_published_at, public_updated_at and change history to the Publishing API. In doing so Content Publisher will be considered the canonical source of this data. This data will be reflected by non user editable data of first_published_at on a Document model and published_at on an Edition model, which reflects the time that Content Publisher published the content. In addition to the existing change_note field on a MetadataRevision there will also be a change_history field that stores a JSON representation of all the previous change history entries for the content.

This change_history field will be used to store a collection of timestamps with notes. These will not include the fixed "First published." change note as this will automatically be appended to the start of a change history with a time based on either the first_published_at or backdating value. It also does not include the current change note of an edition because this does not become an aspect of change history until the content is published and the update type and publishing time are fixed. It was noted that having both change_note and change_history fields risks confusion for developers, however this seemed preferable to complicating the change_history field with data that is not yet historical.

Initially the only means to edit items in a change history will be via a rake task that developers can perform. This will provide a means to add, edit and delete items. Eventually we may choose to build a Web UI to edit this.

We will expand the IntegrityChecker system used by Content Publisher to verify that the times and change history sent to the Publishing API match what is already stored for Whitehall content. This will ensure that these remain consistent through a migration.

The Publishing API had previously deprecated the change_history field in favour of using the change_note field. We will remove this deprecation to reflect that using the change_history field is still a valid approach that is in active use.