Skip to main content
Warning This document has not been updated for a while now. It may be out of date.
Last updated: 24 Sep 2018

collections-publisher: Discarding drafts of previously published step by step pages

Date: 2018-09-24

Context

The current workflow for a step by step page is:

  • When a brand new step by step is saved, it is saved in collections publisher with a status of draft. It is also sent to Publishing API. Publishing API creates a new Document for it, and saves the first Edition.

    Number of versions: Collections Publisher - 1, Publishing API - 1

  • When the step by step is published, the status in collections publisher is set to published. When the "publish" action is received by Publishing API, Publishing API sets the status of the edition to published.

    Number of versions: Collections Publisher - 1, Publishing API - 1

  • When the published step by step is updated in collections publisher and saved, the status in collections publisher is set to draft. When the changes are sent to Publishing API, Publishing API creates a new edition for it and sets the status of that edition to draft.

    Number of versions: Collections Publisher - 1, Publishing API - 2

  • When the existing draft is updated in collections publisher and saved again, the status in collections publisher remains as draft. When the changes are pushed to Publishing API, the existing edition is updated, but its status remains as draft.

    Number of versions: Collections Publisher - 1, Publishing API - 2

  • When the new draft is published, the status in collections publisher is set to published. When the "publish" action is received by Publishing API, the status of the first edition is set to "superseded" and the status of the current (second) edition is set to "published".

    Number of versions: Collections Publisher - 1, Publishing API - 2

  • When the step by step is updated again, collections publisher updates its copy of the step by step and sets the status to "draft". Publishing Api creates a new edition and its status is set to "draft"

    Number of versions: Collections Publisher - 1, Publishing API - 3

  • When the new draft is published, the status in collections publisher is set to published. When the "publish" action is received by Publishing API, the status of the second edition is set to "superseded" and the status of the current (third) edition is set to "published".

    Number of versions: Collections Publisher - 1, Publishing API - 3

And so on and so forth

At the moment, step by step pages are created and updated by a specialist team of content designers, but this is about to change, and step by step is due to be moved to a BAU team.

The main blocker to this is with how draft changes are stored. If a step by step has previously been published there is no way to discard those draft changes without manually unpicking them.

However, all of the different editions of a step by step are being stored in Publishing API.

Decision

The approach we decided on was to get latest published version of the step by step held in the Publishing API and use that to repopulate the step by step data in collections-publisher. This approach keeps the Publishing API as the "record" or "source of truth", and uses collections publisher to just update the latest version.

Pros

  • All editions are already being saved
  • No duplicating data
  • Don't need to restructure the database to store editions
  • Enables version comparison of all versions of the step by step.

Cons

  • How do we reverse engineer done pages so that they are not added to the Navigation rules.
  • We're overwriting local versions of step by steps
  • Reliance on Publishing API. There could be a delay in getting the last published version if Publishing API times out.
  • How do we link to change notes?
Reverse engineering done pages

This is not a problem. Navigation Rules are generated by scraping the steps for internal links. The steps themselves do not link to done pages. All we need to do is compare the links in the steps to those in pages_related_to_step_nav to determine if we need to hide the step nav on those pages.

But what about change notes?

Publishing API only keeps change notes associated with major updates. These change notes are eventually stored in content-store and are publicly accessible. Currently all step by step pages are published as a minor update. This means that any change notes we send to Publishing API would be lost. We also have no way of associating a change note with a specific edition in the Publishing API and it puts any future work to compare versions at risk.

How we decided to overcome this problem is:

  1. add a column to the change notes table to record the version number or edition id held by Publishing API.
  2. When a change note is recorded, leave the edition id blank.
  3. add a post publish task that gets the id of latest edition from Publishing API and uses it to populate every change note for the step by step where the edition id is blank.

This approach to change notes means that if there is a time in the future when we want to compare all previous versions of the step by step, we will be able to load them from Publishing API, and then match them to their corresponding change note using the edition id. No extra work will be need to record and view a change note in the UI.

Other Approaches considered: Restructure the collections-publisher database

The other option we considered is restructuring the database by treating the StepByStepPage model as an "Edition", and adding a new model called "Document".

There would only be one Document for each step by step that only contains data that doesn't change between versions (e.g, content_id). The StepByStepPage model could be modified to add a document_id as a foreign key on Document. So a Document would have many StepByStepPages.

Pros

  • Not overwriting local data
  • Not reliant on the Publishing API
  • Already links to change notes

Cons

  • Need to restructure the step by step index page to get the list of step by steps from Document and only load the latest edition
  • Building more functionality that needs to be maintained
  • The database size will grow exponentially
  • Storing duplicate data that's already being stored in Publishing API

Consequences

Publishing API does a reverse look-up for links. What this means for step by step pages is that, if we tell Publishing API that a list of content items is part of a step by step, Publishing API will find those content items, and do the reverse, i.e. tell the content items which step by step they are a part of.

This relationship is defined in the "links" section of content_items.

The "links" also stores which content items should not display the step by step in the sidebar. These content items are considered "related" to the step by step rather than "part of" it.

Publishing API does not store the changes to "links" that can occur between versions. This means that content designers will need to double-check the navigation rules (i.e which pages should not show the step by step side bar) before re-publishing the step by step.