Skip to main content

Repository: govuk-topic-taxonomy-dynamic-suggestions-experiment

Feasibility of suggesting new topic tags for new & existing documents on GOV.UK dynamically using a large, open-weight embedding model and a vector database

README

A simple Rails app demonstrating how document similarity could be used to come up with Topic Taxonomy tag suggestions. This builds on the work we did in the govuk-topic-taxonomy-static-suggestions-experiment repo by providing a dynamic user interface which allows users to see a list of suggested tags for a new or existing document on GOV.UK.

Requirements

Development

  • Copy example environment file: cp .env.example .env.
  • Set the value of OPENROUTER_API_KEY in .env to a suitable Open Router API key.
  • Run bin/setup to setup and run the app in the development environment. Note that this will run db:seed which will take a few minutes to run.
  • Visit the app in a browser at http://localhost:3000.

Reference data

A bunch of reference data is included in the project and is loaded into the database via bin/rails db:seed which runs as part of bin/setup. Since the data exists in the repo, it’s not necessary to generate it in order to run the app. However, the following describes the provenance of the reference data so it can be re-generated if required.

Topic taxonomy

govuk-docker up -d content-store-lite
docker exec -i govuk-docker-content-store-lite-1 rails db < db/seeds/export-topic-taxonomy.sql > db/seeds/topic_taxonomy.csv
govuk-docker down content-store-lite

Document embeddings

This data is obtained from the govuk-topic-taxonomy-static-suggestions-experiment repo:

Hosting

As of 26 Mar 2027, this app is hosted at https://govuk-topic-taxonomy-suggestions.onrender.com on Go Free Range‘s render.com account, but it’s just a demo/prototype so it will only be kept running until 30 Apr 2026. However, it would be easy for a developer to spin this app up again on any PaaS that supports Rails & PostgreSQL (e.g. Heroku, render.com, fly.io) or on the GOV.UK infrastructure if necessary. The current hosting is using a very standard bin/render-build.sh.