Feasibility of suggesting new topic tags for new & existing documents on GOV.UK dynamically using a large, open-weight embedding model and a vector database
A simple Rails app demonstrating how document similarity could be used to come up with Topic Taxonomy tag suggestions. This builds on the work we did in the govuk-topic-taxonomy-static-suggestions-experiment repo by providing a dynamic user interface which allows users to see a list of suggested tags for a new or existing document on GOV.UK.
Requirements
Ruby (version specified in .ruby-version).
PostgreSQL (version that supports the pgvector extension, e.g. v18).
A bunch of reference data is included in the project and is loaded into the database via bin/rails db:seed which runs as part of bin/setup. Since the data exists in the repo, it’s not necessary to generate it in order to run the app. However, the following describes the provenance of the reference data so it can be re-generated if required.
Copy all the JSON files from the transform/embeddings directory in the govuk-topic-taxonomy-static-suggestions-experiment repo to db/seeds/embeddings.
Copy all the JSON files from the transform/clean directory in the govuk-topic-taxonomy-static-suggestions-experiment repo to db/seeds/clean.
Hosting
As of 26 Mar 2027, this app is hosted at https://govuk-topic-taxonomy-suggestions.onrender.com on Go Free Range‘s render.com account, but it’s just a demo/prototype so it will only be kept running until 30 Apr 2026. However, it would be easy for a developer to spin this app up again on any PaaS that supports Rails & PostgreSQL (e.g. Heroku, render.com, fly.io) or on the GOV.UK infrastructure if necessary. The current hosting is using a very standard bin/render-build.sh.