Feasibility of suggesting new topic tags for new & existing documents on GOV.UK dynamically using a large, open-weight embedding model and a vector database
A simple Rails app demonstrating how document similarity could be used to come up with Topic Taxonomy tag suggestions. This builds on the work we did in the govuk-topic-taxonomy-static-suggestions-experiment repo by providing a dynamic user interface which allows users to see a list of suggested tags for a new or existing document on GOV.UK.
Requirements
Ruby (version specified in .ruby-version).
PostgreSQL (version that supports the pgvector extension, e.g. v18).
A bunch of reference data is included in the project and is loaded into the database via bin/rails db:seed which runs as part of bin/setup. Since the data exists in the repo, it’s not necessary to generate it in order to run the app. However, the following describes the provenance of the reference data so it can be re-generated if required.