Skip to main content

Repository: govuk-topic-taxonomy-dynamic-suggestions-experiment

Feasibility of suggesting new topic tags for new & existing documents on GOV.UK dynamically using a large, open-weight embedding model and a vector database

README

A simple Rails app demonstrating how document similarity could be used to come up with Topic Taxonomy tag suggestions. This builds on the work we did in the govuk-topic-taxonomy-static-suggestions-experiment repo by providing a dynamic user interface which allows users to see a list of suggested tags for a new or existing document on GOV.UK.

Requirements

Development

  • Copy example environment file: cp .env.example .env.
  • Set the value of OPENROUTER_API_KEY in .env to a suitable Open Router API key.
  • Run bin/setup to setup and run the app in the development environment. Note that this will run db:seed which will take a few minutes to run.
  • Visit the app in a browser at http://localhost:3000.

Reference data

A bunch of reference data is included in the project and is loaded into the database via bin/rails db:seed which runs as part of bin/setup. Since the data exists in the repo, it’s not necessary to generate it in order to run the app. However, the following describes the provenance of the reference data so it can be re-generated if required.

Topic taxonomy

govuk-docker up -d content-store-lite
docker exec -i govuk-docker-content-store-lite-1 rails db < db/seeds/export-topic-taxonomy.sql > db/seeds/topic_taxonomy.csv
govuk-docker down content-store-lite

Document embeddings

This data is obtained from the govuk-topic-taxonomy-static-suggestions-experiment repo: