Replicate the production data for Content Tagger locally.
Install the version of Python specified in .python-version, e.g. mise install python (with idiomatic_version_file_enable_tools enabled for Python).
Install pipenv by running pip install --user pipenv.
Install Python libraries by running pipenv install.
Sign up at Hugging Face and create a token of type “Read”.
Copy example environment file: cp .env.example .env.
Set value of HF_TOKEN in .env to the Hugging Face token.
Set the value of OPENROUTER_API_KEY in .env to the Open Router API key.
Generate the suggested topics: pipenv run ./suggest_topics <taxon-base-path>
Documentation
This repo is configured to generate a GitHub Pages website which is currently hosted at https://alphagov.github.io/govuk-document-clustering-experiment/. This provides some detail about how we ran some experiments using this code and the corresponding results.