Repository: govuk-ai-accelerator

GitHub: govuk-ai-accelerator
Ownership: #publishing-classification-systems-metadata owns the repo. #publishing-csm-alerts receives automated alerts for this repo.
Category: AI apps

README

A Python Flask application for asynchronous ontology generation using the taxonomy-ontology-accelerator library.

Local Setup

Prerequisites

Python 3.13 - managed via uv
uv — Python package manager
PostgreSQL - local instance or Docker
AWS credentials - available in the environment (for S3/Bedrock access)
GitHub token — with read access to alphagov/govuk-ai-accelerator-tw-accelerator (private package)

Install uv if not already installed:

brew install uv
# or
pip install uv

1. Install dependencies

uv init --python 3.13
uv python pin 3.13
uv add -r requirements.txt
uv add "git+https://x-access-token:<GITHUB_TOKEN>@github.com/alphagov/govuk-ai-accelerator-tw-accelerator.git"

The default dependency set now includes faiss-cpu, so semantic deduplication can use FAISS when the configured threshold is reached. If you already have an existing virtualenv, run uv sync after pulling these changes.

2. Set up PostgreSQL

Option A — Homebrew (if Postgres is already installed locally):

# Create the user (no password) and database
psql postgres -c "CREATE USER govuk_ai_accelerator_user;"
psql postgres -c "CREATE DATABASE govuk_ai_accelerator OWNER govuk_ai_accelerator_user;"
psql postgres -c "GRANT ALL ON SCHEMA public TO govuk_ai_accelerator_user;"

Option B — Docker:

docker run -d \
  --name govuk-postgres \
  -e POSTGRES_USER=govuk_ai_accelerator_user \
  -e POSTGRES_DB=govuk_ai_accelerator \
  -e POSTGRES_HOST_AUTH_METHOD=trust \
  -p 5432:5432 \
  postgres:15

POSTGRES_HOST_AUTH_METHOD=trust disables password auth — fine for local development, never use in production.

3. Configure environment

source environment.sh

This sets DATABASE_URL=postgresql://govuk_ai_accelerator_user@localhost:5432/govuk_ai_accelerator (no password). Also export your AWS credentials:

export AWS_REGION=eu-west-2
export AWS_ACCESS_KEY_ID=<your key>
export AWS_SECRET_ACCESS_KEY=<your secret>
export AWS_SESSION_TOKEN=<your token>  # if using temporary credentials

4. Run the app

Debug mode (Flask dev server):

uv run govuk_ai_accelerator_app.py

Production mode (Waitress WSGI server):

uv run waitress-serve --port 3000 --call 'govuk_ai_accelerator_app:create_app'

The app runs on http://localhost:3000.

Note: If the database is unavailable, the app starts anyway but job status tracking is disabled. A warning will appear in the logs.

Docker

Build and run using Docker (requires a GitHub token for the private package):

docker build --build-arg GITHUB_TOKEN=<your gh token> -t ontology-app .

docker run -p 3000:3000 \ --add-host=host.docker.internal:host-gateway \ -e DATABASE_URL="postgresql://govuk_ai_accelerator_user@host.docker.internal:5432/govuk_ai_accelerator" \ -e AWS_REGION \ -e AWS_DEFAULT_REGION \ -e AWS_ACCESS_KEY_ID \ -e AWS_SECRET_ACCESS_KEY \ -e AWS_SESSION_TOKEN \ ontology-app

Rebuild the image after pulling dependency changes so the container picks up faiss-cpu from requirements.txt.

Note for Mac users: Inside Docker, use host.docker.internal (not localhost) in DATABASE_URL to reach a Postgres instance running on your Mac.

Using Makefile setup

For quick local development setup to run application in docker, you can run the

```bash
make up GITHUB_TOKEN=<your-github-token>

--- API Reference All endpoints are available at http://localhost:3000. Health Check

GET /healthcheck/ready

Returns {"status": "healthy", "message": "Application is ready"} when the app is running. Ontology UI

GET /ontology/

Web interface for submitting ontology processing jobs via file upload. Submit Ontology Job

POST /ontology/submit


Accepts a YAML config file and optional domain prompt. Returns a job ID immediately for async status polling.
curl -X POST http://localhost:3000/ontology/submit \
  -F "file=@config.yaml" \
  -F "text_file=@domain_prompt.txt"
</code></pre>
<p>Response (<code>202 Accepted</code>):</p>
<pre lang="json"><code>{"job_id": "&lt;uuid&gt;", "status": "pending"}
</code></pre>
<hr>
<h3 id="check-job-status">Check Job Status</h3>
<pre><code>GET /ontology/status/&lt;job_id&gt;
</code></pre>
<pre lang="bash"><code>curl http://localhost:3000/ontology/status/&lt;job_id&gt;
</code></pre>
<p>Response:</p>
<pre lang="json"><code>{"job_id": "&lt;uuid&gt;", "status": "pending|completed|failed"}
</code></pre>
<hr>
<h2 id="tests">Tests</h2>
<pre lang="bash"><code>uv run pytest
</code></pre>
<hr>
<h2 id="licence">Licence</h2>
<p><a href="https://github.com/alphagov/govuk-ai-accelerator/blob/main/LICENCE">MIT LICENCE</a></p>
  </div>