There are currently three environments for CKAN:
- Live — co-prod3.dh.bytemark.co.uk
- Test — co-prod2.dh.bytemark.co.uk
- Development — co-dev1.dh.bytemark.co.uk
You can ssh on to these machines with
ssh co@<machine-name>. For example, to
access the Test machine, you would
If you cannot
ssh as above, it’s worth asking
someone in the #platform-health slack channel to make sure you are in the
We are in the process of migrating CKAN to standard GOV.UK infrastructure.
ckanext-dgu is the primary CKAN extension for the current environments. This is being replaced with ckanext-datagovuk as part of the migration process. Although other extensions are used in the deployment, ckanext-dgu and ckanext-datagovuk are the ones that contain our changes to functionality and styling.
For commands not available via the user interface you must connect to the server to run the commands. All of the
commands to interact with CKAN use a tool called
Many of these commands take a path to the config file with the
-c option, although you can instead use
-c $CKAN_INI which should resolve to
On Bytemark servers
paster should be run with:
cd /vagrant/src/ckan . /home/co/ckan/bin/activate paster
On GOV.UK servers
paster should be run with:
cd /var/apps/ckan sudo -u deploy govuk_setenv ckan venv/bin/paster
A full guide to administering CKAN and Bytemark can be found in the CKAN sysops document.
Further, less commonly used, commands can be found in the CKAN documentation.
There is also a separate historical document of previous admin tasks that you may wish to consult.
Switching between legacy CKAN and Find open data
To access legacy CKAN, append
?legacy=1 to the URL.
If viewing a dataset, the final part of the path must be removed, leaving only the GUID (e.g.
https://data.gov.uk/dataset/f760008b-86d3-4bbb-89da-1dfe56101554/gh-wine-cellar-data on Find open data can be viewed in legacy CKAN at
Accessing the CKAN API
There are times when it can be useful to access the CKAN API when debugging or resolving issues.
Note that the responses will be different depending on your access permissions. The ID can be specified as either the GUID or the URL slug (referred to as a URL name in CKAN).
Listing all datasets
Viewing a dataset
Searching for a dataset
Find all packages created during a specific timeframe
Find all packages modified during a specific timeframe
List all publishers
View a publisher record
View a user (e.g. to get CKAN API key for a Drupal user)
Creating a system administrator account
paster --plugin=ckan sysadmin add USERNAME email=EMAIL_ADDRESS -c $CKAN_INI
You will be prompted twice for a password.
Removing a system administrator account
paster --plugin=ckan sysadmin remove USERNAME -c $CKAN_INI
paster --plugin=ckan user list -c $CKAN_INI
Viewing a user
paster --plugin=ckan user USERNAME -c $CKAN_INI
Adding a user
paster --plugin=ckan user add USERNAME email=EMAIL_ADDRESS -c $CKAN_INI
Removing a user
paster --plugin=ckan user remove USERNAME -c $CKAN_INI
Changing a user’s password
paster --plugin=ckan user setpass USERNAME -c $CKAN_INI
Deleting a dataset
CKAN has two types of deletions, the default soft-delete, and a purge. The soft delete gives the option of undeleting a dataset but the purge will remove all trace of it from the system.
Where the following commands mention DATASET_NAME, this should either be the slug for the dataset, or the UUID.
Deleting a dataset:
paster --plugin=ckan dataset delete DATASET_NAME -c $CKAN_INI
Purging a dataset:
paster --plugin=ckan dataset purge DATASET_NAME -c $CKAN_INI
Rebuilding the search index
CKAN uses Solr for its search index, and occasionally it may be necessary to interact with it to refresh the index, or rebuild it from scratch.
Refresh the entire search index:
paster --plugin=ckan search-index rebuild -r -c $CKAN_INI
Rebuild the entire search index:
paster --plugin=ckan search-index rebuild -c $CKAN_INI
Only reindex those packages that are not currently indexed:
paster --plugin=ckan search-index -o rebuild -c $CKAN_INI
Managing the harvest workers
Although harvesters can mostly be managed from the user interface, it is sometimes easier to perform these tasks from the command line. If using a system administrator account you will see > 400 harvest configs without a clear way of seeing which are currently running.
Listing current jobs
Returns a list of currently running jobs. This will contain the JOB_ID necessary to cancel jobs.
paster --plugin=ckanext-harvest harvester jobs -c $CKAN_INI
Cancelling a current job
To cancel a currently running job, you will require a JOB_ID from the Listing current jobs section.
paster --plugin=ckanext-harvest harvester job_abort JOB_ID -c $CKAN_INI
Purging all currently queued tasks
It may be necessary, if there is a schedule clash and the system is too busy, to purge the queues used in the various stages of harvesting
Warning: This command will empty the Redis queues
paster --plugin=ckanext-harvest harvester purge_queues -c $CKAN_INI
Restarting the harvest queues
If the queues stall, it may be necessary to restart one or both of the harvest queues.
The gather jobs retrieve the identifiers of the updated datasets and create jobs in the fetch queue.
sudo supervisorctl restart ckan_gather_queue
The fetch job retrieve the datasets from the remote source and perform the relevant updates in CKAN.
sudo supervisorctl restart ckan_fetch_queue
Adding a new Schema
Each new schema for the schema dropdown in CKAN needs a title and a URL …
paster --plugin=pylons shell $CKAN_INI
Then in the REPL that loads:
>>> from ckanext.dgu.model.schema_codelist import Schema >>> model.Session.add(Schema(url="[URL]", title="[TITLE]")) >>> model.repo.commit_and_remove()
Find all packages where a resource has a partial URL
SELECT DISTINCT (p.name) FROM package p INNER JOIN resource_group rg ON rg.package_id = p.id INNER JOIN resource r ON r.resource_group_id = rg.id WHERE r.url LIKE '%neighbourhood.statistics.gov.uk%' AND p.state = 'active';
Stopping a harvester
Find the UUID of the harvester:
psql ckan -c "SELECT id FROM harvest_source WHERE name = '[NAME]'"
Set all jobs belonging to that harvester to finished:
psql ckan -c "UPDATE harvest_job SET finished = NOW(), status = 'Finished' WHERE source_id = '[UUID]' AND NOT status = 'Finished';"
Change a publisher’s name
Change the name in the publisher page then reindex that publisher:
paster --plugin=ckan search-index rebuild-publisher [PUBLISHER} -c $CKAN_INI
Register a brownfield dataset
See the supporting manual.