Determine the number of datasets in CKAN using the API.
Determine the number of datsets in the Publish Postgres database using the Rails console.
cf ssh publish-data-beta-production /tmp/lifecycle/launcher /home/vcap/app 'rails console' '' >>> Dataset.count
If these numbers match, but the number of datasets served on Find is still different, identify the number of published in the Publish Postgres database.
cf ssh publish-data-beta-production /tmp/lifecycle/launcher /home/vcap/app 'rails console' '' >>> Dataset.published.count
With the current set up, all datasets that are available through the CKAN API will be marked as public in the Publish Postgres database. Therefore, if you get a different number of datasets, you should mark them all as published in the Publish Postgres database.
cf ssh publish-data-beta-production /tmp/lifecycle/launcher /home/vcap/app 'rails console' '' >>> Dataset.update(status: 'published')
A reindex must then be done to update the status with the Elastic instance that serves Find.
Datasets published in CKAN are not appearing on Find
Check the Sidekiq queue (see monitoring section) length to ensure the queue length is not too long. You should not be seeing more jobs than the number of datasets in CKAN.
If the queue is too long, you should clear the queue. The next sync process will repopulate the queue with any relevant datasets that require updating.
Harvesters not processing or seem stuck
The harvesting process runs as a single threaded program, if any harvesting process crashes by raising an exception, it will take out the entire process. We have configured Upstart to restart the process automatically, but if the service keeps crashing, Upstart will decide it’s unhealthy and stop that after a while.
You can check whether the process is still running by checking if entries are
still being written to the log file on the
$ govukcli set-context production-aws $ govukcli ssh ckan
$ sudo tail -f /var/log/ckan/procfile_harvester_fetch_consumer.err.log
Or you could check that the services are all showing as
started on the
$ sudo initctl list | grep harvester
If the server has stopped, there is a Fabric script that will restart it for you. This script first checks whether the harvesting process is running or not so if you suspect the process has crashed, you can run this script first to try and restart the process.
$ fab aws_production class:ckan ckan.restart_harvester