This project uses Godep to manage it’s dependencies. If you have a
working Go development setup, you should be able to install
Godep by running:
go get github.com/tools/godep
To run the worker you’ll first need to build it using go build to
generate a binary. You can then run the built binary directly using
./govuk_crawler_worker. All configuration is injected using
environment varibles. For details on this look at the main.go file.
How it works
This is a message queue worker that will consume URLs from a queue and
crawl them, saving the output to disk. Whilst this is the main reason
for this worker to exist it has a few activities that it covers before
the page gets written to disk.
The workflow for the worker can be defined as the following set of