Skip to main content

Repository: govuk_seed_crawler

README

This gem retrieves a list of seed URLs from the GOV.UK sitemap and adds them to RabbitMQ so that the crawler can consume them.

Installation

Add this line to your application’s Gemfile:

gem 'govuk_seed_crawler'

And then execute:

$ bundle

Or install it yourself as:

$ gem install govuk_seed_crawler

Usage

To run with the RabbitMQ connection defaults:

bundle exec seed-crawler https://www.gov.uk/

Run with --help to see a list of options:

bundle exec seed-crawler --help

Deployment

The gem is automatically deployed to RubyGems when the gem version is updated on main. (Don’t forget to add to the CHANGELOG!

For the new gem version to be used on GOV.UK, you’ll need to update the reference in govuk-puppet.

Contributing

  1. Fork it ( http://github.com/{my-github-username}/govuk_seed_crawler/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Licence

MIT License