asset-manager: Local virus scanning with ClamAV
Per the main Asset Manager README, we use ClamAV to scan uploaded assets for viruses before they are made available to the public.
When might you run a local scan?
One reason you might want to run a scan locally is if an error like
Heuristics.Limits.Exceeded.MaxFiles FOUND (VirusScanner::InfectedFile)
is
raised. This means that an uploaded file exceeds our size
limits and cannot be scanned. To fix this, we'll
likely need to increase our limits, and running a scan locally can allow us to
experiment with the relevant setting. Various GOV.UK Helm Charts commits have
done this: 229e16e, 5c33832, and a01862b.
Other than Sentry error reporting, this is often surfaced via Zendesk support tickets when a user tries to access an uploaded document and sees a JSON response like this:
{
"_response_info": {
"status": "not found"
}
}
Setup
You'll need to install the ClamAV CLI tool and set up its virus database before you can run a scan. Below are example steps for setting this up using Homebrew with an arm64 architecture macOS system (e.g. M1 or later).
- Install the CLI tool:
brew install clamav
- Start the service:
brew services start clamav
- Create a config file for setting up the virus database:
cd /opt/homebrew/etc/clamav
cp freshclam.conf.sample freshclam.conf
- Edit the file created in the last step (
freshclam.conf
) and comment out theExample
line with a#
- (Optional) Edit the config to more closely resemble the config we use in production
- Set up the virus database:
freshclam
Usage
Run the following command, adjusting arguments as appropriate (see the clamscan
docs or run man clamscan
to learn more). You might want to
replicate relevant parts of our production clamd
config via these arguments for accurate testing.
clamscan --alert-exceeds-max=yes --max-files=35000 --max-scansize=2000M --max-filesize=500M filename.pdf
[!TIP] For the example in When might you run a local scan?, we adjusted the
--max-files
argument until the scan stopped reporting theHeuristics.Limits.Exceeded.MaxFiles
error.
The scan might take a little while to complete. After it completes, it should report the results. A report for a clean file should include lines like those below (along with others).
filename.pdf OK
Scanned files: 1
Infected files: 0