SpamAssassin 4.0.x Setup and Hardening on Debian/Ubuntu
Future Foundation — Public Documentation
Author: Jeff Brown | March 2026
This document covers a clean installation of SpamAssassin 4.0.x from CPAN on Debian-based mail servers running Exim4 with SA-Exim integration. It assumes a working Exim4 MTA and basic familiarity with Perl, systemd, and git.
The Debian apt package for SpamAssassin typically lags well behind upstream. At the time of writing, apt on Debian 12 ships 3.4.6 while the current stable release is 4.0.2 (August 2025). The CPAN install gives you access to newer plugins, improved Bayes classification, and the DMARC/FromNameSpoof/Phishing plugins that are absent or disabled in the packaged version. This guide assumes CPAN as the installation method for that reason.
1. Prerequisites
Ensure the following packages are installed. These provide the build toolchain, the Perl module installer, and the runtime dependencies SA needs:
apt-get update
apt-get install build-essential libssl-dev libexpat1-dev \
libhtml-parser-perl libnet-dns-perl libnetaddr-ip-perl \
libio-socket-inet6-perl libmail-dkim-perl libgeoip2-perl \
cpanminus razor pyzor re2cThe razor and pyzor packages install the collaborative filtering clients. We will configure them in section 5.
Note: the re2c package is needed for sa-compile, which compiles SA rules into optimised C code for faster scanning.
2. Installing SpamAssassin from CPAN
cpanm Mail::SpamAssassinThis installs the SA binaries under /usr/local/bin/ and the Perl modules under /usr/local/share/perl/. Once complete, verify:
/usr/local/bin/spamassassin --versionYou should see something like:
SpamAssassin version 4.0.2
running on Perl version 5.36.0The apt-installed binaries remain at /usr/bin/spamd and /usr/bin/spamassassin. These are now superseded but are not removed automatically. Keep them in place as a fallback but ensure all systemd units and cron jobs point to /usr/local/bin/.
3. Switching spamd to the CPAN Binary
The default systemd unit for spamassassin still points to the apt binary at /usr/sbin/spamd. Override it without editing the packaged unit file:
systemctl edit spamassassinThis opens an override file. Add:
[Service]
ExecStart=
ExecStart=/usr/local/bin/spamd -d --pidfile=/run/spamd.pid \
--syslog=/var/log/spamd.log --create-prefs --max-children=5 \
--min-children=2 --min-spare=2 --max-spare=4 \
--max-conn-per-child=50 --timeout-child=240 \
--helper-home-dir -D learn
Nice=15The blank ExecStart= line is required to clear the inherited value before setting the new one. Adjust --max-children to suit available RAM (each child consumes roughly 80-120MB).
Then reload and restart:
systemctl daemon-reload
systemctl restart spamassassinSend a test message and confirm the X-Spam-Checker-Version header now shows version 4.0.2 rather than 3.4.6.
4. Plugin Management: the local.pre Convention
SpamAssassin reads all .pre files before local.cf. The .pre files are intended exclusively for loadplugin and loadobject directives. Everything else — scores, whitelist entries, dns_query_restriction, Bayes settings, trusted_networks — belongs in local.cf.
The stock installation ships with version-specific .pre files (init.pre, v310.pre, v320.pre ... v402.pre) that load the bundled plugins. Do not edit these. Your local customisations belong in a separate file called local.pre, which SA reads automatically because it ends in .pre.
Create /etc/spamassassin/local.pre containing only your additional plugin loads. A representative example:
loadplugin Mail::SpamAssassin::Plugin::DMARC
loadplugin Mail::SpamAssassin::Plugin::AttachmentPresent
loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
loadplugin Mail::SpamAssassin::Plugin::Phishing
loadplugin Mail::SpamAssassin::Plugin::Razor2
loadplugin Mail::SpamAssassin::Plugin::PyzorThat is it. No scores, no configuration, no conditionals. Just loadplugin lines. The configuration for these plugins (scores, thresholds, dns settings) goes into local.cf.
Why this matters: loadplugin directives must be processed before the rules and scores that reference them. Placing them in local.cf can cause ordering issues where SA tries to apply a score to a test that has not yet been defined because its plugin was loaded too late in the parsing sequence. The .pre files are parsed first by design, so plugins loaded there are guaranteed to be available when local.cf is read.
To verify which plugins are loaded:
spamassassin --lint -D 2>&1 | grep -i "plugin.*loaded"To check that a specific plugin's tests are available:
spamassassin --lint -D 2>&1 | grep -i "dmarc\|fromnamespoof\|phishing"5. Configuring Razor2 and Pyzor
Razor2 and Pyzor are collaborative spam signature databases. When someone reports a spam message to the Razor or Pyzor network, your server can query that network and benefit immediately without any local Bayes training. They are particularly effective against phishing campaigns where the same message body hits many recipients simultaneously.
5a. Pyzor
If installed via apt (section 1), test connectivity:
pyzor pingExpected output:
public.pyzor.org:24441 (200, 'OK')That is all the setup Pyzor needs. The SA plugin queries it automatically once loaded via local.pre.
pyzor discover but this command no longer exists in current versions. pyzor ping is the correct connectivity test.5b. Razor2
Initialise the Razor2 client and register with the network:
razor-admin -create
razor-admin -registerThis creates configuration files under /root/.razor/ (or the home directory of whichever user runs spamd). Verify it works:
echo "test" | razor-checkThe exit code is what matters here, not the output. A working installation returns silently.
5c. Verifying SA Integration
Run a debug scan and check that both backends are found:
spamassassin --test-mode -D razor,pyzor < /dev/null 2>&1 | \
grep -E "razor|pyzor"You should see lines indicating both are available, something like:
dbg: pyzor: pyzor is available: /usr/bin/pyzor
dbg: pyzor: got response: public.pyzor.org:24441 (200, 'OK')The "exceeded hardcoded limits" message that appears when testing with empty input is expected and harmless. SA sensibly ignores trivial matches. Real mail will score normally.
5d. A Note on DCC
DCC (Distributed Checksum Clearinghouse) is the third major collaborative filtering system. It is not packaged in Debian due to its non-standard licence and must be compiled from source from https://www.dcc-servers.net/dcc/. Razor2 and Pyzor together cover most of the collaborative filtering benefit; add DCC only if you have a specific need and the appetite for maintaining a source build.
6. The KAM Ruleset Channel
The default sa-update channel (updates.spamassassin.org) provides the core ruleset. Kevin McGrail's KAM channel is an actively maintained supplementary ruleset with aggressive phishing URL rules and patterns targeting current spam campaigns. It is probably the single highest-value addition to any SA installation.
6a. Import the GPG Signing Key
Download and import the key, running as the debian-spamd user to match the ownership of the GPG keyring:
wget https://mcgrail.com/downloads/kam.sa-channels.mcgrail.com.key \
-O /tmp/kam.key
chmod 644 /tmp/kam.key
sudo -u debian-spamd sa-update \
--import /tmp/kam.key \
--gpghomedir /var/lib/spamassassin/sa-update-keysdebian-spamd. Downloading it to a restricted directory (like /root or a mail spool) and then running the import as debian-spamd will fail with a permission error.6b. Pull the Channel
sudo -u debian-spamd sa-update \
--gpgkey 24C063D8 \
--channel kam.sa-channels.mcgrail.com \
--gpghomedir /var/lib/spamassassin/sa-update-keys \
--verboseA successful first run downloads the ruleset. Subsequent runs that report "no fresh updates" with exit code 1 mean the channel is working and simply has nothing new since the last pull.
The rules land under:
/var/lib/spamassassin/4.000002/kam_sa-channels_mcgrail_com/Note the directory name uses underscores, not dots.
6c. Adding KAM to the Daily Cron
The stock Debian cron job at /etc/cron.daily/spamassassin handles sa-update and spamd reload. Two things to watch for:
First, the stock cron script may hardcode /usr/bin/sa-update (the apt version). If you installed SA from CPAN, the correct binary is /usr/local/bin/sa-update. Either update the path in the cron script or ensure /usr/local/bin precedes /usr/bin in the cron PATH.
Second, add the KAM channel pull after the existing sa-update block and before the # Local variables: comment at the bottom:
# KAM ruleset channel update
env -i LANG="$LANG" PATH="$PATH" http_proxy="$http_proxy" \
start-stop-daemon --chuid debian-spamd:debian-spamd --start \
--exec /usr/local/bin/sa-update -- \
--gpgkey 24C063D8 \
--channel kam.sa-channels.mcgrail.com \
--gpghomedir /var/lib/spamassassin/sa-update-keys 2>&1This mirrors the existing sa-update block in style, running as the same user with the same GPG home directory.
Also confirm that CRON=1 is set in /etc/default/spamassassin, otherwise the entire cron script exits immediately without doing anything.
6d. Defunct Channels
kam.sa.net.au and sought.rules.yerp.org. Both are defunct as of 2026. The correct current channel is kam.sa-channels.mcgrail.com as documented above. If you find stale directories from previous attempts under /var/lib/spamassassin/4.000002/ (kam_sa_net_au, sought_rules_yerp_org) they can be safely removed.7. Bayesian Classifier Health and Training
SA's Bayesian classifier is one of its most powerful components but it needs a minimum of 200 spam and 200 ham messages before it activates. Below that threshold, BAYES_* scores in headers are meaningless.
7a. Checking Corpus Health
sa-learn --dump magicA healthy output looks like:
0.000 0 3 0 non-token data: bayes db version
0.000 0 100498 0 non-token data: nspam
0.000 0 912746 0 non-token data: nham
0.000 0 259630 0 non-token data: ntokens
...The critical numbers are nspam and nham. Both should be well above 200 for Bayes to function. The newest atime timestamp confirms the database is actively learning from live traffic.
Watch the ham/spam ratio. A corpus heavily skewed toward ham (say 9:1 or worse) makes Bayes conservative about flagging spam. You want something closer to 2:1 or 3:1 for optimal sensitivity.
7b. Training from SA-Exim Reject Spools
If you run SA-Exim, rejected messages accumulate in /var/spool/sa-exim/SApermreject/new/. These are high-confidence spam that SA scored above the reject threshold. They are excellent training material but Bayes does not learn from them automatically, because SA-Exim rejects them at SMTP time before the full auto-learn pipeline completes.
Feed them explicitly:
sa-learn --progress --spam /var/spool/sa-exim/SApermreject/new/The --progress flag shows a running count. On a corpus of several thousand messages this takes a few minutes.
SAspamaccept directories without careful review. These contain messages that SA flagged as spam but delivered anyway, and if your scoring has had any period of misconfiguration they will contain false positives that will contaminate the corpus.7c. Auto-learning Configuration
The following settings in local.cf control auto-learning thresholds:
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam -0.001
bayes_auto_learn_threshold_spam 8.0This means SA automatically learns messages scoring below -0.001 as ham and above 8.0 as spam. The spam threshold is deliberately conservative to avoid learning from borderline messages that might be misclassified.
Confirm auto-learning is active by checking for a regularly updated journal file:
ls -lh /etc/spamassassin/bayes/A growing bayes_journal file confirms the auto-learn loop is firing.
8. DNS Query Restrictions
Some RBL providers (notably Validity/SenderScore) block queries from certain resolver IPs and return misleading positive results rather than useful signal. These false hits add phantom score to legitimate mail.
Suppress queries to known-broken lists in local.cf:
dns_query_restriction deny bl.score.senderscore.com
dns_query_restriction deny sa-accredit.habeas.com
dns_query_restriction deny sa-trusted.bondedsender.orgSimilarly, any RBL that returns URIBL_BLOCKED or equivalent "you are not authorised" responses should be scored to zero:
score URIBL_BLOCKED 0.0This prevents blocked query results from contributing to scores.
9. Trusted Networks
The trusted_networks directive tells SA which relay IPs are under your control. Mail received from these IPs is not subjected to relay-based checks (RDNS, PBL, etc). This must match your SPF record. If they diverge, SA will penalise mail relayed through your own infrastructure.
trusted_networks 129.232.230.120/29 197.189.206.80/29 41.203.26.232/29 41.72.147.64/27
trusted_networks 2c0f:fce8:4000:801::/64 2c0f:fce8:0:40c::/64IPv4 and IPv6 ranges must be on separate trusted_networks lines. Review and update these whenever relay IPs change.
10. Shortcircuiting and Priority Hints
Shortcircuiting allows SA to skip expensive downstream checks (RBL lookups, network tests) when a definitive early result is already available. This improves throughput without sacrificing accuracy.
In local.cf, within an ifplugin block:
ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
shortcircuit USER_IN_WHITELIST on
shortcircuit USER_IN_DEF_WHITELIST on
shortcircuit USER_IN_ALL_SPAM_TO on
shortcircuit USER_IN_BLACKLIST on
shortcircuit USER_IN_BLACKLIST_TO on
shortcircuit SUBJECT_IN_BLACKLIST on
shortcircuit ALL_TRUSTED on
endifPriority hints tell SA to run certain tests early. Setting a negative priority causes a test to run before the default batch. If Bayes returns a high-confidence spam result early, the shortcircuit plugin can skip the remaining network tests entirely:
priority BAYES_99 -850
priority BAYES_999 -85011. Phishing Plugin Feeds
Loading the Phishing plugin (section 4) is necessary but not sufficient. The plugin needs feed data from OpenPhish or PhishTank to be useful. Without the feed files, the plugin loads but fires blind.
Check whether feed data exists:
find /var/lib/spamassassin -name "*phish*" 2>/dev/nullIf nothing is returned, the feeds are not being downloaded. Consult the SA documentation for the Phishing plugin on how to configure the feed download cron. This is a separate mechanism from sa-update.
12. Lint Testing and Validation
After any configuration change, always lint before restarting:
spamassassin --lint && echo "Config OK"This catches syntax errors, unknown test names, and plugin loading failures. Only restart spamd after a clean lint:
systemctl restart spamassassinFor a reload without dropping active connections (spamd supports SIGHUP):
systemctl reload spamassassin13. Git-Based Configuration Management
Managing local.cf and local.pre in a git repository makes configuration changes auditable, reversible, and deployable across multiple servers.
The recommended repository structure:
/etc/spamassassin/
.git/
.gitignore
local.cf -- all scoring, settings, custom rules
local.pre -- loadplugin directives onlyThe .gitignore should exclude everything except the files you manage:
65_debian.cf
bayes
local.cf.bak
local.cf.dpkg-dist
local.cf.ispa-original
sa-update-hooks.d
sa-update-keys
*.pre
!local.preThe *.pre wildcard excludes all the version-specific .pre files (init.pre, v310.pre, v342.pre, v400.pre, etc) which are managed by the SA package and should not be committed. The negation !local.pre must come after the wildcard. Gitignore processes rules top to bottom, so if !local.pre appears before *.pre, the wildcard simply re-ignores it.
On secondary servers, use git sparse-checkout to pull only the files that should be deployed:
git init
git remote add origin https://git.example.com/org/spamassassin.git
git config core.sparseCheckout true
echo "local.cf" >> .git/info/sparse-checkout
echo "local.pre" >> .git/info/sparse-checkout
echo ".gitignore" >> .git/info/sparse-checkout
git pull origin masterHistorical snapshots (like a pre-migration copy of another server's config) can live in the repo without being deployed, provided they are excluded from the sparse-checkout on secondary servers and added to .gitignore so they are not deployed by accident.
When pulling on a server for the first time, if git reports "no tracking information for the current branch", set it up with:
git checkout -b master origin/masterSubsequent pulls then work normally:
git pull --ff-only14. New Server Checklist
On a fresh Debian 12 server with Exim4 and SA-Exim already working:
- Install prerequisites (section 1).
- Install SA from CPAN (section 2).
- Override the systemd unit to use
/usr/local/bin/spamd(section 3). - Clone the git repository to
/etc/spamassassin(section 13). Set up sparse-checkout forlocal.cf,local.pre,.gitignore. - Create the Bayes database directory:
mkdir -p /etc/spamassassin/bayes chown Debian-exim:Debian-exim /etc/spamassassin/bayes chmod 2770 /etc/spamassassin/bayes - Initialise Razor2 and test Pyzor (section 5):
razor-admin -create razor-admin -register pyzor ping - Import the KAM GPG key and pull the channel (section 6).
- Update the daily cron job with the KAM channel block and ensure it references
/usr/local/bin/sa-update. - Lint and restart:
spamassassin --lint && systemctl restart spamassassin - Send a test message and verify the
X-Spam-Checker-Versionheader shows 4.0.2, andDMARC_PASSor other plugin-specific tests appear inX-Spam-Status. - After a few days of live traffic, check Bayes health:
sa-learn --dump magicConfirm
nspamandnhamare both growing.
Appendix A: Recommended spamd Flags
| Flag | Purpose |
|---|---|
--max-children=5 |
Maximum concurrent scanning processes |
--min-children=2 |
Keep at least 2 children warm |
--min-spare=2 |
Minimum idle children |
--max-spare=4 |
Maximum idle children |
--max-conn-per-child=50 |
Recycle children after 50 connections (prevents memory leaks) |
--timeout-child=240 |
Kill children idle for 4 minutes |
--helper-home-dir |
Use the helper user's home for .razor etc |
-D learn |
Debug logging for Bayes auto-learn events |
Nice=15 |
Run at reduced priority |
Adjust --max-children based on available RAM. Each child uses roughly 80-120MB depending on ruleset size.
Appendix B: Useful Diagnostic Commands
| Task | Command |
|---|---|
| Check SA version | spamassassin --version |
| Lint the configuration | spamassassin --lint |
| List loaded plugins | spamassassin --lint -D 2>&1 | grep "plugin.*loaded" |
| Test a message interactively | spamassassin --test-mode < /path/to/message.eml |
| Check Bayes database health | sa-learn --dump magic |
| Test Pyzor connectivity | pyzor ping |
| Test Razor2/Pyzor integration | spamassassin --test-mode -D razor,pyzor < /dev/null 2>&1 | grep -E "razor|pyzor" |
| Check KAM ruleset | ls -la /var/lib/spamassassin/4.000002/kam_sa-channels_mcgrail_com/ |
| Check which sa-update is in PATH | which sa-update |
Appendix C: File Separation Reference
loadplugin directives
required_score, trusted_networks, dns_query_restriction, score overrides, whitelist_from_rcvd, Bayes settings, shortcircuit configuration, priority hints, custom header rules, custom keyword blocking rules
- Log in to post comments