Wazuh ships with over 3,000 built-in rules. That sounds like plenty, until you need to detect something that isn't in the default ruleset. An obscure web shell signature in an IIS log. A WordPress scan hitting xmlrpc.php 40 times in 30 seconds from a single IP. An HAProxy log that's JSON-wrapped inside a Docker container and doesn't match any built-in decoder. When those moments arrive, you either write custom decoders and rules or you miss the alert.
This tutorial is built on Bayu Sangkaya's open-source repository, wazuh-custom-rules-and-decoders: production-grade decoders, rules, integrations, and active-response scripts from real SOC deployments. Bayu also maintains materi_wazuh, the best Wazuh training curriculum I've found in the Indonesian infosec community.
Everything in this tutorial is built on Bayu Sangkaya's work. The decoder and rule examples are lifted directly from his repository, I'm explaining them, not inventing them. If this article helps you, drop a star on his repo. Open-source security tools live and die on community support.
Why Built-In Rules Aren't Enough
Wazuh's default ruleset is broad, SSH brute force, Windows Event Log anomalies, file integrity changes, vulnerability scans. But "broad" is not the same as "complete." The defaults are designed to cover common attack patterns across generic environments. Your environment is not generic.
Here's what breaks: a web shell is dropped into /var/www/html/ on a Linux server. Wazuh's FIM rules fire, rule 554, "File added to the system." Level 5. Informational. No alert, no correlation, no mention of "web shell." A human has to see rule 554, check the filename, recognize .php in the web root, and manually escalate. At 3 AM, that human doesn't exist.
A custom rule solves this. It watches for FIM events on files ending in .php, .aspx, .jsp, scripting extensions, and fires at level 12. It watches for file content changes containing eval, passthru, or shell_exec and fires at level 15. It tags the alert with MITRE T1505.003 (Server Software Component: Web Shell). Now you don't need a human to connect the dots, the rule does it at machine speed.
Custom rules turn "something happened" into "this specific threat happened, at this severity, on this host, right now." That's the difference between a log aggregator and a SIEM.
Decoders vs. Rules: The Two-Layer Model
Before writing anything, understand the data flow inside Wazuh's analysis engine:
Raw Log → [ Pre-decoding ] → [ Decoders ] → [ Rules ] → Alert
extract timestamp, extract fields match conditions,
hostname, program from raw text assign severity,
name first generate alert
Decoders parse raw log lines into structured fields. A raw HAProxy log like 192.168.1.10:54321 [15/Jun/2026:14:22:10] frontend_http backend_servers/server01 is meaningless to a rule engine. A decoder extracts the source IP, timestamp, frontend name, and backend name into named fields, $srcip, $accept_date, $frontend_name, $backend_name. After decoding, rules can reference these fields by name instead of regex-matching the same raw string over and over.
Rules evaluate decoded fields against conditions and fire alerts. They check if a field matches a pattern, if the same source IP triggered a certain rule multiple times in a time window, or if a decoded program name equals a suspicious value. Rules also assign severity (level 0–15), map to MITRE ATT&CK, and route alerts to specific groups for reporting.
The two are tightly coupled: a rule's <decoded_as> tag tells Wazuh which decoder must match first. If your decoder doesn't fire, your rule, no matter how perfectly written, will never trigger.
Bayu Sangkaya's Repository Structure
Here's what's in the repo and where each piece goes on your Wazuh manager.
wazuh-custom-rules-and-decoders/
├── decoders/ # Custom XML decoders
│ ├── haproxy_decoder.xml # HAProxy log parsing (Docker + plain)
│ └── webshell_command_decoder.xml # Web shell network connection parser
├── rules/ # Custom XML rules
│ ├── 500500-webshell-rules.xml # Web shell detection (FIM + auditd + Sysmon)
│ ├── 841101-wpscan_rules.xml # WordPress scan detection (frequency-based)
│ ├── 500554-judol_rules.xml # Judol phishing kit detection
│ ├── 800001-haproxy_rules.xml # HAProxy attack detection (14 KB, extensive)
│ ├── 100620-misp_rules.xml # MISP threat feed integration
│ ├── 100625-opencti_rules.xml # OpenCTI threat feed integration
│ ├── Openbao-rules.xml # OpenBao secret manager monitoring
│ └── active-response.xml # Active response trigger rules
├── ESET-integration/ # ESET antivirus log forwarder
│ ├── 420010-eset_rules.xml # Rules for ESET detection events
│ ├── eset_logcollector.py # Python log collector daemon
│ └── eset_daemon.service # Systemd service unit
├── integration/ # Custom integration scripts
│ ├── custom-misp.py # MISP threat intelligence integration
│ ├── custom-thehive.py # TheHive case management integration
│ ├── custom-dfir_iris.py # DFIR-IRIS case management integration
│ └── custom-telegram.py # Telegram alert notification
├── active-response/ # Automated response scripts
│ ├── quarantine-malware.sh # Linux malware quarantine
│ ├── quarantine-webshell.sh # Linux web shell quarantine
│ ├── remove-malware.py # Cross-platform malware removal
│ └── remove-malware.exe # Windows malware removal binary
├── sysmon/ # Sysmon configuration & rules
│ ├── sysmonconfig.xml # Full Sysmon config (300 KB)
│ └── 255000-sysmon_rules.xml # Sysmon event correlation rules
├── audit/ # Auditd rules
│ └── 10-webshell.rules # Auditd syscall rules for web shells
├── browser-monitoring/ # Browser history monitoring
│ ├── browser-history-monitor.py # Python history collector
│ ├── installer-script.sh # Linux installer
│ └── windows-installer.ps1 # Windows installer
├── install-agent.sh # Interactive Linux agent installer
├── install-agent.ps1 # Interactive Windows agent installer
└── README.md
The naming convention tells you where things go: files in decoders/ land in /var/ossec/etc/decoders/ on the manager. Files in rules/ land in /var/ossec/etc/rules/. Integration scripts go to /var/ossec/integrations/. Active-response scripts go to /var/ossec/active-response/bin/ on the agent side.
The numeric prefixes on rule files (500500, 841101, 800001) are rule ID ranges. Wazuh reserves specific ranges: 1–999 for system rules, 1000–5999 for built-in rules, and 100000+ for custom rules. Bayu's numbering is deliberate, 500500 for web shells sits in a distinct namespace, 841101 for WPScan maps to his WAF rule range, and 800001 starts the HAProxy range. Don't reuse these IDs in your own custom rules unless you merge carefully.
Installation: Clone, Copy, Configure
Getting Bayu's rules and decoders onto your Wazuh manager takes three steps.
# 1. Clone the repository
cd /tmp
git clone https://github.com/bayusky/wazuh-custom-rules-and-decoders.git
cd wazuh-custom-rules-and-decoders
# 2. Copy decoders and rules to Wazuh directories
sudo cp decoders/*.xml /var/ossec/etc/decoders/
sudo cp rules/*.xml /var/ossec/etc/rules/
# 3. Copy integrations and active-response scripts
sudo cp -r integration/* /var/ossec/integrations/
sudo cp active-response/quarantine-malware.sh /var/ossec/active-response/bin/
sudo cp active-response/quarantine-webshell.sh /var/ossec/active-response/bin/
sudo cp active-response/remove-malware.py /var/ossec/active-response/bin/
# 4. Fix permissions — Wazuh runs as user 'wazuh'
sudo chown -R wazuh:wazuh /var/ossec/etc/decoders/
sudo chown -R wazuh:wazuh /var/ossec/etc/rules/
sudo chown -R wazuh:wazuh /var/ossec/integrations/
sudo chmod 750 /var/ossec/integrations/*.py
sudo chmod 750 /var/ossec/active-response/bin/*.sh
sudo chmod 750 /var/ossec/active-response/bin/*.py
# 5. Restart the Wazuh manager to load new decoders and rules
sudo systemctl restart wazuh-manager
Step 2 is critical: Wazuh's analysis engine reads every XML file in /var/ossec/etc/rules/ and /var/ossec/etc/decoders/ at startup. If any file has a syntax error, a missing closing tag, an unescaped &, a duplicate rule ID, the entire analysis engine fails to load. Your SIEM goes silent. No warning in the dashboard, no error visible in the UI. You only catch it by checking /var/ossec/logs/ossec.log.
Verify the restart succeeded:
# Check that analysisd is running
sudo /var/ossec/bin/wazuh-control status | grep analysisd
# Look for rule loading errors
sudo grep -i "error\|fail\|syntax" /var/ossec/logs/ossec.log | tail -20
# Count how many rules are loaded (should be >3000)
sudo grep -c "Rule loaded" /var/ossec/logs/ossec.log
If the rule count is lower than expected, or if you see "ERROR: Rule 'xxxxx' has duplicate id", you have an ID collision. Remove the conflicting file, fix the duplicate, and restart.
Registering Custom Rules in ossec.conf
Just copying files into the directories isn't always enough. Wazuh's main configuration at /var/ossec/etc/ossec.conf must explicitly include custom rule and decoder files:
<ossec_config>
<ruleset>
<!-- Built-in rules (already present) -->
<include>rules_config.xml</include>
<rule_dir>rules</rule_dir>
<decoder_dir>decoders</decoder_dir>
<!-- Bayu Sangkaya custom rules -->
<rule_file>500500-webshell-rules.xml</rule_file>
<rule_file>841101-wpscan_rules.xml</rule_file>
<rule_file>800001-haproxy_rules.xml</rule_file>
<rule_file>500554-judol_rules.xml</rule_file>
<!-- Only include the rules you actually need -->
<!-- Comment out unused ones to reduce processing overhead -->
<list>etc/lists/audit-keys</list>
<list>etc/lists/security-eventchannel</list>
</ruleset>
</ossec_config>
If you dropped the files into the directories but didn't add the <rule_file> references, Wazuh may ignore them depending on your version. Wazuh 4.x uses rules_config.xml as a master include list, check that file before adding standalone <rule_file> entries, or use the <rule_dir> directive which auto-loads all XML files in a directory.
Decoder Syntax: The Parsing Layer
Here's a real SSH brute-force decoder. Read it first, then I'll explain what each part does.
<decoder name="my_decoder">
<program_name>sshd</program_name> <!-- Match on syslog program name -->
<parent>ossec</parent> <!-- Inherit from parent decoder -->
<prematch offset="after_parent">^Failed</prematch> <!-- Coarse pattern to trigger -->
<regex offset="after_prematch" type="pcre2">(\S+) from (\S+)</regex> <!-- Field extraction -->
<order>user, srcip</order> <!-- Map regex groups to field names -->
</decoder>
Eight lines, two fields extracted. Here's what's happening:
program_name is the fastest filter. When a syslog line hits the manager, Wazuh looks at the second word, the program that generated it. If it says sshd, this decoder activates. If it says haproxy, Wazuh skips it and tries the next decoder. This is your first routing layer. Get it right and your decoder never wastes CPU on irrelevant logs.
parent chains decoders together. ossec is the built-in parent that already extracted the timestamp and hostname. Your decoder inherits that work and builds on top. Without a parent, your decoder starts from the raw log line, which is almost never what you want.
prematch is the coarse gate. It checks if the log even contains the word "Failed" before running the expensive regex. The offset="after_parent" tells it to search only the leftover portion, not the parts the parent already handled. A well-written prematch saves CPU. A bad one misses logs. A missing one means every log that matches program_name gets regex'd, even the ones you don't care about.
regex is where the real parsing happens. Two capture groups: (\S+) — "one or more non-whitespace characters" — grab the username and source IP from "Failed password for root from 192.168.1.100". Use type="pcre2" for production. OS_regex is faster but chokes on edge cases that PCRE2 handles silently.
order maps capture groups to field names. Group 1 is the username, group 2 is the source IP. After this decoder fires, your rules can reference $user and $srcip directly. The order list must match the groups exactly or fields get silently discarded, and you'll spend an hour wondering why your rule's <field name="srcip"> never matches.
A few more elements you'll see in Bayu's decoders: type (set to web-log for Apache/Nginx/HAProxy, it handles URL normalization for you) and use_own_name (rare but critical for Docker JSON logs where the parent is json but rules need the decoder's actual name).
Parent-Child Chain Example (HAProxy)
Bayu's HAProxy decoder demonstrates a real multi-stage chain. Here's a simplified version:
<!-- Stage 1: Identify this as an HAProxy log -->
<decoder name="haproxy">
<program_name>haproxy</program_name>
<prematch>\d+.\d+.\d+.\d+:\d+ \S+ \S+</prematch>
</decoder>
<!-- Stage 2: Extract client info -->
<decoder name="haproxy1">
<parent>haproxy</parent>
<regex type="pcre2">(\d+.\d+.\d+.\d+):(\d+) \[(\S+)\] (\S+) (\S+)/(\S+) (\S+)</regex>
<order>srcip, srcport, accept_date, frontend_name, backend_name, server_name, timer</order>
</decoder>
<!-- Stage 3: Extract termination and connection stats -->
<decoder name="haproxy1">
<parent>haproxy</parent>
<regex type="pcre2">- - (\S+) (\d+/\d+/\d+/\d+/\d+) (\d+)/(\d+)</regex>
<order>termination_state, connections, server_queue, backend_queue</order>
</decoder>
<!-- Stage 4: Extract headers and URL -->
<decoder name="haproxy1">
<parent>haproxy</parent>
<type>web-log</type>
<regex type="pcre2">\{(.*)\} "(.*)"</regex>
<order>headers, url</order>
</decoder>
Notice all three children share the name haproxy1, they all extend the same parent haproxy, and their fields accumulate. After all four stages run, a single HAProxy log has been parsed into $srcip, $srcport, $frontend_name, $backend_name, $termination_state, $url, and more, all available to rules as named fields.
HAProxy logs can be 200+ characters. A single regex matching everything would be unreadable, unmaintainable, and brittle, one format change breaks the entire parser. By breaking extraction into stages, you can add new fields without touching existing regexes. If HAProxy adds a new log format field, you add one child decoder, the rest keep working.
Rule Syntax: The Detection Layer
Rules are where detection logic lives. They evaluate decoded fields, apply thresholds, and generate alerts.
<group name="attack_category, platform,">
<rule id="100001" level="10">
<decoded_as>json</decoded_as> <!-- Which decoder must match -->
<if_sid>530</if_sid> <!-- Parent rule that must fire first -->
<field name="data.virus">yes</field> <!-- Exact field match -->
<field name="data.action" type="pcre2">alert</field> <!-- Regex field match -->
<description>Antivirus alert: $(data.virus)</description>
<mitre>
<id>T1204</id> <!-- MITRE ATT&CK technique -->
</mitre>
<group>pci_dss_11.4,nist_800_53_SI.4,</group> <!-- Compliance mapping -->
</rule>
</group>
Key Rule Elements
Rules are where detection logic lives. Look at the example above: a single rule checks if a decoder matched (decoded_as), if a parent rule fired first (if_sid), if a decoded field contains a specific value (field), and assigns a severity level. Here's what each part controls:
level is the most consequential decision in any rule. 0-4 is noise. 5-9 is worth logging. 10-11 wakes up the SOC. 12-15 means someone's getting paged. The table in the code block above is your cheat sheet, but the rule of thumb: if you'd want to know about it at 3 AM, level 12+. If it can wait until morning, level 7-10.
decoded_as links rules to decoders. If the log wasn't decoded as json, a rule with <decoded_as>json</decoded_as> never fires. This is your primary routing. Get it wrong and your rule is dead code.
if_sid creates parent-child chains. Rule 500500 (web shell file creation) only evaluates if rule 554 (any file creation) already fired. This saves CPU: instead of regex-matching every log for web shell patterns, Wazuh first checks if a file was even created. If not, skip the expensive check. Multi-level chains like 554 → 500500 → 500502 are how you build detection depth without killing performance.
if_matched_sid is the correlation version. Used with frequency and timeframe for counting. "If rule 841101 fired 14 times from the same IP in 30 seconds, escalate." Different from if_sid, which only checks if a parent fired once. Easy to confuse the two, and Wazuh won't warn you if you use the wrong one.
field matches decoded fields. Without type, it's exact string match. With type="pcre2", it's regex. Prefer field over match whenever possible: match searches the entire raw log line and is slower. Only use it when the data you need wasn't extracted by any decoder.
group and mitre are free context. The outer <group> tag categorizes rules for dashboard filtering. The inner <group> maps to compliance frameworks (PCI DSS, NIST 800-53). <mitre> tags map to ATT&CK techniques. Add at least one MITRE ID per rule. Your SOC will thank you when they're hunting by technique instead of grepping rule descriptions.
Example 1: Web Shell Detection (The Full Stack)
This is Bayu's flagship detection, a multi-layered web shell detector covering file creation, file modification, file content analysis, command execution, and network connections. It spans a decoder and rules across Linux, Windows, auditd, and Sysmon.
The Decoder
<decoder name="network-traffic-child">
<parent>ossec</parent>
<prematch offset="after_parent">^output: 'webshell connections':</prematch>
<regex offset="after_prematch" type="pcre2">(\d+.\d+.\d+.\d+):(\d+)\|(\d+.\d+.\d+.\d+):(\d+)</regex>
<order>local_ip, local_port, foreign_ip, foreign_port</order>
</decoder>
Line-by-line:
- Name:
network-traffic-child, a specific child decoder for network connection logs. The parent isossec, meaning it inherits from Wazuh's built-in log parser (extracting timestamp, hostname, etc.). - Pre-match: Searches for the literal string
output: 'webshell connections':anywhere after the parent decoder's portion. The caret^ensures matching starts at the beginning of the remainder, important for precision and to avoid matching internal substrings. - Regex: Four capture groups extracting
local_ip:local_port|foreign_ip:foreign_port. The pipe|is a literal delimiter in the log format. Since the regex uses PCRE2, the dot.matches any character including dots in IP addresses, be aware of this if you need strict IP validation. - Order: Maps the four groups to named fields
$local_ip,$local_port,$foreign_ip,$foreign_port, directly usable in rules.
This decoder is designed to parse output from a custom script that runs on web servers and logs when a web process (like w3wp.exe or php-fpm) opens a network connection, a strong indicator of a web shell phoning home.
The Rules
Bayu's web shell rules are the most thorough in the repo. Here are three key ones:
<rule id="500500" level="12">
<if_sid>554</if_sid>
<field name="file" type="pcre2">(?i).php$|.phtml$|.php3$|.php4$|
.php5$|.phps$|.phar$|.asp$|.aspx$|.jsp$|.cshtml$|.vbhtml$</field>
<description>[File creation]: Possible web shell scripting file
($(file)) created</description>
<mitre>
<id>T1105</id>
<id>T1505</id>
</mitre>
</rule>
How it works: Rule 554 ("File added to the system") fires for every new file. Rule 500500 inherits from 554, then checks if the filename matches a web scripting extension. The (?i) flag makes the match case-insensitive, .PHP, .Php, and .php all match. Level 12 means this goes straight to the SOC queue.
But file creation alone isn't enough. An attacker might modify an existing legitimate PHP file to add a one-liner web shell:
<!-- Detects modification of scripting files -->
<rule id="500501" level="12">
<if_sid>550</if_sid>
<field name="file" type="pcre2">(?i).php$|.phtml$|.asp$|.aspx$|.jsp$</field>
<description>[File modification]: Possible web shell content
added in $(file)</description>
<mitre><id>T1105</id><id>T1505</id></mitre>
</rule>
<!-- Escalates if the modification contains web shell functions -->
<rule id="500502" level="15">
<if_sid>500501</if_sid>
<field name="changed_content" type="pcre2">(?i)passthru|exec|eval|
shell_exec|assert|str_rot13|system|phpinfo|base64_decode|chmod|
mkdir|fopen|fclose|readfile|show_source|proc_open|pcntl_exec|
execute|WScript.Shell|WScript.Network|FileSystemObject|Adodb.stream</field>
<description>[File Modification]: File $(file) contains a
web shell</description>
<mitre><id>T1105</id><id>T1505.003</id></mitre>
</rule>
Rule 500501 inherits from 550 ("File integrity checksum changed"). Rule 500502 inherits from 500501, a three-level chain: 550 → 500501 → 500502. If a scripting file was modified AND the changed content contains eval, passthru, base64_decode, or any of 20 other web shell indicators, it fires at level 15, maximum severity, wake-someone-up territory. The MITRE mapping narrows from general T1505 to specific T1505.003 (Web Shell).
But what about web shells that don't drop files? What if the attacker exploits a command injection in an existing application? That's where the Sysmon and auditd rules come in:
<rule id="500530" level="12">
<if_sid>61603</if_sid>
<field name="win.eventdata.parentImage" type="pcre2">(?i)w3wp\.exe</field>
<field name="win.eventdata.parentUser" type="pcre2">
(?i)IIS\sAPPPOOL\\\\DefaultAppPool</field>
<description>[Command execution ($(win.eventdata.commandLine))]:
Possible web shell attack detected</description>
<mitre><id>T1505.003</id><id>T1059.004</id></mitre>
</rule>
This fires when Sysmon event 61603 (process creation from network connection) shows a parent process of w3wp.exe (IIS worker process) running as IIS APPPOOL\DefaultAppPool, the exact conditions of a web server spawning a child process. That's not normal behavior unless you've explicitly configured IIS to execute CGI scripts. Combined with the command line in the alert description $(win.eventdata.commandLine), an analyst sees exactly what command the attacker ran.
This layered approach, FIM for file drops, content analysis for injections, Sysmon/auditd for command execution, and network connection monitoring for C2, means a web shell has to evade four independent detection layers to go unnoticed. Most don't.
Example 2: WPScan Detection with Frequency Correlation
WordPress scanning is noisy by nature, a single WPScan run generates dozens of 404s and 403s. A naive rule that fires on every wp-admin hit creates alert fatigue. Bayu's approach uses frequency-based correlation: fire a low-level informational alert on each scan signature, then escalate only when the pattern repeats from the same source IP.
<group name="wpscan,web,accesslog,">
<!-- Base rule: fires on any WP-specific URL with a 4xx response -->
<rule id="841101" level="7">
<if_sid>800001,800002,31100</if_sid>
<id>^4</id>
<url>wp-includes|wp-login|wp-admin|wp-|wordpress|xmlrpc.php</url>
<description>WP scanning detected</description>
<group>attack,pci_dss_6.5,pci_dss_11.4,</group>
</rule>
<!-- Escalation: fires when base rule hits 14x from same IP in 30s -->
<rule id="841151" level="10" frequency="14" timeframe="30">
<if_matched_sid>841101</if_matched_sid>
<same_source_ip />
<description>Multiple WP scan detected from same source ip.</description>
<mitre><id>T1595.002</id></mitre>
<group>web_scan,recon,attack,</group>
</rule>
</group>
How the base rule (841101) works:
<if_sid>800001,800002,31100</if_sid>, The rule inherits from three parent rules. Rules 800001 and 800002 are Bayu's own HAProxy access log rules (covering HTTP and HTTPS). Rule 31100 is Wazuh's built-in web access log rule. This means the rule works whether your web logs come through HAProxy or directly from Apache/Nginx, one rule, multiple log sources.<id>^4</id>, Matches HTTP 4xx response codes. WPScan probes paths that mostly return 403 Forbidden or 404 Not Found. A 200 OK onxmlrpc.phpis normal; a 404 onwp-content/plugins/revslider/isn't.<url>, Matches URLs containing WordPress-specific keywords. The pipe-separated list is implicitly an OR match.wp-catches all REST API and plugin paths.
How the escalation rule (841151) works:
frequency="14" timeframe="30", Rule 841101 must fire at least 14 times within 30 seconds.<if_matched_sid>841101</if_matched_sid>, Counts occurrences of rule 841101. Different from<if_sid>, which requires the parent to fire once.if_matched_sidwith frequency means "count how many times the parent fired."<same_source_ip />, All 14 matches must come from the same IP. Without this, 14 different IPs hitting WordPress paths (normal internet noise) would trigger a false positive.
The escalation fires at level 10, high enough to warrant investigation, not high enough to wake anyone up. It maps to MITRE T1595.002 (Active Scanning: Vulnerability Scanning), giving your SOC the exact technique to start their investigation.
14 matches in 30 seconds works for Bayu's environment. In yours, a single WPScan run might generate 30 requests in 20 seconds, or 5 requests in 60 seconds. Watch your logs during a test scan and set frequency to roughly half the scan volume, enough to catch real scans without false positives from bots casually probing xmlrpc.php.
Example 3: HAProxy Docker JSON Decoder
HAProxy inside Docker outputs JSON-formatted logs. Wazuh's built-in HAProxy decoder expects plain-text syslog format and fails silently on JSON. Bayu wrote a decoder that handles both formats:
<decoder name="haproxy-docker">
<parent>json</parent>
<use_own_name>true</use_own_name>
<prematch offset="after_parent">^log":"(\d+.\d+.\d+.\d+):(\d+)
\[(\S+)\] (\S+) (\S+)/(\S+)</prematch>
<regex offset="after_parent" type="pcre2">^log":"(\d+.\d+.\d+.\d+):
(\d+) \[(\S+)\] (\S+) (\S+)/(\S+) (\d+/\d+/\d+/\d+/\d+) (\S+) (\S+)
- - (\S+) (\d+/\d+/\d+/\d+/\d+) (\d+)/(\d+) \{(.*)\} \\"(\w+ \S+)</regex>
<order>srcip, srcport, accept_date, frontend_name, backend_name,
server_name, timer, id, response_length, termination_state,
connections, server_queue, backend_queue, headers, url</order>
</decoder>
Why this decoder exists: In Docker, HAProxy logs go to stdout and get wrapped in JSON by the logging driver. The raw log looks something like:
{"log":"192.168.1.10:54321 [15/Jun/2026:14:22:10.123] frontend_http
backend_servers/server01 0/0/1/45/46 200 1234 - - ---- 1/1/0/0/0
0/0 { } \"GET /api/health HTTP/1.1\"","stream":"stdout","time":
"2026-06-15T14:22:10Z"}
The json parent decoder extracts the top-level JSON structure. Then haproxy-docker inherits from it and searches for the log field, which contains the actual HAProxy log line. The use_own_name flag is critical, without it, rules that reference <decoded_as>haproxy-docker</decoded_as> would fail because Wazuh would use the parent's name (json) instead.
The regex extracts 15 fields in a single pass, from source IP and port through to the HTTP method and URL. This is only feasible because the parent json decoder already handled the JSON parsing, so the child only deals with the raw HAProxy portion.
If you're running any application in Docker with JSON logging, your logs arrive double-encoded. The outer layer is JSON (container runtime), and the inner layer is whatever format the application produces (syslog, CEF, custom). A common mistake is writing a decoder that tries to handle both layers in one regex. Instead, always use json as the parent decoder, then write a child that handles just the inner format. This is exactly what Bayu did.
Testing with wazuh-logtest
Never restart the Wazuh manager to test a new rule. A syntax error kills the analysis engine, and all agents go silent until you fix it. Use wazuh-logtest instead, it loads your rules and decoders in a sandboxed environment without affecting the running manager.
# Start the logtest daemon (runs in background)
sudo /var/ossec/bin/wazuh-logtest -d
# Connect to the test socket
sudo /var/ossec/bin/wazuh-logtest
Once connected, paste a raw log line and press Enter. The tool outputs the decoded fields, matched rules, and final alert, all without touching the production engine.
** Pasting a FIM event where shell.php was added to /var/www/html
** Phase 1: Completed pre-decoding
** Phase 2: Completed decoding
name: 'syscheck'
file: '/var/www/html/shell.php'
** Phase 3: Completed filtering (rules)
Rule id: '554' fired → Level 5
Rule id: '500500' fired → Level 12
Description: [File creation]: Possible web shell scripting file
(/var/www/html/shell.php) created
MITRE: T1105, T1505
** Alert to be generated
What to look for in logtest output:
- Phase 1 completed: Pre-decoding worked, timestamp, hostname, and program name were extracted.
- Phase 2 completed: Your decoder matched and extracted fields. If Phase 2 shows no decoder match, your
<program_name>or<prematch>regex is wrong. - Phase 3 completed: Your rule(s) fired. Check that the fired rule ID matches what you expected. If a parent rule fired but your child didn't, the child's
<field>or<match>condition isn't satisfied.
Testing frequency-based rules: Paste the same log line repeatedly, incrementing the count. After reaching the frequency threshold within the timeframe, the escalation rule should fire. If it doesn't, check that you're not exceeding the timeframe between pastes, logtest respects real time, not simulated time.
logtest only sees rules and decoders that are in the standard directories (/var/ossec/etc/rules/, /var/ossec/etc/decoders/). If you copied files to a test directory and ran logtest with custom paths, it won't find them. Always copy test rules to the production directories, run logtest, then remove them if they fail, or just keep them and accept that a broken rule won't load on the next restart (which is safe).
Common Mistakes and How to Fix Them
These are the issues I've hit repeatedly, along with the exact fix for each.
1. XML Syntax Errors, The Silent Killer
Symptom: Manager restarts, appears healthy, but analysisd doesn't load. No alerts appear. /var/ossec/logs/ossec.log contains "ERROR: Could not load rules."
Root cause: Unescaped XML characters. & must become &. < inside a regex must become <. A missing closing tag. A duplicate rule ID.
Fix: Validate XML before deployment:
sudo apt install libxml2-utils -y
xmllint --noout /var/ossec/etc/rules/your-rule-file.xml
If xmllint complains, fix the error before deploying. A single unescaped ampersand in a regex like &search= will break the entire ruleset.
2. Rule ID Collisions
Symptom: "ERROR: Rule '500500' has duplicate id."
Root cause: You copied Bayu's rules but also have your own rules that start at 500000.
Fix: Choose a unique ID range. Use 600000+ for your own custom rules, or rename Bayu's rules to a range that doesn't conflict. If you rename, update every <if_sid> and <if_matched_sid> that references the old IDs.
3. Decoder Never Fires
Symptom: logtest shows "Phase 2 completed" but your decoder's fields aren't extracted. Rules with <decoded_as>my_decoder</decoded_as> never fire.
Root cause: The <program_name> doesn't match, or the <prematch> regex doesn't match the log format, or the child decoder references a parent that doesn't exist.
Fix: In logtest, look at what decoder did match in Phase 2. If the log was decoded as syslog instead of haproxy, your program_name or prematch is wrong. Check the raw log carefully, is HAProxy running as a different process name? Is the log format slightly different from what the regex expects?
4. PCRE2 Regex Doesn't Match
Symptom: A <field type="pcre2"> never matches, even though the field clearly contains the expected value.
Root cause: PCRE2 is strict about certain characters. A dot . in a regex matches any character, including backslashes and special chars. If your field contains backslashes (like Windows paths), you need to double-escape: \\\\ in XML becomes \\ in the regex, which matches a single backslash.
Fix: Test your regex separately before putting it in a rule. A quick Python one-liner:
python3 -c "import re; print(bool(re.search(r'(?i)w3wp\\.exe', 'C:\\\\Windows\\\\System32\\\\inetsrv\\\\w3wp.exe')))"
If this returns False, your regex is broken. Tweak it until it returns True, then paste the corrected pattern into your rule XML.
5. Frequency Rules Never Escalate
Symptom: The base rule fires repeatedly, but the escalation rule never triggers.
Root cause: (a) The timeframe is too short for your log volume. (b) You used <if_sid> instead of <if_matched_sid>. (c) Missing <same_source_ip /> when logs come from different IPs.
Fix: (a) Increase timeframe incrementally and test. (b) For correlation rules, always use <if_matched_sid> with frequency, <if_sid> is for simple parent-child chains. (c) Add <same_source_ip /> if you're correlating by attacker IP.
Beyond Rules: The Full Repo
Bayu's repository goes deeper than just decoders and rules. Here's what else you should explore:
- ESET Integration: A Python daemon that collects ESET antivirus logs and forwards them to Wazuh, paired with 400 KB of detection rules (that's 400 KB of XML, one of the largest single rule files in any open-source Wazuh repo).
- Active Response Scripts:
quarantine-webshell.shandremove-malware.pycan automatically isolate compromised hosts when a web shell or malware rule fires. Wire these to rules 500500–500502 for automated web shell containment. - Integration Scripts: Custom connectors for MISP (threat intelligence), TheHive (case management), DFIR-IRIS (incident response), and Telegram (alerting). These replace Wazuh's built-in integrations with more flexible Python alternatives.
- Browser Monitoring: A Python script that collects browser history from Chrome, Firefox, and Edge, useful for detecting phishing victims who visited known-malicious URLs.
- Sysmon Config: A 300 KB Sysmon configuration with granular event logging. Deploy it alongside the 255000-sysmon rules for deep Windows process monitoring.
Further Learning
Bayu Sangkaya's materi_wazuh repository is a full Wazuh training curriculum covering architecture, agent deployment, rule and decoder writing, integration development, and SOAR pipeline construction. If you're building a SOC on Wazuh, whether in Indonesia or anywhere else, start there.
For deep dives into decoder and rule syntax, Wazuh's official documentation is the reference: Custom Rules and Decoders and the Ruleset XML Syntax reference. The official docs are thorough but dry, pair them with Bayu's practical examples for the fastest learning curve.
If you find Bayu's work valuable, consider supporting him on Ko-fi or Trakteer. Open-source security maintainers rarely get compensated for the detection logic that protects thousands of organizations, a coffee goes a long way.
The best way to learn custom rules is to write them. Pick a log source you currently monitor manually, a login failure pattern, a port scan, an unusual DNS query, and write a decoder for it, then a rule, then test it with logtest. Start with level 5, tune it until it catches what you want without false positives, then escalate to level 10. The gap between "I read about custom rules" and "I wrote a rule that caught a real attack" is about three hours of focused work. Close it.
References
Bayu Sangkaya, Wazuh Custom Rules and Decoders. https://github.com/bayusky/wazuh-custom-rules-and-decoders
Bayu Sangkaya, Materi Wazuh (Training). https://github.com/bayusky/materi_wazuh
Bayu Sangkaya, LinkedIn. https://www.linkedin.com/in/bayu-sangkaya/
Wazuh Documentation, Rules & Decoders. https://documentation.wazuh.com/current/user-manual/ruleset/
Wazuh, Ruleset XML Syntax. https://documentation.wazuh.com/current/user-manual/ruleset/ruleset-xml-syntax/
Wazuh, Custom Rules & Decoders. https://documentation.wazuh.com/current/user-manual/ruleset/custom.html