Wazuh Digest any source!
How I Built a Custom Wazuh Log Ingest Pipeline (And Ditched the Wodle)
If you’ve ever tried to push custom logs into Wazuh, you’ve probably stumbled across something called a Wodle. Wazuh uses these built-in scripts to collect and parse data—especially useful for integrations like AWS.
So… Wodle for AWS?
Sure, Wodle can collect AWS logs. But when I tried using it for my AWS environment, things didn’t exactly go as planned. Here’s what went wrong:
- My GitHub logs were in gzip format—Wodle didn’t like that.
- Paths were hardcoded—super annoying.
- Everything felt like the sun and the stars had to align for it to work.
It just wasn’t cutting it for me.
Flipping the Script: API + Socket = Freedom
After trying every possible way to feed data into Wazuh, I decided to flip the entire idea:
Why integrate every log source into Wazuh?
Instead—why not push to Wazuh using a generic method?
So, I built a tiny Python API (literally ~20 lines of code) that accepts JSON logs and dumps them directly into the Wazuh socket.
import socket
import json
from flask import Flask, request
app = Flask(__name__)
sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
@app.route("/ingest", methods=["POST"])
def ingest():
log = json.dumps(request.json)
sock.sendto(log.encode(), "/var/ossec/queue/ossec/queue")
return "ok", 200
Yes, the socket is open. Is this a security risk? Maybe. But it works.
Why This Rocks: Any Source, Any Tool
With this API in place, I can now use any log processing tool I love—like:
- Vector
- Fluentd
- Filebeat
These tools can read data from S3, SQS, or local files, format it however I want, and push it straight to my Wazuh agents through this simple API.
Now I can ingest logs from:
- GitHub
- AWS (CloudTrail, GuardDuty, etc.)
- Custom SSO Systems
- Security Scanners
- CI/CD Pipelines
Logs are stored in S3 and read using SQS. Everything gets parsed, enriched, and fed into Wazuh agents.
Custom Sources & CI/CD Integration
It’s an API—so you can use curl
or any HTTP client to POST data. That means you can:
- Pipe alerts from your CI/CD pipeline
- Integrate custom alert systems
- Push logs from custom tools
It’s simple, flexible, and works with whatever you’ve got.
Vector Example: SQS to Wazuh
I use Vector.dev to read logs from SQS and send them to Wazuh.
Here’s a simplified snippet from my Vector config: Full config in the Agents gitrepo
[sources.aws_sqs]
type = "aws_sqs"
region = "eu-west-1"
queue_url = "https://sqs.eu-west-1.amazonaws.com/..."
transforms:
s3sso_parser:
type: remap
inputs:
- s3sso_s3
source: |-
.aws = parse_json!(.message)
.parsed_type = "json"
.integration = "aws"
.aws.source = "sso"
del(.message)
s3access_parser:
type: remap
inputs:
- s3access_s3
source: |-
.integration = "aws"
.structured, err = parse_json(.message) ?? parse_common_log(.message) ?? parse_apache_log(.message, "common") ?? parse_aws_alb_log(.message) ?? parse_regex(.message, r'^(?P<env>\S+) (?P<tls>\S+) (?P<time_action>\S+) (?P<url>\S+) (?P<account>\S+) (?P<source>\S+) (?P<destip>\S+) (?P<size>\S+) (?P<time>\S+) (?P<response>\S+) (?P<value1>\S+) (?P<value2>\S+) (?P<value3>\S+) (?P<value4>\S+) (?P<value5>\S+) (?P<value6>\S+) (?P<value12>\S+) (?P<value7>\S+) (?P<value8>\S+) (?P<value9>\S+) (?P<value10>\S+) (?P<value11>)') ?? parse_regex(.message,r'^(?P<owner>\S+) (?P<bucket>\S+) \[(?P<timestamp>[^\]]+)\] (?P<ip_address>\S+) (?P<requester>\S+) (?P<request_id>\S+) (?P<operation>\S+) (?P<key>\S+) (?P<request_uri>\S+) (?P<status>\S+) (?P<error_code>\S+) (?P<response_bytes>\S+) (?P<object_size>\S+) (?P<request_ms>\S+) (?P<processing_ms>\S+) (?P<referrer>\S+) (?P<user_agent>[^"]+) (?P<version_id>\S+)')
if err != null {
.parsed_type = "error parsing s3_access"
}
.aws = .structured
del(.structured)
sinks:
wazuh-ingest:
inputs:
- dest_ready
type: http
uri: http://wazuh-agent.integsrv:9001/batch
method: put
encoding:
codec: json
batch:
max_events: 70
timeout_secs: 2
request:
rate_limit_duration_secs: 10
rate_limit_num: 10
retry_initial_backoff_secs: 5
It’s clean and lets me process logs before they ever hit Wazuh.
Custom Values & Enrichment
Once everything lands in one place, you can do smart things—like merge user data across sources (e.g., GitHub + AWS).
This makes:
- Alerting more precise
- Indexing faster
- Searches more unified
You can define custom objects and values in your logs. Super helpful for SIEM correlation.
Wazuh Rules: Triggering the Right Logic
To activate Wazuh’s built-in rules, you just need to format your JSON correctly. Example:
{
"integration": "aws",
"aws": {
"source": "cloudtrail",
"eventName": "ConsoleLogin",
...
}
}
Wazuh will pick up that it’s AWS CloudTrail data and apply its native detection logic.
GitHub and other integrations follow the same idea—just use the right
"integration"
key and structure.
Need help writing rules? ChatGPT + Wazuh rules repo on GitHub is your best friend. Just paste sample logs and go.
Final Thoughts
I’m no longer waiting for Wodle updates or hoping the integration will work out of the box. My new setup means:
- Logs from any source can go into Wazuh
- I can customize, enrich, and standardize logs
- I’ve built a flexible SIEM pipeline that works on my terms
All the code, config, and examples are available on my GitHub: https://github.com/samma-io/wazuh-agent