Rules
YAML-based normalization rules provide an intuitive way to define pattern matching and transformation logic. Rules leverage syslog-ng's high-performance patterndb engine while offering a more organized and readable format than XML, plus multi-line sequence capabilities.
What It Does
Rules enable pattern-based log normalization with a user-friendly interface:
- Intuitive YAML syntax: Define patterns in readable, structured format
- High-performance matching: Powered by syslog-ng's proven patterndb engine
- Field extraction: Capture variable data from log lines
- Output formatting: Transform matched lines to consistent format
- Multi-line sequences: Group related lines together (unique to patterndb-yaml)
- Use case: Normalize heterogeneous logs for analysis, diff comparison, or monitoring
Key insight: Write rules once in YAML, get syslog-ng's performance plus enhanced capabilities.
Pattern Matching Engine
Under the hood, patterndb-yaml uses syslog-ng's patterndb as its matching engine. When you provide YAML rules, patterndb-yaml:
- Converts your YAML to syslog-ng's XML pattern database format
- Loads the patterns into syslog-ng's high-performance matching engine
- Applies patterns to normalize your logs
Benefits:
- High-performance matching: Leverages syslog-ng's optimized engine
- Proven reliability: Pattern matching used in production worldwide
- Enhanced capabilities: Multi-line sequences and simplified YAML syntax
Writing Normalization Rules
Basic Rule Structure
A normalization rule has three required components:
name: Unique identifier for the rulepattern: Match criteria (text literals and field captures)output: Normalized output format template
rules:
- name: rule_identifier # 1. Unique rule name
pattern: # 2. Match criteria
- text: "fixed string" # Literal text to match
- field: variable_data # Variable data to capture
output: "[tag:{field}]" # 3. Normalized output format
Example: Simple Text Matching
Match log lines by fixed text patterns:
Input: Application logs with severity levels
[INFO] Application started
[ERROR] Connection timeout
[WARN] Deprecated API used
[DEBUG] Cache hit for key: user_123
Log lines with different severity levels in square brackets.
Rules: Match severity levels
rules:
- name: log_info
pattern:
- text: "[INFO] "
- field: message
output: "[info:{message}]"
- name: log_error
pattern:
- text: "[ERROR] "
- field: message
output: "[error:{message}]"
- name: log_warn
pattern:
- text: "[WARN] "
- field: message
output: "[warn:{message}]"
- name: log_debug
pattern:
- text: "[DEBUG] "
- field: message
output: "[debug:{message}]"
Four rules matching INFO, ERROR, WARN, and DEBUG severity levels.
Output: Normalized severity levels
[info:Application started]
[error:Connection timeout]
[warn:Deprecated API used]
[debug:Cache hit for key: user_123]
Each log line is normalized to a consistent format with severity tag.
How it works:
- Each rule's
patternis tested against the input line - When a pattern matches, the rule's
outputformat is used - The
{message}placeholder is replaced with the captured field value
Example: Field Extraction
Extract multiple fields from structured log lines:
Input: Login events
User alice logged in from 192.168.1.100
User bob logged in from 10.0.0.50
Failed login attempt for charlie from 192.168.1.200
User admin logged in from 127.0.0.1
User login events with usernames and IP addresses.
Rules: Extract username and IP
rules:
- name: successful_login
pattern:
- text: "User "
- field: username
- text: " logged in from "
- field: ip_address
output: "[login-success:user={username},ip={ip_address}]"
- name: failed_login
pattern:
- text: "Failed login attempt for "
- field: username
- text: " from "
- field: ip_address
output: "[login-failed:user={username},ip={ip_address}]"
Rules that extract username and ip_address fields from login events.
Output: Normalized with extracted fields
[login-success:user=alice,ip=192.168.1.100]
[login-success:user=bob,ip=10.0.0.50]
[login-failed:user=charlie,ip=192.168.1.200]
[login-success:user=admin,ip=127.0.0.1]
Username and IP address extracted and formatted consistently.
How field extraction works:
- Pattern components are matched in order
textcomponents match literal stringsfieldcomponents capture variable data between text delimiters- Captured fields are available in the
outputformat string
Pattern Components
Text Matching
Match literal text exactly:
Characteristics:
- Case-sensitive matching
- ANSI escape codes are automatically stripped before matching
- Whitespace matters (must match exactly)
Field Extraction
Capture variable data:
pattern:
- field: username # Captures until next delimiter
- text: " from " # Delimiter
- field: ip_address # Captures until end of line
Field behavior:
- Fields capture text between delimiters
- Last field in pattern captures until end of line
- Field names must be valid YAML identifiers
Numbered Fields
Extract numeric values:
Number parser:
- Matches one or more digits (
0-9) - Useful for ports, IDs, counts, etc.
- Fails to match if non-digit characters are encountered
Output Formatting
The output field defines the normalized format:
Placeholders:
{fieldname}: Replaced with extracted field value- Literal text: Appears as-is in output
- Format is completely customizable
Common patterns:
# Tagged format
output: "[tag:data={data}]"
# Key-value format
output: "event=login user={user} ip={ip}"
# JSON-like format
output: '{"event":"login","user":"{user}"}'
# Simplified format
output: "{user}@{host}"
Rule Matching Behavior
Match Order
Rules are tested in the order they appear in the YAML file:
rules:
- name: specific_rule # Tested first
pattern:
- text: "WARN: deprecated"
output: "[deprecated-warning]"
- name: general_rule # Tested second
pattern:
- text: "WARN: "
output: "[warning]"
Best practice: Put more specific rules before general rules.
Unmatched Lines
Lines that don't match any rule are passed through unchanged:
# Input
Matched log line
Random unstructured text
Another matched line
# Output (with one rule matching "Matched")
[matched]
Random unstructured text ← Passed through
[matched]
Match Statistics
Use statistics to see match effectiveness:
Normalization Statistics
┌─────────────────┬────────┐
│ Lines Processed │ 1,000 │
│ Lines Matched │ 847 │
│ Match Rate │ 84.7% │
└─────────────────┴────────┘
Low match rates suggest missing rules for common patterns.
Advanced Features
Alternatives
Match any of several options:
pattern:
- text: "Status: "
- alternatives:
- - text: "OK"
- - text: "SUCCESS"
- - text: "PASSED"
- field: details
Matches "Status: OK", "Status: SUCCESS", or "Status: PASSED".
Transformations
Transform field values before output:
rules:
- name: clean_ansi
pattern:
- text: "Output: "
- field: message
output: "[clean:{message}]"
transformations:
message: strip_ansi # Remove ANSI escape codes
Available transformations:
strip_ansi: Remove ANSI color/formatting codes
Sequences: Multi-Line Pattern Matching
Sequences allow you to group related lines together for atomic processing. This is particularly useful for multi-line log entries like dialogs, stack traces, or multi-part messages.
A sequence consists of:
- Leader pattern: The first line that starts the sequence
- Follower patterns: Subsequent lines that belong to the sequence
- Buffering behavior: All sequence lines are buffered and output together
Example: Dialog Question-Answer Pairs
This example demonstrates how follower patterns can match lines based on their
position relative to a leader line. Notice that both answer lines (following
questions) and non-answer lines start with -, but only the answers are
matched because they follow a question leader.
Input: Interactive dialog logs
[Q] What is your name?
- My name is Alice
[Q] What is your favorite color?
- Blue
- Actually, green
[Q] Where do you live?
- New York
Regular log line without question/answer format
- This line starts with a dash but is not an answer
[Q] How old are you?
- 25
Lines 2, 4-5, 7, and 11 start with - and follow [Q] questions.
Line 9 also starts with - but does NOT follow a question.
Rules: Match question-answer sequences
rules:
- name: dialog_question
pattern:
- text: "[Q] "
- field: question
output: "[dialog-question:{question}]"
sequence:
followers:
- pattern:
- text: "- "
- field: answer
output: "[dialog-answer:{answer}]"
The follower pattern matches lines starting with -, but only when they
follow a [Q] leader line. This positional matching is key: the same
pattern (- ...) is treated differently based on context.
Output: Normalized sequences
[dialog-question:What is your name?]
[dialog-answer:My name is Alice]
[dialog-question:What is your favorite color?]
[dialog-answer:Blue]
[dialog-answer:Actually, green]
[dialog-question:Where do you live?]
[dialog-answer:New York]
Regular log line without question/answer format
- This line starts with a dash but is not an answer
[dialog-question:How old are you?]
[dialog-answer:25]
Questions are normalized to [dialog-question:...] format and follower
answers are normalized to [dialog-answer:...] format. All lines in a
sequence are buffered and output together atomically. Note that line 9
starts with - but is NOT normalized because it doesn't follow a question.
How sequences work:
- Leader line matches and starts buffering
- Follower lines are buffered (added to the sequence)
- Non-follower line ends the sequence
- All buffered lines are output together atomically
This ensures related lines stay together even during streaming processing.
When to use sequences:
- Multi-line error messages or stack traces
- Question-answer pairs or dialogs
- Header-detail log entries
- Any related group of lines that should be processed atomically
Common Patterns
HTTP Logs
rules:
- name: http_request
pattern:
- field: method
- text: " "
- field: path
- text: " HTTP/"
- field: version
output: "[http:{method},{path}]"
Timestamps
rules:
- name: iso_timestamp_log
pattern:
- field: timestamp # ISO 8601 timestamp
- text: " "
- field: message
output: "[log:{message}]" # Discard timestamp
Key-Value Pairs
rules:
- name: kv_pair
pattern:
- field: key
- text: "="
- field: value
output: "{key}={value}" # Preserve format
Error Messages
rules:
- name: exception
pattern:
- field: exception_type
- text: ": "
- field: error_message
output: "[error:type={exception_type},msg={error_message}]"
Rule Development Tips
Start Simple
Begin with basic patterns and iterate:
- Identify common log formats in your input
- Write simple rules matching key patterns
- Test with real data and check match rates
- Refine patterns based on unmatched lines
Use Explain Mode
Debug pattern matching with --explain:
EXPLAIN: [Line 42] Matched rule 'http_request'
EXPLAIN: [Line 42] Extracted fields: method='GET', path='/api/users'
EXPLAIN: [Line 42] Output: [http:GET,/api/users]
Validate Output
Check that output format is consistent:
# All output should start with same tag format
patterndb-yaml --rules rules.yaml logs.txt | grep -v '^\['
# Should return no lines (all lines start with [tag])
Performance Considerations
Rules are processed efficiently using syslog-ng's optimized engine:
- Cached normalization: Identical lines normalized once
- Sequential matching: First matching rule wins (no backtracking)
- ANSI stripping: Pre-compiled regex, minimal overhead
- Mature engine: Built on syslog-ng's established codebase
Best practice: Order rules from most specific to most general for optimal performance.
Rule of Thumb
Write rules that are:
- Specific enough to match intended patterns accurately
- General enough to handle slight variations
- Ordered from specific to general
- Tested with real log data to verify match rates
Avoid:
- Overly broad patterns that match unintended lines
- Duplicate rules with overlapping patterns
- Complex patterns when simple ones suffice
See Also
- Explain Mode - Debug pattern matching
- Statistics - Measure normalization effectiveness
- Generate XML - Export rules to syslog-ng format
- Quick Start - Quick introduction to rules
- syslog-ng Pattern Database Documentation - Underlying pattern matching engine