Skip to content

Core Concepts

All applications in this repository are configured using a shared rule syntax. A rule is a YAML object composed of a trigger, optional conditions, and an action.

# A rule is defined by an optional name, a trigger, optional conditions, and an action.
- name: my-rule        # optional — used as the "rule_name" label in Prometheus metrics
  trigger:
    # ... defines what starts the rule evaluation (e.g., a NATS message or an HTTP request)
  conditions:
    # ... defines the logic to determine if the action should run (e.g., field value checks)
  action:
    # ... defines what to do if the conditions pass (e.g., publish a NATS message or make an HTTP call)

When name is omitted the trigger subject (NATS) or path (HTTP) is used as the metric label instead.

1. Triggers (The "If")

A trigger defines the event that initiates a rule evaluation.

NATS Trigger: Evaluates a message from a NATS subject.

trigger:
  nats:
    subject: "sensors.temperature.>" # Supports wildcards

HTTP Trigger: Evaluates an incoming HTTP request.

trigger:
  http:
    path: "/webhooks/github"   # Exact path match
    method: "POST"             # Optional, defaults to all methods

2. Conditions (The "When")

Conditions are an optional block of logic that must evaluate to true for the action to be executed.

Template Syntax

All field references must use explicit template syntax: {variable}

This consistent syntax applies to both the field (left side) and value (right side) of conditions.

Supported Variable Types:

  • Message fields: {temperature}, {user.name}
  • System variables: {@time.hour}, {@subject.1}, {@header.X-Device-ID}
  • KV lookups: {@kv.device_status.{device_id}:status}

Values can be:

  • Templates (for variable comparisons): {threshold}, {@kv.config:max_temp}
  • Literals: 30, "active", true

Basic Condition Examples

conditions:
  operator: and # or "or"
  items:
    # Compare message field to literal
    - field: "{temperature}"
      operator: gte
      value: 30

    # Check an HTTP header
    - field: "{@header.X-Device-Auth}"
      operator: "exists"

    # Check the time of day
    - field: "{@time.hour}"
      operator: gte
      value: 9

    # Check a value from a NATS KV store
    - field: "{@kv.device_status.{device_id}:status}"
      operator: "eq"
      value: "active"

Variable-to-Variable Comparisons

The rule engine now supports comparing variables to other variables, enabling dynamic thresholds and cross-field validation:

conditions:
  operator: and
  items:
    # Compare two message fields
    - field: "{end_timestamp}"
      operator: gt
      value: "{start_timestamp}"

    # Compare message field to KV value
    - field: "{temperature}"
      operator: gt
      value: "{@kv.sensor_config.{sensor_id}:max_temp}"

    # Compare to system variable
    - field: "{last_update_hour}"
      operator: eq
      value: "{@time.hour}"

    # Permission level check
    - field: "{user_level}"
      operator: gte
      value: "{@kv.permissions.{resource_id}:required_level}"

Type Handling

Variables are resolved to their native types for accurate comparison: - Numbers remain numbers: {temperature}25.5 (float) - Strings remain strings: {status}"active" - Booleans remain booleans: {enabled}true - Automatic type coercion: When comparing different types (e.g., string "30" vs number 30)

This ensures numeric comparisons work correctly:

# These both work correctly
- field: "{count}"   # count = 42 (number)
  operator: gt
  value: "{limit}"   # limit = 50 (number)

- field: "{count}"   # count = 42 (number)
  operator: gt
  value: "40"        # String "40" coerced to number

Common Use Cases

Dynamic Thresholds:

# Threshold stored in KV, different per sensor
- field: "{temperature}"
  operator: gt
  value: "{@kv.sensor_config.{sensor_id}:max_temp}"

Cross-Field Validation:

# Ensure end time is after start time
- field: "{end_timestamp}"
  operator: gt
  value: "{start_timestamp}"

Range Checks:

# Value must be within KV-defined range
conditions:
  operator: and
  items:
    - field: "{value}"
      operator: gte
      value: "{@kv.ranges.{sensor_type}:min}"
    - field: "{value}"
      operator: lte
      value: "{@kv.ranges.{sensor_type}:max}"

Access Control:

# User level must meet or exceed required level
- field: "{user_level}"
  operator: gte
  value: "{@kv.permissions.{resource_id}:required_level}"

Rate Limiting:

# Current usage must be below limit
- field: "{current_requests}"
  operator: lt
  value: "{@kv.rate_limits.{user_id}:requests_per_hour}"

Available Operators

Comparison: - eq - Equals - neq - Not equals - gt - Greater than - lt - Less than - gte - Greater than or equal - lte - Less than or equal - exists - Field exists (not null)

String/Array: - contains - String contains substring or array contains element - not_contains - Inverse of contains - in - Value is in array - not_in - Value is not in array

Array Operators: - any - At least one array element matches nested conditions - all - All array elements match nested conditions - none - No array elements match nested conditions

Time-Based: - recent - Timestamp is within time window (e.g., "5s", "1m", "1h")

3. Actions (The "Then")

An action defines the work to be done when a rule's conditions are met.

NATS Action: Publishes a new message to a NATS subject.

action:
  nats:
    subject: "alerts.high_temp.{device_id}"
    mode: core      # optional: "core" or "jetstream" (overrides global nats.publish.mode)
    payload: |
      {
        "alert": "High temperature detected!",
        "temp": {temperature},
        "device": "{device_id}",
        "timestamp": "{@timestamp()}"
      }

The optional mode field overrides the global nats.publish.mode for this action only. Use core for fire-and-forget (notifications, dashboards) or jetstream for durable delivery (audit logs, safety alerts). When omitted, the global setting applies. See Per-Rule Publish Mode Override for examples.

HTTP Action: Makes an outbound HTTP request to an external service.

action:
  http:
    url: "https://api.pagerduty.com/incidents"
    method: "POST"
    headers:
      Authorization: "Token ${PAGERDUTY_TOKEN}" # Env vars supported
    payload: '{"service": "app-alerts", "message": "{alert_message}"}'
    retry:
      maxAttempts: 3
      initialDelay: "1s"

4. Debounce

Debounce suppresses rapid re-fires of a rule within a configurable time window. When a rule has a debounce field, only the first matching message fires the action — subsequent matches within the window are silently dropped.

- trigger:
    nats:
      subject: "sensors.temperature.>"
  debounce: "30s"
  conditions:
    operator: and
    items:
      - field: "{temperature}"
        operator: gt
        value: 45
  action:
    nats:
      subject: "alerts.high_temp"
      payload: '{"temp": {temperature}, "device": "{device_id}"}'

The debounce field accepts a Go duration string at the rule level (alongside trigger, conditions, and action).

How It Works

  1. A message matches the rule's trigger (and optionally passes conditions).
  2. Shunt builds a debounce key from the trigger and action subjects: trigger_subject::action_subject.
  3. If this key has not fired within the debounce window, the action executes and the timestamp is recorded.
  4. If the key has fired within the window, the message is dropped and the messages_debounced_total metric increments.

Debounce state is in-memory only — it resets on process restart. The first message after a restart always fires.

Valid Duration Formats

Any Go time.Duration string: "5s", "1m", "1m30s", "2h", "500ms".

Examples

Sensor flood protection — only alert once per minute regardless of how many readings arrive:

- trigger:
    nats:
      subject: "sensors.temperature.>"
  debounce: "1m"
  conditions:
    operator: and
    items:
      - field: "{temperature}"
        operator: gt
        value: 45
  action:
    nats:
      subject: "alerts.high_temp.{device_id}"
      payload: '{"temp": {temperature}, "device": "{device_id}"}'

Notification deduplication — suppress duplicate webhook deliveries for 5 minutes:

- trigger:
    nats:
      subject: "events.deploy.>"
  debounce: "5m"
  action:
    http:
      url: "https://hooks.slack.com/services/T00/B00/xxx"
      method: "POST"
      payload: '{"text": "Deploy event on {service}"}'

Environment Variables

The rule engine supports environment variable expansion for static configuration values using ${VAR_NAME} syntax. This enables secure secret management and environment-specific configuration without hardcoding values in rule files.

How It Works

Environment variables are expanded at load time (when rules are loaded from the NATS KV bucket), not at runtime. This means:

  • Performance: Zero runtime overhead - values are substituted once during startup
  • Security: Secrets are never stored in rule files
  • Simplicity: Standard environment variable management (Docker, K8s, systemd, etc.)
  • Validation: Expanded values are validated along with the rest of the rule

Important: Environment variable expansion is completely separate from template variable substitution: - ${ENV_VAR} → Expanded at load time (static configuration) - {field} or {@system} → Resolved at runtime (per-message templating)

Syntax

# Use ${VARIABLE_NAME} anywhere in your rules
action:
  http:
    url: "https://api.example.com"
    headers:
      Authorization: "Bearer ${API_TOKEN}"

Where Can I Use Them?

Environment variables can be used in both conditions and actions:

In Actions

NATS Actions:

action:
  nats:
    subject: "alerts.${ENVIRONMENT}.critical"
    payload: |
      {
        "apiKey": "${SERVICE_API_KEY}",
        "region": "${AWS_REGION}"
      }
    headers:
      X-Service-Token: "${INTERNAL_TOKEN}"

HTTP Actions:

action:
  http:
    url: "${API_BASE_URL}/incidents"
    method: "POST"
    headers:
      Authorization: "Token ${PAGERDUTY_TOKEN}"
      X-Environment: "${DEPLOY_ENV}"
    payload: '{"service": "${SERVICE_NAME}"}'

In Conditions

Environment variables can also be used in condition values:

conditions:
  operator: and
  items:
    # Check if status matches expected value from env
    - field: "{status}"
      operator: eq
      value: "${EXPECTED_STATUS}"

    # Check if environment matches
    - field: "{environment}"
      operator: in
      value: ["${PRIMARY_ENV}", "${SECONDARY_ENV}"]

Missing Variables

If an environment variable is not set, the system will:

  1. Log a warning with details about the missing variable
  2. Substitute with an empty string
  3. Continue loading the rule (non-fatal)

Best Practice: Always set required environment variables before starting the application, or the rule may not work as intended.

Complete Example: Variable Comparisons with KV

# Dynamic threshold management with KV store
# KV: sensor_config["temp-001"] = {"max_temp": 35, "critical_temp": 45}
# Message: {"sensor_id": "temp-001", "temperature": 38, "location": "server_room"}

- trigger:
    nats:
      subject: "sensors.temperature"

  conditions:
    operator: and
    items:
      # Temperature exceeds configured max (warning level)
      - field: "{temperature}"
        operator: gt
        value: "{@kv.sensor_config.{sensor_id}:max_temp}"

      # But below critical level
      - field: "{temperature}"
        operator: lt
        value: "{@kv.sensor_config.{sensor_id}:critical_temp}"

      # Only during business hours
      - field: "{@time.hour}"
        operator: gte
        value: 9
      - field: "{@time.hour}"
        operator: lt
        value: 17

  action:
    nats:
      subject: "alerts.temperature.warning.{sensor_id}"
      payload: |
        {
          "alert": "Temperature warning - exceeds threshold",
          "sensor_id": "{sensor_id}",
          "location": "{location}",
          "current_temperature": {temperature},
          "thresholds": {
            "max": "{@kv.sensor_config.{sensor_id}:max_temp}",
            "critical": "{@kv.sensor_config.{sensor_id}:critical_temp}"
          },
          "triggered_at": "{@timestamp.iso}",
          "alert_id": "{@uuid7()}"
        }