Custom Analysis Rules

Define organization-specific policies in YAML, JSON, or TOML. Custom rules produce structured signals with line numbers and evidence, same as built-ins.

Rules are pattern-matching only—no code execution, no external calls. The rule engine validates against an embedded JSON Schema (IDE autocomplete works offline).

Basic Example

# company_rules.yaml
metadata:
  name: "Acme Corp Security Rules"
  version: "1.0.0"

rules:
  - id: "ACME-001"
    name: "require-stage-encryption"
    risk_level: Critical
    message: "Stage created without encryption"
    enabled: true
    triggers:
      - statement_type: "CreateStageStatement"
        categorical_signal:
          category: GOVERNANCE
          surface: encryption
          condition: disabled

  - id: "ACME-020"
    name: "audit-policy-removal"
    risk_level: High
    message: "Data protection policy removed"
    enabled: true
    triggers:
      - statement_type: "AlterTableStatement"
        categorical_signal:
          category: GOVERNANCE
          surface: masking_policy
          condition: removed

Running Custom Rules

# Analyze with custom rules
lexega-sql analyze query.sql --custom-rules company_rules.yaml

# Combine with catalog
lexega-sql analyze query.sql --custom-rules rules.yaml --catalog prod.json

# Multiple rule files
lexega-sql analyze query.sql --custom-rules rules1.yaml --custom-rules rules2.yaml

Cloud Storage URIs

For larger or shared deployments, rules, policies, exceptions, and output artifacts can be read from or written to cloud storage. This enables separation of concerns: DevOps maintains rules, Security maintains policies, artifacts go to shared storage.

SchemeProviderCLI Required
s3://bucket/keyAWS S3aws
gs://bucket/keyGoogle Cloud Storagegsutil
az://container/blobAzure Blob Storageaz
https://...Any HTTPS endpoint (read-only)curl

Reading from cloud storage:

# AWS S3
lexega-sql analyze query.sql --custom-rules s3://mybucket/governance/rules.yaml

# Google Cloud Storage  
lexega-sql analyze query.sql --policy gs://mybucket/security/policy.yaml

# Azure Blob Storage
lexega-sql analyze query.sql --exceptions az://governance/exceptions.yaml

# Pre-signed URLs
lexega-sql analyze query.sql --custom-rules "https://mybucket.s3.amazonaws.com/rules.yaml?sig=..."

Writing to cloud storage:

# Write decision artifact to S3
lexega-sql review main..HEAD models/ \
  --policy policy.yaml \
  --env prod \
  --decision-out s3://mybucket/ci/decisions/$GITHUB_RUN_ID/

# Write risk report to GCS
lexega-sql analyze query.sql --report-out gs://mybucket/reports/risk.json

Severity vs Enforcement

  • Signal severity (risk_level): Each signal carries a risk_level (Critical/High/Medium/Low). Built-in signals set this in code; custom rules set it via the rule's risk_level field.
  • Policy enforcement (--policy): The policy layer is the only way to block. A policy bundle can allow/warn/block by referencing rule IDs (e.g., block rule C050 in prod). Policies are an enforcement layer on top of signals — they reference rules, not define detection logic.

Rule Structure

FieldTypeDescription
idstringUnique identifier (avoid C### to prevent conflict with built-in rules)
namestringHuman-readable rule name (kebab-case recommended)
risk_levelenumCritical, High, Medium, or Low
messagestringSignal description shown to users
enabledbooleanWhether rule is active (default: true)
triggersarrayConditions that fire the rule (ANY match triggers signal)

Trigger Patterns

Each trigger matches on:

  • statement_type: AST node type (e.g., CreateStageStatement, AlterTableStatement)
  • categorical_signal: Governance/security signals extracted during semantic analysis
    • category: GOVERNANCE, SECURITY, DATA_ACCESS, DATA_INTEGRITY, PERFORMANCE, OPERATIONS, QUERY, SEMANTICS
    • surface: What's being changed (e.g., encryption, masking_policy, network_policy)
    • condition: Optional qualifier (disabled, enabled, changed, removed, dropped)

Statement Type Aliases

Use aliases to match groups of related statements without listing each one:

AliasExpands ToUse Case
QueryStatementSelectStatement, SetSelectStatementAll queries (SELECT, UNION, INTERSECT, EXCEPT)
DmlWriteStatementInsertStatement, UpdateStatement, DeleteStatement, MergeStatement, MultiInsertStatementAll data-modifying statements
DmlStatementAll query + write statementsAny DML
TableStatementCreateTableStatement, AlterTableStatement, DropTableStatement, TruncateStatementTable DDL
PolicyStatementAll masking, row access, network, session, password, auth, aggregation policy statementsAny policy change
StageStatementCreateStageStatement, AlterStageStatement, DropStageStatementStage DDL
IntegrationStatementAll API, storage, notification, external access integration statementsAny integration change
CopyStatementCopyIntoTableStatement, CopyIntoLocationStatementCOPY INTO

Example: Match any query (not just simple SELECT):

triggers:
  - statement_type: QueryStatement  # Matches SELECT, UNION, INTERSECT, EXCEPT
    categorical_signal:
      category: DATA_ACCESS

⚠️ Common mistake: Using SelectStatement when you mean QueryStatement. A UNION query is a SetSelectStatement, not a SelectStatement.

Common Signals

Signal PatternUse Case
encryption:disabledStage created with TYPE='NONE'
masking_policy:removedMasking policy unset from table column
network_policy:droppedDROP NETWORK POLICY statement
session_policy:removedSession timeout settings cleared
tag:unsetGovernance tag removed from object

Advanced Features

  • Table/Column-Level Rules: Match on specific table/column names referenced in queries
  • Multi-Condition Triggers: Require multiple signals present simultaneously
  • YAML/TOML/JSON Support: Use your preferred format (YAML recommended for readability)
  • Debug Mode: --verbose shows why rules almost matched (missing signals)

Execution Order: Built-in analyzers run first. Custom rules execute as part of the policy analyzer (after built-in policy/governance checks). After all analyzers run, signals are deduplicated by topic: when a custom signal exists for a topic, the corresponding built-in signal(s) for that topic are suppressed (custom takes precedence). In addition, built-in blast-radius signals may be suppressed for statements covered by a custom rule.

Need Help?

Can't find what you're looking for? Check out our GitHub or reach out to support.