Custom Analysis Rules

Define organization-specific policies in YAML, JSON, or TOML. Custom rules produce structured signals with line numbers and evidence, same as built-ins.

Rules are pattern-matching only—no code execution, no external calls. The rule engine validates against an embedded JSON Schema (IDE autocomplete works offline).

Basic Example

# company_rules.yaml
metadata:
  name: "Acme Corp Security Rules"
  version: "1.0.0"

rules:
  - id: "ACME-001"
    name: "require-stage-encryption"
    risk_level: Critical
    message: "Stage created without encryption"
    enabled: true
    triggers:
      - statement_type: "CreateStageStatement"
        categorical_signal:
          category: GOVERNANCE
          surface: encryption
          condition: disabled

  - id: "ACME-020"
    name: "audit-policy-removal"
    risk_level: High
    message: "Data protection policy removed"
    enabled: true
    triggers:
      - statement_type: "AlterTableStatement"
        categorical_signal:
          category: GOVERNANCE
          surface: masking_policy
          condition: removed

Running Custom Rules

# Analyze with custom rules
lexega-sql analyze query.sql --custom-rules company_rules.yaml

# Combine with catalog
lexega-sql analyze query.sql --custom-rules rules.yaml --catalog prod.json

# Multiple rule files
lexega-sql analyze query.sql --custom-rules rules1.yaml --custom-rules rules2.yaml

Cloud Storage URIs

For larger or shared deployments, rules, policies, exceptions, and output artifacts can be read from or written to cloud storage. This enables separation of concerns: DevOps maintains rules, Security maintains policies, artifacts go to shared storage.

Scheme	Provider	CLI Required
`s3://bucket/key`	AWS S3	`aws`
`gs://bucket/key`	Google Cloud Storage	`gsutil`
`az://container/blob`	Azure Blob Storage	`az`
`https://...`	Any HTTPS endpoint (read-only)	`curl`

Reading from cloud storage:

# AWS S3
lexega-sql analyze query.sql --custom-rules s3://mybucket/governance/rules.yaml

# Google Cloud Storage  
lexega-sql analyze query.sql --policy gs://mybucket/security/policy.yaml

# Azure Blob Storage
lexega-sql analyze query.sql --exceptions az://governance/exceptions.yaml

# Pre-signed URLs
lexega-sql analyze query.sql --custom-rules "https://mybucket.s3.amazonaws.com/rules.yaml?sig=..."

Writing to cloud storage:

# Write decision artifact to S3
lexega-sql review main..HEAD models/ \
  --policy policy.yaml \
  --env prod \
  --decision-out s3://mybucket/ci/decisions/$GITHUB_RUN_ID/

# Write risk report to GCS
lexega-sql analyze query.sql --report-out gs://mybucket/reports/risk.json

Severity vs Enforcement

Signal severity (risk_level): Each signal carries a risk_level (Critical/High/Medium/Low). Built-in signals set this in code; custom rules set it via the rule's risk_level field.
Policy enforcement (--policy): The policy layer is the only way to block. A policy bundle can allow/warn/block by referencing rule IDs (e.g., block rule C050 in prod). Policies are an enforcement layer on top of signals — they reference rules, not define detection logic.

Rule Structure

Field	Type	Description
`id`	string	Unique identifier (avoid C### to prevent conflict with built-in rules)
`name`	string	Human-readable rule name (kebab-case recommended)
`risk_level`	enum	Critical, High, Medium, or Low
`message`	string	Signal description shown to users
`enabled`	boolean	Whether rule is active (default: true)
`triggers`	array	Conditions that fire the rule (ANY match triggers signal)

Trigger Patterns

Each trigger matches on:

statement_type: AST node type (e.g., CreateStageStatement, AlterTableStatement)
categorical_signal: Governance/security signals extracted during semantic analysis
- category: GOVERNANCE, SECURITY, DATA_ACCESS, DATA_INTEGRITY, PERFORMANCE, OPERATIONS, QUERY, SEMANTICS
- surface: What's being changed (e.g., encryption, masking_policy, network_policy)
- condition: Optional qualifier (disabled, enabled, changed, removed, dropped)

Statement Type Aliases

Use aliases to match groups of related statements without listing each one:

Alias	Expands To	Use Case
`QueryStatement`	`SelectStatement`, `SetSelectStatement`	All queries (SELECT, UNION, INTERSECT, EXCEPT)
`DmlWriteStatement`	`InsertStatement`, `UpdateStatement`, `DeleteStatement`, `MergeStatement`, `MultiInsertStatement`	All data-modifying statements
`DmlStatement`	All query + write statements	Any DML
`TableStatement`	`CreateTableStatement`, `AlterTableStatement`, `DropTableStatement`, `TruncateStatement`	Table DDL
`PolicyStatement`	All masking, row access, network, session, password, auth, aggregation policy statements	Any policy change
`StageStatement`	`CreateStageStatement`, `AlterStageStatement`, `DropStageStatement`	Stage DDL
`IntegrationStatement`	All API, storage, notification, external access integration statements	Any integration change
`CopyStatement`	`CopyIntoTableStatement`, `CopyIntoLocationStatement`	COPY INTO

Example: Match any query (not just simple SELECT):

triggers:
  - statement_type: QueryStatement  # Matches SELECT, UNION, INTERSECT, EXCEPT
    categorical_signal:
      category: DATA_ACCESS

⚠️ Common mistake: Using SelectStatement when you mean QueryStatement. A UNION query is a SetSelectStatement, not a SelectStatement.

Common Signals

Signal Pattern	Use Case
`encryption:disabled`	Stage created with TYPE='NONE'
`masking_policy:removed`	Masking policy unset from table column
`network_policy:dropped`	DROP NETWORK POLICY statement
`session_policy:removed`	Session timeout settings cleared
`tag:unset`	Governance tag removed from object

Advanced Features

Table/Column-Level Rules: Match on specific table/column names referenced in queries
Multi-Condition Triggers: Require multiple signals present simultaneously
YAML/TOML/JSON Support: Use your preferred format (YAML recommended for readability)
Debug Mode: --verbose shows why rules almost matched (missing signals)

Execution Order: Built-in analyzers run first. Custom rules execute as part of the policy analyzer (after built-in policy/governance checks). After all analyzers run, signals are deduplicated by topic: when a custom signal exists for a topic, the corresponding built-in signal(s) for that topic are suppressed (custom takes precedence). In addition, built-in blast-radius signals may be suppressed for statements covered by a custom rule.