Custom Analysis Rules
Define organization-specific policies in YAML, JSON, or TOML. Custom rules produce structured signals with line numbers and evidence, same as built-ins.
Rules are pattern-matching only—no code execution, no external calls. The rule engine validates against an embedded JSON Schema (IDE autocomplete works offline).
Basic Example
# company_rules.yaml
metadata:
name: "Acme Corp Security Rules"
version: "1.0.0"
rules:
- id: "ACME-001"
name: "require-stage-encryption"
risk_level: Critical
message: "Stage created without encryption"
enabled: true
triggers:
- statement_type: "CreateStageStatement"
categorical_signal:
category: GOVERNANCE
surface: encryption
condition: disabled
- id: "ACME-020"
name: "audit-policy-removal"
risk_level: High
message: "Data protection policy removed"
enabled: true
triggers:
- statement_type: "AlterTableStatement"
categorical_signal:
category: GOVERNANCE
surface: masking_policy
condition: removed
Running Custom Rules
# Analyze with custom rules
lexega-sql analyze query.sql --custom-rules company_rules.yaml
# Combine with catalog
lexega-sql analyze query.sql --custom-rules rules.yaml --catalog prod.json
# Multiple rule files
lexega-sql analyze query.sql --custom-rules rules1.yaml --custom-rules rules2.yaml
Cloud Storage URIs
For larger or shared deployments, rules, policies, exceptions, and output artifacts can be read from or written to cloud storage. This enables separation of concerns: DevOps maintains rules, Security maintains policies, artifacts go to shared storage.
| Scheme | Provider | CLI Required |
|---|---|---|
s3://bucket/key | AWS S3 | aws |
gs://bucket/key | Google Cloud Storage | gsutil |
az://container/blob | Azure Blob Storage | az |
https://... | Any HTTPS endpoint (read-only) | curl |
Reading from cloud storage:
# AWS S3
lexega-sql analyze query.sql --custom-rules s3://mybucket/governance/rules.yaml
# Google Cloud Storage
lexega-sql analyze query.sql --policy gs://mybucket/security/policy.yaml
# Azure Blob Storage
lexega-sql analyze query.sql --exceptions az://governance/exceptions.yaml
# Pre-signed URLs
lexega-sql analyze query.sql --custom-rules "https://mybucket.s3.amazonaws.com/rules.yaml?sig=..."
Writing to cloud storage:
# Write decision artifact to S3
lexega-sql review main..HEAD models/ \
--policy policy.yaml \
--env prod \
--decision-out s3://mybucket/ci/decisions/$GITHUB_RUN_ID/
# Write risk report to GCS
lexega-sql analyze query.sql --report-out gs://mybucket/reports/risk.json
Severity vs Enforcement
- Signal severity (
risk_level): Each signal carries arisk_level(Critical/High/Medium/Low). Built-in signals set this in code; custom rules set it via the rule'srisk_levelfield. - Policy enforcement (
--policy): The policy layer is the only way to block. A policy bundle canallow/warn/blockby referencing rule IDs (e.g., block ruleC050in prod). Policies are an enforcement layer on top of signals — they reference rules, not define detection logic.
Rule Structure
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier (avoid C### to prevent conflict with built-in rules) |
name | string | Human-readable rule name (kebab-case recommended) |
risk_level | enum | Critical, High, Medium, or Low |
message | string | Signal description shown to users |
enabled | boolean | Whether rule is active (default: true) |
triggers | array | Conditions that fire the rule (ANY match triggers signal) |
Trigger Patterns
Each trigger matches on:
- statement_type: AST node type (e.g.,
CreateStageStatement,AlterTableStatement) - categorical_signal: Governance/security signals extracted during semantic analysis
category: GOVERNANCE, SECURITY, DATA_ACCESS, DATA_INTEGRITY, PERFORMANCE, OPERATIONS, QUERY, SEMANTICSsurface: What's being changed (e.g., encryption, masking_policy, network_policy)condition: Optional qualifier (disabled, enabled, changed, removed, dropped)
Statement Type Aliases
Use aliases to match groups of related statements without listing each one:
| Alias | Expands To | Use Case |
|---|---|---|
QueryStatement | SelectStatement, SetSelectStatement | All queries (SELECT, UNION, INTERSECT, EXCEPT) |
DmlWriteStatement | InsertStatement, UpdateStatement, DeleteStatement, MergeStatement, MultiInsertStatement | All data-modifying statements |
DmlStatement | All query + write statements | Any DML |
TableStatement | CreateTableStatement, AlterTableStatement, DropTableStatement, TruncateStatement | Table DDL |
PolicyStatement | All masking, row access, network, session, password, auth, aggregation policy statements | Any policy change |
StageStatement | CreateStageStatement, AlterStageStatement, DropStageStatement | Stage DDL |
IntegrationStatement | All API, storage, notification, external access integration statements | Any integration change |
CopyStatement | CopyIntoTableStatement, CopyIntoLocationStatement | COPY INTO |
Example: Match any query (not just simple SELECT):
triggers:
- statement_type: QueryStatement # Matches SELECT, UNION, INTERSECT, EXCEPT
categorical_signal:
category: DATA_ACCESS
⚠️ Common mistake: Using SelectStatement when you mean QueryStatement. A UNION query is a SetSelectStatement, not a SelectStatement.
Common Signals
| Signal Pattern | Use Case |
|---|---|
encryption:disabled | Stage created with TYPE='NONE' |
masking_policy:removed | Masking policy unset from table column |
network_policy:dropped | DROP NETWORK POLICY statement |
session_policy:removed | Session timeout settings cleared |
tag:unset | Governance tag removed from object |
Advanced Features
- Table/Column-Level Rules: Match on specific table/column names referenced in queries
- Multi-Condition Triggers: Require multiple signals present simultaneously
- YAML/TOML/JSON Support: Use your preferred format (YAML recommended for readability)
- Debug Mode:
--verboseshows why rules almost matched (missing signals)
Execution Order: Built-in analyzers run first. Custom rules execute as part of the policy analyzer (after built-in policy/governance checks). After all analyzers run, signals are deduplicated by topic: when a custom signal exists for a topic, the corresponding built-in signal(s) for that topic are suppressed (custom takes precedence). In addition, built-in blast-radius signals may be suppressed for statements covered by a custom rule.
Need Help?
Can't find what you're looking for? Check out our GitHub or reach out to support.