Cloud Storage Pipelines
CI runners are ephemeral: anything written to the runner's disk disappears when the job ends. If your pipeline writes decision and report artifacts to a local .lexega/ directory, there is nothing left for the dashboard to read once the job completes.
The fix is to write artifacts to cloud storage instead. The artifact flags — --decision-out and --report-out on analyze/diff/review, and --data-dir on dashboard — accept an s3://, gs://, or az:// URI in place of a local path. So do the config inputs (--policy, --exceptions, --custom-rules, catalog snapshots), which additionally accept https:// URLs for pre-signed or public endpoints. Only the SQL paths themselves are local.
CI run 1 ──▶ s3://my-bucket/lexega-data/decisions/<run-1>/
CI run 2 ──▶ s3://my-bucket/lexega-data/decisions/<run-2>/ ──▶ lexega-sql dashboard
CI run 3 ──▶ s3://my-bucket/lexega-data/decisions/<run-3>/ --data-dir s3://my-bucket/lexega-data
Each run appends to the same bucket prefix; the dashboard reads the whole history from your machine. This page walks through the full setup for each provider — bucket creation, CI authentication, the workflow itself, and reading it back.
Two Streams, One Bucket
A complete history takes two kinds of runs, and they answer different questions:
- PR runs (
review) gate each change before it merges. Their decision artifacts are the record of what was caught — including merges that were blocked and therefore never reachmain. If artifacts are only published frommain, every catch is lost with the runner that found it. - Mainline runs (
analyze) snapshot the repo's full SQL after each merge — what's actually live, tracked over time.
Both streams write to the same bucket prefix. Artifacts record which kind of run produced them, and the dashboard keeps the two series separate — blocked/allowed rates come from PR runs, posture trends from mainline snapshots. The workflows below run both from a single file with two triggers.
One caveat for public repos: pull requests from forks don't receive OIDC tokens or secrets, so the PR stream assumes branches in the same repo — the norm for internal data platform repos.
The identity that powers this grouping — repository, CI run, and PR number — is detected automatically from the CI environment on GitHub Actions, GitLab CI, Azure DevOps, and Bitbucket; outside CI, pass --repo, --run-id, and --change-id (the repository also falls back to the checkout's git remote). Artifacts written by older Lexega versions carry no run scope and count in both views until those pipelines upgrade.
How It Works
Lexega delegates cloud I/O to the provider's own CLI:
| URI scheme | Tool invoked | Example |
|---|---|---|
s3://bucket/path | aws s3 cp | s3://my-bucket/lexega-data/decisions/123/ |
gs://bucket/path | gsutil cp | gs://my-bucket/lexega-data/decisions/123/ |
az://container/path | az storage blob | az://my-container/lexega-data/decisions/123/ |
If you can run aws s3 cp (or gsutil cp, or az storage blob upload) in a shell on the runner, Lexega can write artifacts there — no separate credential configuration. GitHub-hosted Ubuntu runners preinstall the AWS and Azure CLIs; the Google Cloud CLI was dropped from the ubuntu-24.04 image (today's ubuntu-latest), so the GCS workflow below installs it with setup-gcloud. On self-hosted or container-based runners, install the CLI you need.
For az:// URIs the storage account is not part of the URI. The Azure CLI resolves it from environment variables: set AZURE_STORAGE_ACCOUNT (account name) and either AZURE_STORAGE_AUTH_MODE=login (use the signed-in identity) or AZURE_STORAGE_KEY / AZURE_STORAGE_CONNECTION_STRING.
All three providers use the same directory contract the dashboard expects:
<prefix>/
decisions/<run-id>/decision.json
reports/<run-id>/risk_report.json
Use a unique per-run directory (e.g. $GITHUB_RUN_ID) so each pipeline run produces distinct artifacts. A policy block exits with code 2, which fails the CI job — no extra gating logic needed. Artifacts are uploaded before the exit code is decided, so blocked runs appear in the dashboard too (those are usually the ones you most want to see).
The workflows below assume your repo has SQL under models/ and a committed .lexega/policy.yml (created by lexega-sql init — see Quick Start). Adjust both paths to your layout.
Centralized Policy (optional)
Because --policy resolves URIs the same way, the policy doesn't have to live in the repo at all. Host it in the bucket and every pipeline pulls the same gate at run time — useful when a security team owns the policy and application repos shouldn't be able to edit their own enforcement:
lexega-sql analyze models/ -r \
--policy s3://my-bucket/lexega-data/policy.yml --env prod \
--decision-out s3://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out s3://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/
If you do this, grant the CI identity read access to the policy object (e.g. s3:GetObject on my-bucket/lexega-data/policy.yml) — the per-provider setups below are deliberately write-only. A https:// pre-signed URL works as well.
AWS S3
One-time setup
Create the bucket and a role that GitHub Actions can assume via OIDC (no long-lived keys stored in CI):
aws s3 mb s3://my-bucket
# Register GitHub as an OIDC provider (skip if your account already has it)
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com
Create a role lexega-ci-writer that your repo's workflows can assume (replace account ID, org, and repo):
cat > trust.json <<'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com" },
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
"StringLike": { "token.actions.githubusercontent.com:sub": "repo:my-org/my-repo:*" }
}
}]
}
EOF
aws iam create-role --role-name lexega-ci-writer \
--assume-role-policy-document file://trust.json
# CI only needs to write
cat > write.json <<'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/lexega-data/*"
}]
}
EOF
aws iam put-role-policy --role-name lexega-ci-writer \
--policy-name lexega-artifact-write --policy-document file://write.json
Workflow
name: SQL Governance
on:
pull_request:
push:
branches: [main]
permissions:
id-token: write # required for OIDC
contents: read
jobs:
lexega:
runs-on: ubuntu-latest
env:
LEXEGA_LICENSE_KEY: ${{ secrets.LEXEGA_LICENSE_KEY }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # review diffs against the PR base
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/lexega-ci-writer
aws-region: us-east-1
- name: Install Lexega
run: |
curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.11.0/lexega-sql-linux-x64
curl -sSL https://github.com/Lexega/releases/releases/download/v1.11.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
sudo install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql
- name: Review the change (PR gate)
if: github.event_name == 'pull_request'
run: |
lexega-sql review ${{ github.event.pull_request.base.sha }}..${{ github.sha }} . -r \
--policy .lexega/policy.yml --env prod \
--decision-out s3://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out s3://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/
- name: Snapshot the repo (merges to main)
if: github.event_name == 'push'
run: |
lexega-sql analyze models/ -r \
--policy .lexega/policy.yml --env prod \
--decision-out s3://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out s3://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/
View the dashboard
From your machine, with read access to the bucket (s3:GetObject + s3:ListBucket):
aws sso login # or any other aws auth method
lexega-sql dashboard --data-dir s3://my-bucket/lexega-data
Google Cloud Storage
One-time setup
Create the bucket, a service account, and a Workload Identity Federation pool so GitHub Actions can authenticate without exported keys (replace PROJECT_ID, PROJECT_NUMBER, my-org/my-repo):
gsutil mb gs://my-bucket
gcloud iam service-accounts create lexega-ci
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
--member="serviceAccount:lexega-ci@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
gcloud iam workload-identity-pools create github --location=global
gcloud iam workload-identity-pools providers create-oidc github-actions \
--location=global --workload-identity-pool=github \
--issuer-uri="https://token.actions.githubusercontent.com" \
--attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
--attribute-condition="assertion.repository=='my-org/my-repo'"
gcloud iam service-accounts add-iam-policy-binding \
lexega-ci@PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github/attribute.repository/my-org/my-repo"
Workflow
name: SQL Governance
on:
pull_request:
push:
branches: [main]
permissions:
id-token: write # required for Workload Identity Federation
contents: read
jobs:
lexega:
runs-on: ubuntu-latest
env:
LEXEGA_LICENSE_KEY: ${{ secrets.LEXEGA_LICENSE_KEY }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # review diffs against the PR base
- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v2
with:
workload_identity_provider: projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github/providers/github-actions
service_account: lexega-ci@PROJECT_ID.iam.gserviceaccount.com
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Install Lexega
run: |
curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.11.0/lexega-sql-linux-x64
curl -sSL https://github.com/Lexega/releases/releases/download/v1.11.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
sudo install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql
- name: Review the change (PR gate)
if: github.event_name == 'pull_request'
run: |
lexega-sql review ${{ github.event.pull_request.base.sha }}..${{ github.sha }} . -r \
--policy .lexega/policy.yml --env prod \
--decision-out gs://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out gs://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/
- name: Snapshot the repo (merges to main)
if: github.event_name == 'push'
run: |
lexega-sql analyze models/ -r \
--policy .lexega/policy.yml --env prod \
--decision-out gs://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out gs://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/
View the dashboard
From your machine, with roles/storage.objectViewer on the bucket:
gcloud auth login
lexega-sql dashboard --data-dir gs://my-bucket/lexega-data
Azure Blob Storage
One-time setup
Create a storage account and container, then an app registration with a federated credential so GitHub Actions can sign in via OIDC:
az group create -n lexega-rg -l eastus
az storage account create -n mylexegastore -g lexega-rg # account names are globally unique
# --auth-mode key for the one-time setup: subscription Owner/Contributor can list
# account keys, but does NOT implicitly hold data-plane roles, so --auth-mode login
# fails here unless you've granted yourself a Storage Blob Data role
az storage container create -n my-container \
--account-name mylexegastore --auth-mode key
# App registration + service principal for CI
APP_ID=$(az ad app create --display-name lexega-ci --query appId -o tsv)
az ad sp create --id $APP_ID
# Allow it to write blobs (Contributor includes read; use it for CI)
az role assignment create --assignee $APP_ID \
--role "Storage Blob Data Contributor" \
--scope $(az storage account show -n mylexegastore -g lexega-rg --query id -o tsv)
# Trust GitHub Actions OIDC tokens from your repo. Federated credentials
# match the OIDC subject exactly, so the two triggers need one credential each:
az ad app federated-credential create --id $APP_ID --parameters '{
"name": "github-main",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:my-org/my-repo:ref:refs/heads/main",
"audiences": ["api://AzureADTokenExchange"]
}'
az ad app federated-credential create --id $APP_ID --parameters '{
"name": "github-pr",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:my-org/my-repo:pull_request",
"audiences": ["api://AzureADTokenExchange"]
}'
(The AWS and GCS setups need no equivalent second step — their trust conditions match on the whole repo, which covers both triggers.)
Workflow
az:// URIs name only the container and blob path; the storage account and auth mode come from environment variables:
name: SQL Governance
on:
pull_request:
push:
branches: [main]
permissions:
id-token: write # required for OIDC
contents: read
jobs:
lexega:
runs-on: ubuntu-latest
env:
LEXEGA_LICENSE_KEY: ${{ secrets.LEXEGA_LICENSE_KEY }}
AZURE_STORAGE_ACCOUNT: mylexegastore
AZURE_STORAGE_AUTH_MODE: login
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # review diffs against the PR base
- name: Azure login
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Install Lexega
run: |
curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.11.0/lexega-sql-linux-x64
curl -sSL https://github.com/Lexega/releases/releases/download/v1.11.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
sudo install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql
- name: Review the change (PR gate)
if: github.event_name == 'pull_request'
run: |
lexega-sql review ${{ github.event.pull_request.base.sha }}..${{ github.sha }} . -r \
--policy .lexega/policy.yml --env prod \
--decision-out az://my-container/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out az://my-container/lexega-data/reports/$GITHUB_RUN_ID/
- name: Snapshot the repo (merges to main)
if: github.event_name == 'push'
run: |
lexega-sql analyze models/ -r \
--policy .lexega/policy.yml --env prod \
--decision-out az://my-container/lexega-data/decisions/$GITHUB_RUN_ID/ \
--report-out az://my-container/lexega-data/reports/$GITHUB_RUN_ID/
AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_SUBSCRIPTION_ID are identifiers, not secrets, but repository secrets are a convenient place to keep them out of the workflow file.
View the dashboard
From your machine, with the "Storage Blob Data Reader" role on the storage account:
az login
export AZURE_STORAGE_ACCOUNT=mylexegastore
export AZURE_STORAGE_AUTH_MODE=login
lexega-sql dashboard --data-dir az://my-container/lexega-data
Permissions Summary
CI only writes; the dashboard only reads. Grant each side the minimum:
| Provider | CI (writer) | Dashboard (reader) |
|---|---|---|
| S3 | s3:PutObject on the prefix | s3:GetObject + s3:ListBucket |
| GCS | roles/storage.objectAdmin on the bucket | roles/storage.objectViewer |
| Azure | Storage Blob Data Contributor | Storage Blob Data Reader |
Other CI Systems
Nothing above is GitHub-specific except the authentication step. The invariant is: any runner where the provider CLI is installed and authenticated can write Lexega artifacts. Swap $GITHUB_RUN_ID for your platform's run identifier ($CI_PIPELINE_ID on GitLab, $(Build.BuildId) on Azure DevOps).
GitLab CI example writing to S3. Store AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION as protected CI/CD variables — GitLab injects them as environment variables, which is exactly where the AWS CLI looks (prefer your platform's OIDC federation where you have it configured):
.lexega:
image:
name: amazon/aws-cli:latest
entrypoint: [""] # the image's entrypoint is `aws` itself; reset it so script lines run
variables:
LEXEGA_LICENSE_KEY: $LEXEGA_LICENSE_KEY
before_script:
- curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.11.0/lexega-sql-linux-x64
- curl -sSL https://github.com/Lexega/releases/releases/download/v1.11.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
- install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql
sql-review:
extends: .lexega
rules:
- if: $CI_MERGE_REQUEST_IID
variables:
GIT_DEPTH: "0" # review needs history back to the merge base
before_script:
- dnf install -y -q git # review shells out to git; the aws-cli image doesn't bundle it
- !reference [.lexega, before_script]
script:
- lexega-sql review $CI_MERGE_REQUEST_DIFF_BASE_SHA..HEAD . -r
--policy .lexega/policy.yml --env prod
--decision-out s3://my-bucket/lexega-data/decisions/$CI_PIPELINE_ID/
--report-out s3://my-bucket/lexega-data/reports/$CI_PIPELINE_ID/
sql-snapshot:
extends: .lexega
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
script:
- lexega-sql analyze models/ -r
--policy .lexega/policy.yml --env prod
--decision-out s3://my-bucket/lexega-data/decisions/$CI_PIPELINE_ID/
--report-out s3://my-bucket/lexega-data/reports/$CI_PIPELINE_ID/
To also post the review to the PR or MR itself (--pr-comment), or feed code scanning (SARIF, GitLab SAST), see Integration Options — both compose with cloud artifact publishing.
Need Help?
Can't find what you're looking for? Check out our GitHub or reach out to support.