Cloud Storage Pipelines

Name: Lexega
Author: Lexega

CI runners are ephemeral: anything written to the runner's disk disappears when the job ends. If your pipeline writes decision and report artifacts to a local .lexega/ directory, there is nothing left for the dashboard to read once the job completes.

The fix is to write artifacts to cloud storage instead. The artifact flags — --decision-out and --report-out on ci/analyze/diff/review, and --data-dir on dashboard — accept an s3://, gs://, or az:// URI in place of a local path. So do the config inputs (--policy, --exceptions, --custom-rules, catalog snapshots), which additionally accept https:// URLs for pre-signed or public endpoints. Only the SQL paths themselves are local.

CI run 1 ──▶ s3://my-bucket/lexega-data/decisions/<run-1>/
CI run 2 ──▶ s3://my-bucket/lexega-data/decisions/<run-2>/   ──▶  lexega-sql dashboard
CI run 3 ──▶ s3://my-bucket/lexega-data/decisions/<run-3>/        --data-dir s3://my-bucket/lexega-data

Each run appends to the same bucket prefix; the dashboard reads the whole history from your machine. This page walks through the full setup for each provider — bucket creation, CI authentication, the workflow itself, and reading it back.

Two Streams, One Bucket

A complete history takes two kinds of runs, and they answer different questions:

PR runs (review) gate each change before it merges. Their decision artifacts are the record of what was caught — including merges that were blocked and therefore never reach main. If artifacts are only published from main, every catch is lost with the runner that found it.
Mainline runs (analyze) snapshot the repo's full SQL after each merge — what's actually live, tracked over time.

Both streams write to the same bucket prefix. Artifacts record which kind of run produced them, and the dashboard keeps the two series separate — blocked/allowed rates come from PR runs, posture trends from mainline snapshots. You don't pick the mode yourself: the workflows below run one lexega-sql ci step under two triggers, and ci selects it from the pipeline event — pull requests get a change review against the detected base, pushes to main get a repository snapshot. (It prints which mode it chose; --snapshot / --range override.)

One caveat for public repos: pull requests from forks don't receive OIDC tokens or secrets, so the PR stream assumes branches in the same repo — the norm for internal data platform repos.

The identity that powers this grouping — repository, CI run, PR number, and commit — is detected automatically from the CI environment on GitHub Actions, GitLab CI, Azure DevOps, and Bitbucket; outside CI, pass --repo, --run-id, --change-id, and --commit (the repository also falls back to the checkout's git remote). Artifacts written by older Lexega versions carry no run scope and count in both views until those pipelines upgrade.

How It Works

Lexega delegates cloud I/O to the provider's own CLI:

URI scheme	Tool invoked	Example
`s3://bucket/path`	`aws s3 cp`	`s3://my-bucket/lexega-data/decisions/123/`
`gs://bucket/path`	`gsutil cp`	`gs://my-bucket/lexega-data/decisions/123/`
`az://container/path`	`az storage blob`	`az://my-container/lexega-data/decisions/123/`

If you can run aws s3 cp (or gsutil cp, or az storage blob upload) in a shell on the runner, Lexega can write artifacts there — no separate credential configuration. GitHub-hosted Ubuntu runners preinstall the AWS and Azure CLIs; the Google Cloud CLI was dropped from the ubuntu-24.04 image (today's ubuntu-latest), so the GCS workflow below installs it with setup-gcloud. On self-hosted or container-based runners, install the CLI you need.

For az:// URIs the storage account is not part of the URI. The Azure CLI resolves it from environment variables: set AZURE_STORAGE_ACCOUNT (account name) and either AZURE_STORAGE_AUTH_MODE=login (use the signed-in identity) or AZURE_STORAGE_KEY / AZURE_STORAGE_CONNECTION_STRING.

All three providers use the same directory contract the dashboard expects:

<prefix>/
  decisions/<run-id>/decision.json
  reports/<run-id>/risk_report.json

Use a unique per-run directory (e.g. $GITHUB_RUN_ID) so each pipeline run produces distinct artifacts. A policy block exits with code 2, which fails the CI job — no extra gating logic needed. Artifacts are uploaded before the exit code is decided, so blocked runs appear in the dashboard too (those are usually the ones you most want to see).

One thing to set before turning a policy from warn to block under a dual-trigger workflow: scope the blocking entries with run_scopes: [change, runtime], so pull requests are gated but the post-merge snapshot records standing findings without turning every push to main red. See Gate Changes, Record Snapshots.

The workflows below assume a committed .lexega/policy.yml (created by lexega-sql init — see Quick Start). ci scans the whole tree for SQL by default; pass paths before the options to narrow it (lexega-sql ci models/ --policy …).

Centralized Policy (optional)

Because --policy resolves URIs the same way, the policy doesn't have to live in the repo at all. Host it in the bucket and every pipeline pulls the same gate at run time — useful when a security team owns the policy and application repos shouldn't be able to edit their own enforcement:

lexega-sql ci \
  --policy s3://my-bucket/lexega-data/policy.yml --env prod \
  --decision-out s3://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
  --report-out s3://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/

If you do this, grant the CI identity read access to the policy object (e.g. s3:GetObject on my-bucket/lexega-data/policy.yml) — the per-provider setups below are deliberately write-only. A https:// pre-signed URL works as well.

AWS S3

One-time setup

Create the bucket and a role that GitHub Actions can assume via OIDC (no long-lived keys stored in CI):

aws s3 mb s3://my-bucket

# Register GitHub as an OIDC provider (skip if your account already has it)
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com

Create a role lexega-ci-writer that your repo's workflows can assume (replace account ID, org, and repo):

cat > trust.json <<'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com" },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
      "StringLike": { "token.actions.githubusercontent.com:sub": "repo:my-org/my-repo:*" }
    }
  }]
}
EOF
aws iam create-role --role-name lexega-ci-writer \
  --assume-role-policy-document file://trust.json

# CI only needs to write
cat > write.json <<'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": "s3:PutObject",
    "Resource": "arn:aws:s3:::my-bucket/lexega-data/*"
  }]
}
EOF
aws iam put-role-policy --role-name lexega-ci-writer \
  --policy-name lexega-artifact-write --policy-document file://write.json

Workflow

name: SQL Governance
on:
  pull_request:
  push:
    branches: [main]

permissions:
  id-token: write   # required for OIDC
  contents: read

jobs:
  lexega:
    runs-on: ubuntu-latest
    env:
      LEXEGA_LICENSE_KEY: ${{ secrets.LEXEGA_LICENSE_KEY }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # review diffs against the PR base

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/lexega-ci-writer
          aws-region: us-east-1

      - name: Install Lexega
        run: |
          curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.23.0/lexega-sql-linux-x64
          curl -sSL https://github.com/Lexega/releases/releases/download/v1.23.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
          sudo install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql

      - name: SQL governance gate
        run: |
          lexega-sql ci \
            --policy .lexega/policy.yml --env prod \
            --decision-out s3://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
            --report-out s3://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/

View the dashboard

From your machine, with read access to the bucket (s3:GetObject + s3:ListBucket):

aws sso login   # or any other aws auth method
lexega-sql dashboard --data-dir s3://my-bucket/lexega-data

Google Cloud Storage

One-time setup

Create the bucket, a service account, and a Workload Identity Federation pool so GitHub Actions can authenticate without exported keys (replace PROJECT_ID, PROJECT_NUMBER, my-org/my-repo):

gsutil mb gs://my-bucket

gcloud iam service-accounts create lexega-ci
# objectCreator is write-only — runs use unique per-run paths, so CI never
# needs to read or overwrite existing artifacts
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
  --member="serviceAccount:lexega-ci@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectCreator"

gcloud iam workload-identity-pools create github --location=global
gcloud iam workload-identity-pools providers create-oidc github-actions \
  --location=global --workload-identity-pool=github \
  --issuer-uri="https://token.actions.githubusercontent.com" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --attribute-condition="assertion.repository=='my-org/my-repo'"

gcloud iam service-accounts add-iam-policy-binding \
  lexega-ci@PROJECT_ID.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github/attribute.repository/my-org/my-repo"

Workflow

name: SQL Governance
on:
  pull_request:
  push:
    branches: [main]

permissions:
  id-token: write   # required for Workload Identity Federation
  contents: read

jobs:
  lexega:
    runs-on: ubuntu-latest
    env:
      LEXEGA_LICENSE_KEY: ${{ secrets.LEXEGA_LICENSE_KEY }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # review diffs against the PR base

      - name: Authenticate to Google Cloud
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github/providers/github-actions
          service_account: lexega-ci@PROJECT_ID.iam.gserviceaccount.com

      - name: Set up Cloud SDK
        uses: google-github-actions/setup-gcloud@v2

      - name: Install Lexega
        run: |
          curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.23.0/lexega-sql-linux-x64
          curl -sSL https://github.com/Lexega/releases/releases/download/v1.23.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
          sudo install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql

      - name: SQL governance gate
        run: |
          lexega-sql ci \
            --policy .lexega/policy.yml --env prod \
            --decision-out gs://my-bucket/lexega-data/decisions/$GITHUB_RUN_ID/ \
            --report-out gs://my-bucket/lexega-data/reports/$GITHUB_RUN_ID/

View the dashboard

From your machine, with roles/storage.objectViewer on the bucket:

gcloud auth login
lexega-sql dashboard --data-dir gs://my-bucket/lexega-data

Azure Blob Storage

One-time setup

Create a storage account and container, then an app registration with a federated credential so GitHub Actions can sign in via OIDC:

az group create -n lexega-rg -l eastus
az storage account create -n mylexegastore -g lexega-rg   # account names are globally unique
# --auth-mode key for the one-time setup: subscription Owner/Contributor can list
# account keys, but does NOT implicitly hold data-plane roles, so --auth-mode login
# fails here unless you've granted yourself a Storage Blob Data role
az storage container create -n my-container \
  --account-name mylexegastore --auth-mode key

# App registration + service principal for CI
APP_ID=$(az ad app create --display-name lexega-ci --query appId -o tsv)
az ad sp create --id $APP_ID

# Allow it to write blobs (Contributor includes read; use it for CI)
az role assignment create --assignee $APP_ID \
  --role "Storage Blob Data Contributor" \
  --scope $(az storage account show -n mylexegastore -g lexega-rg --query id -o tsv)

# Trust GitHub Actions OIDC tokens from your repo. Federated credentials
# match the OIDC subject exactly, so the two triggers need one credential each:
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:my-org/my-repo:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-pr",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:my-org/my-repo:pull_request",
  "audiences": ["api://AzureADTokenExchange"]
}'

(The AWS and GCS setups need no equivalent second step — their trust conditions match on the whole repo, which covers both triggers.)

Workflow

az:// URIs name only the container and blob path; the storage account and auth mode come from environment variables:

name: SQL Governance
on:
  pull_request:
  push:
    branches: [main]

permissions:
  id-token: write   # required for OIDC
  contents: read

jobs:
  lexega:
    runs-on: ubuntu-latest
    env:
      LEXEGA_LICENSE_KEY: ${{ secrets.LEXEGA_LICENSE_KEY }}
      AZURE_STORAGE_ACCOUNT: mylexegastore
      AZURE_STORAGE_AUTH_MODE: login
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # review diffs against the PR base

      - name: Azure login
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Install Lexega
        run: |
          curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.23.0/lexega-sql-linux-x64
          curl -sSL https://github.com/Lexega/releases/releases/download/v1.23.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
          sudo install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql

      - name: SQL governance gate
        run: |
          lexega-sql ci \
            --policy .lexega/policy.yml --env prod \
            --decision-out az://my-container/lexega-data/decisions/$GITHUB_RUN_ID/ \
            --report-out az://my-container/lexega-data/reports/$GITHUB_RUN_ID/

AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_SUBSCRIPTION_ID are identifiers, not secrets, but repository secrets are a convenient place to keep them out of the workflow file.

View the dashboard

From your machine, with the "Storage Blob Data Reader" role on the storage account:

az login
export AZURE_STORAGE_ACCOUNT=mylexegastore
export AZURE_STORAGE_AUTH_MODE=login
lexega-sql dashboard --data-dir az://my-container/lexega-data

Permissions Summary

CI only writes; the dashboard only reads. Grant each side the minimum:

Provider	CI (writer)	Dashboard (reader)
S3	`s3:PutObject` on the prefix	`s3:GetObject` + `s3:ListBucket`
GCS	`roles/storage.objectCreator` on the bucket	`roles/storage.objectViewer`
Azure	Storage Blob Data Contributor	Storage Blob Data Reader

Other CI Systems

Nothing above is GitHub-specific except the authentication step. The invariant is: any runner where the provider CLI is installed and authenticated can write Lexega artifacts. Swap $GITHUB_RUN_ID for your platform's run identifier ($CI_PIPELINE_ID on GitLab, $(Build.BuildId) on Azure DevOps).

GitLab CI example writing to S3. Store AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION as protected CI/CD variables — GitLab injects them as environment variables, which is exactly where the AWS CLI looks (prefer your platform's OIDC federation where you have it configured):

sql-governance:
  image:
    name: amazon/aws-cli:latest
    entrypoint: [""]   # the image's entrypoint is `aws` itself; reset it so script lines run
  rules:
    - if: $CI_MERGE_REQUEST_IID                        # change review
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH      # snapshot
  variables:
    LEXEGA_LICENSE_KEY: $LEXEGA_LICENSE_KEY
    GIT_DEPTH: "0"   # MR reviews need history back to the merge base
  before_script:
    - dnf install -y -q git   # reviews shell out to git; the aws-cli image doesn't bundle it
    - curl -sSL -O https://github.com/Lexega/releases/releases/download/v1.23.0/lexega-sql-linux-x64
    - curl -sSL https://github.com/Lexega/releases/releases/download/v1.23.0/CHECKSUMS.sha256 | grep ' lexega-sql-linux-x64#x27; | sha256sum -c -
    - install -m 755 lexega-sql-linux-x64 /usr/local/bin/lexega-sql
  script:
    - lexega-sql ci
        --policy .lexega/policy.yml --env prod
        --decision-out s3://my-bucket/lexega-data/decisions/$CI_PIPELINE_ID/
        --report-out s3://my-bucket/lexega-data/reports/$CI_PIPELINE_ID/

To also post the review to the PR or MR itself (--pr-comment), or feed code scanning (SARIF, GitLab SAST), see Integration Options — both compose with cloud artifact publishing.

Cloud Storage Pipelines

Two Streams, One Bucket

How It Works

Centralized Policy (optional)

AWS S3

One-time setup

Workflow

View the dashboard

Google Cloud Storage

One-time setup

Workflow

View the dashboard

Azure Blob Storage

One-time setup

Workflow

View the dashboard

Permissions Summary

Other CI Systems

Need Help?