RunReveal

Federated Search

Federated Search lets you query data that lives in an external S3-compatible bucket directly, without ingesting it into RunReveal first. It is a kind of Custom View — a federated view — that reads from your bucket via ClickHouse's s3() table function at query time instead of from the runreveal.logs table.

Use it to search cold archives, query large datasets you already store in object storage, or explore public datasets, all from the same Search Explorer and SQL detections you already use.

Getting Started: Navigate to Settings → Custom Views, click Create Custom View, and select External S3-compatible storage as the source.

Internal vs. Federated Views

Internal Custom ViewFederated View
Reads fromrunreveal.logs (ingested data)Your external S3-compatible bucket
IngestionRequires a source + ingestion pipelineNone — data stays in your bucket
Storage costStored in the workspace databaseStays in your object storage
Best forHot, frequently queried dataCold archives, large/occasional datasets, external data
Query latencyFast (local table)Depends on bucket size, format, and partitioning

How It Works

External bucket → Federated view → Search and detections

  1. Files stay in your bucket — NDJSON, CSV, TSV, Parquet, or ORC, with optional compression and path partitioning.
  2. RunReveal builds a federated view — ClickHouse's s3() reads matching objects at query time. NDJSON views use JSON path columns; structured formats use the file's native schema.
  3. You query the view — Use Search Explorer or SQL detections against the workspace-prefixed table name. No data is copied into RunReveal.
ComponentRole
S3-compatible bucketSource of truth for your data
Federated viewVirtual table over s3() with your column definitions
Search Explorer and SQL detectionsQuery surfaces, same as internal custom views

When you query a federated view, ClickHouse reads the matching objects from your bucket on demand, applies your column definitions, and returns rows — nothing is copied into RunReveal. Credentials are encrypted at rest and are never returned in API responses.

Availability

Federated Search is available on Pro and Enterprise plans. Like all custom views, it is not available for workspaces using Bring Your Own Database (BYODB). Contact RunReveal if you don't see the option.

Supported Providers

ProviderNotes
AWS S3Virtual-hosted-style addressing; region required
Cloudflare R2Requires an endpoint URL (e.g. https://{account}.r2.cloudflarestorage.com)
Google Cloud Storage (HMAC)Requires an endpoint URL and HMAC access keys
MinIORequires an endpoint URL; http is allowed for local development
Other S3-compatibleAny provider exposing an S3-compatible API; requires an endpoint URL

Supported Formats

FormatHow columns are defined
NDJSON (one JSON object per line)You define columns with JSON path expressions. The view exposes rawLog plus your extracted columns.
CSV with header row (CSVWithNames)Schema is read from the file. Columns are auto-populated on save.
TSV with header row (TSVWithNames)Schema is read from the file. Columns are auto-populated on save.
ParquetNative schema is read from the file. Columns are auto-populated on save.
ORCNative schema is read from the file. Columns are auto-populated on save.

JSON vs. structured formats: For NDJSON, you declare each column with a JSON path and a type (just like an internal custom view). For structured formats (CSV/TSV/Parquet/ORC), RunReveal runs a DESCRIBE against a sample of your files and fills the column list in for you using the file's own schema and native types.

Supported compression options: auto-detect (default), none, gzip, zstd, lz4, brotli, and xz.

Creating a Federated View

Start a new Custom View

Go to Settings → Custom Views and click Create Custom View.

Select external storage

In the Source dropdown, choose External S3-compatible storage. This reveals the bucket configuration section.

Custom View form with External S3-compatible storage selected as the source (click to open full size)

Configure the connection

Fill in the provider, bucket, and path settings (see Connection Settings below), then choose your authentication mode.

External S3-compatible storage configuration form (click to open full size)

Verify and preview

Use the preview/sample action to pull a handful of rows from your bucket. This confirms your connection works and the path matches files.

  • NDJSON: sample rows are used to suggest column mappings.
  • Structured formats: the file schema is read and your columns are populated automatically.

Add or adjust columns

For NDJSON, add columns using JSON paths and pick a type for each. For structured formats, review the auto-detected columns. See Column Configuration for supported types and JSON path syntax.

Save

Save the view. It becomes queryable in the Search Explorer under its workspace-prefixed name.

Connection Settings

SettingDescription
ProviderThe S3-compatible provider hosting your data.
BucketThe bucket name (required).
Endpoint URLRequired for R2, GCS, MinIO, and Other. Leave blank for AWS S3. Must use https (http is only permitted for MinIO).
RegionRequired for AWS S3 (e.g. us-east-1).
PrefixOptional key prefix to scope the search (e.g. logs/2024/).
Path globOptional glob to match files within the prefix (e.g. **/*.json).
FormatOne of NDJSON, CSV, TSV, Parquet, or ORC.
CompressionAuto-detect by default, or set explicitly.

Authentication

ModeProvidersRequired credentials
Access keysAll providersAccess key ID + secret access key (optional session token). For GCS, use HMAC keys.
IAM Role ARNAWS S3 onlyA role ARN that ClickHouse Cloud assumes (recommended). Requires a trust relationship — see IAM Role Assumption. Requires ClickHouse 25.8+.
Anonymous / public bucket (NOSIGN)All providersNone — for public buckets that allow unauthenticated reads (e.g. open data registries).

Credentials are encrypted with your workspace key before they are stored, and they are never included in API responses.

Prefer IAM Role ARN for AWS S3. Role assumption avoids storing long-lived access keys: ClickHouse Cloud assumes a role you control to obtain temporary credentials, and you can revoke access at any time by editing the role. See IAM Role Assumption below.

IAM Role Assumption (AWS S3)

For AWS S3 buckets we recommend authenticating with an IAM Role ARN instead of access keys. RunReveal runs Federated Search queries on ClickHouse Cloud, which uses role chaining to assume a role you create in your AWS account. You grant that role read access to your bucket and add a trust relationship that allows ClickHouse Cloud's service principal to assume it. No long-lived credentials ever leave your account.

Create an IAM role with a trust policy

In your AWS account, create a new IAM role with the following trust policy. This authorizes RunReveal's ClickHouse Cloud service principal to assume the role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::426924874929:role/CH-S3-indigo-xo-42-ue2-29-Role"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The principal above is the service role used by RunReveal's production ClickHouse Cloud instance (account 426924874929 is ClickHouse Cloud's AWS account). Use it for workspaces on app.runreveal.com. If you run RunReveal in a dedicated or self-hosted environment, contact RunReveal for the principal that applies to your deployment.

Attach a permissions policy for your bucket

Attach a policy to the role that grants read access to the bucket (and prefix) you want to query. Replace YOUR_BUCKET with your bucket name:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::YOUR_BUCKET/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::YOUR_BUCKET"
    }
  ]
}

To scope access to a single prefix, add a Condition with s3:prefix on the ListBucket statement and narrow the GetObject resource to arn:aws:s3:::YOUR_BUCKET/logs/*.

Use the role ARN in your federated view

Choose IAM Role ARN as the authentication mode when creating or editing the federated view, and paste the ARN of the role you just created (for example arn:aws:iam::123456789012:role/runreveal-federated-read). RunReveal stores only this ARN — no secret keys — and ClickHouse Cloud assumes the role each time it reads your bucket.

IAM Role ARN authentication requires ClickHouse 25.8+, which is always available on ClickHouse Cloud. It is supported for AWS S3 only — R2, GCS, and other providers must use access keys.

Partitioning

If your bucket is organized into key=value/ path segments (for example date=2024-01-01/hour=12/), you can prune the files ClickHouse scans so queries only touch the objects they need. There are two mutually exclusive approaches:

Hive partitioning

Enable Use Hive partitioning when your paths follow a key=value/ layout. ClickHouse auto-discovers the partition columns from the file path, exposes them with native types, and prunes unmatched files when you filter on them.

Example layout:

s3://my-bucket/logs/date=2024-01-01/hour=12/events.parquet

Filtering WHERE date = '2024-01-01' skips downloading files in other partitions.

Hive partitioning and manual partition columns are mutually exclusive. Enabling Hive partitioning auto-discovers the columns from the path; setting them manually as well produces duplicate columns and is rejected at save time.

Querying a Federated View

Federated views follow the same naming convention as internal views: {workspace_name}_{your_view_name}. Query them like any other table.

-- Search archived events in your bucket
SELECT
  receivedAt,
  actor_email,
  src_ip,
  event_name
FROM workspace_archived_auth_logs
WHERE event_name = 'login_failed'
ORDER BY receivedAt DESC
LIMIT 100;

For partitioned data, filter on a partition column to reduce the number of files scanned:

SELECT *
FROM workspace_archived_auth_logs
WHERE date = '2024-01-01'
LIMIT 100;

You can also use federated views in SQL detections, just like internal custom views. See Using Custom Views in Detections.

Use Cases

  • Cold archive search: keep older logs in cheap object storage and query them on demand without re-ingesting.
  • Bring your own data lake: query data you already export to S3/R2/GCS from other systems.
  • Public datasets: point an anonymous (NOSIGN) view at a public bucket to explore open data.
  • Cost control: avoid storing rarely queried data in the workspace database while keeping it searchable.

Limitations

  • Available on Pro and Enterprise plans only. Like all custom views, it is not available for BYODB workspaces.
  • Query performance depends on bucket size, file format, compression, and partitioning. Large unpartitioned scans can be slow and costly — prefer columnar formats (Parquet/ORC) and partition pruning for big datasets.
  • Endpoints must use https (except MinIO, which may use http for local development).
  • Federated views cannot be used in Sigma streaming detections (the same limitation as internal custom views).

Next Steps

  • Custom Views: column configuration, JSON paths, and querying basics.
  • Search: explore your views in the Search Explorer.
  • Detections: alert on federated view data with SQL detections.
  • Destinations: route ingested events to object storage you can later search with federated views.