Data Model
RunReveal’s data model is simple and allows for bring your own database customers to extend the tables that RunReveal provides to create things like:
- Custom views for internal or custom log sources.
- Custom schemas to perform enrichment or normalization across
All logs that RunReveal receives we normalize into a consistent schema. This schema is built to be flexible and extensible.
What’s in a log?
The complete log schema is available here below. This is accessible in the
product by running describe logs
as a query.
$ describe logs
workspaceID String
sourceID String
sourceType LowCardinality(String)
sourceTTL UInt32
receivedAt DateTime
id String
eventTime DateTime
eventName String
eventID String
srcIP String
srcASCountryCode String
srcASNumber UInt32
srcASOrganization String
srcCity String
srcConnectionType String
srcISP String
srcLatitude Float64
srcLongitude Float64
srcUserType String
dstIP String
dstASCountryCode String
dstASNumber UInt32
dstASOrganization String
dstCity String
dstConnectionType String
dstISP String
dstLatitude Float64
dstLongitude Float64
dstUserType String
actor Map(String, String)
tags Map(String, String)
resources Array(String)
serviceName String
enrichments Array(Tuple(data Map(String, String), name String, provider String, type String, value String))
readOnly Bool
rawLog String
There are a lot of important fields and nuances to this schema that is worth covering. Some of the important bits are
rawLog
The rawLog
column contains the raw JSON (or otherwise) as a string. This is
an incredibly useful column because it allows us to build views, materialized
views, or otherwise on top of the logs table for custom use cases.
The other columns are populated by each of our sources as soon as the logs are ingested. When a log arrives at a source the following steps occur.
- The log is parsed into a structured object.
- The relevant fields are pulled out of the log and filled into a normalized log
- The normalized log contains a
rawLog
field, which is set with the original log line that was parsed. - The normalized log performs automatic enrichments (srcASCountryCode, dstISP, etc).
By default all of our log sources will automatically do this (except for generic log sources). Additionally, however, our transforms feature can be used to move data around further within our normalized schema, or the rawLog fields.
Building new views
Within your database it’s possible to build custom views that extend logs within
RunReveal. Here’s the most simple example of creating a new view called co_example_logs
with a new column called extra_field
CREATE OR REPLACE VIEW runreveal.co_example_logs AS
SELECT
*,
JSONExtractString(rawLog, 'extra_field') as `extra_field`
FROM runreveal.logs
WHERE sourceID = '2z3ykgYgk5fats6m2S9iKrVqbFk'
;
One thing to note, that’s implicit in select * from logs
is that
it’s critical to include the receivedAt
column in tables and views.
We highly recommend that all your tables include this column because
it’s one of our most important indexes on the table. This will ensure
that your queries remain quick. It’s also the way our UI partitions
and loads data by time. To make sure that a table or view is compatible
with our UI it’s required to have the receivedAt
column.
Other tables
There are other important tables to make note of in RunReveal.
detections
- Holds all findings from sigma and query based detectionsscheduled_query_runs
- Information on scheduled query. Shows runs and results.runreveal_source_volumes
- Counts of how much data was received.
Each of these tables can also be extended, but many of them will have different
primary keys and indexes. If you’re curious about the details you can describe
each one of these and examine how they work!