Sampling
Sampling reduces log volume by keeping only a percentage of matching events. This is useful for cost optimization on high-volume, low-value log sources while maintaining statistical visibility into patterns and trends.
Sampling is configured as a pipeline step. Navigate to Pipelines → Edit your pipeline → Add a Sample step.
When to Use Sampling
Sampling is ideal when you need to:
- Reduce storage costs for high-volume sources without losing visibility
- Maintain statistical patterns while reducing event count
- Optimize performance for sources generating millions of events daily
- Keep representative data for trend analysis and capacity planning
Not for Security-Critical Events: Don’t sample events that require full fidelity for security investigations (authentication failures, privilege escalation, data access). Use sampling only for operational or informational logs.
How Sampling Works
When you add a Sample step to a pipeline:
- Events matching the step’s preconditions are evaluated
- A percentage of matching events (e.g., 10%) are kept
- The remaining events are discarded before reaching storage
- Non-matching events pass through unaffected
Configuration
| Setting | Description | Example |
|---|---|---|
| Sample Rate | Percentage of events to keep (1-99%) | 10% |
| Preconditions | Which events the sampling applies to | eventName regex: ^Get.* |
Use Cases
Sample AWS S3 Data Events
S3 data events (GetObject, PutObject, ListObjects) generate extremely high volumes in active environments. Sample 10% to reduce costs while maintaining visibility into access patterns.
{
"eventName": "GetObject",
"eventSource": "s3.amazonaws.com",
"awsRegion": "us-east-1",
"userIdentity": {
"arn": "arn:aws:iam::123456789012:role/data-pipeline"
},
"requestParameters": {
"bucketName": "company-data-lake",
"key": "logs/2024/01/15/events.json"
}
}Cost Impact
Sample CloudFlare HTTP Requests
CDN and web traffic logs can generate millions of events daily. Sample successful requests while keeping all errors at full fidelity for debugging.
{
"ClientRequestHost": "api.company.com",
"ClientRequestMethod": "GET",
"ClientRequestURI": "/v1/users",
"EdgeResponseStatus": 200,
"ClientIP": "203.0.113.50",
"RayID": "8a1b2c3d4e5f6g7h"
}By using a precondition that only matches successful (200) responses, all error responses (4xx, 5xx) are kept at full fidelity for debugging and security analysis.
Sample VPC Flow Logs
VPC Flow Logs capture all network traffic metadata, generating massive volumes. Sample accepted traffic while keeping all rejected connections for security analysis.
{
"version": 2,
"account-id": "123456789012",
"interface-id": "eni-0123456789abcdef0",
"srcaddr": "10.0.1.100",
"dstaddr": "10.0.2.50",
"srcport": 443,
"dstport": 49152,
"protocol": 6,
"packets": 10,
"bytes": 840,
"action": "ACCEPT"
}What’s Preserved
Best Practices
- Start with higher sample rates (e.g., 25%) and reduce gradually based on your analysis needs
- Use preconditions to sample only specific event types, keeping security-relevant events at full fidelity
- Combine with enrichments to tag events before sampling decisions
- Monitor your dashboards after enabling sampling to ensure statistical patterns remain visible
- Document your sampling strategy so analysts know which data is sampled
Sample vs Filter vs Drop
| Action | What Happens | When to Use |
|---|---|---|
| Sample | Keeps a percentage of events randomly | High-volume operational logs where statistical patterns are sufficient |
| Filter | Archives events (still searchable) | Events you rarely need but must retain for compliance |
| Drop | Permanently removes events | Events with zero value that should never be stored |