Skip to main content

Benchmarks

Synthetic Data

Methodology

We prepared a set of benchmarks designed to be as reproducible and deterministic as possible.
The dataset used during benchmarks is deterministic and consists of 40GiB (219 million) structured JSON logs.
Example log:

{
"@timestamp": 897484581,
"clientip": "162.146.3.0",
"request": "GET /images/logo_cfo.gif HTTP/1.1",
"status": 200,
"size": 1504
}

Tests were run on an AWS host c6a.4xlarge, with the following configuration:

CPURAMDisk
AMD EPYC 7R13 Processor 3.6GHz, 16vCPU32GiBGP3

For more details on component setup and how to run the suite, see the link.

Local cluster configuration:

ContainerReplicasCPU LimitRAM Limit
seq‑db (--mode single)148GiB
elasticsearch148GiB
file.d112GiB

Results (write‑path)

In the synthetic tests we obtained the following results:

ContainerAvg Logs/secAvg ThroughputAvg CPU UsageAvg RAM Usage
seq‑db370,00048MiB/s3.3vCPU1.8GiB
elasticsearch110,00014MiB/s1.9vCPU2.4GiB

Thus, with comparable resource usage, seq‑db demonstrated on average 3.4× higher throughput than Elasticsearch.

Results (read‑path)

Both stores were pre-loaded with the same dataset. Read-path tests were run without any write load.

Elasticsearch settings:

  • Request cache disabled (request_cache=false)
  • Total hits counting disabled (track_total_hits=false)

Tests were executed using Grafana k6, with query parameters available in the benchmarks/k6 folder.

Scenario: fetch all logs using offsets

ES enforces default limit page_size * offset ≤ 10,000.

Parameters: 20 looping virtual users for 10s, fetching a random page [1–50].

DBAvgP50P95
seq‑db5.56ms5.05ms9.56ms
elasticsearch6.06ms5.11ms11.8ms

Scenario status: in(500,400,403)

Parameters: 20 VUs for 10s.

DBAvgP50P95
seq‑db364.68ms356.96ms472.26ms
elasticsearch21.68ms18.91ms29.84ms

Scenario request: GET /english/images/top_stories.gif HTTP/1.0

Parameters: 20 looping VUs for 10s.

DBAvgP50P95
seq‑db269.98ms213.43ms704.19ms
elasticsearch46.65ms43.27ms80.53ms

Scenario: aggregation counting logs by status

SQL analogue: SELECT status, COUNT(*) GROUP BY status.
Parameters: 10 parallel queries, 2 VUs.

DBAvgP50P95
seq‑db16.81s16.88s16.10s
elasticsearch6.46s6.44s6.57s

Scenario: minimum log size for each status

SQL analogue: SELECT status, MIN(size) GROUP BY status.
Parameters: 5 iterations with 1 thread.

DBAvgP50P95
seq‑db33.34s33.41s33.93s
elasticsearch16.88s16.82s17.5s

Scenario: range queries — fetch 5,000 documents

Parameters: 20 threads, 10s, random page [1–50], 100 documents per page.

DBAvgP50P95
seq‑db406.09ms385.13ms509.05ms
elasticsearch22.75ms18.06ms64.61ms

Real (production) Data

Methodology

In addition to synthetic tests, we also seq‑db and Elasticsearch on real logs from our production services. We prepared several benchmark scenarios showing performance for a single instance and for a medium‑sized cluster:

  • Throughput test of a single instance of seq‑db and Elasticsearch (1x1 configuration)
  • Throughput test of 6 instances of seq‑db and Elasticsearch with RF=2 (6x6 configuration)

Real production log datasets (~280GiB total) were pre-processed to minimize CPU use during ingestion via file.d to seq‑db or Elasticsearch. This ensured determinism and independence from delivery systems (e.g., Apache Kafka).

The high level write pipeline:

┌──────────────┐        ┌───────────┐  
│ ┌──────┐ │ │ │
│ │ file ├──┐ │ ┌──►│ elastic │
│ └──────┘ │ │ │ │ │
│ ┌─────────▼┐ │ │ └───────────┘
│ │ ├─┼────┘ ┌───────────┐
│ │ file.d │ │ │ │
│ │ ├─┼───────►│ seq-db │
│ └──────────┘ │ │ │
└──────────────┘ └───────────┘

Cluster configuration:

ContainerCPURAMDisk
seq‑db (--mode store)Xeon Gold 6240R @ 2.40GHzDDR4 3200MHzRAID10, 4×SSD
seq‑db (--mode ingestor)Xeon Gold 6240R @ 2.40GHzDDR4 3200MHz
elasticsearch (master/data)Xeon Gold 6240R @ 2.40GHzDDR4 3200MHzRAID10, 4×SSD
file.dXeon Gold 6240R @ 2.40GHzDDR4 3200MHz

We selected a baseline set of fields to index. Elasticsearch was set up with index k8s-logs-index to index only those fields.

Configuration 1x1

Index settings (same applied to seq‑db):

curl -X PUT "http://localhost:9200/k8s-logs-index/" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": { "codec": "best_compression" },
"number_of_replicas": 0,
"number_of_shards": 6,
},
"mappings": {
"dynamic": "false",
"properties": {
"k8s_cluster": { "type": "keyword" },
"k8s_container": { "type": "keyword" },
"k8s_group": { "type": "keyword" },
"k8s_label_jobid": { "type": "keyword" },
"k8s_namespace": { "type": "keyword" },
"k8s_node": { "type": "keyword" },
"k8s_pod": { "type": "keyword" },
"k8s_pod_label_cron": { "type": "keyword" },
"client_ip": { "type": "keyword" },
"http_code": { "type": "integer" },
"http_method": { "type": "keyword" },
"message": { "type": "text" }
}
}
}'
Results
ContainerCPU LimitRAM LimitAvg CPUAvg RAM
seq‑db (--mode store)816GiB6.57GiB
seq‑db (--mode proxy)88GiB73GiB
elasticsearch (master)24GiB00GiB
elasticsearch (data)1632GiB15.830GiB
ContainerAvg ThroughputLogs/sec
seq‑db520MiB/s162,000
elasticsearch195MiB/s62,000

Here, seq‑db achieved ~2.6x higher throughput under similar resource constraints.

Configuration 6x6

Six seq‑db nodes in --mode proxy and six in --mode store. Elasticsearch indexing settings stayed the same except number_of_replicas=1:

curl -X PUT "http://localhost:9200/k8s-logs-index/" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": { "codec": "best_compression" },
"number_of_replicas": 1,
"number_of_shards": 6,
},
"mappings": {
"dynamic": "false",
"properties": {
"k8s_cluster": { "type": "keyword" },
"k8s_container": { "type": "keyword" },
"k8s_group": { "type": "keyword" },
"k8s_label_jobid": { "type": "keyword" },
"k8s_namespace": { "type": "keyword" },
"k8s_node": { "type": "keyword" },
"k8s_pod": { "type": "keyword" },
"k8s_pod_label_cron": { "type": "keyword" },
"client_ip": { "type": "keyword" },
"http_code": { "type": "integer" },
"http_method": { "type": "keyword" },
"message": { "type": "text" }
}
}
}'

We have also tweaked the index.merge.scheduler.max_thread_count to increase the bulk throughput.

Results
ContainerCPU LimitRAM LimitReplicasAvg CPU (per instance)Avg RAM (per instance)
seq‑db (--mode proxy)58GiB63.61.5GiB
seq‑db (--mode store)816GiB66.16.3GiB
elasticsearch (data)1332GiB64.513GiB
ContainerAvg ThroughputLogs/sec
seq‑db1.3GiB/s585,139 docs/s
elasticsearch113.58MiB/s37,658 docs/s