Benchmarks
Methodology
We prepared a set of benchmarks designed to be as reproducible and deterministic as possible.
The dataset used during benchmarks is deterministic and consists of 40GiB (219 million) structured JSON logs.
Example log:
{
"@timestamp": 897484581,
"clientip": "162.146.3.0",
"request": "GET /images/logo_cfo.gif HTTP/1.1",
"status": 200,
"size": 1504
}
All benchmarks were run against this dataset. The only thing that was changing - computational resources and cluster size.
Local Deploy
Tests were run on an AWS host c6a.4xlarge, with the following configuration:
| CPU | RAM | Disk |
|---|---|---|
| AMD EPYC 7R13 Processor 3.6GHz, 16vCPU | 32GiB | GP3 |
For more details on component setup and how to run the suite, see the link.
Local cluster configuration:
| Container | Replicas | CPU Limit | RAM Limit |
|---|---|---|---|
seq‑db (--mode single) | 1 | 4 | 8GiB |
| elasticsearch | 1 | 4 | 8GiB |
| file.d | 1 | – | 12GiB |
Results (write‑path)
In the synthetic tests we obtained the following results:
| Container | Avg. Logs/sec | Avg. Throughput | Avg. CPU Usage | Avg. RAM Usage |
|---|---|---|---|---|
| seq‑db | 370,000 | 48MiB/s | 3.3vCPU | 1.8GiB |
| elasticsearch | 110,000 | 14MiB/s | 1.9vCPU | 2.4GiB |
Thus, with comparable resource usage, seq‑db demonstrated on average 3.4× higher throughput than Elasticsearch.
Results (read‑path)
Both stores were pre-loaded with the same dataset. Read-path tests were run without any write load.
Elasticsearch settings:
- Request cache disabled (
request_cache=false) - Total hits counting disabled (
track_total_hits=false)
Tests were executed using Grafana k6, with query parameters available in the benchmarks/k6 folder.
Scenario: fetch all logs using offsets
ES enforces default limit page_size * offset ≤ 10,000.
Parameters: 20 looping virtual users for 10s, fetching a random page [1–50].
| DB | Avg | P50 | P95 |
|---|---|---|---|
| seq‑db | 5.56ms | 5.05ms | 9.56ms |
| elasticsearch | 6.06ms | 5.11ms | 11.8ms |
Scenario status: in(500,400,403)
Parameters: 20 VUs for 10s.
| DB | Avg | P50 | P95 |
|---|---|---|---|
| seq‑db | 364.68ms | 356.96ms | 472.26ms |
| elasticsearch | 21.68ms | 18.91ms | 29.84ms |
Scenario request: GET /english/images/top_stories.gif HTTP/1.0
Parameters: 20 looping VUs for 10s.
| DB | Avg | P50 | P95 |
|---|---|---|---|
| seq‑db | 269.98ms | 213.43ms | 704.19ms |
| elasticsearch | 46.65ms | 43.27ms | 80.53ms |
Scenario: aggregation counting logs by status
SQL analogue: SELECT status, COUNT(*) GROUP BY status.
Parameters: 10 parallel queries, 2 VUs.
| DB | Avg | P50 | P95 |
|---|---|---|---|
| seq‑db | 16.81s | 16.88s | 16.10s |
| elasticsearch | 6.46s | 6.44s | 6.57s |
Scenario: minimum log size for each status
SQL analogue: SELECT status, MIN(size) GROUP BY status.
Parameters: 5 iterations with 1 thread.
| DB | Avg | P50 | P95 |
|---|---|---|---|
| seq‑db | 33.34s | 33.41s | 33.93s |
| elasticsearch | 16.88s | 16.82s | 17.5s |
Scenario: range queries — fetch 5,000 documents
Parameters: 20 threads, 10s, random page [1–50], 100 documents per page.
| DB | Avg | P50 | P95 |
|---|---|---|---|
| seq‑db | 406.09ms | 385.13ms | 509.05ms |
| elasticsearch | 22.75ms | 18.06ms | 64.61ms |
K8S Deploy
Cluster computation resources description:
| Container | CPU | RAM | Disk |
|---|---|---|---|
seq‑db (--mode store) | Xeon Gold 6240R @ 2.40GHz | DDR4 3200MHz | RAID10, 4×SSD |
seq‑db (--mode ingestor) | Xeon Gold 6240R @ 2.40GHz | DDR4 3200MHz | – |
| elasticsearch (master/data) | Xeon Gold 6240R @ 2.40GHz | DDR4 3200MHz | RAID10, 4×SSD |
| file.d | Xeon Gold 6240R @ 2.40GHz | DDR4 3200MHz | – |
We selected a baseline set of fields to index. Elasticsearch was set up with index k8s-logs-index to index only those fields.
Configuration 1x1
Index settings (same applied to seq‑db including durability guarantees):
curl -X PUT "http://localhost:9200/k8s-logs-index/" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": "6",
"refresh_interval": "1s",
"number_of_replicas": "0",
"codec": "best_compression",
"merge.scheduler.max_thread_count": "2",
"translog": { "durability": "request" }
}
},
"mappings": {
"dynamic": "false",
"properties": {
"request": { "type": "text" },
"size": { "type": "keyword" },
"status": { "type": "keyword" },
"clientip": { "type": "keyword" }
}
}
}'
Results
| Container | CPU Limit | RAM Limit | Avg. CPU | Avg. RAM |
|---|---|---|---|---|
seq‑db (--mode store) | 10 | 16GiB | 8.81 | 3.2GB |
seq‑db (--mode proxy) | 6 | 8GiB | 4.92 | 4.9 GiB |
| elasticsearch (data) | 16 | 32GiB | 15.18 | 13GB |
| Container | Avg. Throughput | Logs/sec |
|---|---|---|
| seq‑db | 181MiB/s | 1,403,514 |
| elasticsearch | 61MiB/s | 442,924 |
Here, seq‑db achieved ~2.9x higher throughput with fewer resources usage.
Configuration 6x6
Six seq‑db instances with --mode proxy and six with --mode store.
Elasticsearch indexing settings stayed the same except number_of_replicas=1:
curl -X PUT "http://localhost:9200/k8s-logs-index/" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": "6",
"refresh_interval": "1s",
"number_of_replicas": "1",
"codec": "best_compression",
"merge.scheduler.max_thread_count": "2",
"translog": { "durability": "request" }
}
},
"mappings": {
"dynamic": "false",
"properties": {
"request": { "type": "text" },
"size": { "type": "keyword" },
"status": { "type": "keyword" },
"clientip": { "type": "keyword" }
}
}
}'
Results
| Container | CPU Limit | RAM Limit | Replicas | Avg. CPU (per instance) | Avg. RAM (per instance) |
|---|---|---|---|---|---|
seq‑db (--mode proxy) | 3 | 8GiB | 6 | 1.87 | 2.2GiB |
seq‑db (--mode store) | 10 | 16GiB | 6 | 7.40 | 2.5GiB |
| elasticsearch (data) | 13 | 32GiB | 6 | 7.34 | 8.8GiB |
| Container | Avg. Throughput | Avg. Logs/sec |
|---|---|---|
| seq‑db | 436MiB/s | 3,383,724 |
| elasticsearch | 62MiB/s | 482,596 |