Skip to main content

Benchmarks

Methodology

We prepared a set of benchmarks designed to be as reproducible and deterministic as possible.
The dataset used during benchmarks is deterministic and consists of 40GiB (219 million) structured JSON logs.

Example log:

{
"@timestamp": 897484581,
"clientip": "162.146.3.0",
"request": "GET /images/logo_cfo.gif HTTP/1.1",
"status": 200,
"size": 1504
}

All benchmarks were run against this dataset. The only thing that was changing - computational resources and cluster size.

Local Deploy

Tests were run on an AWS host c6a.4xlarge, with the following configuration:

CPURAMDisk
AMD EPYC 7R13 Processor 3.6GHz, 16vCPU32GiBGP3

For more details on component setup and how to run the suite, see the link.

Local cluster configuration:

ContainerReplicasCPU LimitRAM Limit
seq‑db (--mode single)148GiB
elasticsearch148GiB
file.d112GiB

Results (write‑path)

In the synthetic tests we obtained the following results:

ContainerAvg. Logs/secAvg. ThroughputAvg. CPU UsageAvg. RAM Usage
seq‑db370,00048MiB/s3.3vCPU1.8GiB
elasticsearch110,00014MiB/s1.9vCPU2.4GiB

Thus, with comparable resource usage, seq‑db demonstrated on average 3.4× higher throughput than Elasticsearch.

Results (read‑path)

Both stores were pre-loaded with the same dataset. Read-path tests were run without any write load.

Elasticsearch settings:

  • Request cache disabled (request_cache=false)
  • Total hits counting disabled (track_total_hits=false)

Tests were executed using Grafana k6, with query parameters available in the benchmarks/k6 folder.

Scenario: fetch all logs using offsets

ES enforces default limit page_size * offset ≤ 10,000.

Parameters: 20 looping virtual users for 10s, fetching a random page [1–50].

DBAvgP50P95
seq‑db5.56ms5.05ms9.56ms
elasticsearch6.06ms5.11ms11.8ms

Scenario status: in(500,400,403)

Parameters: 20 VUs for 10s.

DBAvgP50P95
seq‑db364.68ms356.96ms472.26ms
elasticsearch21.68ms18.91ms29.84ms

Scenario request: GET /english/images/top_stories.gif HTTP/1.0

Parameters: 20 looping VUs for 10s.

DBAvgP50P95
seq‑db269.98ms213.43ms704.19ms
elasticsearch46.65ms43.27ms80.53ms

Scenario: aggregation counting logs by status

SQL analogue: SELECT status, COUNT(*) GROUP BY status.
Parameters: 10 parallel queries, 2 VUs.

DBAvgP50P95
seq‑db16.81s16.88s16.10s
elasticsearch6.46s6.44s6.57s

Scenario: minimum log size for each status

SQL analogue: SELECT status, MIN(size) GROUP BY status.
Parameters: 5 iterations with 1 thread.

DBAvgP50P95
seq‑db33.34s33.41s33.93s
elasticsearch16.88s16.82s17.5s

Scenario: range queries — fetch 5,000 documents

Parameters: 20 threads, 10s, random page [1–50], 100 documents per page.

DBAvgP50P95
seq‑db406.09ms385.13ms509.05ms
elasticsearch22.75ms18.06ms64.61ms

K8S Deploy

Cluster computation resources description:

ContainerCPURAMDisk
seq‑db (--mode store)Xeon Gold 6240R @ 2.40GHzDDR4 3200MHzRAID10, 4×SSD
seq‑db (--mode ingestor)Xeon Gold 6240R @ 2.40GHzDDR4 3200MHz
elasticsearch (master/data)Xeon Gold 6240R @ 2.40GHzDDR4 3200MHzRAID10, 4×SSD
file.dXeon Gold 6240R @ 2.40GHzDDR4 3200MHz

We selected a baseline set of fields to index. Elasticsearch was set up with index k8s-logs-index to index only those fields.

Configuration 1x1

Index settings (same applied to seq‑db including durability guarantees):

curl -X PUT "http://localhost:9200/k8s-logs-index/" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": "6",
"refresh_interval": "1s",
"number_of_replicas": "0",
"codec": "best_compression",
"merge.scheduler.max_thread_count": "2",
"translog": { "durability": "request" }
}
},
"mappings": {
"dynamic": "false",
"properties": {
"request": { "type": "text" },
"size": { "type": "keyword" },
"status": { "type": "keyword" },
"clientip": { "type": "keyword" }
}
}
}'

Results

ContainerCPU LimitRAM LimitAvg. CPUAvg. RAM
seq‑db (--mode store)1016GiB8.813.2GB
seq‑db (--mode proxy)68GiB4.924.9 GiB
elasticsearch (data)1632GiB15.1813GB
ContainerAvg. ThroughputLogs/sec
seq‑db181MiB/s1,403,514
elasticsearch61MiB/s442,924

Here, seq‑db achieved ~2.9x higher throughput with fewer resources usage.

Configuration 6x6

Six seq‑db instances with --mode proxy and six with --mode store. Elasticsearch indexing settings stayed the same except number_of_replicas=1:

curl -X PUT "http://localhost:9200/k8s-logs-index/" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": "6",
"refresh_interval": "1s",
"number_of_replicas": "1",
"codec": "best_compression",
"merge.scheduler.max_thread_count": "2",
"translog": { "durability": "request" }
}
},
"mappings": {
"dynamic": "false",
"properties": {
"request": { "type": "text" },
"size": { "type": "keyword" },
"status": { "type": "keyword" },
"clientip": { "type": "keyword" }
}
}
}'

Results

ContainerCPU LimitRAM LimitReplicasAvg. CPU (per instance)Avg. RAM (per instance)
seq‑db (--mode proxy)38GiB61.872.2GiB
seq‑db (--mode store)1016GiB67.402.5GiB
elasticsearch (data)1332GiB67.348.8GiB
ContainerAvg. ThroughputAvg. Logs/sec
seq‑db436MiB/s3,383,724
elasticsearch62MiB/s482,596