Prometheus Settings
pg_doorman exposes Prometheus metrics on the [web] listener. Enable /metrics through [web]; the tables below map metric names to the pooler state they report.
Enabling the Web Listener
Both the Prometheus metrics endpoint (/metrics) and the optional operator console (the SPA on /, /api/*) are served by the same [web] listener. The legacy prometheus.* config keys are accepted as aliases for web.*.
web:
enabled: true # Bind the HTTP listener for /metrics
host: "0.0.0.0"
port: 9127
# Operator console is off by default; see the Web UI guide
ui: false
ui_anonymous: false
Configuration Options
For UI settings, see Web UI. The minimum to expose /metrics is:
| Option | Description | Default |
|---|---|---|
enabled | Enable the [web] HTTP listener. /metrics is available when this is true; the operator console also requires ui = true. | false |
host | Bind address for the [web] HTTP listener. | "0.0.0.0" |
port | Port for the [web] HTTP listener. | 9127 |
Configuring Prometheus
Add the following job to your Prometheus configuration to scrape metrics from pg_doorman:
scrape_configs:
- job_name: 'pg_doorman'
static_configs:
- targets: ['<pg_doorman_host>:9127']
Replace <pg_doorman_host> with the hostname or IP address of your pg_doorman instance.
Available Metrics
pg_doorman exposes the following metrics:
System Metrics
| Metric | Description |
|---|---|
pg_doorman_total_memory | Total memory allocated to the pg_doorman process in bytes. Monitors the memory footprint of the application. |
Connection Metrics
| Metric | Description |
|---|---|
pg_doorman_connections_total | Cumulative count of accepted client connections by type. Types include: 'plain' (unencrypted), 'tls' (encrypted), 'cancel' (cancel-query startup), and 'total' (sum of all). Counter form; use rate(pg_doorman_connections_total[5m]) for connection rate. |
pg_doorman_connection_count | DEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_connections_total kept for one minor release. New rules and dashboards must consume the counter form. |
Socket Metrics (Linux only)
| Metric | Description |
|---|---|
pg_doorman_sockets | Counter of sockets used by pg_doorman by socket type. Types include: 'tcp' (IPv4 TCP sockets), 'tcp6' (IPv6 TCP sockets), 'unix' (Unix domain sockets), and 'unknown' (sockets of unrecognized type). Only available on Linux systems. Collected by a background task every 15 seconds; scrapes serve whatever the last tick produced, so reported counts can lag reality by up to one refresh interval. Use Prometheus scrape_interval of at least 15 s to avoid scraping the same snapshot twice. |
Pool Metrics
| Metric | Description |
|---|---|
pg_doorman_pools_clients | Number of clients in connection pools by status, user, and database. Status values include: 'idle' (connected but not executing queries), 'waiting' (waiting for a server connection), and 'active' (currently executing queries). Helps monitor connection pool utilization and client distribution. |
pg_doorman_pools_servers | Number of servers in connection pools by status, user, and database. Status values include: 'active' (actively serving clients) and 'idle' (available for new connections). Helps monitor server availability and load distribution. |
pg_doorman_pools_bytes_total | Cumulative bytes transferred per pool and direction. Direction values include: 'received' (data from client) and 'sent' (data to client). Counter form; use rate(pg_doorman_pools_bytes_total[5m]) for throughput. |
pg_doorman_pools_bytes | DEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_pools_bytes_total. |
| pg_doorman_pool_size | Configured maximum pool size per user and database. Useful for calculating remaining pool capacity together with pg_doorman_pools_servers. |
Query and Transaction Metrics
| Metric | Description |
|---|---|
pg_doorman_pools_query_duration_seconds | Server-side query latency histogram per pool, in seconds. Use histogram_quantile(q, sum by (le, user, database) (rate(pg_doorman_pools_query_duration_seconds_bucket[5m]))) for quantiles; rate(_count[5m]) for QPS. |
pg_doorman_pools_transaction_duration_seconds | End-to-end transaction latency histogram per pool, in seconds. Same composition contract as pg_doorman_pools_query_duration_seconds. |
pg_doorman_pools_wait_duration_seconds | Client checkout wait latency histogram per pool, in seconds. Use histogram_quantile(0.99, ...) for tail wait. |
pg_doorman_pools_transactions_total | Cumulative transaction count per pool. Counter form; use rate(pg_doorman_pools_transactions_total[5m]) for TPS. |
pg_doorman_pools_queries_percentile | DEPRECATED, removed in 3.10. Pre-aggregated percentile gauge that cannot be summed across replicas. Use pg_doorman_pools_query_duration_seconds_bucket with histogram_quantile(). |
pg_doorman_pools_transactions_percentile | DEPRECATED, removed in 3.10. See pg_doorman_pools_transaction_duration_seconds. |
pg_doorman_pools_transactions_count | DEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_pools_transactions_total. |
pg_doorman_pools_transactions_total_time | Total time spent executing transactions in connection pools by user and database. Values are in milliseconds. Helps monitor overall transaction performance and identify users or databases with high transaction execution times. |
pg_doorman_pools_queries_total | Cumulative query count per pool. Counter form; use rate(pg_doorman_pools_queries_total[5m]) for QPS. |
pg_doorman_pools_queries_count | DEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_pools_queries_total. |
pg_doorman_pools_queries_total_time | Total time spent executing queries in connection pools by user and database. Values are in milliseconds. Helps monitor overall query performance and identify users or databases with high query execution times. |
pg_doorman_pools_avg_wait_time | DEPRECATED, removed in 3.10. Running mean that drowns tail wait spikes. Use pg_doorman_pools_wait_duration_seconds_bucket with histogram_quantile(). |
Auth Query Metrics
These metrics are only available when auth_query is configured for one or more pools.
| Metric | Description |
|---|---|
pg_doorman_auth_query_cache_total | Cumulative auth query cache events by type (hits/misses/refetches/rate_limited) and database. Counter form; the entries snapshot stays on pg_doorman_auth_query_cache. |
pg_doorman_auth_query_auth_total | Cumulative auth query authentication outcomes by result (success/failure) and database. Counter form. |
pg_doorman_auth_query_executor_total | Cumulative auth query executor events by type (queries/errors) and database. Counter form. |
pg_doorman_auth_query_dynamic_pools_total | Cumulative auth query dynamic pool lifecycle events by type (created/destroyed) and database. Counter form; the current snapshot stays on pg_doorman_auth_query_dynamic_pools. |
pg_doorman_auth_query_cache | Snapshot gauge for entries (current cached credentials). Cumulative members are deprecated in this metric — use pg_doorman_auth_query_cache_total. |
pg_doorman_auth_query_auth | DEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_auth_query_auth_total. |
pg_doorman_auth_query_executor | DEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_auth_query_executor_total. |
pg_doorman_auth_query_dynamic_pools | Auth query dynamic pool lifecycle metrics by type and database. Types include: current (currently active dynamic pools), created (total pools created since startup), destroyed (total pools garbage-collected or removed on RELOAD). Only relevant in passthrough mode. |
Configured startup_parameters
These metrics cover two failure points for configured startup parameters. pg_doorman_backend_startup_parameter_errors_total counts backend startups PostgreSQL rejected after pg_doorman sent the StartupMessage. pg_doorman_startup_parameters_dropped_total counts drop events before StartupMessage, either because the resolved parameter set was too large or because an auth_query JSON value was invalid.
| Metric | Description |
|---|---|
pg_doorman_backend_startup_parameter_errors_total | Counter by (pool, sqlstate). Increments when PostgreSQL rejects a backend startup and the ErrorResponse names a startup parameter sent by pg_doorman. SQLSTATEs with the 57P prefix are excluded because Patroni-assisted fallback handles those errors. The failing parameter name and username are written to the warning log line, not to labels. pg_doorman first parses the common parameter "<name>" phrase, then scans the message for any sent key in double quotes. If neither lookup finds a key, the counter is not incremented. |
pg_doorman_startup_parameters_dropped_total | Counter by (pool, reason). Increments when pg_doorman drops startup parameters before sending StartupMessage. Reasons: cascade_budget_exceeded, packet_cap_exceeded, auth_query_oversize, auth_query_overlay_oversize, auth_query_bad_type, auth_query_invalid_json, auth_query_invalid_shape, auth_query_invalid_entry, dedicated_mode. |
Server Metrics
| Metric | Description |
|---|---|
pg_doorman_servers_prepared_hits | Live aggregate of prepared-statement cache hits across currently active backends of each pool, by user and database. This gauge can decrease when backends rotate; use pg_doorman_servers_prepared_hits_total for rates. |
pg_doorman_servers_prepared_misses | Live aggregate of prepared-statement cache misses across currently active backends of each pool, by user and database. This gauge can decrease when backends rotate; use pg_doorman_servers_prepared_misses_total for rates. |
pg_doorman_servers_prepared_hits_total | Counter form of prepared-statement cache hits across all backends of each pool, by user and database. Use rate() over this metric for hit throughput. |
pg_doorman_servers_prepared_misses_total | Counter form of prepared-statement cache misses across all backends of each pool, by user and database. A sustained non-zero rate signals queries that could benefit from being prepared, or from a larger server_prepared_statements_cache_size. |
Per-Client Prepared Statement Cache Metrics
The per-client prepared statement cache is split into a Named map (unbounded) and an Anonymous LRU bounded by client_anonymous_prepared_cache_size (defaults to the resolved prepared_statements_cache_size when unset). The three metrics below expose the size of each part and the eviction rate on the bounded part.
| Metric | Description |
|---|---|
pg_doorman_clients_prepared_named_entries | Gauge by user and database. Sum of Named entries across every connected client's cache. Named statements have no upper bound and are kept until the client disconnects or sends DEALLOCATE. Sustained growth here indicates drivers that mint per-query named statements (some pgjdbc / Hibernate flows, some .NET Npgsql configurations) and may justify capping per-client memory at the application layer. |
pg_doorman_clients_prepared_anonymous_entries | Gauge by user and database. Sum of Anonymous entries across every connected client's cache. Each client's Anonymous part is capped at client_anonymous_prepared_cache_size, so this gauge approaches at most connected_clients * cache_size. |
pg_doorman_clients_prepared_anonymous_evictions_total | Counter by user and database. Cumulative count of Anonymous LRU evictions across all clients of the pool. A sustained non-zero rate signals that client_anonymous_prepared_cache_size is too small for the workload and the LRU is recycling entries faster than the application reuses them. The counter is monotonic per pool; an upgrade restarts it from zero. |
Query Interner Metrics
The query interner is process-global. These metrics have no pool, user, or database labels; use the prepared-statement metrics above to locate the affected pool.
| Metric | Description |
|---|---|
pg_doorman_query_interner_entries | Gauge by kind (named or anonymous). Number of interned query texts. Refreshed once per GC sweep. |
pg_doorman_query_interner_bytes | Gauge by kind (named or anonymous). Total bytes of interned query text. Refreshed once per GC sweep. |
pg_doorman_query_interner_evictions_total | Counter by kind and reason (gc_passive or ttl_expired). Named entries are removed when no cache outside the interner still holds them; anonymous entries are removed after the idle TTL. |
pg_doorman_query_interner_synthetic_misses_total | Counter of synthetic SQLSTATE 26000 responses for anonymous prepared statements whose state was no longer available when a later Bind or Describe referenced it. Check client Anonymous LRU evictions, WARN logs, RESET INTERNER, and TTL evictions before increasing query_interner_anon_idle_ttl_seconds. |
pg_doorman_query_interner_gc_duration_seconds | Histogram of one interner GC sweep (named and anonymous combined), in seconds. Use this to detect large interners that make sweep time visible. |
pg_doorman_pooler_check_query_backend_total | Counter of pooler_check_query probes forwarded to PostgreSQL (cache miss or RELOAD-induced re-probe). Steady-state value should be flat after warmup; a continuously rising rate means the per-pool cache is not retaining its entry. |
pg_doorman_pooler_check_query_cache_total | Counter of pooler_check_query probes answered from the per-pool response cache without touching the backend. Hit rate = cache_total / (cache_total + backend_total). |
Grafana Dashboard
You can create a Grafana dashboard to visualize these metrics. Here's a simple example of panels you might want to include:
- Connection counts by type
- Memory usage over time
- Client and server counts by pool
- Query and transaction performance percentiles
- Network traffic by pool
Example Queries
Here are some example Prometheus queries that you might find useful:
Connection Rate
rate(pg_doorman_connections_total{type="total"}[5m])
Pool Utilization
sum by (database) (pg_doorman_pools_clients{status="active"}) / sum by (database) (pg_doorman_pools_servers{status="active"} + pg_doorman_pools_servers{status="idle"})
Slow Queries (p99)
histogram_quantile(0.99, sum by (le, user, database) (rate(pg_doorman_pools_query_duration_seconds_bucket[5m])))
Client Wait Time (p99)
histogram_quantile(0.99, sum by (le, user, database) (rate(pg_doorman_pools_wait_duration_seconds_bucket[5m])))
Auth Query Cache Hit Rate
rate(pg_doorman_auth_query_cache_total{type="hits"}[5m]) / clamp_min(rate(pg_doorman_auth_query_cache_total{type="hits"}[5m]) + rate(pg_doorman_auth_query_cache_total{type="misses"}[5m]), 0.001)
Auth Query Failure Rate
rate(pg_doorman_auth_query_auth_total{result="failure"}[5m])