Prometheus Settings

pg_doorman exposes Prometheus metrics on the [web] listener. Enable /metrics through [web]; the tables below map metric names to the pooler state they report.

Enabling the Web Listener

Both the Prometheus metrics endpoint (/metrics) and the optional operator console (the SPA on /, /api/*) are served by the same [web] listener. The legacy prometheus.* config keys are accepted as aliases for web.*.

web:
  enabled: true     # Bind the HTTP listener for /metrics
  host: "0.0.0.0"
  port: 9127
  # Operator console is off by default; see the Web UI guide
  ui: false
  ui_anonymous: false

Configuration Options

For UI settings, see Web UI. The minimum to expose /metrics is:

OptionDescriptionDefault
enabledEnable the [web] HTTP listener. /metrics is available when this is true; the operator console also requires ui = true.false
hostBind address for the [web] HTTP listener."0.0.0.0"
portPort for the [web] HTTP listener.9127

Configuring Prometheus

Add the following job to your Prometheus configuration to scrape metrics from pg_doorman:

scrape_configs:
  - job_name: 'pg_doorman'
    static_configs:
      - targets: ['<pg_doorman_host>:9127']

Replace <pg_doorman_host> with the hostname or IP address of your pg_doorman instance.

Available Metrics

pg_doorman exposes the following metrics:

System Metrics

MetricDescription
pg_doorman_total_memoryTotal memory allocated to the pg_doorman process in bytes. Monitors the memory footprint of the application.

Connection Metrics

MetricDescription
pg_doorman_connections_totalCumulative count of accepted client connections by type. Types include: 'plain' (unencrypted), 'tls' (encrypted), 'cancel' (cancel-query startup), and 'total' (sum of all). Counter form; use rate(pg_doorman_connections_total[5m]) for connection rate.
pg_doorman_connection_countDEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_connections_total kept for one minor release. New rules and dashboards must consume the counter form.

Socket Metrics (Linux only)

MetricDescription
pg_doorman_socketsCounter of sockets used by pg_doorman by socket type. Types include: 'tcp' (IPv4 TCP sockets), 'tcp6' (IPv6 TCP sockets), 'unix' (Unix domain sockets), and 'unknown' (sockets of unrecognized type). Only available on Linux systems. Collected by a background task every 15 seconds; scrapes serve whatever the last tick produced, so reported counts can lag reality by up to one refresh interval. Use Prometheus scrape_interval of at least 15 s to avoid scraping the same snapshot twice.

Pool Metrics

MetricDescription
pg_doorman_pools_clientsNumber of clients in connection pools by status, user, and database. Status values include: 'idle' (connected but not executing queries), 'waiting' (waiting for a server connection), and 'active' (currently executing queries). Helps monitor connection pool utilization and client distribution.
pg_doorman_pools_serversNumber of servers in connection pools by status, user, and database. Status values include: 'active' (actively serving clients) and 'idle' (available for new connections). Helps monitor server availability and load distribution.
pg_doorman_pools_bytes_totalCumulative bytes transferred per pool and direction. Direction values include: 'received' (data from client) and 'sent' (data to client). Counter form; use rate(pg_doorman_pools_bytes_total[5m]) for throughput.
pg_doorman_pools_bytesDEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_pools_bytes_total.

| pg_doorman_pool_size | Configured maximum pool size per user and database. Useful for calculating remaining pool capacity together with pg_doorman_pools_servers. |

Query and Transaction Metrics

MetricDescription
pg_doorman_pools_query_duration_secondsServer-side query latency histogram per pool, in seconds. Use histogram_quantile(q, sum by (le, user, database) (rate(pg_doorman_pools_query_duration_seconds_bucket[5m]))) for quantiles; rate(_count[5m]) for QPS.
pg_doorman_pools_transaction_duration_secondsEnd-to-end transaction latency histogram per pool, in seconds. Same composition contract as pg_doorman_pools_query_duration_seconds.
pg_doorman_pools_wait_duration_secondsClient checkout wait latency histogram per pool, in seconds. Use histogram_quantile(0.99, ...) for tail wait.
pg_doorman_pools_transactions_totalCumulative transaction count per pool. Counter form; use rate(pg_doorman_pools_transactions_total[5m]) for TPS.
pg_doorman_pools_queries_percentileDEPRECATED, removed in 3.10. Pre-aggregated percentile gauge that cannot be summed across replicas. Use pg_doorman_pools_query_duration_seconds_bucket with histogram_quantile().
pg_doorman_pools_transactions_percentileDEPRECATED, removed in 3.10. See pg_doorman_pools_transaction_duration_seconds.
pg_doorman_pools_transactions_countDEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_pools_transactions_total.
pg_doorman_pools_transactions_total_timeTotal time spent executing transactions in connection pools by user and database. Values are in milliseconds. Helps monitor overall transaction performance and identify users or databases with high transaction execution times.
pg_doorman_pools_queries_totalCumulative query count per pool. Counter form; use rate(pg_doorman_pools_queries_total[5m]) for QPS.
pg_doorman_pools_queries_countDEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_pools_queries_total.
pg_doorman_pools_queries_total_timeTotal time spent executing queries in connection pools by user and database. Values are in milliseconds. Helps monitor overall query performance and identify users or databases with high query execution times.
pg_doorman_pools_avg_wait_timeDEPRECATED, removed in 3.10. Running mean that drowns tail wait spikes. Use pg_doorman_pools_wait_duration_seconds_bucket with histogram_quantile().

Auth Query Metrics

These metrics are only available when auth_query is configured for one or more pools.

MetricDescription
pg_doorman_auth_query_cache_totalCumulative auth query cache events by type (hits/misses/refetches/rate_limited) and database. Counter form; the entries snapshot stays on pg_doorman_auth_query_cache.
pg_doorman_auth_query_auth_totalCumulative auth query authentication outcomes by result (success/failure) and database. Counter form.
pg_doorman_auth_query_executor_totalCumulative auth query executor events by type (queries/errors) and database. Counter form.
pg_doorman_auth_query_dynamic_pools_totalCumulative auth query dynamic pool lifecycle events by type (created/destroyed) and database. Counter form; the current snapshot stays on pg_doorman_auth_query_dynamic_pools.
pg_doorman_auth_query_cacheSnapshot gauge for entries (current cached credentials). Cumulative members are deprecated in this metric — use pg_doorman_auth_query_cache_total.
pg_doorman_auth_query_authDEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_auth_query_auth_total.
pg_doorman_auth_query_executorDEPRECATED, removed in 3.10. Gauge mirror of pg_doorman_auth_query_executor_total.
pg_doorman_auth_query_dynamic_poolsAuth query dynamic pool lifecycle metrics by type and database. Types include: current (currently active dynamic pools), created (total pools created since startup), destroyed (total pools garbage-collected or removed on RELOAD). Only relevant in passthrough mode.

Configured startup_parameters

These metrics cover two failure points for configured startup parameters. pg_doorman_backend_startup_parameter_errors_total counts backend startups PostgreSQL rejected after pg_doorman sent the StartupMessage. pg_doorman_startup_parameters_dropped_total counts drop events before StartupMessage, either because the resolved parameter set was too large or because an auth_query JSON value was invalid.

MetricDescription
pg_doorman_backend_startup_parameter_errors_totalCounter by (pool, sqlstate). Increments when PostgreSQL rejects a backend startup and the ErrorResponse names a startup parameter sent by pg_doorman. SQLSTATEs with the 57P prefix are excluded because Patroni-assisted fallback handles those errors. The failing parameter name and username are written to the warning log line, not to labels. pg_doorman first parses the common parameter "<name>" phrase, then scans the message for any sent key in double quotes. If neither lookup finds a key, the counter is not incremented.
pg_doorman_startup_parameters_dropped_totalCounter by (pool, reason). Increments when pg_doorman drops startup parameters before sending StartupMessage. Reasons: cascade_budget_exceeded, packet_cap_exceeded, auth_query_oversize, auth_query_overlay_oversize, auth_query_bad_type, auth_query_invalid_json, auth_query_invalid_shape, auth_query_invalid_entry, dedicated_mode.

Server Metrics

MetricDescription
pg_doorman_servers_prepared_hitsLive aggregate of prepared-statement cache hits across currently active backends of each pool, by user and database. This gauge can decrease when backends rotate; use pg_doorman_servers_prepared_hits_total for rates.
pg_doorman_servers_prepared_missesLive aggregate of prepared-statement cache misses across currently active backends of each pool, by user and database. This gauge can decrease when backends rotate; use pg_doorman_servers_prepared_misses_total for rates.
pg_doorman_servers_prepared_hits_totalCounter form of prepared-statement cache hits across all backends of each pool, by user and database. Use rate() over this metric for hit throughput.
pg_doorman_servers_prepared_misses_totalCounter form of prepared-statement cache misses across all backends of each pool, by user and database. A sustained non-zero rate signals queries that could benefit from being prepared, or from a larger server_prepared_statements_cache_size.

Per-Client Prepared Statement Cache Metrics

The per-client prepared statement cache is split into a Named map (unbounded) and an Anonymous LRU bounded by client_anonymous_prepared_cache_size (defaults to the resolved prepared_statements_cache_size when unset). The three metrics below expose the size of each part and the eviction rate on the bounded part.

MetricDescription
pg_doorman_clients_prepared_named_entriesGauge by user and database. Sum of Named entries across every connected client's cache. Named statements have no upper bound and are kept until the client disconnects or sends DEALLOCATE. Sustained growth here indicates drivers that mint per-query named statements (some pgjdbc / Hibernate flows, some .NET Npgsql configurations) and may justify capping per-client memory at the application layer.
pg_doorman_clients_prepared_anonymous_entriesGauge by user and database. Sum of Anonymous entries across every connected client's cache. Each client's Anonymous part is capped at client_anonymous_prepared_cache_size, so this gauge approaches at most connected_clients * cache_size.
pg_doorman_clients_prepared_anonymous_evictions_totalCounter by user and database. Cumulative count of Anonymous LRU evictions across all clients of the pool. A sustained non-zero rate signals that client_anonymous_prepared_cache_size is too small for the workload and the LRU is recycling entries faster than the application reuses them. The counter is monotonic per pool; an upgrade restarts it from zero.

Query Interner Metrics

The query interner is process-global. These metrics have no pool, user, or database labels; use the prepared-statement metrics above to locate the affected pool.

MetricDescription
pg_doorman_query_interner_entriesGauge by kind (named or anonymous). Number of interned query texts. Refreshed once per GC sweep.
pg_doorman_query_interner_bytesGauge by kind (named or anonymous). Total bytes of interned query text. Refreshed once per GC sweep.
pg_doorman_query_interner_evictions_totalCounter by kind and reason (gc_passive or ttl_expired). Named entries are removed when no cache outside the interner still holds them; anonymous entries are removed after the idle TTL.
pg_doorman_query_interner_synthetic_misses_totalCounter of synthetic SQLSTATE 26000 responses for anonymous prepared statements whose state was no longer available when a later Bind or Describe referenced it. Check client Anonymous LRU evictions, WARN logs, RESET INTERNER, and TTL evictions before increasing query_interner_anon_idle_ttl_seconds.
pg_doorman_query_interner_gc_duration_secondsHistogram of one interner GC sweep (named and anonymous combined), in seconds. Use this to detect large interners that make sweep time visible.
pg_doorman_pooler_check_query_backend_totalCounter of pooler_check_query probes forwarded to PostgreSQL (cache miss or RELOAD-induced re-probe). Steady-state value should be flat after warmup; a continuously rising rate means the per-pool cache is not retaining its entry.
pg_doorman_pooler_check_query_cache_totalCounter of pooler_check_query probes answered from the per-pool response cache without touching the backend. Hit rate = cache_total / (cache_total + backend_total).

Grafana Dashboard

You can create a Grafana dashboard to visualize these metrics. Here's a simple example of panels you might want to include:

  1. Connection counts by type
  2. Memory usage over time
  3. Client and server counts by pool
  4. Query and transaction performance percentiles
  5. Network traffic by pool

Example Queries

Here are some example Prometheus queries that you might find useful:

Connection Rate

rate(pg_doorman_connections_total{type="total"}[5m])

Pool Utilization

sum by (database) (pg_doorman_pools_clients{status="active"}) / sum by (database) (pg_doorman_pools_servers{status="active"} + pg_doorman_pools_servers{status="idle"})

Slow Queries (p99)

histogram_quantile(0.99, sum by (le, user, database) (rate(pg_doorman_pools_query_duration_seconds_bucket[5m])))

Client Wait Time (p99)

histogram_quantile(0.99, sum by (le, user, database) (rate(pg_doorman_pools_wait_duration_seconds_bucket[5m])))

Auth Query Cache Hit Rate

rate(pg_doorman_auth_query_cache_total{type="hits"}[5m]) / clamp_min(rate(pg_doorman_auth_query_cache_total{type="hits"}[5m]) + rate(pg_doorman_auth_query_cache_total{type="misses"}[5m]), 0.001)

Auth Query Failure Rate

rate(pg_doorman_auth_query_auth_total{result="failure"}[5m])