pg_flight_recorder: Continuously sample PostgreSQL system state via pg_cron
A pure-SQL flight recorder for PostgreSQL 15–18 that captures wait events, locks, WAL, and query performance every 60s via pg_cron—no sidecars, no agents, no external polling.
What Matters
-
Two extensions:
pgfr_record(core collection, ring buffers, scheduling) and optionalpgfr_analyze(reporting, anomaly detection, time travel queries). - Ring buffers retain sampled activity (wait events, sessions, locks) for 2h hot; archives persist 7 days. Snapshots (WAL, I/O, tables) kept 30 days.
-
Default retention is ~2.5GB uncompressed, ~150MB compressed—exportable via
pg_dumpon a single schema. - Circuit breaker skips collection if recent runs averaged >1s; load shedding kicks in above 70% active connections; per-query section timeout is 250ms.
-
pgfr_analyze.what_happened_at('timestamp')andincident_timeline()enable point-in-time incident reconstruction from archived samples. -
XID and MultiXID wraparound monitoring included; configurable warning ratios (default tunable via
xid_warning_ratioconfig key, per postgres-howto #0044). -
Switching to
troubleshootingprofile drops to 60s sampling with expanded capture;production_safeprofile uses 300s intervals for minimal overhead.