People having experience with commercial RDBMS are used to have the ability to answer the question “What a particular session is doing right now?” Or even “What was that session waiting 5 minutes ago?” For a long time PostgreSQL did not have such diagnostic tools and DBAs used to get out with different ways of sophistication. I gave a talk on pgday.ru (in Russian) about how we do it. This talk was collaborative with Ildus Kurbangaliev from PostgrePro. And Ildus was just speaking about tool that allows to answer questions above.

Strictly speaking it is not the first try to implement what people used to call wait [events] interface, but all previous attempts were not brought to some reasonable state and died as proof of concept patches. But pg_stat_wait is currently available as a set of patches to current stable 9.4 branch and currently developing 9.6 (actual versions should be looked at pgsql-hackers@).

After quite long testing and fixing bugs we even deployed them to production.

Installation

Before it all becomes part of core PostgreSQL you need to recompile postgres. I think description of rebuilding as ./configure && make && sudo make install is meaningless — much better to look into documentation.

After it you should add pg_stat_wait to shared_preload_libraries. Additionally, you can add following options to postgresql.conf:

  • waits_monitoring = on - enabling functionality on,
  • pg_stat_wait.history = on - storing history of wait events,
  • pg_stat_wait.history_size = 1000000 - number of last events to keep in history,
  • pg_stat_wait.history_period = 1000 - how often should wait events be stored in history (ms).

After that you should restart PostgreSQL and make CREATE EXTENSION pg_stat_wait. After that everything will start working.

Capabilities

What exactly will start to work? First you may look at what is inside the extension:

rpopdb01g/postgres M # \dxS+ pg_stat_wait
           Objects in extension "pg_stat_wait"
                   Object Description
---------------------------------------------------------
 function pg_is_in_trace(integer)
 function pg_start_trace(integer,cstring)
 function pg_stat_wait_get_current(integer)
 function pg_stat_wait_get_history()
 function pg_stat_wait_get_profile(integer,boolean)
 function pg_stat_wait_make_test_lwlock(integer,integer)
 function pg_stat_wait_reset_profile()
 function pg_stop_trace(integer)
 function pg_wait_class_list()
 function pg_wait_event_list()
 view pg_stat_wait_current
 view pg_stat_wait_history
 view pg_stat_wait_profile
 view pg_wait_class
 view pg_wait_event
 view pg_wait_events
(16 rows)

rpopdb01g/postgres M #

Let’s see what wait events pg_stat_wait is able to monitor:

rpopdb01g/postgres M # SELECT version();
                                                    version
---------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
(1 row)

rpopdb01g/postgres M # SELECT class_name, count(event_name)
FROM pg_wait_events GROUP BY 1 ORDER BY 2 DESC;
 class_name | count
------------+-------
 LWLocks    |    52
 Storage    |     9
 Locks      |     9
 Network    |     3
 Latch      |     1
 CPU        |     1
(6 rows)

rpopdb01g/postgres M #

You can see that waits monitoring for 9.4 knows about 52 LWLocks and for disk, for example, it can track next things:

rpopdb01g/postgres M # SELECT * FROM pg_wait_events WHERE class_id = 3;
 class_id | class_name | event_id | event_name
----------+------------+----------+------------
        3 | Storage    |        0 | SMGR_READ
        3 | Storage    |        1 | SMGR_WRITE
        3 | Storage    |        2 | SMGR_FSYNC
        3 | Storage    |        3 | XLOG_READ
        3 | Storage    |        4 | XLOG_WRITE
        3 | Storage    |        5 | XLOG_FSYNC
        3 | Storage    |        6 | SLRU_READ
        3 | Storage    |        7 | SLRU_WRITE
        3 | Storage    |        8 | SLRU_FSYNC
(9 rows)

rpopdb01g/postgres M #

Under “can track” the following is meant:

  • What and how long a particular process is waiting right now?
  • How many times a particular process hung in waiting of every event type and how much time did it spend waiting?
  • What was a particular process waiting some time ago?

For answering these questions there are pg_stat_wait_current, pg_stat_wait_profile, pg_stat_wait_history respectively. Best seen on the examples.

pg_stat_wait_current

rpopdb01g/postgres M # SELECT pid, class_name, event_name, wait_time
FROM pg_stat_wait_current WHERE class_id NOT IN (4, 5)
ORDER BY wait_time DESC;
  pid  | class_name |  event_name   | wait_time
-------+------------+---------------+-----------
 23510 | LWLocks    | BufferLWLocks |     17184
 23537 | LWLocks    | BufferLWLocks |      9367
 23628 | LWLocks    | BufferLWLocks |      9366
 23502 | LWLocks    | BufferLWLocks |      3215
 23504 | LWLocks    | BufferLWLocks |      2846
 23533 | LWLocks    | BufferLWLocks |      2788
 23514 | LWLocks    | BufferLWLocks |      2658
 23517 | LWLocks    | BufferLWLocks |      2658
 23532 | LWLocks    | BufferLWLocks |      2641
 23527 | LWLocks    | BufferLWLocks |      2507
 23952 | Storage    | SMGR_READ     |      2502
 23518 | Storage    | XLOG_FSYNC    |      1576
 23524 | LWLocks    | WALWriteLock  |      1027
(13 rows)

rpopdb01g/postgres M #

We remove waits of classes ‘Network’ and ‘Latch’ because their waiting time is usually several orders of magnitude longer than waits of other classes. And listed above columns are not all columns that exist in the view:

smcdb01d/postgres M # SELECT * FROM pg_stat_wait_current
WHERE class_id IN (2, 3) LIMIT 2;
-[ RECORD 1 ]-----------------------------
pid        | 12107
sample_ts  | 2015-11-16 10:36:59.598562+03
class_id   | 2
class_name | Locks
event_id   | 4
event_name | Transaction
wait_time  | 24334
p1         | 5
p2         | 255593733
p3         | 0
p4         | 0
p5         | 0
-[ RECORD 2 ]-----------------------------
pid        | 1266
sample_ts  | 2015-11-16 10:36:59.598562+03
class_id   | 3
class_name | Storage
event_id   | 0
event_name | SMGR_READ
wait_time  | 1710
p1         | 1663
p2         | 16400
p3         | 20508
p4         | 0
p5         | 220036

smcdb01d/postgres M #

Parameters p1-p5 are text fields. For example, for heavy-weight locks they give approximately same information that you can see in pg_locks view and for disk I/O waits you can understand from which DB, relation and block we were waiting while reading.

pg_stat_wait_profile

For example, you can see how much time DB spent in each class of waits:

rpopdb01g/postgres M # SELECT class_name, sum(wait_time) AS wait_time,
sum(wait_count) AS wait_count FROM pg_stat_wait_profile
GROUP BY class_name ORDER BY wait_time DESC;
 class_name |  wait_time   | wait_count
------------+--------------+------------
 Network    | 144196945815 |   11877848
 Latch      |  90164921148 |    3521073
 LWLocks    |   2648490737 |   10501900
 Storage    |    977430136 |   36444251
 CPU        |     68890774 |  365699457
 Locks      |           74 |          1
(6 rows)

rpopdb01g/postgres M #

Or which LWLocks are the hottest in the system:

rpopdb01g/postgres M # SELECT event_name, sum(wait_time) AS wait_time,
sum(wait_count) AS wait_count FROM pg_stat_wait_profile
WHERE class_id = 1 AND wait_time != 0 AND wait_count != 0
GROUP BY event_name ORDER BY wait_time DESC;
      event_name      | wait_time  | wait_count
----------------------+------------+------------
 LockMgrLWLocks       | 1873294341 |    3870685
 WALWriteLock         | 1039279117 |     859101
 BufferLWLocks        |  299153931 |    7356555
 BufFreelistLock      |    7466923 |      75484
 ProcArrayLock        |    2321769 |      34355
 CLogControlLock      |     778148 |      21286
 WALInsertLocks       |     456224 |       7451
 BufferMgrLocks       |     107374 |       8447
 XidGenLock           |      84914 |       2506
 UserDefinedLocks     |       1875 |          7
 CLogBufferLocks      |        868 |         80
 SInvalWriteLock      |         11 |          3
 CheckpointerCommLock |          1 |          1
(13 rows)

Time: 29.388 ms
rpopdb01g/postgres M #

These two examples show that waiting time does not always correlate with wait events count. That’s why sampling without accounting waiting time can give not right the whole picture.

pg_stat_wait_history

This view allows to see what a particular process was waiting for in the past. Storage depth and sampling interval can be configured as shown above.

xivadb01e/postgres M # SELECT sample_ts, class_name, event_name, wait_time
FROM pg_stat_wait_history WHERE pid = 29585 ORDER BY sample_ts DESC LIMIT 10;
           sample_ts           | class_name |   event_name    | wait_time
-------------------------------+------------+-----------------+-----------
 2015-11-16 10:56:28.544052+03 | LWLocks    | BufferMgrLocks  |    983997
 2015-11-16 10:56:27.542938+03 | LWLocks    | CLogControlLock |    655975
 2015-11-16 10:56:26.850302+03 | LWLocks    | WALInsertLocks  |    979516
 2015-11-16 10:56:25.849207+03 | LWLocks    | WALInsertLocks  |    207418
 2015-11-16 10:56:24.848059+03 | LWLocks    | WALInsertLocks  |    923916
 2015-11-16 10:56:23.846909+03 | LWLocks    | WALInsertLocks  |    753185
 2015-11-16 10:56:22.845808+03 | LWLocks    | WALInsertLocks  |    877707
 2015-11-16 10:56:21.844718+03 | LWLocks    | WALInsertLocks  |    778897
 2015-11-16 10:56:20.843562+03 | LWLocks    | CLogControlLock |    991267
 2015-11-16 10:56:19.842464+03 | LWLocks    | CLogControlLock |   1001059
(10 rows)

xivadb01e/postgres M #

Session tracing

All described above views are designed to be always turned on, their performance overhead is minimal. But there are cases when sampling once in pg_stat_wait.history_period is not enough and you need to see all waits of a particular process. In that case you should use functions for tracing, for example:

rpopdb01g/postgres M # SELECT pg_backend_pid();
 pg_backend_pid
----------------
           5399
(1 row)

rpopdb01g/postgres M # SELECT pg_start_trace(5399, '/tmp/5399.trace');
INFO:  00000: Trace was started to: /tmp/5399.trace
LOCATION:  StartWait, wait.c:259
 pg_start_trace
----------------

(1 row)

rpopdb01g/postgres M # SELECT pg_is_in_trace(5399);
 pg_is_in_trace
----------------
 t
(1 row)

-- some activity

rpopdb01g/postgres M # SELECT pg_stop_trace(5399);
INFO:  00000: Trace was stopped
LOCATION:  StartWait, wait.c:265
 pg_stop_trace
---------------

(1 row)

rpopdb01g/postgres M #

A simple text file would be created where there would be two lines for each wait event, for example:

start 2015-11-16 11:17:26.831686+03 CPU MemAllocation 0 0 0 0 0
stop 2015-11-16 11:17:26.831695+03 CPU
start 2015-11-16 11:17:26.831705+03 LWLocks BufferLWLocks 122 1 0 0 0
stop 2015-11-16 11:17:26.831715+03 LWLocks
start 2015-11-16 11:17:26.831738+03 Network WRITE 0 0 0 0 0
stop 2015-11-16 11:17:26.831749+03 Network
start 2015-11-16 11:17:26.831795+03 Network READ 0 0 0 0 0
stop 2015-11-16 11:17:26.831808+03 Network
start 2015-11-16 11:17:26.831825+03 Storage SMGR_READ 1663 13003 12763 0 13
stop 2015-11-16 11:17:26.831844+03 Storage

Instead of conclusion

Wait interface is the long-awaited feature in PostgreSQL which allows significantly improve the understanding of what is happening inside the database. Right now this functionality is kicked into core PostgreSQL so that starting from 9.6 you would not need to recompile postgres.

Just in case, shortly before Ildus submitted his implementation on pgsql-hackers@ Robert Haas proposed the same idea and lots of people supported this idea. To become it true a couple of preparatory patches have already been commited, for example Refactoring of LWLock tranches.

I really hope that it will become part of PostgreSQL in 9.6.


Comments

comments powered by Disqus