Yet another bloghttps://simply.name/2016-09-18T23:00:00+03:00One more time about collation in PostgreSQL2016-09-18T23:00:00+03:002016-09-18T23:00:00+03:00d0ubletag:simply.name,2016-09-18:/pg-lc-collate.html<p>It’s been a long time since my last post. It’s time to write something useful :)</p>
<p>When people start working with PostgreSQL they sometimes make mistakes which are
really difficult to fix later. For example during <code>initdb</code> of your first <span class="caps">DB</span> you
don’t really understand whether you need …</p><p>It’s been a long time since my last post. It’s time to write something useful :)</p>
<p>When people start working with PostgreSQL they sometimes make mistakes which are
really difficult to fix later. For example during <code>initdb</code> of your first <span class="caps">DB</span> you
don’t really understand whether you need checksums for data or not. Especially
that by default they are turned off and documentation says that they “may incur
a noticeable performance penalty”.</p>
<p>And when you already have several hundred databases with a few hundred terabytes
of data on different hardware or (even worse) in different virtualization
systems, you do understand that you are ready to pay some performance for
identification of silent data corruption. But the problem is that you can’t
easily turn checksums on. It is one of the things that is adjusted only once
while invoking <code>initdb</code> command. In the bright future we hope for logical
replication but until that moment the only way is <code>pg_dump</code>, <code>initdb</code>,
<code>pg_restore</code> that is with downtime.</p>
<p>And if checksums may be not useful for you (e.g. you have perfect hardware and
<span class="caps">OS</span> without bugs), <code>lc_collate</code> is important for everyone. And now I will prove it.</p>
<h3>Sort order</h3>
<p>Suppose you have installed PostgreSQL from packages or built it from sources and
initialized <span class="caps">DB</span> by yourself. Most probably, in the modern world of victorious
<span class="caps">UTF</span>-8 you would see something like that:</p>
<div class="highlight"><pre><span></span><span class="nv">d0uble</span><span class="w"> </span>~<span class="w"> </span><span class="p">$</span><span class="w"> </span><span class="nv">psql</span><span class="w"> </span><span class="o">-</span><span class="nv">l</span><span class="w"></span>
<span class="w"> </span><span class="nv">List</span><span class="w"> </span><span class="nv">of</span><span class="w"> </span><span class="nv">databases</span><span class="w"></span>
<span class="w"> </span><span class="nv">Name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Owner</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Encoding</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Collate</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Ctype</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Access</span><span class="w"> </span><span class="nv">privileges</span><span class="w"></span>
<span class="o">-----------+--------+----------+-------------+-------------+-------------------</span><span class="w"></span>
<span class="w"> </span><span class="nv">postgres</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">UTF8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="nv">template0</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">UTF8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">=</span><span class="nv">c</span><span class="o">/</span><span class="nv">d0uble</span><span class="w"> </span><span class="o">+</span><span class="w"></span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="o">=</span><span class="nv">CTc</span><span class="o">/</span><span class="nv">d0uble</span><span class="w"></span>
<span class="w"> </span><span class="nv">template1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">UTF8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">=</span><span class="nv">c</span><span class="o">/</span><span class="nv">d0uble</span><span class="w"> </span><span class="o">+</span><span class="w"></span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="o">=</span><span class="nv">CTc</span><span class="o">/</span><span class="nf">d0uble</span><span class="w"></span>
<span class="p">(</span><span class="mi">3</span><span class="w"> </span><span class="nv">rows</span><span class="p">)</span><span class="w"></span>
<span class="nv">d0uble</span><span class="w"> </span>~<span class="w"> </span><span class="p">$</span><span class="w"></span>
</pre></div>
<p>If you don’t specify explicitly, <code>initdb</code> will take settings for columns 3-5
from operating system. And most likely you would think that everything is fine
if you see <code>UTF-8</code> there. However, in some cases you may be surprised. Look at
the following query result on linux box:</p>
<div class="highlight"><pre><span></span>linux> SELECT name FROM unnest(ARRAY[
'MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg'
]) name ORDER BY name;
name
--------------------
my-image.jpg
my_name
MYNAME
my-third-image.jpg
(4 rows)
linux>
</pre></div>
<p>Such sort order seems really weird. And this despite the fact that the client
connected to <span class="caps">DB</span> with quite adequate settings:</p>
<div class="highlight"><pre><span></span>linux> SELECT name, setting FROM pg_settings WHERE category ~ 'Locale';
name | setting
----------------------------+--------------------
client_encoding | UTF8
DateStyle | ISO, MDY
default_text_search_config | pg_catalog.english
extra_float_digits | 0
IntervalStyle | postgres
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
server_encoding | UTF8
TimeZone | Europe/Moscow
timezone_abbreviations | Default
(14 rows)
linux>
</pre></div>
<p>The result doesn’t depend on distro — at least it is the same on <span class="caps">RHEL</span> 6 and
Ubuntu 14.04. Even more strange is the fact that the same query with the same
server and client settings on Mac <span class="caps">OS</span> X gives another result:</p>
<div class="highlight"><pre><span></span>macos> SELECT name FROM unnest(ARRAY[
'MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg'
]) name ORDER BY name;
name
--------------------
my_name
MYNAME
my-image.jpg
my-third-image.jpg
(4 rows)
macos>
</pre></div>
<p>At first glance, linux is seriously broken in this place. But the problem is
that the result which depends on <span class="caps">OS</span> is very bad result. Fortunately, we discoved
it during testing — tests on developer’s macbook were fine, but on testing
linux-server not.</p>
<p>The reason is that PostgreSQL takes collation from <span class="caps">OS</span> and surprisingly <span class="caps">UTF</span>-8 may
be different ¯\_(ツ)_/¯ While searching you could find a lot of threads about
different sort order in Linux and Mac <span class="caps">OS</span> X (
<a href="http://stackoverflow.com/questions/16328592">1</a>,
<a href="https://www.postgresql.org/message-id/flat/23053.1337036410%40sss.pgh.pa.us#23053.1337036410@sss.pgh.pa.us">2</a>,
<a href="http://stackoverflow.com/questions/27395317">3</a>,
<a href="https://www.postgresql.org/message-id/4B4E845F.80906@postnewspapers.com.au">4</a>,
<a href="http://dba.stackexchange.com/questions/106964">5</a>,
<a href="http://dba.stackexchange.com/questions/94887">6</a>).</p>
<p>Opinions are different about the question “who is to blame?” but we can
confidently say that Mac <span class="caps">OS</span> X exactly doesn’t account all regional specifics. It
can be seen by links above or i.e. on the following example for Russian language:</p>
<div class="highlight"><pre><span></span>macos> SELECT name FROM unnest(ARRAY[
'а', 'д', 'е', 'ё', 'ж', 'я'
]) name ORDER BY name;
name
------
а
д
е
ж
я
ё
(6 rows)
macos>
</pre></div>
<p>Meanwhile Linux handles this request reasonably from my point of view. And even
previous query result may be explained — linux ignores whitespaces and symbols
<code>-</code>, <code>_</code> while sorting. I.e. thinking a little the broken <span class="caps">OS</span> is Mac <span class="caps">OS</span> X.</p>
<p>After all we moved our tests to docker to be independant from <span class="caps">OS</span> characteristics
but there are other ways to get the same results in different operating systems.
The easiest one is to use <code>LC_COLLATE = C</code> because it is the only collation
which is distributed with PostgreSQL and doesn’t depend on <span class="caps">OS</span> (see
<a href="https://www.postgresql.org/docs/current/static/charset.html">documentation</a>).</p>
<div class="highlight"><pre><span></span>linux> SELECT name FROM unnest(ARRAY[
'MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg'
]) name ORDER BY name COLLATE "C";
name
--------------------
my_name
MYNAME
my-image.jpg
my-third-image.jpg
(4 rows)
linux>
</pre></div>
<p>You can see that is such case results are the same for both <span class="caps">OS</span>. But it is also
easy to see that they are the same as in Mac <span class="caps">OS</span> X so also with problems for
multibyte encodings, e.g.:</p>
<div class="highlight"><pre><span></span>linux> SELECT name FROM unnest(ARRAY[
'а', 'д', 'е', 'ё', 'ж', 'я'
]) name ORDER BY name COLLATE "C";
name
------
а
д
е
ж
я
ё
(6 rows)
linux>
</pre></div>
<p>Not worth while to think that sort result with <code>LC_COLLATE=en_US.UTF-8</code> in Mac
<span class="caps">OS</span> X always would be the same as with <code>LC_COLLATE=C</code> in any <span class="caps">OS</span>. You can
certainly be sure only in the fact that collation <code>C</code> guarantees the same result
everywhere because it is provided with PostgreSQL and doesn’t depend on <span class="caps">OS</span>.</p>
<p>Meanwhile from a purely narrow-minded point of ordinary user view it seems odd
not to account whitespaces and other non-alphanumeric characters while sorting,
but these rules have been invented, standardized and not for me to change them.
However, in the original problem these rules were invalid so we moved to <code>C</code>
collation.</p>
<h3>Prefix queries</h3>
<p>The fact that postgres relies on glibc in sorting has some more nuances which is
to say some more. For example let’s create the following table with two text
fields and insert into it a million of random rows:</p>
<div class="highlight"><pre><span></span><span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">CREATE</span><span class="w"> </span><span class="nt">TABLE</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">(</span><span class="w"></span>
<span class="w"> </span><span class="nt">a</span><span class="w"> </span><span class="nt">text</span><span class="o">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">b</span><span class="w"> </span><span class="nt">text</span><span class="w"> </span><span class="nt">COLLATE</span><span class="w"> </span><span class="s2">"C"</span><span class="o">);</span><span class="w"></span>
<span class="nt">CREATE</span><span class="w"> </span><span class="nt">TABLE</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">INSERT</span><span class="w"> </span><span class="nt">INTO</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">md5</span><span class="o">(</span><span class="nt">n</span><span class="p">::</span><span class="nd">text</span><span class="o">),</span><span class="w"> </span><span class="nt">md5</span><span class="o">(</span><span class="nt">n</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"></span>
<span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">generate_series</span><span class="o">(</span><span class="nt">1</span><span class="o">,</span><span class="w"> </span><span class="nt">1000000</span><span class="o">)</span><span class="w"> </span><span class="nt">n</span><span class="o">;</span><span class="w"></span>
<span class="nt">INSERT</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">1000000</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"> </span><span class="nt">ON</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">USING</span><span class="w"> </span><span class="nt">btree</span><span class="w"> </span><span class="o">(</span><span class="nt">a</span><span class="o">);</span><span class="w"></span>
<span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"> </span><span class="nt">ON</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">USING</span><span class="w"> </span><span class="nt">btree</span><span class="w"> </span><span class="o">(</span><span class="nt">b</span><span class="o">);</span><span class="w"></span>
<span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">ANALYZE</span><span class="w"> </span><span class="nt">sort_test</span><span class="o">;</span><span class="w"></span>
<span class="nt">ANALYZE</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">LIMIT</span><span class="w"> </span><span class="nt">2</span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">a</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">b</span><span class="w"></span>
<span class="nt">----------------------------------</span><span class="o">+</span><span class="nt">----------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">c4ca4238a0b923820dcc509a6f75849b</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">c4ca4238a0b923820dcc509a6f75849b</span><span class="w"></span>
<span class="w"> </span><span class="nt">c81e728d9d4c2f636f067f89cc14862c</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">c81e728d9d4c2f636f067f89cc14862c</span><span class="w"></span>
<span class="o">(</span><span class="nt">2</span><span class="w"> </span><span class="nt">rows</span><span class="o">)</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"></span>
</pre></div>
<p>First field is created with default collation (<code>en_US.UTF-8</code> in my example)
while the second one is with collation <code>C</code>, the values are the same in both
columns. Let’s see plans for queries by prefix of each field:</p>
<div class="highlight"><pre><span></span><span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">explain</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">WHERE</span><span class="w"> </span><span class="nt">a</span><span class="w"> </span><span class="nt">LIKE</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">QUERY</span><span class="w"> </span><span class="nt">PLAN</span><span class="w"></span>
<span class="nt">----------------------------------------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">Seq</span><span class="w"> </span><span class="nt">Scan</span><span class="w"> </span><span class="nt">on</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">(</span><span class="nt">cost</span><span class="o">=</span><span class="nt">0</span><span class="p">.</span><span class="nc">00</span><span class="o">.</span><span class="p">.</span><span class="nc">24846</span><span class="p">.</span><span class="nc">00</span><span class="w"> </span><span class="nt">rows</span><span class="o">=</span><span class="nt">100</span><span class="w"> </span><span class="nt">width</span><span class="o">=</span><span class="nt">66</span><span class="o">)</span><span class="w"></span>
<span class="w"> </span><span class="nt">Filter</span><span class="o">:</span><span class="w"> </span><span class="o">(</span><span class="nt">a</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"></span>
<span class="o">(</span><span class="nt">2</span><span class="w"> </span><span class="nt">rows</span><span class="o">)</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">explain</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">WHERE</span><span class="w"> </span><span class="nt">b</span><span class="w"> </span><span class="nt">LIKE</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">QUERY</span><span class="w"> </span><span class="nt">PLAN</span><span class="w"></span>
<span class="nt">------------------------------------------------------------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">Index</span><span class="w"> </span><span class="nt">Scan</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">sort_test_b_idx</span><span class="w"> </span><span class="nt">on</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">(</span><span class="nt">cost</span><span class="o">=</span><span class="nt">0</span><span class="p">.</span><span class="nc">42</span><span class="o">.</span><span class="p">.</span><span class="nc">8</span><span class="p">.</span><span class="nc">45</span><span class="w"> </span><span class="nt">rows</span><span class="o">=</span><span class="nt">100</span><span class="w"> </span><span class="nt">width</span><span class="o">=</span><span class="nt">66</span><span class="o">)</span><span class="w"></span>
<span class="w"> </span><span class="nt">Index</span><span class="w"> </span><span class="nt">Cond</span><span class="o">:</span><span class="w"> </span><span class="o">((</span><span class="nt">b</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'c4ca4238a0'</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"> </span><span class="nt">AND</span><span class="w"> </span><span class="o">(</span><span class="nt">b</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'c4ca4238a1'</span><span class="p">::</span><span class="nd">text</span><span class="o">))</span><span class="w"></span>
<span class="w"> </span><span class="nt">Filter</span><span class="o">:</span><span class="w"> </span><span class="o">(</span><span class="nt">b</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"></span>
<span class="o">(</span><span class="nt">3</span><span class="w"> </span><span class="nt">rows</span><span class="o">)</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"></span>
</pre></div>
<p>It’s easy to see that PostgreSQL uses index only for seconf query. The reason
can be seen in <span class="caps">EXPLAIN</span> output (see <code>Index Cond</code>) — in the second case
PostgreSQL knows the order of characters and converts index search condition
from <code>b LIKE 'c4ca4238a0%'</code> to <code>b >= 'c4ca4238a0' AND b < 'c4ca4238a1'</code> (and
just then postgres will filter received results by original condition) and
these two operations are well covered by B-Tree.</p>
<p>You can see that such query cost with collation <code>C</code> is approximately 2500 times less.</p>
<h3>Abbreviated keys</h3>
<p>One of really good optimizations which appeared in PostgreSQL 9.5 was so called
abbreviated keys. The best thing to read about it is <a href="http://pgeoghegan.blogspot.ru/2015/01/abbreviated-keys-exploiting-locality-to.html">the
post</a>
of optimization’s author, Peter Geoghegan. In short it greatly accelerated
sorting of text fields and creating indexes on them. Some examples may be seen
<a href="https://www.depesz.com/2015/01/27/waiting-for-9-5-use-abbreviated-keys-for-faster-sorting-o
f-text-datums/">here</a>.</p>
<p>Unfortunately, in 9.5.2 this optimization <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3df9c374e279db37b0
0cd9c86219471d0cdaa97c">was turned
off</a> for all collations except <code>C</code>. The reason was glibc bug
(as we remember PostgreSQL relies on glibc for all collations except <code>C</code>) in
which result indexes could be inconsistent.</p>
<h3>Instead of a conclusion</h3>
<p>In the original issue after all we started using <code>lc_collate = C</code>, because the
data may be in different languages and this collation seems to be the best
choice for that. Yes, it won’t consider some corner cases in each language but
it would be good enough for all others.</p>
<p>Meanwhile it is really sad that there is no silver bullet and when all your data
is e.g. in Russian you have to choose between performance and correct sorting
order with accounting Russian language specifics.</p>Ещё раз про collation в PostgreSQL2016-09-18T23:00:00+03:002016-09-18T23:00:00+03:00d0ubletag:simply.name,2016-09-18:/ru/pg-lc-collate.html<p>Давненько я ничего не писал. Надо сдуть пыль с блога и написать что-нибудь
полезное :)</p>
<p>Когда люди начинают работать с PostgreSQL, они временами допускают
ошибки, которые потом очень сложно исправить. Например, в момент инициализации
первой базы ты слабо понимаешь, зачем нужно включать контрольные суммы для
данных. Тем более, что по-умолчанию они …</p><p>Давненько я ничего не писал. Надо сдуть пыль с блога и написать что-нибудь
полезное :)</p>
<p>Когда люди начинают работать с PostgreSQL, они временами допускают
ошибки, которые потом очень сложно исправить. Например, в момент инициализации
первой базы ты слабо понимаешь, зачем нужно включать контрольные суммы для
данных. Тем более, что по-умолчанию они выключены, а в документации написано,
что они могут сильно просадить производительность.</p>
<p>А когда у тебя уже больше сотни баз с сотнями терабайт данных на самом разном
железе или (ещё хуже) в разных системах виртуализации, ты понимаешь, что готов
заплатить немножко производительности для определения тихого повреждения данных.
Но проблема в том, что дёшево включить контрольные суммы ты не можешь. Это
одна из тех вещей, которая задаётся один раз при выполнении команды <code>initdb</code>. В
светлом будущем надеемся на логическую репликацию, а пока единственный способ
это поменять — это сделать <code>pg_dump</code>, <code>initdb</code>, <code>pg_restore</code>, т.е. с простоем.</p>
<p>И если контрольные суммы могут вам и не пригодиться (вдруг у вас безупречно
работающее аппаратное обеспечение и ОС без багов), то <code>lc_collate</code>, о котором
пойдёт речь, касается каждого. И сейчас я вам это докажу.</p>
<h3>Порядок сортировки</h3>
<p>Допустим, вы поставили PostgreSQL из пакетов или собрали из исходников и
самостоятельно инициализировали базу. Скорее всего, в современном мире
победившего <span class="caps">UTF</span>-8 вы увидите нечто такое:</p>
<div class="highlight"><pre><span></span><span class="nv">d0uble</span><span class="w"> </span>~<span class="w"> </span><span class="p">$</span><span class="w"> </span><span class="nv">psql</span><span class="w"> </span><span class="o">-</span><span class="nv">l</span><span class="w"></span>
<span class="w"> </span><span class="nv">List</span><span class="w"> </span><span class="nv">of</span><span class="w"> </span><span class="nv">databases</span><span class="w"></span>
<span class="w"> </span><span class="nv">Name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Owner</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Encoding</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Collate</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Ctype</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">Access</span><span class="w"> </span><span class="nv">privileges</span><span class="w"></span>
<span class="o">-----------+--------+----------+-------------+-------------+-------------------</span><span class="w"></span>
<span class="w"> </span><span class="nv">postgres</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">UTF8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="nv">template0</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">UTF8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">=</span><span class="nv">c</span><span class="o">/</span><span class="nv">d0uble</span><span class="w"> </span><span class="o">+</span><span class="w"></span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="o">=</span><span class="nv">CTc</span><span class="o">/</span><span class="nv">d0uble</span><span class="w"></span>
<span class="w"> </span><span class="nv">template1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">UTF8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">en_US</span><span class="o">.</span><span class="nv">UTF</span><span class="o">-</span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">=</span><span class="nv">c</span><span class="o">/</span><span class="nv">d0uble</span><span class="w"> </span><span class="o">+</span><span class="w"></span>
<span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nv">d0uble</span><span class="o">=</span><span class="nv">CTc</span><span class="o">/</span><span class="nf">d0uble</span><span class="w"></span>
<span class="p">(</span><span class="mi">3</span><span class="w"> </span><span class="nv">rows</span><span class="p">)</span><span class="w"></span>
<span class="nv">d0uble</span><span class="w"> </span>~<span class="w"> </span><span class="p">$</span><span class="w"></span>
</pre></div>
<p>Если явно не указано другого, то <code>initdb</code> возьмёт настройки для столбцов 3-5
из операционной системы. И скорее всего, вам будет казаться, что если там есть
<code>UTF-8</code>, то всё будет хорошо. Однако, в некоторых случаях вы вполне себе можете
в этом засомневаться. Взгляните на следующий запрос, выполненный на linux-машине:</p>
<div class="highlight"><pre><span></span>linux> SELECT name FROM unnest(ARRAY[
'MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg'
]) name ORDER BY name;
name
--------------------
my-image.jpg
my_name
MYNAME
my-third-image.jpg
(4 rows)
linux>
</pre></div>
<p>Такой порядок сортировки кажется очень странным. И это при том, что клиент
пришёл в базу с вполне себе адекватными настройками:</p>
<div class="highlight"><pre><span></span>linux> SELECT name, setting FROM pg_settings WHERE category ~ 'Locale';
name | setting
----------------------------+--------------------
client_encoding | UTF8
DateStyle | ISO, MDY
default_text_search_config | pg_catalog.english
extra_float_digits | 0
IntervalStyle | postgres
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
server_encoding | UTF8
TimeZone | Europe/Moscow
timezone_abbreviations | Default
(14 rows)
linux>
</pre></div>
<p>Результат не зависит от дистрибутива — по крайней мере в <span class="caps">RHEL</span> 6 и Ubuntu 14.04
он одинаковый. Ещё более странным является тот факт, что тот же запрос с теми
же настройками сервера и клиента в Mac <span class="caps">OS</span> X даст другой результат:</p>
<div class="highlight"><pre><span></span>macos> SELECT name FROM unnest(ARRAY[
'MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg'
]) name ORDER BY name;
name
--------------------
my_name
MYNAME
my-image.jpg
my-third-image.jpg
(4 rows)
macos>
</pre></div>
<p>На первый взгляд кажется, что linux серьёзно сломан в этом месте. Но проблема
не в этом, а в том, что результат, зависящий от операционной системы, - очень
плохой результат. К счастью, мы обнаружили странное поведение на этапе
тестирования - тесты на ноутбуке разработчика проходили нормально, а на
тестовом linux-сервере падали.</p>
<p>Причиной тому является тот факт, что правила сортировки PostgreSQL берёт из ОС,
и (сюрприз!) <span class="caps">UTF</span>-8 бывает разный ¯\_(ツ)_/¯ Если поискать, то можно найти
множество тредов про различное поведение в Linux и Mac <span class="caps">OS</span> X (
<a href="http://stackoverflow.com/questions/16328592">1</a>,
<a href="https://www.postgresql.org/message-id/flat/23053.1337036410%40sss.pgh.pa.us#23053.1337036410@sss.pgh.pa.us">2</a>,
<a href="http://stackoverflow.com/questions/27395317">3</a>,
<a href="https://www.postgresql.org/message-id/4B4E845F.80906@postnewspapers.com.au">4</a>,
<a href="http://dba.stackexchange.com/questions/106964">5</a>,
<a href="http://dba.stackexchange.com/questions/94887">6</a>).</p>
<p>На вопрос “кто виноват?” мнения расходятся, но можно уверенно сказать, что
Mac <span class="caps">OS</span> X точно учитывает не все региональные специфики. Это видно по ссылкам
выше или, например, можно продемонстрировать вот таким примером для русского языка:</p>
<div class="highlight"><pre><span></span>macos> SELECT name FROM unnest(ARRAY[
'а', 'д', 'е', 'ё', 'ж', 'я'
]) name ORDER BY name;
name
------
а
д
е
ж
я
ё
(6 rows)
macos>
</pre></div>
<p>Linux при этом с таким запросом справляется логично с моей точки зрения. И даже
вполне себе можно объяснить результат первого запроса, показанный им - linux
просто игнорирует символы пробела, <code>-</code> и <code>_</code> при сортировке. Т.е. если немного
разобраться, то сломанной уже выглядит Mac <span class="caps">OS</span> X.</p>
<p>В конце концов мы унесли тесты в docker, чтобы не зависеть от особенностей ОС и
получать детерменированные результаты, но есть и другие способы это сделать.
Самым простым из них является использование <code>LC_COLLATE = C</code>, потому что это
единственный collation, который поставляется вместе с PostgreSQL и не зависит
от ОС (см.
<a href="https://www.postgresql.org/docs/current/static/charset.html">документацию</a>).</p>
<div class="highlight"><pre><span></span>linux> SELECT name FROM unnest(ARRAY[
'MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg'
]) name ORDER BY name COLLATE "C";
name
--------------------
my_name
MYNAME
my-image.jpg
my-third-image.jpg
(4 rows)
linux>
</pre></div>
<p>Как видно, в таком случае результаты будут одинаковыми в обеих ОС. Но нетрудно
заметить, что такими же как в Mac <span class="caps">OS</span> X, а это значит, что тоже с граблями для
мультибайтных кодировок, например:</p>
<div class="highlight"><pre><span></span>linux> SELECT name FROM unnest(ARRAY[
'а', 'д', 'е', 'ё', 'ж', 'я'
]) name ORDER BY name COLLATE "C";
name
------
а
д
е
ж
я
ё
(6 rows)
linux>
</pre></div>
<p>Не стоит при этом думать, что результат сортировки с <code>LC_COLLATE=en_US.UTF-8</code> в
Mac <span class="caps">OS</span> X всегда будет таким же как с <code>LC_COLLATE=C</code> в любой ОС. Наверняка можно
быть уверенным лишь в том, что одинаковый результат гарантирует collation <code>C</code>,
потому что он поставляется вместе с PostgreSQL и не зависит от ОС.</p>
<p>При этом мне с чисто обывательской точки зрения обычного пользователя кажется
странным не учитывать пробельные символы, дефисы и другие неалфавитные символы
в сортировке, но эти правила когда-то кто-то придумал, стандартизировал и не
мне их менять. Впрочем, в исходной задаче эти правила оказались недопустимыми
и мы стали использовать collation <code>C</code>.</p>
<h3>Запросы по префиксу</h3>
<p>Тот факт, что postgres опирается на glibc в вопросах сортировки, имеет ещё ряд
нюансов, о которых стоит сказать. Для примера создадим следующую табличку с
двумя текстовыми полями и вставим в неё один миллион случайных строчек:</p>
<div class="highlight"><pre><span></span><span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">CREATE</span><span class="w"> </span><span class="nt">TABLE</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">(</span><span class="w"></span>
<span class="w"> </span><span class="nt">a</span><span class="w"> </span><span class="nt">text</span><span class="o">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">b</span><span class="w"> </span><span class="nt">text</span><span class="w"> </span><span class="nt">COLLATE</span><span class="w"> </span><span class="s2">"C"</span><span class="o">);</span><span class="w"></span>
<span class="nt">CREATE</span><span class="w"> </span><span class="nt">TABLE</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">INSERT</span><span class="w"> </span><span class="nt">INTO</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">md5</span><span class="o">(</span><span class="nt">n</span><span class="p">::</span><span class="nd">text</span><span class="o">),</span><span class="w"> </span><span class="nt">md5</span><span class="o">(</span><span class="nt">n</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"></span>
<span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">generate_series</span><span class="o">(</span><span class="nt">1</span><span class="o">,</span><span class="w"> </span><span class="nt">1000000</span><span class="o">)</span><span class="w"> </span><span class="nt">n</span><span class="o">;</span><span class="w"></span>
<span class="nt">INSERT</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">1000000</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"> </span><span class="nt">ON</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">USING</span><span class="w"> </span><span class="nt">btree</span><span class="w"> </span><span class="o">(</span><span class="nt">a</span><span class="o">);</span><span class="w"></span>
<span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"> </span><span class="nt">ON</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">USING</span><span class="w"> </span><span class="nt">btree</span><span class="w"> </span><span class="o">(</span><span class="nt">b</span><span class="o">);</span><span class="w"></span>
<span class="nt">CREATE</span><span class="w"> </span><span class="nt">INDEX</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">ANALYZE</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">;</span><span class="w"></span>
<span class="nt">ANALYZE</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">LIMIT</span><span class="w"> </span><span class="nt">2</span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">a</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">b</span><span class="w"></span>
<span class="nt">----------------------------------</span><span class="o">+</span><span class="nt">----------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">c4ca4238a0b923820dcc509a6f75849b</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">c4ca4238a0b923820dcc509a6f75849b</span><span class="w"></span>
<span class="w"> </span><span class="nt">c81e728d9d4c2f636f067f89cc14862c</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">c81e728d9d4c2f636f067f89cc14862c</span><span class="w"></span>
<span class="o">(</span><span class="nt">2</span><span class="w"> </span><span class="nt">rows</span><span class="o">)</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"></span>
</pre></div>
<p>Одно поле создано с collation по-умолчанию (<code>en_US.UTF-8</code> в моём примере),
а второе с collation <code>C</code>, значения в них одинаковые. Посмотрим на планы
запросов по префиксу каждого из полей:</p>
<div class="highlight"><pre><span></span><span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">explain</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">WHERE</span><span class="w"> </span><span class="nt">a</span><span class="w"> </span><span class="nt">LIKE</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">QUERY</span><span class="w"> </span><span class="nt">PLAN</span><span class="w"></span>
<span class="nt">----------------------------------------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">Seq</span><span class="w"> </span><span class="nt">Scan</span><span class="w"> </span><span class="nt">on</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">(</span><span class="nt">cost</span><span class="o">=</span><span class="nt">0</span><span class="p">.</span><span class="nc">00</span><span class="o">.</span><span class="p">.</span><span class="nc">24846</span><span class="p">.</span><span class="nc">00</span><span class="w"> </span><span class="nt">rows</span><span class="o">=</span><span class="nt">100</span><span class="w"> </span><span class="nt">width</span><span class="o">=</span><span class="nt">66</span><span class="o">)</span><span class="w"></span>
<span class="w"> </span><span class="nt">Filter</span><span class="o">:</span><span class="w"> </span><span class="o">(</span><span class="nt">a</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"></span>
<span class="o">(</span><span class="nt">2</span><span class="w"> </span><span class="nt">rows</span><span class="o">)</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"> </span><span class="nt">explain</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="nt">WHERE</span><span class="w"> </span><span class="nt">b</span><span class="w"> </span><span class="nt">LIKE</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">QUERY</span><span class="w"> </span><span class="nt">PLAN</span><span class="w"></span>
<span class="nt">------------------------------------------------------------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">Index</span><span class="w"> </span><span class="nt">Scan</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">sort_test_b_idx</span><span class="w"> </span><span class="nt">on</span><span class="w"> </span><span class="nt">sort_test</span><span class="w"> </span><span class="o">(</span><span class="nt">cost</span><span class="o">=</span><span class="nt">0</span><span class="p">.</span><span class="nc">42</span><span class="o">.</span><span class="p">.</span><span class="nc">8</span><span class="p">.</span><span class="nc">45</span><span class="w"> </span><span class="nt">rows</span><span class="o">=</span><span class="nt">100</span><span class="w"> </span><span class="nt">width</span><span class="o">=</span><span class="nt">66</span><span class="o">)</span><span class="w"></span>
<span class="w"> </span><span class="nt">Index</span><span class="w"> </span><span class="nt">Cond</span><span class="o">:</span><span class="w"> </span><span class="o">((</span><span class="nt">b</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'c4ca4238a0'</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"> </span><span class="nt">AND</span><span class="w"> </span><span class="o">(</span><span class="nt">b</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="s1">'c4ca4238a1'</span><span class="p">::</span><span class="nd">text</span><span class="o">))</span><span class="w"></span>
<span class="w"> </span><span class="nt">Filter</span><span class="o">:</span><span class="w"> </span><span class="o">(</span><span class="nt">b</span><span class="w"> </span><span class="o">~~</span><span class="w"> </span><span class="s1">'c4ca4238a0%'</span><span class="p">::</span><span class="nd">text</span><span class="o">)</span><span class="w"></span>
<span class="o">(</span><span class="nt">3</span><span class="w"> </span><span class="nt">rows</span><span class="o">)</span><span class="w"></span>
<span class="nt">linux</span><span class="o">></span><span class="w"></span>
</pre></div>
<p>Как видно, PostgreSQL не использует индекс для выполнения первого запроса, но
использует для второго. Причину этого можно увидеть в выводе <span class="caps">EXPLAIN</span> (см.
<code>Index Cond</code>) - во втором случае PostgreSQL знает порядок символов и
преобразовывает условие выборки по индексу с <code>b LIKE 'c4ca4238a0%'</code> в
<code>b >= 'c4ca4238a0' AND b < 'c4ca4238a1'</code>, а эти две операции хорошо
покрываются B-Tree (и только потом полученные результаты postgres уже
дофильтрует по исходному условию).</p>
<p>Как видно, стоимость такого запроса при collation <code>C</code> примерно в 2500 раз меньше.</p>
<h3>Abbreviated keys</h3>
<p>Одной из хороших оптимизаций, которая появилась с выходом PostgreSQL 9.5, были
т.н. abbreviated keys, что можно перевести на русский как “сокращённые ключи”.
Лучше всего об этом почитать в
<a href="http://pgeoghegan.blogspot.ru/2015/01/abbreviated-keys-exploiting-locality-to.html">посте автора</a>
этой оптимизации, Peter Geoghegan. Если коротко, то эта оптимизация значительно
ускорила сортировку текстовых полей и создание индексов по ним, примеры можно
посмотреть, например,
<a href="https://www.depesz.com/2015/01/27/waiting-for-9-5-use-abbreviated-keys-for-faster-sorting-of-text-datums/">тут</a>.</p>
<p>К сожалению, в 9.5.2 эту оптимизацию
<a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3df9c374e279db37b00cd9c86219471d0cdaa97c">выключили</a>
для всех collation кроме <code>C</code>. Причиной тому стал баг в glibc (а как мы помним,
для всех сollation кроме <code>C</code> PostgreSQL опирается на glibc), в результате
которого индексы могли получаться неконсистентными.</p>
<h3>Вместо заключения</h3>
<p>В задаче, с которой всё началось, мы в конце концов пришли к использованию
<code>lc_collate = C</code>, потому что данные предполагают использование самых разных
языков мира и этот collation кажется самым правильным для таких случаев. Да,
он не будет учитывать некоторые пограничные случаи в каждом из языков, но зато
будет работать вмеру хорошо для всех.</p>
<p>При этом грустно, что серебряной пули не бывает и когда все твои данные,
например, на русском, ты вынужден выбирать между производительностью и
правильностью сортировки с учётом специфики русского языка.</p>Video and slides of my talk at PGCon 20162016-09-15T14:00:00+03:002016-09-15T14:00:00+03:00d0ubletag:simply.name,2016-09-15:/video-pgcon2016.htmlThis talk gives a historical overview of several years during which we were migrating a 24/7 web-service with 300 <span class="caps">TB</span> of metadata and 250k <span class="caps">TPS</span> from Oracle databases to PostgreSQL.<br/>
<p>Everything about the talk is <a href="http://www.pgcon.org/2016/schedule/events/923.en.html">here</a>. Presentation in <span class="caps">PDF</span> could be taken <a href="https://yadi.sk/i/WPBvI8Jhretwy">here</a>. Everything is in English.</p>
<p>Video:<br /><br /><iframe width="560" height="315" src="https://www.youtube.com/embed/-SS4R1sFH3c" frameborder="0" allowfullscreen></iframe></p>
<p>Slides:<br /><br /><iframe src="//www.slideshare.net/slideshow/embed_code/key/if7i51PLzfihkj" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe></p>
Wait interface in PostgreSQL2015-11-16T16:00:00+03:002015-11-16T16:00:00+03:00d0ubletag:simply.name,2015-11-16:/pg-stat-wait.html<p>People having experience with commercial <span class="caps">RDBMS</span> are used to have the ability
to answer the question “What a particular session is doing right now?” Or
even “What was that session waiting 5 minutes ago?” For a long time PostgreSQL
did not have such diagnostic tools and DBAs used to get …</p><p>People having experience with commercial <span class="caps">RDBMS</span> are used to have the ability
to answer the question “What a particular session is doing right now?” Or
even “What was that session waiting 5 minutes ago?” For a long time PostgreSQL
did not have such diagnostic tools and DBAs used to get out with different
ways of sophistication. I <a href="https://simply.name/ru/slides-pgday2015.html">gave a talk</a> on
pgday.ru (in Russian) about how we do it. This talk was collaborative with
Ildus Kurbangaliev from PostgrePro. And Ildus was just speaking about tool
that allows to answer questions above.</p>
<p>Strictly speaking it is not the first try to implement what people used to call
wait [events] interface, but all previous attempts were not brought to some
reasonable state and died as proof of concept patches. But <code>pg_stat_wait</code> is
currently available <a href="https://github.com/postgrespro/postgres/tree/waits_monitoring_94">as a set of patches to current stable 9.4
branch</a> and
currently developing 9.6 (actual versions should be looked at pgsql-hackers@).</p>
<p>After quite long testing and fixing bugs we even deployed them to production.</p>
<h3>Installation</h3>
<p>Before it all becomes part of core PostgreSQL you need to recompile postgres.
I think description of rebuilding as <code>./configure && make && sudo make install</code>
is meaningless — much better to look into
<a href="http://www.postgresql.org/docs/9.4/static/install-procedure.html">documentation</a>.</p>
<p>After it you should add <code>pg_stat_wait</code> to <code>shared_preload_libraries</code>.
Additionally, you can add following options to <code>postgresql.conf</code>:</p>
<ul>
<li><code>waits_monitoring = on</code> - enabling functionality on,</li>
<li><code>pg_stat_wait.history = on</code> - storing history of wait events,</li>
<li><code>pg_stat_wait.history_size = 1000000</code> - number of last events to keep
in history,</li>
<li><code>pg_stat_wait.history_period = 1000</code> - how often should wait events be
stored in history (ms).</li>
</ul>
<p>After that you should restart PostgreSQL and make <code>CREATE EXTENSION
pg_stat_wait</code>. After that everything will start working.</p>
<h3>Capabilities</h3>
<p>What exactly will start to work? First you may look at what is inside the extension:</p>
<div class="highlight"><pre><span></span>rpopdb01g/postgres M # \dxS+ pg_stat_wait
Objects in extension "pg_stat_wait"
Object Description
---------------------------------------------------------
function pg_is_in_trace(integer)
function pg_start_trace(integer,cstring)
function pg_stat_wait_get_current(integer)
function pg_stat_wait_get_history()
function pg_stat_wait_get_profile(integer,boolean)
function pg_stat_wait_make_test_lwlock(integer,integer)
function pg_stat_wait_reset_profile()
function pg_stop_trace(integer)
function pg_wait_class_list()
function pg_wait_event_list()
view pg_stat_wait_current
view pg_stat_wait_history
view pg_stat_wait_profile
view pg_wait_class
view pg_wait_event
view pg_wait_events
(16 rows)
rpopdb01g/postgres M #
</pre></div>
<p>Let’s see what wait events <code>pg_stat_wait</code> is able to monitor:</p>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT version();</span><span class="w"></span>
<span class="w"> </span><span class="n">version</span><span class="w"></span>
<span class="o">---------------------------------------------------------------------------------------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="n">PostgreSQL</span><span class="w"> </span><span class="mf">9.4</span><span class="o">.</span><span class="mi">5</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">x86_64</span><span class="o">-</span><span class="n">unknown</span><span class="o">-</span><span class="n">linux</span><span class="o">-</span><span class="n">gnu</span><span class="p">,</span><span class="w"> </span><span class="n">compiled</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">gcc</span><span class="w"> </span><span class="p">(</span><span class="n">GCC</span><span class="p">)</span><span class="w"> </span><span class="mf">4.4</span><span class="o">.</span><span class="mi">7</span><span class="w"> </span><span class="mi">20120313</span><span class="w"> </span><span class="p">(</span><span class="n">Red</span><span class="w"> </span><span class="n">Hat</span><span class="w"> </span><span class="mf">4.4</span><span class="o">.</span><span class="mi">7</span><span class="o">-</span><span class="mi">11</span><span class="p">),</span><span class="w"> </span><span class="mi">64</span><span class="o">-</span><span class="n">bit</span><span class="w"></span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="n">row</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT class_name, count(event_name)</span><span class="w"></span>
<span class="n">FROM</span><span class="w"> </span><span class="n">pg_wait_events</span><span class="w"> </span><span class="n">GROUP</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="n">DESC</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">count</span><span class="w"></span>
<span class="o">------------+-------</span><span class="w"></span>
<span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">52</span><span class="w"></span>
<span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9</span><span class="w"></span>
<span class="w"> </span><span class="n">Locks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9</span><span class="w"></span>
<span class="w"> </span><span class="n">Network</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"></span>
<span class="w"> </span><span class="n">Latch</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="n">CPU</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">(</span><span class="mi">6</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>You can see that waits monitoring for 9.4 knows about 52 LWLocks and
for disk, for example, it can track next things:</p>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT * FROM pg_wait_events WHERE class_id = 3;</span><span class="w"></span>
<span class="w"> </span><span class="n">class_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_name</span><span class="w"></span>
<span class="o">----------+------------+----------+------------</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_READ</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_WRITE</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_FSYNC</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_READ</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_WRITE</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_FSYNC</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SLRU_READ</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SLRU_WRITE</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SLRU_FSYNC</span><span class="w"></span>
<span class="p">(</span><span class="mi">9</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Under “can track” the following is meant:</p>
<ul>
<li>What and <em>how long</em> a particular process is waiting right now?</li>
<li>How many times a particular process hung in waiting of every event type
and <em>how much time</em> did it spend waiting?</li>
<li>What was a particular process waiting some time ago?</li>
</ul>
<p>For answering these questions there are <code>pg_stat_wait_current</code>,
<code>pg_stat_wait_profile</code>, <code>pg_stat_wait_history</code> respectively. Best seen on the examples.</p>
<h4>pg_stat_wait_current</h4>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT pid, class_name, event_name, wait_time</span><span class="w"></span>
<span class="n">FROM</span><span class="w"> </span><span class="n">pg_stat_wait_current</span><span class="w"> </span><span class="n">WHERE</span><span class="w"> </span><span class="n">class_id</span><span class="w"> </span><span class="n">NOT</span><span class="w"> </span><span class="n">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"></span>
<span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="n">wait_time</span><span class="w"> </span><span class="n">DESC</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_time</span><span class="w"></span>
<span class="o">-------+------------+---------------+-----------</span><span class="w"></span>
<span class="w"> </span><span class="mi">23510</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">17184</span><span class="w"></span>
<span class="w"> </span><span class="mi">23537</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9367</span><span class="w"></span>
<span class="w"> </span><span class="mi">23628</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9366</span><span class="w"></span>
<span class="w"> </span><span class="mi">23502</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3215</span><span class="w"></span>
<span class="w"> </span><span class="mi">23504</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2846</span><span class="w"></span>
<span class="w"> </span><span class="mi">23533</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2788</span><span class="w"></span>
<span class="w"> </span><span class="mi">23514</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2658</span><span class="w"></span>
<span class="w"> </span><span class="mi">23517</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2658</span><span class="w"></span>
<span class="w"> </span><span class="mi">23532</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2641</span><span class="w"></span>
<span class="w"> </span><span class="mi">23527</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2507</span><span class="w"></span>
<span class="w"> </span><span class="mi">23952</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_READ</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2502</span><span class="w"></span>
<span class="w"> </span><span class="mi">23518</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_FSYNC</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1576</span><span class="w"></span>
<span class="w"> </span><span class="mi">23524</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALWriteLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1027</span><span class="w"></span>
<span class="p">(</span><span class="mi">13</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>We remove waits of classes ‘Network’ and ‘Latch’ because their waiting time
is usually several orders of magnitude longer than waits of other classes.
And listed above columns are not all columns that exist in the view:</p>
<div class="highlight"><pre><span></span><span class="n">smcdb01d</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT * FROM pg_stat_wait_current</span><span class="w"></span>
<span class="n">WHERE</span><span class="w"> </span><span class="n">class_id</span><span class="w"> </span><span class="n">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">)</span><span class="w"> </span><span class="n">LIMIT</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span><span class="w"></span>
<span class="o">-</span><span class="p">[</span><span class="w"> </span><span class="n">RECORD</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">]</span><span class="o">-----------------------------</span><span class="w"></span>
<span class="n">pid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">12107</span><span class="w"></span>
<span class="n">sample_ts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">36</span><span class="p">:</span><span class="mf">59.598562</span><span class="o">+</span><span class="mi">03</span><span class="w"></span>
<span class="n">class_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"></span>
<span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Locks</span><span class="w"></span>
<span class="n">event_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">4</span><span class="w"></span>
<span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Transaction</span><span class="w"></span>
<span class="n">wait_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">24334</span><span class="w"></span>
<span class="n">p1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">5</span><span class="w"></span>
<span class="n">p2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">255593733</span><span class="w"></span>
<span class="n">p3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">p4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">p5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="o">-</span><span class="p">[</span><span class="w"> </span><span class="n">RECORD</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">]</span><span class="o">-----------------------------</span><span class="w"></span>
<span class="n">pid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1266</span><span class="w"></span>
<span class="n">sample_ts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">36</span><span class="p">:</span><span class="mf">59.598562</span><span class="o">+</span><span class="mi">03</span><span class="w"></span>
<span class="n">class_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"></span>
<span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"></span>
<span class="n">event_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_READ</span><span class="w"></span>
<span class="n">wait_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1710</span><span class="w"></span>
<span class="n">p1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1663</span><span class="w"></span>
<span class="n">p2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16400</span><span class="w"></span>
<span class="n">p3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">20508</span><span class="w"></span>
<span class="n">p4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">p5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">220036</span><span class="w"></span>
<span class="n">smcdb01d</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Parameters <code>p1</code>-<code>p5</code> are text fields. For example, for heavy-weight locks they
give approximately same information that you can see in <code>pg_locks</code> view and for
disk I/O waits you can understand from which <span class="caps">DB</span>, relation and block we were
waiting while reading.</p>
<h4>pg_stat_wait_profile</h4>
<p>For example, you can see how much time <span class="caps">DB</span> spent in each class of waits:</p>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT class_name, sum(wait_time) AS wait_time,</span><span class="w"></span>
<span class="n">sum</span><span class="p">(</span><span class="n">wait_count</span><span class="p">)</span><span class="w"> </span><span class="n">AS</span><span class="w"> </span><span class="n">wait_count</span><span class="w"> </span><span class="n">FROM</span><span class="w"> </span><span class="n">pg_stat_wait_profile</span><span class="w"></span>
<span class="n">GROUP</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="n">wait_time</span><span class="w"> </span><span class="n">DESC</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_count</span><span class="w"></span>
<span class="o">------------+--------------+------------</span><span class="w"></span>
<span class="w"> </span><span class="n">Network</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">144196945815</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">11877848</span><span class="w"></span>
<span class="w"> </span><span class="n">Latch</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">90164921148</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3521073</span><span class="w"></span>
<span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2648490737</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">10501900</span><span class="w"></span>
<span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">977430136</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">36444251</span><span class="w"></span>
<span class="w"> </span><span class="n">CPU</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">68890774</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">365699457</span><span class="w"></span>
<span class="w"> </span><span class="n">Locks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">74</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">(</span><span class="mi">6</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Or which LWLocks are the hottest in the system:</p>
<div class="highlight"><pre><span></span>rpopdb01g/postgres M # SELECT event_name, sum(wait_time) AS wait_time,
sum(wait_count) AS wait_count FROM pg_stat_wait_profile
WHERE class_id = 1 AND wait_time != 0 AND wait_count != 0
GROUP BY event_name ORDER BY wait_time DESC;
event_name | wait_time | wait_count
----------------------+------------+------------
LockMgrLWLocks | 1873294341 | 3870685
WALWriteLock | 1039279117 | 859101
BufferLWLocks | 299153931 | 7356555
BufFreelistLock | 7466923 | 75484
ProcArrayLock | 2321769 | 34355
CLogControlLock | 778148 | 21286
WALInsertLocks | 456224 | 7451
BufferMgrLocks | 107374 | 8447
XidGenLock | 84914 | 2506
UserDefinedLocks | 1875 | 7
CLogBufferLocks | 868 | 80
SInvalWriteLock | 11 | 3
CheckpointerCommLock | 1 | 1
(13 rows)
Time: 29.388 ms
rpopdb01g/postgres M #
</pre></div>
<p>These two examples show that waiting time does not always correlate with wait
events count. That’s why sampling without accounting waiting time can give
not right the whole picture.</p>
<h4>pg_stat_wait_history</h4>
<p>This view allows to see what a particular process was waiting for in the past.
Storage depth and sampling interval can be configured as shown above.</p>
<div class="highlight"><pre><span></span><span class="n">xivadb01e</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT sample_ts, class_name, event_name, wait_time</span><span class="w"></span>
<span class="n">FROM</span><span class="w"> </span><span class="n">pg_stat_wait_history</span><span class="w"> </span><span class="n">WHERE</span><span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">29585</span><span class="w"> </span><span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="n">sample_ts</span><span class="w"> </span><span class="n">DESC</span><span class="w"> </span><span class="n">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">sample_ts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_time</span><span class="w"></span>
<span class="o">-------------------------------+------------+-----------------+-----------</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">28.544052</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferMgrLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">983997</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">27.542938</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">CLogControlLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">655975</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">26.850302</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">979516</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">25.849207</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">207418</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">24.848059</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">923916</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">23.846909</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">753185</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">22.845808</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">877707</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">21.844718</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">778897</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">20.843562</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">CLogControlLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">991267</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">19.842464</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">CLogControlLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1001059</span><span class="w"></span>
<span class="p">(</span><span class="mi">10</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">xivadb01e</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<h4>Session tracing</h4>
<p>All described above views are designed to be always turned on, their
performance overhead is minimal. But there are cases when sampling once in
<code>pg_stat_wait.history_period</code> is not enough and you need to see all waits of
a particular process. In that case you should use functions for tracing,
for example:</p>
<div class="highlight"><pre><span></span><span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_backend_pid</span><span class="o">();</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_backend_pid</span><span class="w"></span>
<span class="nt">----------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">5399</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_start_trace</span><span class="o">(</span><span class="nt">5399</span><span class="o">,</span><span class="w"> </span><span class="s1">'/tmp/5399.trace'</span><span class="o">);</span><span class="w"></span>
<span class="nt">INFO</span><span class="o">:</span><span class="w"> </span><span class="nt">00000</span><span class="o">:</span><span class="w"> </span><span class="nt">Trace</span><span class="w"> </span><span class="nt">was</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">to</span><span class="o">:</span><span class="w"> </span><span class="o">/</span><span class="nt">tmp</span><span class="o">/</span><span class="nt">5399</span><span class="p">.</span><span class="nc">trace</span><span class="w"></span>
<span class="nt">LOCATION</span><span class="o">:</span><span class="w"> </span><span class="nt">StartWait</span><span class="o">,</span><span class="w"> </span><span class="nt">wait</span><span class="p">.</span><span class="nc">c</span><span class="p">:</span><span class="nd">259</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_start_trace</span><span class="w"></span>
<span class="nt">----------------</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_is_in_trace</span><span class="o">(</span><span class="nt">5399</span><span class="o">);</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_is_in_trace</span><span class="w"></span>
<span class="nt">----------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">t</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">--</span><span class="w"> </span><span class="nt">some</span><span class="w"> </span><span class="nt">activity</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_stop_trace</span><span class="o">(</span><span class="nt">5399</span><span class="o">);</span><span class="w"></span>
<span class="nt">INFO</span><span class="o">:</span><span class="w"> </span><span class="nt">00000</span><span class="o">:</span><span class="w"> </span><span class="nt">Trace</span><span class="w"> </span><span class="nt">was</span><span class="w"> </span><span class="nt">stopped</span><span class="w"></span>
<span class="nt">LOCATION</span><span class="o">:</span><span class="w"> </span><span class="nt">StartWait</span><span class="o">,</span><span class="w"> </span><span class="nt">wait</span><span class="p">.</span><span class="nc">c</span><span class="p">:</span><span class="nd">265</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_stop_trace</span><span class="w"></span>
<span class="nt">---------------</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"></span>
</pre></div>
<p>A simple text file would be created where there would be two lines for each
wait event, for example:</p>
<div class="highlight"><pre><span></span>start 2015-11-16 11:17:26.831686+03 CPU MemAllocation 0 0 0 0 0
stop 2015-11-16 11:17:26.831695+03 CPU
start 2015-11-16 11:17:26.831705+03 LWLocks BufferLWLocks 122 1 0 0 0
stop 2015-11-16 11:17:26.831715+03 LWLocks
start 2015-11-16 11:17:26.831738+03 Network WRITE 0 0 0 0 0
stop 2015-11-16 11:17:26.831749+03 Network
start 2015-11-16 11:17:26.831795+03 Network READ 0 0 0 0 0
stop 2015-11-16 11:17:26.831808+03 Network
start 2015-11-16 11:17:26.831825+03 Storage SMGR_READ 1663 13003 12763 0 13
stop 2015-11-16 11:17:26.831844+03 Storage
</pre></div>
<h3>Instead of conclusion</h3>
<p>Wait interface is the long-awaited feature in PostgreSQL which allows
significantly improve the understanding of what is happening inside the
database. Right now this functionality is kicked into core PostgreSQL
so that starting from 9.6 you would not need to recompile postgres.</p>
<p>Just in case, shortly before Ildus
<a href="http://www.postgresql.org/message-id/559D4729.9080704@postgrespro.ru">submitted</a>
his implementation on pgsql-hackers@ Robert Haas
<a href="http://www.postgresql.org/message-id/CA+TgmoYd3GTz2_mJfUHF+RPe-bCy75ytJeKVv9x-o+SonCGApw@mail.gmail.com">proposed</a>
the same idea and lots of people supported this idea. To become it true
a couple of preparatory patches have already been commited, for example
<a href="http://www.postgresql.org/message-id/3F71DA37-A17B-4961-9908-016E6323E612@postgrespro.ru">Refactoring of LWLock tranches</a>.</p>
<p>I really hope that it will become part of PostgreSQL in 9.6.</p>Интерфейс ожиданий в PostgreSQL2015-11-16T16:00:00+03:002015-11-16T16:00:00+03:00d0ubletag:simply.name,2015-11-16:/ru/pg-stat-wait.html<p>Люди, имеющие опыт работы с коммерческими СУБД, привыкли к тому, что могут
получить ответ на вопрос “Чем прямо сейчас занимается конкретная сессия?”
Или ещё лучше “Чего ждала каждая сессия 5 минут назад?” Долгое время
PostgreSQL не имел таких средств диагностики и <span class="caps">DBA</span> приходилось выкручиваться
разной степени изощрённости способами. О том …</p><p>Люди, имеющие опыт работы с коммерческими СУБД, привыкли к тому, что могут
получить ответ на вопрос “Чем прямо сейчас занимается конкретная сессия?”
Или ещё лучше “Чего ждала каждая сессия 5 минут назад?” Долгое время
PostgreSQL не имел таких средств диагностики и <span class="caps">DBA</span> приходилось выкручиваться
разной степени изощрённости способами. О том, как это делаем мы, я
<a href="https://simply.name/ru/slides-pgday2015.html">рассказывал</a> на pgday.ru. Этот доклад
я читал не один, а вместе с Ильдусом Курбангалиевым из PostgrePro. И Ильдус
как раз рассказывал об инструменте, который позволяет ответить на вопросы выше.</p>
<p>Строго говоря, это далеко не первая попытка реализовать то, что люди привыкли
называть интерфейсом [событий] ожиданий, но все предыдущие не были доведены
до какого-либо разумного состояния, оставаясь proof of concept патчами. А вот
<code>pg_stat_wait</code> вполне себе доступен <a href="https://github.com/postgrespro/postgres/tree/waits_monitoring_94">в виде набора патчей к текущей стабильной
ветке 9.4</a> и
разрабатываемой нынче 9.6 (актуальные версии стоит искать в pgsql-hackers@).</p>
<p>После довольно продолжительного тестирования и исправления ряда багов мы не
просто посчитали эти патчи полезными, но даже пригодными для использования
в бою. Довольно долго мы катили эти изменения в production и ничего такого
не случилось :)</p>
<h3>Установка</h3>
<p>До того, как всё это попадёт в ядро PostgreSQL, нужно пересобирать postgres.
Пересборку в виде <code>./configure && make && sudo make install</code>, думаю, описывать
смысла нет - лучше посмотреть в
<a href="http://www.postgresql.org/docs/9.4/static/install-procedure.html">документации</a>.</p>
<p>После этого в <code>shared_preload_libraries</code> надо будет добавить <code>pg_stat_wait</code>.
Кроме того, в <code>postgresql.conf</code> можно добавить следующие опции:</p>
<ul>
<li><code>waits_monitoring = on</code> - включение функциональности как таковой,</li>
<li><code>pg_stat_wait.history = on</code> - хранение истории ожиданий,</li>
<li><code>pg_stat_wait.history_size = 1000000</code> - количество событий в истории,</li>
<li><code>pg_stat_wait.history_period = 1000</code> - как часто сохранять события ожидания
в историю (мс).</li>
</ul>
<p>После этого стоит запустить PostgreSQL и сказать <code>CREATE EXTENSION
pg_stat_wait</code>. После этого всё начнёт работать.</p>
<h3>Возможности</h3>
<p>А что именно начнёт работать? Первым делом стоит посмотреть, что входит в
состав расширения:</p>
<div class="highlight"><pre><span></span>rpopdb01g/postgres M # \dxS+ pg_stat_wait
Objects in extension "pg_stat_wait"
Object Description
---------------------------------------------------------
function pg_is_in_trace(integer)
function pg_start_trace(integer,cstring)
function pg_stat_wait_get_current(integer)
function pg_stat_wait_get_history()
function pg_stat_wait_get_profile(integer,boolean)
function pg_stat_wait_make_test_lwlock(integer,integer)
function pg_stat_wait_reset_profile()
function pg_stop_trace(integer)
function pg_wait_class_list()
function pg_wait_event_list()
view pg_stat_wait_current
view pg_stat_wait_history
view pg_stat_wait_profile
view pg_wait_class
view pg_wait_event
view pg_wait_events
(16 rows)
rpopdb01g/postgres M #
</pre></div>
<p>Давайте посмотрим, какие события ожидания <code>pg_stat_wait</code> умеет мониторить:</p>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT version();</span><span class="w"></span>
<span class="w"> </span><span class="n">version</span><span class="w"></span>
<span class="o">---------------------------------------------------------------------------------------------------------------</span><span class="w"></span>
<span class="w"> </span><span class="n">PostgreSQL</span><span class="w"> </span><span class="mf">9.4</span><span class="o">.</span><span class="mi">5</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">x86_64</span><span class="o">-</span><span class="n">unknown</span><span class="o">-</span><span class="n">linux</span><span class="o">-</span><span class="n">gnu</span><span class="p">,</span><span class="w"> </span><span class="n">compiled</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">gcc</span><span class="w"> </span><span class="p">(</span><span class="n">GCC</span><span class="p">)</span><span class="w"> </span><span class="mf">4.4</span><span class="o">.</span><span class="mi">7</span><span class="w"> </span><span class="mi">20120313</span><span class="w"> </span><span class="p">(</span><span class="n">Red</span><span class="w"> </span><span class="n">Hat</span><span class="w"> </span><span class="mf">4.4</span><span class="o">.</span><span class="mi">7</span><span class="o">-</span><span class="mi">11</span><span class="p">),</span><span class="w"> </span><span class="mi">64</span><span class="o">-</span><span class="n">bit</span><span class="w"></span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="n">row</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT class_name, count(event_name)</span><span class="w"></span>
<span class="n">FROM</span><span class="w"> </span><span class="n">pg_wait_events</span><span class="w"> </span><span class="n">GROUP</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="n">DESC</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">count</span><span class="w"></span>
<span class="o">------------+-------</span><span class="w"></span>
<span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">52</span><span class="w"></span>
<span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9</span><span class="w"></span>
<span class="w"> </span><span class="n">Locks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9</span><span class="w"></span>
<span class="w"> </span><span class="n">Network</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"></span>
<span class="w"> </span><span class="n">Latch</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="n">CPU</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">(</span><span class="mi">6</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Можно увидеть, что мониторинг ожиданий для 9.4 знает 52 типа легковесных
блокировок, а например, для диска умеет отслеживать следующие вещи:</p>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT * FROM pg_wait_events WHERE class_id = 3;</span><span class="w"></span>
<span class="w"> </span><span class="n">class_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_name</span><span class="w"></span>
<span class="o">----------+------------+----------+------------</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_READ</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_WRITE</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_FSYNC</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_READ</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_WRITE</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_FSYNC</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SLRU_READ</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SLRU_WRITE</span><span class="w"></span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SLRU_FSYNC</span><span class="w"></span>
<span class="p">(</span><span class="mi">9</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Под “умеет отслеживать” понимается тот факт, что можно посмотреть:</p>
<ul>
<li>Чего прямо сейчас ждёт конкретный процесс и <em>как долго</em>?</li>
<li>Сколько раз конкретный процесс повисал в ожидании каждого события и <em>как
много</em> времени суммарно провёл в ожидании?</li>
<li>Чего ждал конкретный процесс какое-то время назад?</li>
</ul>
<p>Для ответов на эти вопросы есть представления <code>pg_stat_wait_current</code>,
<code>pg_stat_wait_profile</code> и <code>pg_stat_wait_history</code> соответственно. Лучше всего
рассмотреть на примерах.</p>
<h4>pg_stat_wait_current</h4>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT pid, class_name, event_name, wait_time</span><span class="w"></span>
<span class="n">FROM</span><span class="w"> </span><span class="n">pg_stat_wait_current</span><span class="w"> </span><span class="n">WHERE</span><span class="w"> </span><span class="n">class_id</span><span class="w"> </span><span class="n">NOT</span><span class="w"> </span><span class="n">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"></span>
<span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="n">wait_time</span><span class="w"> </span><span class="n">DESC</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_time</span><span class="w"></span>
<span class="o">-------+------------+---------------+-----------</span><span class="w"></span>
<span class="w"> </span><span class="mi">23510</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">17184</span><span class="w"></span>
<span class="w"> </span><span class="mi">23537</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9367</span><span class="w"></span>
<span class="w"> </span><span class="mi">23628</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">9366</span><span class="w"></span>
<span class="w"> </span><span class="mi">23502</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3215</span><span class="w"></span>
<span class="w"> </span><span class="mi">23504</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2846</span><span class="w"></span>
<span class="w"> </span><span class="mi">23533</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2788</span><span class="w"></span>
<span class="w"> </span><span class="mi">23514</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2658</span><span class="w"></span>
<span class="w"> </span><span class="mi">23517</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2658</span><span class="w"></span>
<span class="w"> </span><span class="mi">23532</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2641</span><span class="w"></span>
<span class="w"> </span><span class="mi">23527</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferLWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2507</span><span class="w"></span>
<span class="w"> </span><span class="mi">23952</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_READ</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2502</span><span class="w"></span>
<span class="w"> </span><span class="mi">23518</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XLOG_FSYNC</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1576</span><span class="w"></span>
<span class="w"> </span><span class="mi">23524</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALWriteLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1027</span><span class="w"></span>
<span class="p">(</span><span class="mi">13</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Мы исключаем ожидания сети и латчей, поскольку время их ожидания обычно на
несколько порядков больше времени ожидания остальных классов. Ну и это далеко
не все столбцы, которые есть в представлении:</p>
<div class="highlight"><pre><span></span><span class="n">smcdb01d</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT * FROM pg_stat_wait_current</span><span class="w"></span>
<span class="n">WHERE</span><span class="w"> </span><span class="n">class_id</span><span class="w"> </span><span class="n">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">)</span><span class="w"> </span><span class="n">LIMIT</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span><span class="w"></span>
<span class="o">-</span><span class="p">[</span><span class="w"> </span><span class="n">RECORD</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">]</span><span class="o">-----------------------------</span><span class="w"></span>
<span class="n">pid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">12107</span><span class="w"></span>
<span class="n">sample_ts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">36</span><span class="p">:</span><span class="mf">59.598562</span><span class="o">+</span><span class="mi">03</span><span class="w"></span>
<span class="n">class_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"></span>
<span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Locks</span><span class="w"></span>
<span class="n">event_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">4</span><span class="w"></span>
<span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Transaction</span><span class="w"></span>
<span class="n">wait_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">24334</span><span class="w"></span>
<span class="n">p1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">5</span><span class="w"></span>
<span class="n">p2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">255593733</span><span class="w"></span>
<span class="n">p3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">p4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">p5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="o">-</span><span class="p">[</span><span class="w"> </span><span class="n">RECORD</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">]</span><span class="o">-----------------------------</span><span class="w"></span>
<span class="n">pid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1266</span><span class="w"></span>
<span class="n">sample_ts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">36</span><span class="p">:</span><span class="mf">59.598562</span><span class="o">+</span><span class="mi">03</span><span class="w"></span>
<span class="n">class_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3</span><span class="w"></span>
<span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Storage</span><span class="w"></span>
<span class="n">event_id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">SMGR_READ</span><span class="w"></span>
<span class="n">wait_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1710</span><span class="w"></span>
<span class="n">p1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1663</span><span class="w"></span>
<span class="n">p2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16400</span><span class="w"></span>
<span class="n">p3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">20508</span><span class="w"></span>
<span class="n">p4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="n">p5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">220036</span><span class="w"></span>
<span class="n">smcdb01d</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Параметры <code>p1</code>-<code>p5</code> — это текстовые поля. Например, для heavy-weight блокировок
они дают примерно ту же информацию, что можно найти в <code>pg_locks</code>, а для
событий дискового I/O можно понять, из каких базы, отношения, блока мы ожидали чтения.</p>
<h4>pg_stat_wait_profile</h4>
<p>Например, можно посмотреть, сколько времени база тратила в каждом из типов ожиданий:</p>
<div class="highlight"><pre><span></span><span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT class_name, sum(wait_time) AS wait_time,</span><span class="w"></span>
<span class="n">sum</span><span class="p">(</span><span class="n">wait_count</span><span class="p">)</span><span class="w"> </span><span class="n">AS</span><span class="w"> </span><span class="n">wait_count</span><span class="w"> </span><span class="n">FROM</span><span class="w"> </span><span class="n">pg_stat_wait_profile</span><span class="w"></span>
<span class="n">GROUP</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="n">wait_time</span><span class="w"> </span><span class="n">DESC</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_count</span><span class="w"></span>
<span class="o">------------+--------------+------------</span><span class="w"></span>
<span class="w"> </span><span class="n">Network</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">144196945815</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">11877848</span><span class="w"></span>
<span class="w"> </span><span class="n">Latch</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">90164921148</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">3521073</span><span class="w"></span>
<span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2648490737</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">10501900</span><span class="w"></span>
<span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">977430136</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">36444251</span><span class="w"></span>
<span class="w"> </span><span class="n">CPU</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">68890774</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">365699457</span><span class="w"></span>
<span class="w"> </span><span class="n">Locks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">74</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">(</span><span class="mi">6</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">rpopdb01g</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<p>Или, например, какие легковесные блокироки являются самыми горячими:</p>
<div class="highlight"><pre><span></span>rpopdb01g/postgres M # SELECT event_name, sum(wait_time) AS wait_time,
sum(wait_count) AS wait_count FROM pg_stat_wait_profile
WHERE class_id = 1 AND wait_time != 0 AND wait_count != 0
GROUP BY event_name ORDER BY wait_time DESC;
event_name | wait_time | wait_count
----------------------+------------+------------
LockMgrLWLocks | 1873294341 | 3870685
WALWriteLock | 1039279117 | 859101
BufferLWLocks | 299153931 | 7356555
BufFreelistLock | 7466923 | 75484
ProcArrayLock | 2321769 | 34355
CLogControlLock | 778148 | 21286
WALInsertLocks | 456224 | 7451
BufferMgrLocks | 107374 | 8447
XidGenLock | 84914 | 2506
UserDefinedLocks | 1875 | 7
CLogBufferLocks | 868 | 80
SInvalWriteLock | 11 | 3
CheckpointerCommLock | 1 | 1
(13 rows)
Time: 29.388 ms
rpopdb01g/postgres M #
</pre></div>
<p>Эти два примера хорошо показывают, что время ожидания не всегда коррелирует
с количеством самих ожиданий, а потому семплирование без учёта времени ожиданий
может давать не слишком правильную картину мира.</p>
<h4>pg_stat_wait_history</h4>
<p>Это представление позволяет увидеть, чего ожидал конкретный процесс в прошлом.
Глубина хранения и интервал семплирования данных настраивается, как описано выше.</p>
<div class="highlight"><pre><span></span><span class="n">xivadb01e</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1"># SELECT sample_ts, class_name, event_name, wait_time</span><span class="w"></span>
<span class="n">FROM</span><span class="w"> </span><span class="n">pg_stat_wait_history</span><span class="w"> </span><span class="n">WHERE</span><span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">29585</span><span class="w"> </span><span class="n">ORDER</span><span class="w"> </span><span class="n">BY</span><span class="w"> </span><span class="n">sample_ts</span><span class="w"> </span><span class="n">DESC</span><span class="w"> </span><span class="n">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">sample_ts</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">class_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">event_name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">wait_time</span><span class="w"></span>
<span class="o">-------------------------------+------------+-----------------+-----------</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">28.544052</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">BufferMgrLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">983997</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">27.542938</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">CLogControlLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">655975</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">26.850302</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">979516</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">25.849207</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">207418</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">24.848059</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">923916</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">23.846909</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">753185</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">22.845808</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">877707</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">21.844718</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">WALInsertLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">778897</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">20.843562</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">CLogControlLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">991267</span><span class="w"></span>
<span class="w"> </span><span class="mi">2015</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">16</span><span class="w"> </span><span class="mi">10</span><span class="p">:</span><span class="mi">56</span><span class="p">:</span><span class="mf">19.842464</span><span class="o">+</span><span class="mi">03</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">LWLocks</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">CLogControlLock</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1001059</span><span class="w"></span>
<span class="p">(</span><span class="mi">10</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w"></span>
<span class="n">xivadb01e</span><span class="o">/</span><span class="n">postgres</span><span class="w"> </span><span class="n">M</span><span class="w"> </span><span class="c1">#</span><span class="w"></span>
</pre></div>
<h4>Трассировка сессии</h4>
<p>Все описанные выше представления рассчитаны на то, что они могут быть включены
всегда, т.е. они сделаны с минимальным overhead’ом по производительности. Но
бывают случаи, когда семплирования раз в <code>pg_stat_wait.history_period</code>
недостаточно и нужно увидеть все события ожидания процесса. В этом случае стоит
использовать функции для трассировки, например, так:</p>
<div class="highlight"><pre><span></span><span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_backend_pid</span><span class="o">();</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_backend_pid</span><span class="w"></span>
<span class="nt">----------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">5399</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_start_trace</span><span class="o">(</span><span class="nt">5399</span><span class="o">,</span><span class="w"> </span><span class="s1">'/tmp/5399.trace'</span><span class="o">);</span><span class="w"></span>
<span class="nt">INFO</span><span class="o">:</span><span class="w"> </span><span class="nt">00000</span><span class="o">:</span><span class="w"> </span><span class="nt">Trace</span><span class="w"> </span><span class="nt">was</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">to</span><span class="o">:</span><span class="w"> </span><span class="o">/</span><span class="nt">tmp</span><span class="o">/</span><span class="nt">5399</span><span class="p">.</span><span class="nc">trace</span><span class="w"></span>
<span class="nt">LOCATION</span><span class="o">:</span><span class="w"> </span><span class="nt">StartWait</span><span class="o">,</span><span class="w"> </span><span class="nt">wait</span><span class="p">.</span><span class="nc">c</span><span class="p">:</span><span class="nd">259</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_start_trace</span><span class="w"></span>
<span class="nt">----------------</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_is_in_trace</span><span class="o">(</span><span class="nt">5399</span><span class="o">);</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_is_in_trace</span><span class="w"></span>
<span class="nt">----------------</span><span class="w"></span>
<span class="w"> </span><span class="nt">t</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">--</span><span class="w"> </span><span class="nt">some</span><span class="w"> </span><span class="nt">activity</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">pg_stop_trace</span><span class="o">(</span><span class="nt">5399</span><span class="o">);</span><span class="w"></span>
<span class="nt">INFO</span><span class="o">:</span><span class="w"> </span><span class="nt">00000</span><span class="o">:</span><span class="w"> </span><span class="nt">Trace</span><span class="w"> </span><span class="nt">was</span><span class="w"> </span><span class="nt">stopped</span><span class="w"></span>
<span class="nt">LOCATION</span><span class="o">:</span><span class="w"> </span><span class="nt">StartWait</span><span class="o">,</span><span class="w"> </span><span class="nt">wait</span><span class="p">.</span><span class="nc">c</span><span class="p">:</span><span class="nd">265</span><span class="w"></span>
<span class="w"> </span><span class="nt">pg_stop_trace</span><span class="w"></span>
<span class="nt">---------------</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">rpopdb01g</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">M</span><span class="w"> </span><span class="err">#</span><span class="w"></span>
</pre></div>
<p>Будет создан обычный текстовый файл, где на каждое событие ожидания будет
записываться две строки, например:</p>
<div class="highlight"><pre><span></span>start 2015-11-16 11:17:26.831686+03 CPU MemAllocation 0 0 0 0 0
stop 2015-11-16 11:17:26.831695+03 CPU
start 2015-11-16 11:17:26.831705+03 LWLocks BufferLWLocks 122 1 0 0 0
stop 2015-11-16 11:17:26.831715+03 LWLocks
start 2015-11-16 11:17:26.831738+03 Network WRITE 0 0 0 0 0
stop 2015-11-16 11:17:26.831749+03 Network
start 2015-11-16 11:17:26.831795+03 Network READ 0 0 0 0 0
stop 2015-11-16 11:17:26.831808+03 Network
start 2015-11-16 11:17:26.831825+03 Storage SMGR_READ 1663 13003 12763 0 13
stop 2015-11-16 11:17:26.831844+03 Storage
</pre></div>
<h3>Вместо заключения</h3>
<p>Интерфейс ожиданий — долгожданная функциональность в PostgreSQL, которая
позволяет значительно лучше понимать, что именно происходит с базой. Прямо
сейчас эта функциональность толкается в ядро PostgreSQL, чтобы начиная с 9.6
не требовалось пересобирать postgres для её работы.</p>
<p>На всякий случай скажу, что незадолго до того, как Ильдус
<a href="http://www.postgresql.org/message-id/559D4729.9080704@postgrespro.ru">представил</a>
свою реализацию на pgsql-hackers@, идею сделать wait interface
<a href="http://www.postgresql.org/message-id/CA+TgmoYd3GTz2_mJfUHF+RPe-bCy75ytJeKVv9x-o+SonCGApw@mail.gmail.com">озвучил</a>
Robert Haas. И очевидно, эту идею поддержали многие. Для того, чтобы это
случилось, уже принято пару подготовительных патчей, например,
<a href="http://www.postgresql.org/message-id/3F71DA37-A17B-4961-9908-016E6323E612@postgrespro.ru">Refactoring of LWLock tranches</a>.</p>
<p>Очень надеемся, что мы увидим это в 9.6.</p>PostgreSQL replication lag in seconds2015-06-14T16:00:00+03:002015-06-14T16:00:00+03:00d0ubletag:simply.name,2015-06-14:/postgresql-replication-monitoring.html<p>Our typical PostgreSQL shard consists of master and two replics. We monitor that
master has as much as needed number of replics (we fire <span class="caps">WARN</span> event in monitoring
if there is only one alive replica and <span class="caps">CRIT</span> if there are no alive replics). And
we monitor replication lag, <code>replay_location …</code></p><p>Our typical PostgreSQL shard consists of master and two replics. We monitor that
master has as much as needed number of replics (we fire <span class="caps">WARN</span> event in monitoring
if there is only one alive replica and <span class="caps">CRIT</span> if there are no alive replics). And
we monitor replication lag, <code>replay_location</code> of the replica. All this is done
with a couple of easy queries to <code>pg_stat_replication</code>.</p>
<p>This method has two great disadvantages:</p>
<ul>
<li>Most of the data from <code>pg_stat_replication</code> could be taken only by users
with <code>SUPERUSER</code> option. Giving such option to monitoring user is not really
good idea.</li>
<li>We have different threasholds for replication lag because 10 <span class="caps">MB</span> of
replication lag on cluster with 1 <span class="caps">MB</span>/s of writing load and on cluster with
100 <span class="caps">MB</span>/s are not the same.</li>
</ul>
<p>To solve both problems we have written
<a href="http://www.postgresql.org/docs/current/static/bgworker.html">bgworker</a>,
sources for which could be taken <a href="https://github.com/man-brain/repl_mon">here</a>.</p>
<p>The princile of operation is really simple — bgworker once in a while (which
could be configured with an accuracy of 1 ms) writes in some table (<code>repl_mon</code>
by default, but it can be configured) next things:</p>
<div class="highlight"><pre><span></span>pgtest02g/postgres M # \dS+ repl_mon
Table "public.repl_mon"
Column | Type | Modifiers | Storage | Stats target | Description
----------+--------------------------+-----------+----------+--------------+-------------
ts | timestamp with time zone | | plain | |
location | text | | extended | |
replics | integer | | plain | |
pgtest02g/postgres M # select * from repl_mon ;
ts | location | replics
-------------------------------+------------+---------
2015-06-14 15:35:51.632041+03 | 0/1E04E568 | 2
(1 row)
Time: 0.664 ms
pgtest02g/postgres M #
</pre></div>
<p>Query for getting data could be seen
<a href="https://github.com/man-brain/repl_mon/blob/8e14fb52/repl_mon.c#L127-L131">here</a>.</p>
<p>Number of alive replics could be taken directly from this table on master. And
on replics values of fields <code>ts</code> and <code>location</code> could be compared with current
time and <code>pg_last_xlog_replay_location()</code>:</p>
<div class="highlight"><pre><span></span><span class="nt">pgtest02d</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">R</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">(</span><span class="nt">current_timestamp</span><span class="w"> </span><span class="nt">-</span><span class="w"> </span><span class="nt">ts</span><span class="o">)</span><span class="w"> </span><span class="nt">AS</span><span class="w"> </span><span class="nt">lag_time</span><span class="o">,</span><span class="w"> </span><span class="nt">greatest</span><span class="o">(</span><span class="nt">0</span><span class="o">,</span><span class="w"></span>
<span class="nt">pg_xlog_location_diff</span><span class="o">(</span><span class="nt">location</span><span class="p">::</span><span class="nd">pg_lsn</span><span class="o">,</span><span class="w"> </span><span class="nt">pg_last_xlog_replay_location</span><span class="o">()))</span><span class="w"></span>
<span class="nt">AS</span><span class="w"> </span><span class="nt">lag_bytes</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">repl_mon</span><span class="w"> </span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">lag_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">lag_bytes</span><span class="w"></span>
<span class="nt">-----------------</span><span class="o">+</span><span class="nt">-----------</span><span class="w"></span>
<span class="w"> </span><span class="nt">00</span><span class="p">:</span><span class="nd">00</span><span class="p">:</span><span class="nd">00</span><span class="p">.</span><span class="nc">516017</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">0</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">Time</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="p">.</span><span class="nc">724</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">pgtest02d</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">R</span><span class="w"> </span><span class="err">#</span><span class="w"></span>
</pre></div>
<p>Important thing here is that it does not require superuser rights.</p>
<p>For this thing to work you need to execute <code>make</code> and <code>sudo make install</code> in the
source directory. And then add <code>repl_mon</code> to <code>shared_preload_libraries</code> and
restart PostgreSQL.</p>
<p>I hope, someone will find it useful.</p>
<p><span class="caps">P.S.</span> A special thank is to say to Michael Paquier, who supports
<a href="https://github.com/michaelpq/pg_plugins">pg_plugins</a> — a set of simple
templates for PostgreSQL extensions. Most of the code I copied from there.</p>Лаг репликации PostgreSQL в секундах2015-06-14T16:00:00+03:002015-06-14T16:00:00+03:00d0ubletag:simply.name,2015-06-14:/ru/postgresql-replication-monitoring.html<p>Наш классический шард PostgreSQL состоит из мастера и двух реплик. Мы мониторим
тот факт, что реплик ровно столько, сколько должно быть (зажигаем <span class="caps">WARN</span>, если
осталась одна, и <span class="caps">CRIT</span>, если реплик не осталось). И мониторим отставание реплик,
а именно <code>replay_location</code>. Всё это делается парой простых запросов в
<code>pg_stat_replication …</code></p><p>Наш классический шард PostgreSQL состоит из мастера и двух реплик. Мы мониторим
тот факт, что реплик ровно столько, сколько должно быть (зажигаем <span class="caps">WARN</span>, если
осталась одна, и <span class="caps">CRIT</span>, если реплик не осталось). И мониторим отставание реплик,
а именно <code>replay_location</code>. Всё это делается парой простых запросов в
<code>pg_stat_replication</code>.</p>
<p>У этого способа есть два существенных недостатка:</p>
<ul>
<li>Доставать бОльшую часть данных из <code>pg_stat_replication</code> могут только
пользователи с опцией <code>SUPERUSER</code>. Давать такую пользователю для мониторинга не
очень хорошо.</li>
<li>Пороги для лага репликации на всех кластерах мы ставим разными, потому что
10 МБ лага на кластере, куда записи 1 МБ/с, и на кластере с 100 МБ/с записи —
сильно разные вещи.</li>
</ul>
<p>Для решения обеих проблем мы написали
<a href="http://www.postgresql.org/docs/current/static/bgworker.html">bgworker</a>,
исходники которого лежат <a href="https://github.com/man-brain/repl_mon">тут</a>.</p>
<p>Принцип работы очень простой — bgworker раз в какое-то время (настраивается с
точностью до милисекунды) пишет в какую-то табличку (по-умолчанию <code>repl_mon</code>, но
имя настраивается) следующие вещи:</p>
<div class="highlight"><pre><span></span>pgtest02g/postgres M # \dS+ repl_mon
Table "public.repl_mon"
Column | Type | Modifiers | Storage | Stats target | Description
----------+--------------------------+-----------+----------+--------------+-------------
ts | timestamp with time zone | | plain | |
location | text | | extended | |
replics | integer | | plain | |
pgtest02g/postgres M # select * from repl_mon ;
ts | location | replics
-------------------------------+------------+---------
2015-06-14 15:35:51.632041+03 | 0/1E04E568 | 2
(1 row)
Time: 0.664 ms
pgtest02g/postgres M #
</pre></div>
<p>Запрос для получения данных можно увидеть
<a href="https://github.com/man-brain/repl_mon/blob/8e14fb52/repl_mon.c#L127-L131">тут</a>.</p>
<p>Количество живых реплик можно доставать прямо из этой таблички на мастере, а
на репликах можно сравнивать значения из полей <code>ts</code> и <code>location</code> с текущим
временем и <code>pg_last_xlog_replay_location()</code>:</p>
<div class="highlight"><pre><span></span><span class="nt">pgtest02d</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">R</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="nt">SELECT</span><span class="w"> </span><span class="o">(</span><span class="nt">current_timestamp</span><span class="w"> </span><span class="nt">-</span><span class="w"> </span><span class="nt">ts</span><span class="o">)</span><span class="w"> </span><span class="nt">AS</span><span class="w"> </span><span class="nt">lag_time</span><span class="o">,</span><span class="w"> </span><span class="nt">greatest</span><span class="o">(</span><span class="nt">0</span><span class="o">,</span><span class="w"></span>
<span class="nt">pg_xlog_location_diff</span><span class="o">(</span><span class="nt">location</span><span class="p">::</span><span class="nd">pg_lsn</span><span class="o">,</span><span class="w"> </span><span class="nt">pg_last_xlog_replay_location</span><span class="o">()))</span><span class="w"></span>
<span class="nt">AS</span><span class="w"> </span><span class="nt">lag_bytes</span><span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">repl_mon</span><span class="w"> </span><span class="o">;</span><span class="w"></span>
<span class="w"> </span><span class="nt">lag_time</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">lag_bytes</span><span class="w"></span>
<span class="nt">-----------------</span><span class="o">+</span><span class="nt">-----------</span><span class="w"></span>
<span class="w"> </span><span class="nt">00</span><span class="p">:</span><span class="nd">00</span><span class="p">:</span><span class="nd">00</span><span class="p">.</span><span class="nc">516017</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nt">0</span><span class="w"></span>
<span class="o">(</span><span class="nt">1</span><span class="w"> </span><span class="nt">row</span><span class="o">)</span><span class="w"></span>
<span class="nt">Time</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="p">.</span><span class="nc">724</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">pgtest02d</span><span class="o">/</span><span class="nt">postgres</span><span class="w"> </span><span class="nt">R</span><span class="w"> </span><span class="err">#</span><span class="w"></span>
</pre></div>
<p>Важно, что всё это не требует прав суперпользователя.</p>
<p>Для работы этой штуки надо в каталоге с исходниками сказать <code>make</code> и <code>sudo make
install</code>. Затем в <code>shared_preload_libraries</code> добавить <code>repl_mon</code> и перезапустить PostgreSQL.</p>
<p>Надеюсь, кому-нибудь оно будет полезно.</p>
<p><span class="caps">P.S.</span> Отдельное спасибо стоит сказать Michael Paquier, который поддерживает
<a href="https://github.com/michaelpq/pg_plugins">pg_plugins</a> — шаблоны простых
расширений для PostgreSQL. БОльшую часть кода я скопировал оттуда.</p>Checking backups consistency2015-06-06T20:00:00+03:002015-06-06T20:00:00+03:00d0ubletag:simply.name,2015-06-06:/barman-backups-check.html<p>Once upon a time as a result of two human errors while deploying new code on
our databases we did <code>DROP SCHEMA data CASCADE;</code> on all shards of one of our
clusters with more than 3 <span class="caps">TB</span> of data. It added us gray hair, allowed us to
check our <span class="caps">PITR …</span></p><p>Once upon a time as a result of two human errors while deploying new code on
our databases we did <code>DROP SCHEMA data CASCADE;</code> on all shards of one of our
clusters with more than 3 <span class="caps">TB</span> of data. It added us gray hair, allowed us to
check our <span class="caps">PITR</span> skills in production and made us to treat backups differently.</p>
<p>That story had happy end. The incident occured in the end of the working day
when workload was already descreasing and by morning we restored everything
from backups to the needed point of time. We have always been doing backups
and have always been monitoring the fact they are done. But we threw checking of
the ability to restore from them when we migrated to
<a href="http://www.pgbarman.org">barman</a> because of high cost.</p>
<p>Recovery of one shard took more time than others because we could not restore
from last backup and we had to restore from second last (we do backups every
night). For that reason after fuckup we decided to get back checking of backups
consistency. As a result there are a couple of scripts which could be seen
<a href="https://github.com/man-brain/misc/tree/master/backups_checking">here</a>. One of
them (<code>check_backup_consistency.py</code>) sequentially deploys last backup of each
cluster, starts PostgreSQL with <code>recovery_target = 'immediate'</code> and waits for
reaching consistent state.</p>
<p>The second one (<code>check_xlogs.sh</code>) checks that backup server contains all needed
WALs (from the first <span class="caps">WAL</span> of first backup to the last archived <span class="caps">WAL</span>). Generally,
archiver guarantees the sequence in archiving WALs and if you configure
<code>archive_command</code> the right way you should not have problems with that. But we
had situations when free space on partition with <code>pg_xlog</code> ended and we changed
<code>archive_command</code> to move WALs locally. The first deploy would return
<code>archive_command</code> back but locally copied WALs could be forgotten.</p>
<p>We run these checks with cron and monitoring scripts look at status-files
created in <code>/tmp</code>. We start doing backups at 2 a.m. and the last one ends
around 6 a.m. (thanks to incremental backups in barman 1.4). And in the middle
of the day (around 2-3 p.m.) we already know if our backups are consistent and
if we can do <code>DROP SCHEMA</code> again :)</p>
<p>Perhaps, someone would find this scripts useful. Feel free to ask questions.</p>Проверка консистентности бэкапов2015-06-06T20:00:00+03:002015-06-06T20:00:00+03:00d0ubletag:simply.name,2015-06-06:/ru/barman-backups-check.html<p>Однажды в результате наложения двух человеческих ошибок при выкатке кода
на базы мы сделали <code>DROP SCHEMA data CASCADE;</code> на всех шардах одного из
кластеров, в котором лежало около 3 ТБ данных. Это добавило нам много седых
волос, позволило проверить свои навыки в <span class="caps">PITR</span> прямо в бою и заставило
по-другому относиться …</p><p>Однажды в результате наложения двух человеческих ошибок при выкатке кода
на базы мы сделали <code>DROP SCHEMA data CASCADE;</code> на всех шардах одного из
кластеров, в котором лежало около 3 ТБ данных. Это добавило нам много седых
волос, позволило проверить свои навыки в <span class="caps">PITR</span> прямо в бою и заставило
по-другому относиться к резервным копиям.</p>
<p>Та история закончилась хорошо. Инцидент случился ближе к концу рабочего дня,
когда нагрузка уже спадала, а к утру мы восстановили всё из бэкапов на нужный
момент времени. Резервные копии мы делали всегда и всегда мониторили тот факт,
что они делаются. А вот проверку того, сможем ли мы из них восстановиться, мы
в момент переезда на <a href="http://www.pgbarman.org">barman</a> выкинули из-за затратности.</p>
<p>Восстановление одного из шардов затянулось дольше остальных из-за того, что из
последнего бэкапа нам подняться не удалось, пришлось восстанавливаться из
предпоследнего (бэкапы мы делаем каждый день). Потому проверку консистентности
после факапа решили вернуть и в результате получилась пара скриптов, которые
можно посмотреть
<a href="https://github.com/man-brain/misc/tree/master/backups_checking">тут</a>. Один из
них (<code>check_backup_consistency.py</code>) последовательно разворачивает последний
бэкап каждого кластера, запускает PostgreSQL с <code>recovery_target = 'immediate'</code>
и дожидается достижения базой консистентного состояния.</p>
<p>Второй (<code>check_xlogs.sh</code>) проверяет тот
факт, что на машине с бэкапами есть все необходимые <span class="caps">WAL</span>’ы (с первого <span class="caps">WAL</span>’а
первого бэкапа до последнего заархивированного). В общем случае archiver
гарантирует последовательную отправку <span class="caps">WAL</span>’ов в архив и если у вас правильно
указан <code>archive_command</code>, то проблем с этим быть не должно. Но у нас
бывали случаи, когда на разделе с <code>pg_xlog</code> заканчивалось место и мы меняли
<code>archive_command</code> на перекладывание <span class="caps">WAL</span>’ов локально. При очередной выкатке
состояния на базы <code>archive_command</code> возвращалась на место, а вот переложенные
локально <span class="caps">WAL</span>’ы до архива донести могли забыть.</p>
<p>Эти проверки мы запускаем по cron’у, а мониторинг смотрит на status-файлы,
которые скрипты пишут в <code>/tmp</code>. Бэкапы мы начинаем делать в 2:00 и последние
из них добегают около 6:00 (слава инкрементальным бэкапам в barman 1.4). И в
середине дня (около 15:00-16:00) мы уже знаем, можно ли снова делать <code>DROP
SCHEMA</code> :)</p>
<p>Возможно, кому-то эти скрипты пригодятся. Будут вопросы - обращайтесь.</p>Upgrading PostgreSQL to 9.42015-03-31T18:00:00+03:002015-03-31T18:00:00+03:00d0ubletag:simply.name,2015-03-31:/upgrading-postgres-to-9.4.html<h4>Preface</h4>
<p>In 9.4 there is a logical decoding which would allow to upgrade from 9.4 to 9.5
quiet cheap (at least it
<a href="https://wiki.postgresql.org/wiki/UDR_Online_Upgrade">shoud</a> be so). But right
now upgrading major version of PostgreSQL is painful. In most common case it
looks like that:</p>
<ol>
<li>Stop master and call …</li></ol><h4>Preface</h4>
<p>In 9.4 there is a logical decoding which would allow to upgrade from 9.4 to 9.5
quiet cheap (at least it
<a href="https://wiki.postgresql.org/wiki/UDR_Online_Upgrade">shoud</a> be so). But right
now upgrading major version of PostgreSQL is painful. In most common case it
looks like that:</p>
<ol>
<li>Stop master and call <code>pg_upgrade</code>.</li>
<li>Start master with new version.</li>
<li>Make a full backup from upgraded master.</li>
<li>Refill all the replicas from new backup.</li>
</ol>
<p>We have databases with several terabytes of data, each shard of which consists
from three hosts - master and two replics. Most of the read-only queries are
served from replicas. And with some of such DBs we can survive death of only
one replica. So if both replics would die one master would not handle all the
writing and read-only load. And making a full backup of several terabytes and
refilling at least one replica from it is a good challenge.</p>
<h4>Ray of hope</h4>
<p>When we were about to upgrade at night from saturday to sunday and suffer,
Bruce Momjian sent a
<a href="http://www.postgresql.org/message-id/20150219165755.GA18714@momjian.us">patch</a>
to the documentation which allows to upgrade all replics without refilling them
from backup. The patch has been applied to master branch so in documentation
for 9.5 the required steps are already
<a href="http://www.postgresql.org/docs/devel/static/pgupgrade.html">described</a>.</p>
<p>The only disadvantage of such solution is that you need to stop all hosts of
the cluster while upgrading (so <span class="caps">DB</span> is not accessible even in read-only mode).
Such restriction is not very good for us so we have decided to do it a bit different.</p>
<h4>Implementation</h4>
<p>Because we have several dozens of DBs we decided to do an upgrade with a
script. It is very stupid and panics on any error so that everything else you
shoud make manually.</p>
<p>The script is closely depends on our infrastructure so I publish just some
pieces of it. Common sequence is quiet simple (install 9.4 packages on all
hosts, upgrade master, rsync replicas, start master):</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">options</span><span class="p">,</span> <span class="n">hosts</span><span class="p">,</span> <span class="n">master</span><span class="p">):</span>
<span class="n">prefix</span> <span class="o">=</span> <span class="n">options</span><span class="o">.</span><span class="n">prefix</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Installing packages on all hosts."</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">apply_state_on_host</span><span class="p">(</span><span class="s1">'</span><span class="si">%s</span><span class="s1">*'</span> <span class="o">%</span> <span class="n">prefix</span><span class="p">,</span> <span class="s1">'components.pg94.db.packages'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">res</span>
<span class="n">upgrade_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">rsync_replicas</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">hosts</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">start_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Seems, that everything succeded. Unbeliavable!"</span><span class="p">)</span>
</pre></div>
<p>Such sequence extends time in read-only mode (master could be started after
upgrading the first replica), but it completely excludes the need to refill
any host from backup.</p>
<h5>Upgrading master</h5>
<p>Master upgrade seems to be the most intense stage:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">upgrade_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_checksums</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s2">"sed -i /etc/init.d/postgresql-9.4 -e 's/initdb --pgdata/initdb -k --pgdata/'"</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 initdb'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer stop'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.3 stop'</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'/usr/pgsql-9.4/bin/pg_upgrade -b /usr/pgsql-9.3/bin/ -B /usr/pgsql-9.4/bin/ -d /var/lib/pgsql/9.3/data/ -D /var/lib/pgsql/9.4/data/ --check'</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Running 'pg_upgrade --check' on </span><span class="si">%s</span><span class="s2"> failed. Turning everything on back."</span> <span class="o">%</span> <span class="n">master</span><span class="p">)</span>
<span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.3 start'</span><span class="p">)</span>
<span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer start'</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'/usr/pgsql-9.4/bin/pg_upgrade -b /usr/pgsql-9.3/bin/ -B /usr/pgsql-9.4/bin/ -d /var/lib/pgsql/9.3/data/ -D /var/lib/pgsql/9.4/data/ --link'</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">preserve_history</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'rsync -av /var/lib/pgsql/9.3/data/pg_xlog/*.history /var/lib/pgsql/9.4/data/pg_xlog/'</span>
<span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'mkdir -p /var/lib/pgsql/9.4/data/conf.d/'</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">apply_state_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'components.pg94.db.configs'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Could not install configs on </span><span class="si">%s</span><span class="s2">. Exiting."</span> <span class="o">%</span> <span class="n">master</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">70</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'iptables -A INPUT -p tcp -m tcp --dport 5432 -j REJECT && ip6tables -A INPUT -p tcp -m tcp --dport 5432 -j REJECT'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 start'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_stat</span><span class="p">:</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/usr/pgsql-9.4/bin/vacuumdb --all --analyze-only'</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 stop'</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Seems that master has been upgraded successfully. Unbelievable!"</span><span class="p">)</span>
</pre></div>
<p>At first we do <code>pg_upgrade --check</code> and if it fails everything is put back in
place. It is the only case where this happens. In case of any other error the
scripts falls.</p>
<p>Then our configuration files are installed and master closes from replics with
firewall because (surprisingly!) replics with 9.3 can apply changes from master
with 9.4. It would not have happy end though.</p>
<p>It is important that from stopping pgbouncer on master all queries are routed
to replics so the cluster is in read-only state.</p>
<h5>Upgrading replics</h5>
<p>Replics are upgraded sequentially:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rsync_replicas</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">hosts</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="n">hosts</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">master</span><span class="p">)</span>
<span class="k">for</span> <span class="n">replica</span> <span class="ow">in</span> <span class="n">hosts</span><span class="p">:</span>
<span class="n">rsync_one_replica</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</pre></div>
<p>The function itself does not do anything except rsync:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rsync_one_replica</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer stop'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.3 stop'</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'ssh -A root@</span><span class="si">%s</span><span class="s1"> "cd /var/lib/pgsql && rsync --relative --archive --hard-links --size-only 9.3/data 9.4/data root@</span><span class="si">%s</span><span class="s1">:/var/lib/pgsql/"'</span> <span class="o">%</span> <span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Could not rsync changes to </span><span class="si">%s</span><span class="s2">. Exiting."</span> <span class="o">%</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">110</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">tablespace</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'ssh -A root@</span><span class="si">%s</span><span class="s1"> "cd /var/lib/pgsql/9.3/slow && rsync --relative --archive --hard-links --size-only PG_9.3_201306121 PG_9.4_201409291 root@</span><span class="si">%s</span><span class="s1">:/var/lib/pgsql/9.3/slow/"'</span> <span class="o">%</span> <span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Could not rsync tablespace to </span><span class="si">%s</span><span class="s2">. Exiting."</span> <span class="o">%</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">120</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/usr/local/yandex/pgswitch/convert_master.sh 9.4 </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">master</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_remount</span><span class="p">:</span>
<span class="n">remount_catalogs</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 start'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer start'</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Seems that </span><span class="si">%s</span><span class="s2"> has been upgraded successfully. Unbelievable!"</span> <span class="o">%</span> <span class="n">replica</span><span class="p">)</span>
</pre></div>
<p>In last step right <code>recovery.conf</code> is created with our custom script.</p>
<p>After upgrade of the first replica it opens for load and at this monent the
second replica closes from load and is being upgraded. This is the hardest
stage of upgrading because master is closed, replica with 9.3 is closed and the
only host serving load is replica with 9.4 and without any statistics
(unfortunatelly, pg_upgrade does not transfer statistics for optimizer).</p>
<h5>Starting up</h5>
<p>After upgrading of all replicas master is opening for load.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">start_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'iptables -D INPUT -p tcp -m tcp --dport 5432 -j REJECT && ip6tables -D INPUT -p tcp -m tcp --dport 5432 -j REJECT'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 start'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer start'</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'/var/lib/pgsql/analyze_new_cluster.sh'</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
</pre></div>
<p>At this moment all three hosts are up and ready for serving load. Dances, happiness.</p>
<h4>Summary</h4>
<p>We have upgraded several dozens of shards from 9.3.6 to 9.4.1 with read-only
degradation for less than three minutes on each shard. On a couple os shards
we catched some special effects and script failed. So we had to update them
manually. Well that the sequence of steps is clear and manual work reduced
to invoking commands from script. It, however, took more time, about 7 minutes
per shard.</p>
<p>And for dessert… we have already caught a rare
<a href="http://www.postgresql.org/message-id/20150330162247.2492.923@wrigleys.postgresql.org">bug</a>
with 9.4.1 and Tom Lane has made a patch to fix the problem in 38 minutes (!)
from creating a bug report. It is very cool.</p>
<h3><span class="caps">UPD</span>: Attention, potential data loss</h3>
<p>The script above has the following:</p>
<div class="highlight"><pre><span></span><span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_stat</span><span class="p">:</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/usr/pgsql-9.4/bin/vacuumdb --all --analyze-only'</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
</pre></div>
<p>It is done on primary with new version when it is closed from replics.
If <code>autovacuum = on</code> on primary, this can lead to data loss. Details could
be found in
<a href="https://www.postgresql.org/message-id/DA18C5E1-A115-4C1C-9F7C-E7B9A5F3EBC5%40yandex.ru">the thread on pgsql-hackers@</a>.</p>Обновление PostgreSQL до 9.42015-03-31T18:00:00+03:002015-03-31T18:00:00+03:00d0ubletag:simply.name,2015-03-31:/ru/upgrading-postgres-to-9.4.html<h4>Пролог</h4>
<p>В 9.4 появилась логическая репликация. А потому с 9.4 на 9.5 можно будет
обновиться весьма дёшево (по крайней мере так
<a href="https://wiki.postgresql.org/wiki/UDR_Online_Upgrade">должно</a> быть). Ну а прямо
сейчас обновление мажорной версии PostgreSQL — боль. В самом распространённом
варианте это выглядит так:</p>
<ol>
<li>Необходимо полностью потушить мастер и дёрнуть <code>pg_upgrade …</code></li></ol><h4>Пролог</h4>
<p>В 9.4 появилась логическая репликация. А потому с 9.4 на 9.5 можно будет
обновиться весьма дёшево (по крайней мере так
<a href="https://wiki.postgresql.org/wiki/UDR_Online_Upgrade">должно</a> быть). Ну а прямо
сейчас обновление мажорной версии PostgreSQL — боль. В самом распространённом
варианте это выглядит так:</p>
<ol>
<li>Необходимо полностью потушить мастер и дёрнуть <code>pg_upgrade</code>.</li>
<li>Взлететь с новой версией только мастером.</li>
<li>Сделать полный бэкап с обновлённого мастера.</li>
<li>Переналить все реплики из нового бэкапа.</li>
</ol>
<p>У нас есть базы объёмом в единицы терабайт, каждый шард которых состоит из трёх
машин — мастера и двух реплик. Значительная часть читающих запросов летит в
реплики. И есть такие базы, где мы умеем переживать смерть только одной реплики.
Т.е. если обе реплики умрут, один мастер не вытащит на себе всю пишущую и
читающую нагрузку. А сделать полный бэкап базки в пару терабайт и развернуть
его хотя бы на одну реплику за ночь — не самая простая задача.</p>
<h4>Луч надежды</h4>
<p>Когда мы уже собрались обновляться в ночь с субботы на воскресенье и страдать,
Bruce Momjian прислал
<a href="http://www.postgresql.org/message-id/20150219165755.GA18714@momjian.us">патч</a>
к документации, позволяющий выполнить upgrade и всех реплик без переналивки из
бэкапа. Патч в итоге применён к мастеру, т.е. в документации для 9.5 уже
<a href="http://www.postgresql.org/docs/devel/static/pgupgrade.html">есть</a> необходимые шаги.</p>
<p>Единственным минусом этого решения является тот факт, что в момент обновления
потушены должны быть все машины кластера (т.е. база недоступна даже для чтения).
Такое ограничение нам тоже не очень понравилось, потому мы решили сделать
немного по-другому.</p>
<h4>Реализация</h4>
<p>Поскольку базок у нас несколько десятков, выполнять обновление руками на каждой
из них очень не хотелось, потому я написал простой скрипт для этого. При этом
скрипт очень тупой — при любой проблеме он немедленно падает и дальше необходимо
доделывать руками.</p>
<p>Поскольку скрипт тесно провязан с нашей инфраструктурой, я публикую лишь
некоторые кусочки из него, отражающие суть. Общая последовательность достаточно
простая (ставим на все машины пакеты 9.4, обновляем мастер, делаем rsync на
каждую из реплик, взлетаем мастером):</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">options</span><span class="p">,</span> <span class="n">hosts</span><span class="p">,</span> <span class="n">master</span><span class="p">):</span>
<span class="n">prefix</span> <span class="o">=</span> <span class="n">options</span><span class="o">.</span><span class="n">prefix</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Installing packages on all hosts."</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">apply_state_on_host</span><span class="p">(</span><span class="s1">'</span><span class="si">%s</span><span class="s1">*'</span> <span class="o">%</span> <span class="n">prefix</span><span class="p">,</span> <span class="s1">'components.pg94.db.packages'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">res</span>
<span class="n">upgrade_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">rsync_replicas</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">hosts</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">start_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Seems, that everything succeded. Unbeliavable!"</span><span class="p">)</span>
</pre></div>
<p>Такая последовательность продлевает время жизни в read-only (мастером можно
взлетать после обновления первой реплики), но совсем исключает необходимость
переналивки реплик из бэкапов.</p>
<h5>Обновление мастера</h5>
<p>Обновление мастера, наверное, самый насыщенный этап:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">upgrade_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_checksums</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s2">"sed -i /etc/init.d/postgresql-9.4 -e 's/initdb --pgdata/initdb -k --pgdata/'"</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 initdb'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer stop'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.3 stop'</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'/usr/pgsql-9.4/bin/pg_upgrade -b /usr/pgsql-9.3/bin/ -B /usr/pgsql-9.4/bin/ -d /var/lib/pgsql/9.3/data/ -D /var/lib/pgsql/9.4/data/ --check'</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Running 'pg_upgrade --check' on </span><span class="si">%s</span><span class="s2"> failed. Turning everything on back."</span> <span class="o">%</span> <span class="n">master</span><span class="p">)</span>
<span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.3 start'</span><span class="p">)</span>
<span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer start'</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'/usr/pgsql-9.4/bin/pg_upgrade -b /usr/pgsql-9.3/bin/ -B /usr/pgsql-9.4/bin/ -d /var/lib/pgsql/9.3/data/ -D /var/lib/pgsql/9.4/data/ --link'</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">preserve_history</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'rsync -av /var/lib/pgsql/9.3/data/pg_xlog/*.history /var/lib/pgsql/9.4/data/pg_xlog/'</span>
<span class="n">cmd_run_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'mkdir -p /var/lib/pgsql/9.4/data/conf.d/'</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">apply_state_on_host</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'components.pg94.db.configs'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Could not install configs on </span><span class="si">%s</span><span class="s2">. Exiting."</span> <span class="o">%</span> <span class="n">master</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">70</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'iptables -A INPUT -p tcp -m tcp --dport 5432 -j REJECT && ip6tables -A INPUT -p tcp -m tcp --dport 5432 -j REJECT'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 start'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_stat</span><span class="p">:</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/usr/pgsql-9.4/bin/vacuumdb --all --analyze-only'</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 stop'</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Seems that master has been upgraded successfully. Unbelievable!"</span><span class="p">)</span>
</pre></div>
<p>Сначала делается <code>pg_upgrade --check</code> и если есть какие-то проблемы, то всё
возвращается на место. Это единственное место, где так происходит. Во всех
остальных случаях скрипт просто падает.</p>
<p>Затем притаскиваются наши конфигурационные файлы и мастер закрывается от реплик
межсетевым экраном, потому что (сюрприз!) реплики с 9.3 могут тащить изменения
с мастера с 9.4. Ничем хорошим это, правда, не закончится.</p>
<p>Важным является тот факт, что с момента остановки pgbouncer на мастере вся
нагрузка льётся в реплики, т.е. кластер деградирует в read-only.</p>
<h5>Обновление реплик</h5>
<p>Обновление реплик происходит последовательно:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rsync_replicas</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">hosts</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="n">hosts</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">master</span><span class="p">)</span>
<span class="k">for</span> <span class="n">replica</span> <span class="ow">in</span> <span class="n">hosts</span><span class="p">:</span>
<span class="n">rsync_one_replica</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</pre></div>
<p>В самой функции по большому счёту не делается ничего, кроме rsynс:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rsync_one_replica</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer stop'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.3 stop'</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'ssh -A root@</span><span class="si">%s</span><span class="s1"> "cd /var/lib/pgsql && rsync --relative --archive --hard-links --size-only 9.3/data 9.4/data root@</span><span class="si">%s</span><span class="s1">:/var/lib/pgsql/"'</span> <span class="o">%</span> <span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Could not rsync changes to </span><span class="si">%s</span><span class="s2">. Exiting."</span> <span class="o">%</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">110</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">tablespace</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'ssh -A root@</span><span class="si">%s</span><span class="s1"> "cd /var/lib/pgsql/9.3/slow && rsync --relative --archive --hard-links --size-only PG_9.3_201306121 PG_9.4_201409291 root@</span><span class="si">%s</span><span class="s1">:/var/lib/pgsql/9.3/slow/"'</span> <span class="o">%</span> <span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">stdout</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">if</span> <span class="n">res</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Could not rsync tablespace to </span><span class="si">%s</span><span class="s2">. Exiting."</span> <span class="o">%</span> <span class="n">replica</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">120</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/usr/local/yandex/pgswitch/convert_master.sh 9.4 </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">master</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_remount</span><span class="p">:</span>
<span class="n">remount_catalogs</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 start'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">replica</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer start'</span><span class="p">)</span>
<span class="n">bydlog</span><span class="p">(</span><span class="s2">"Seems that </span><span class="si">%s</span><span class="s2"> has been upgraded successfully. Unbelievable!"</span> <span class="o">%</span> <span class="n">replica</span><span class="p">)</span>
</pre></div>
<p>Последний шаг создаёт правильный <code>recovery.conf</code>, чтобы повернуть реплику на
правильного мастера.</p>
<p>После обновления первой реплики она открывается для нагрузки, а в этот момент
вторая закрывается и обновляется. Это самый сложный этап обновления, потому что
мастер закрыт, реплика с 9.3 закрыта и единственная машина, обслуживающая
нагрузку, - реплика с 9.4, у которой совсем нет никакой статистики (к
сожалению, pg_upgrade не переносит статистику).</p>
<h5>Взлёт</h5>
<p>После обновления всех реплик мастер открывается для нагрузки.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">start_master</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">options</span><span class="p">):</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'iptables -D INPUT -p tcp -m tcp --dport 5432 -j REJECT && ip6tables -D INPUT -p tcp -m tcp --dport 5432 -j REJECT'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/postgresql-9.4 start'</span><span class="p">)</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/etc/init.d/pgbouncer start'</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">'/var/lib/pgsql/analyze_new_cluster.sh'</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">cmd</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
</pre></div>
<p>В этот момент все три машины доступны для обслуживания нагрузки. Танцы, радость.</p>
<h4>Итоги</h4>
<p>Несколько десятков шардов мы обновили с 9.3.6 на 9.4.1 с нахождением в read-only
каждого из них менее трёх минут. На паре шардов вылезли спецэффекты, скрипт упал
и потому пришлось их обновлять руками, но последовательность шагов чёткая и
действия руками сводились к выполнению того же, что написано в скрипте. Времени,
правда, это, конечно, заняло побольше, около 7 минут на шард.</p>
<p>И на сладкое скажу, что мы уже наступили на редкий
<a href="http://www.postgresql.org/message-id/20150330162247.2492.923@wrigleys.postgresql.org">баг</a>,
патч с решением которого Tom Lane наваял за 38 минут (!) с момента создания bug
report. Это очень круто.</p>
<h3><span class="caps">UPD</span>: Внимание, потенциальная потеря данных</h3>
<p>В показанном скрипте есть примерно такое:</p>
<div class="highlight"><pre><span></span><span class="k">if</span> <span class="n">options</span><span class="o">.</span><span class="n">need_stat</span><span class="p">:</span>
<span class="n">run_or_exit</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="s1">'/usr/pgsql-9.4/bin/vacuumdb --all --analyze-only'</span><span class="p">,</span> <span class="n">runas</span><span class="o">=</span><span class="s1">'postgres'</span><span class="p">)</span>
</pre></div>
<p>Это делается, когда мастер запущен уже с новой версией, но закрыт от реплик.
Если при этом на мастере <code>autovacuum = on</code>, то это с высокой долей вероятности
приведёт к потере данных. Подробнее см.
<a href="https://pgconf.ru/2018/110829">доклад моего коллеги Дмитрия Сарафанникова</a> или
<a href="https://www.postgresql.org/message-id/DA18C5E1-A115-4C1C-9F7C-E7B9A5F3EBC5%40yandex.ru">обсуждение в рассылке</a>.</p>HTTPS2015-02-08T12:20:00+03:002015-02-08T12:20:00+03:00d0ubletag:simply.name,2015-02-08:/https.html<p>Went 2015 year, everyone moves their services from <span class="caps">HTTP</span> to <span class="caps">HTTPS</span>. And because cool guys from <a href="http://www.startssl.com/">StartSSL</a> give valid certificates for free, I have decided to make the site be accessible through https. Since I would need to generate static content with pelican twice (for http and https) I have …</p><p>Went 2015 year, everyone moves their services from <span class="caps">HTTP</span> to <span class="caps">HTTPS</span>. And because cool guys from <a href="http://www.startssl.com/">StartSSL</a> give valid certificates for free, I have decided to make the site be accessible through https. Since I would need to generate static content with pelican twice (for http and https) I have decided to make the site be available only through <span class="caps">HTTPS</span>. If it makes troubles to someone, please, let me know.</p>HTTPS2015-02-08T12:20:00+03:002015-02-08T12:20:00+03:00d0ubletag:simply.name,2015-02-08:/ru/https.html<p>На дворе 2015-й год, все и вся переводят свои сервисы с <span class="caps">HTTP</span> на <span class="caps">HTTPS</span>. И поскольку замечательные ребята из <a href="http://www.startssl.com/">StartSSL</a> бесплатно раздают валидные сертификаты, я решил сделать сайт доступным по https. Поскольку в таком случае с pelican’ом мне придётся генерировать статику дважды (для http и https), я решил сделать …</p><p>На дворе 2015-й год, все и вся переводят свои сервисы с <span class="caps">HTTP</span> на <span class="caps">HTTPS</span>. И поскольку замечательные ребята из <a href="http://www.startssl.com/">StartSSL</a> бесплатно раздают валидные сертификаты, я решил сделать сайт доступным по https. Поскольку в таком случае с pelican’ом мне придётся генерировать статику дважды (для http и https), я решил сделать сайт доступным только по <span class="caps">HTTPS</span>. Если у кого-то какие-то проблемы или кто-то против, дайте знать.</p>PostgreSQL 9.4 and pg_repack2015-01-31T20:00:00+03:002015-01-31T20:00:00+03:00d0ubletag:simply.name,2015-01-31:/pg_repack94.html<p>We have workflows with storing cooling <span class="caps">UGC</span>-data in <span class="caps">DB</span>. The older the data is,
the less likely it is asked. We partition tables with such data by date and eventually
move data from <span class="caps">SSD</span>-disks to <span class="caps">SATA</span>. It gives us very good hardware savings.
PostgreSQL has built-in support for …</p><p>We have workflows with storing cooling <span class="caps">UGC</span>-data in <span class="caps">DB</span>. The older the data is,
the less likely it is asked. We partition tables with such data by date and eventually
move data from <span class="caps">SSD</span>-disks to <span class="caps">SATA</span>. It gives us very good hardware savings.
PostgreSQL has built-in support for tablespaces, that could be stored on different
devices, and has built-in command <code>ALTER TABLE foo SET TABLESPACE bar</code> which can
solve our initial problem. But this command has one big disadvantage - during
moving to another tablespace the table is exclusively locked so you can not
write or even read it.</p>
<p>Fortunately, there is a great tool called <a href="http://reorg.github.io/pg_repack/">pg_repack</a>
which has been created to solve problems like the above. We successfully use it
with PostgreSQL 9.3 but 9.4 has been release in December and we have started
thinking about an upgrade.</p>
<p>There is <a href="https://github.com/reorg/pg_repack/issues/16">an open issue</a> on github
about 9.4 support but nobody spent time on it. Since I was most interested in it
I did it myself. The result was this <a href="https://github.com/reorg/pg_repack/pull/34">pull request</a>.</p>
<p>It successfully passes regression-tests on 9.3 and 9.4 and definitely moves
a table with all indexes from one tablespace to another. But actually this patch
is not true - it does not support multiple <span class="caps">TOAST</span>-indexes which can happen
in the future. I hope, maintainers will find time to fix the problem right way.</p>
<p>But indeed the main achievement is not the patch but valuable experience:</p>
<ul>
<li>exploring <code>pg_catalog</code> structure a bit,</li>
<li>debugging changes with <span class="caps">GDB</span> (by the way,
<a href="http://www.pgcon.org/2014/schedule/attachments/321_pgcon2014-coredump.pdf">a good presentation</a>
on this topic)</li>
<li>understanding PostgreSQL regression-tests.</li>
</ul>PostgreSQL 9.4 и pg_repack2015-01-31T20:00:00+03:002015-01-31T20:00:00+03:00d0ubletag:simply.name,2015-01-31:/ru/pg_repack94.html<p>У нас есть сценарии хранения в БД остывающих <span class="caps">UGC</span>-данных. Чем старше данные,
тем реже за ними приходят. Таблицы с такими данными мы партиционируем по
времени и по мере остывания данных перемещаем их с <span class="caps">SSD</span>-дисков на <span class="caps">SATA</span>. Это
даёт очень некислую экономию на стоимости железа. В PostgreSQL есть родной …</p><p>У нас есть сценарии хранения в БД остывающих <span class="caps">UGC</span>-данных. Чем старше данные,
тем реже за ними приходят. Таблицы с такими данными мы партиционируем по
времени и по мере остывания данных перемещаем их с <span class="caps">SSD</span>-дисков на <span class="caps">SATA</span>. Это
даёт очень некислую экономию на стоимости железа. В PostgreSQL есть родной
механизм табличных пространств, которые можно положить на
разные устройства, и есть родная команда <code>ALTER TABLE foo SET TABLESPACE bar</code>,
с помощью которой можно решать нашу задачу. Но у этой команды есть существенный
недостаток — в момент перемещения партиции на неё берётся эксклюзивная блокировка,
а потому в неё не просто нельзя писать, но и читать, а это уже совсем нехорошо.</p>
<p>К счастью, есть замечательная штука, которая называется <a href="http://reorg.github.io/pg_repack/">pg_repack</a>
и создана для решения задач вроде вышеописанной. Мы успешно используем её с
PostgreSQL 9.3, но в декабре уже вышла 9.4 и мы стали задумываться об обновлении.</p>
<p>На github уже давно открыта <a href="https://github.com/reorg/pg_repack/issues/16">задача</a>
про поддержку 9.4, но никто ей не занимался. Поскольку мне было надо и никто
этим заниматься не хотел, пришлось делать самому. На выходе получился вот такой
<a href="https://github.com/reorg/pg_repack/pull/34">pull request</a>.</p>
<p>Он проходит тесты на 9.3 и 9.4 и точно перемещает табличку со всеми индексами
из одного табличного пространства в другое. Но строго говоря, этот патч не
слишком правильный — он не накрывает случай с несколькими <span class="caps">TOAST</span>-индексами,
который может случаться в будущем. Надеюсь, разработчики найдут силы и время
поправить проблему на совесть, пусть для этого и надо будет много чего переписать.</p>
<p>А вообще основным достижением является не патч, а полученный мной опыт в результате:</p>
<ul>
<li>ковыряния структуры <code>pg_catalog</code>,</li>
<li>отладки своих изменения с помощью <span class="caps">GDB</span> (кстати,
<a href="http://www.pgcon.org/2014/schedule/attachments/321_pgcon2014-coredump.pdf">хорошая презентация</a>
на эту тему),</li>
<li>разбирательств с регрeссионными тестами PostgreSQL.</li>
</ul>Redesign2015-01-03T12:00:00+03:002015-01-03T12:00:00+03:00d0ubletag:simply.name,2015-01-03:/redesign.html<p>Finally I have found time to taste static content generators. And after some
testing I have chosen <a href="https://github.com/getpelican/pelican">pelican</a>. So far
I’m very satisfied. Let’s see what happens.</p>
<p>By the way, sources of the current site version could be found on
<a href="https://github.com/man-brain/simply.name">github</a>.</p>Обновление сайта2015-01-03T12:00:00+03:002015-01-03T12:00:00+03:00d0ubletag:simply.name,2015-01-03:/ru/redesign.html<p>Наконец-то дошли руки добраться до статических генераторов контента. Выбор пал
на <a href="https://github.com/getpelican/pelican">pelican</a> и пока я очень доволен.</p>
<p>В том или ином виде я перенёс сюда все материалы, но на всякий случай я подержу
старую версию на <a href="http://old.simply.name">http://old.simply.name</a> ещё какое-то время.</p>
<p>Кстати, исходники текущей версии сайта можно взять …</p><p>Наконец-то дошли руки добраться до статических генераторов контента. Выбор пал
на <a href="https://github.com/getpelican/pelican">pelican</a> и пока я очень доволен.</p>
<p>В том или ином виде я перенёс сюда все материалы, но на всякий случай я подержу
старую версию на <a href="http://old.simply.name">http://old.simply.name</a> ещё какое-то время.</p>
<p>Кстати, исходники текущей версии сайта можно взять на
<a href="https://github.com/man-brain/simply.name">github</a>.</p>Pgcheck and delayed replics2014-12-23T21:00:00+03:002014-12-23T21:00:00+03:00d0ubletag:simply.name,2014-12-23:/pgcheck-and-delayed-replics.html<p>Two months ago we <a href="https://simply.name/pgcheck.html">announced</a> pgcheck - a tool for automatic load control on PostgreSQL databases using <span class="caps">PL</span>/Proxy. Today we have fixed all found issues about one new feature - pgcheck can now account replication delays and not to route queires on delayed replics.</p>
<p>Sources and some more info could be …</p><p>Two months ago we <a href="https://simply.name/pgcheck.html">announced</a> pgcheck - a tool for automatic load control on PostgreSQL databases using <span class="caps">PL</span>/Proxy. Today we have fixed all found issues about one new feature - pgcheck can now account replication delays and not to route queires on delayed replics.</p>
<p>Sources and some more info could be found on <a href="https://github.com/yandex/pgcheck">github</a>. Enjoy.</p>Pgcheck и отставшие реплики2014-12-23T21:00:00+03:002014-12-23T21:00:00+03:00d0ubletag:simply.name,2014-12-23:/ru/pgcheck-and-delayed-replics.html<p>Два месяца назад мы <a href="https://simply.name/ru/pgcheck.html">анонсировали</a> pgcheck — инструмент для автоматической балансировки нагрузки на базы PostgreSQL с использованием <span class="caps">PL</span>/Proxy. Сегодня мы поправили все найденные проблемы, связанные с новой функциональностью — pgcheck теперь учитывает отставание реплик и не отправляет читающие запросы на отставшие реплики.</p>
<p>Исходники и документация на <a href="https://github.com/yandex/pgcheck">github</a>. Наслаждайтесь.</p>PostgreSQL and SystemTap2014-12-08T21:00:00+03:002014-12-08T21:00:00+03:00d0ubletag:simply.name,2014-12-08:/postgresql-and-systemtap.html<h4>Preface</h4>
<p>Once upon a time we started having strange performance issues with writing-only
load on PostgreSQL 9.4 with huge shared_buffers. The problem itself is well
described <a href="http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name">here</a>
but it is not the topic of the post. And since PostgreSQL does not have
something like Oracle wait events interface …</p><h4>Preface</h4>
<p>Once upon a time we started having strange performance issues with writing-only
load on PostgreSQL 9.4 with huge shared_buffers. The problem itself is well
described <a href="http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name">here</a>
but it is not the topic of the post. And since PostgreSQL does not have
something like Oracle wait events interface yet, we have written a couple of
simple SystemTap scripts to determine the problem. Below are some details.</p>
<h4>Preparing</h4>
<p>First of all you need to install needed packages. Actually, not everything from
below listed is needed but it is just enough:</p>
<div class="highlight"><pre><span></span><span class="n">root</span><span class="nv">@xdb01d</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">rpm</span><span class="w"> </span><span class="o">-</span><span class="n">qa</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">grep</span><span class="w"> </span><span class="n">systemtap</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">client</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">devel</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">server</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">runtime</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">root</span><span class="nv">@xdb01d</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">rpm</span><span class="w"> </span><span class="o">-</span><span class="n">qa</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">grep</span><span class="w"> </span><span class="o">-</span><span class="n">E</span><span class="w"> </span><span class="s1">'kernel.*2.6.32-504'</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">debuginfo</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">firmware</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">debuginfo</span><span class="o">-</span><span class="n">common</span><span class="o">-</span><span class="n">x86_64</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">headers</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">devel</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">root</span><span class="nv">@xdb01d</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="err">#</span><span class="w"></span>
</pre></div>
<p>The next thing to do is to compile PostgreSQL with <code>--enable-dtrace</code> option
passed to <code>configure</code>-script. Since I am using <span class="caps">RHEL</span> I have fixed spec-file for that:</p>
<div class="highlight"><pre><span></span>$ diff postgresql-9.4.spec.orig postgresql-9.4.spec
323a324,325
> --enable-dtrace <span class="se">\</span>
> --enable-debug <span class="se">\</span>
</pre></div>
<p>Actually, compiling with <code>--enable-dtrace</code> is neccessary for using predefined in
PostgreSQL source code markers. All of them are defined in the
<a href="http://www.postgresql.org/docs/current/static/dynamic-trace.html#DTRACE-PROBE-POINT-TABLE">documentation</a>.
One more thing to say is that recompiling PostgreSQL for deeper debugging is
not really the thing to be used in production-environment :( That’s why <span class="caps">IMHO</span>
analogue of oracle wait interface in PostgreSQL should be done ever.</p>
<h4>First stap</h4>
<p>Right, let’s assume that we have everything needed to start, what next? The
first step is really very difficult. I could recommend a few things to look at:</p>
<ul>
<li><a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/SystemTap_Beginners_Guide/">SystemTap Beginners Guide</a> from Red Hat.
And first of all <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/SystemTap_Beginners_Guide/using-systemtap.html#testing">paragraph about testing SystemTap</a>.</li>
<li><a href="http://blog.endpoint.com/2009/05/postgresql-with-systemtap.html">PostgreSQL with SystemTap</a> by Joshua Tolley.</li>
<li><a href="https://sourceware.org/systemtap/wiki/PostgresqlMarkers">Some examples and even video</a> on SystemTap wiki.</li>
</ul>
<p>At first you could simply do copy-paste the examples above. And than do small
changes in them to get what you need. My first stap is below, it simply prints
all checkpoint events (time and pid). I needed it to map <span class="caps">CPU</span> spikes during
checkpointing with events happening at this time.</p>
<div class="highlight"><pre><span></span><span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{
<span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span>
<span class="k">exit</span><span class="ss">()</span>
}
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">clog__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Clog checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">clog__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Clog checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">subtrans__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Subtrans checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">subtrans__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Subtrans checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">multixact__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Multixact checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">multixact__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Multixact checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__sync__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer sync started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
#<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__sync__written</span><span class="s2">"</span><span class="ss">)</span>
#{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer %d sync written by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="mh">$a</span><span class="nv">rg1</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__sync__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer sync done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__checkpoint__sync__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer checkpoint sync started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer checkpoint sync done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">twophase__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Twophase checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">twophase__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Twophase checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
</pre></div>
<p>The next thing was to track spent time under locking the relation for extension.
I’ve done it in this way:</p>
<div class="highlight"><pre><span></span><span class="k">global</span><span class="w"> </span><span class="nf">count</span><span class="p">,</span><span class="w"> </span><span class="n">abyrvalg</span><span class="p">,</span><span class="w"> </span><span class="n">timings</span><span class="p">,</span><span class="w"> </span><span class="n">sec_timings</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="ss">"/usr/pgsql-9.4/bin/postgres"</span><span class="p">).</span><span class="k">function</span><span class="p">(</span><span class="ss">"LockRelationForExtension"</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">abyrvalg</span><span class="o">[</span><span class="n">$relation, tid()</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_ms</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="nf">count</span><span class="o">++</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="ss">"/usr/pgsql-9.4/bin/postgres"</span><span class="p">).</span><span class="k">function</span><span class="p">(</span><span class="ss">"UnlockRelationForExtension"</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tid</span><span class="p">();</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_ms</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="o">[</span><span class="n">$relation,p</span><span class="o">]</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">abyrvalg</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">abyrvalg</span><span class="o">[</span><span class="n">$relation,p</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="n">timings</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="n">sec_timings</span><span class="o">[</span><span class="n">gettimeofday_s()</span><span class="o">]</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">abyrvalg</span><span class="o">[</span><span class="n">$relation,p</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">#</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="w"> </span><span class="o">></span><span class="mi">1000</span><span class="p">)</span><span class="w"></span>
<span class="err">#</span><span class="w"> </span><span class="n">printf</span><span class="w"> </span><span class="p">(</span><span class="ss">"[%s] Relation %d for extension has been locked by pid %d for %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">gettimeofday_s</span><span class="p">()),</span><span class="w"> </span><span class="err">$</span><span class="n">relation</span><span class="o">-></span><span class="n">rd_id</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">(),</span><span class="w"> </span><span class="n">tmp</span><span class="p">)</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">timer</span><span class="p">.</span><span class="n">s</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_s</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">sec_timings</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"[%s] Min: %d ms; Max: %d ms; Avg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">tmp</span><span class="p">),</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="k">end</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"\nLockRelationForExtension has been called %d times\n"</span><span class="p">,</span><span class="w"> </span><span class="nf">count</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"Min: %d ms\nMax: %d ms\nAvg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">timings</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">timings</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">timings</span><span class="p">))</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
</pre></div>
<p>Actually I have added this part to the first stap and the output of running it
with <code>stap -v</code> command was (lost of lines are skipped with <code><...></code>):</p>
<div class="highlight"><pre><span></span><span class="nt">Pass</span><span class="w"> </span><span class="nt">1</span><span class="o">:</span><span class="w"> </span><span class="nt">parsed</span><span class="w"> </span><span class="nt">user</span><span class="w"> </span><span class="nt">script</span><span class="w"> </span><span class="nt">and</span><span class="w"> </span><span class="nt">96</span><span class="w"> </span><span class="nt">library</span><span class="w"> </span><span class="nt">script</span><span class="o">(</span><span class="nt">s</span><span class="o">)</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">198148virt</span><span class="o">/</span><span class="nt">26440res</span><span class="o">/</span><span class="nt">3120shr</span><span class="o">/</span><span class="nt">23744data</span><span class="w"> </span><span class="nt">kb</span><span class="o">,</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nt">160usr</span><span class="o">/</span><span class="nt">10sys</span><span class="o">/</span><span class="nt">165real</span><span class="w"> </span><span class="nt">ms</span><span class="o">.</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">2</span><span class="o">:</span><span class="w"> </span><span class="nt">analyzed</span><span class="w"> </span><span class="nt">script</span><span class="o">:</span><span class="w"> </span><span class="nt">25</span><span class="w"> </span><span class="nt">probe</span><span class="o">(</span><span class="nt">s</span><span class="o">),</span><span class="w"> </span><span class="nt">8</span><span class="w"> </span><span class="nt">function</span><span class="o">(</span><span class="nt">s</span><span class="o">),</span><span class="w"> </span><span class="nt">3</span><span class="w"> </span><span class="nt">embed</span><span class="o">(</span><span class="nt">s</span><span class="o">),</span><span class="w"> </span><span class="nt">4</span><span class="w"> </span><span class="nt">global</span><span class="o">(</span><span class="nt">s</span><span class="o">)</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">231456virt</span><span class="o">/</span><span class="nt">44380res</span><span class="o">/</span><span class="nt">12740shr</span><span class="o">/</span><span class="nt">31976data</span><span class="w"> </span><span class="nt">kb</span><span class="o">,</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nt">90usr</span><span class="o">/</span><span class="nt">60sys</span><span class="o">/</span><span class="nt">163real</span><span class="w"> </span><span class="nt">ms</span><span class="o">.</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">3</span><span class="o">:</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">cached</span><span class="w"> </span><span class="o">/</span><span class="nt">root</span><span class="o">/</span><span class="p">.</span><span class="nc">systemtap</span><span class="o">/</span><span class="nt">cache</span><span class="o">/</span><span class="nt">54</span><span class="o">/</span><span class="nt">stap_5407ff18f4496fac55552cb675f64223_13156</span><span class="p">.</span><span class="nc">c</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">4</span><span class="o">:</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">cached</span><span class="w"> </span><span class="o">/</span><span class="nt">root</span><span class="o">/</span><span class="p">.</span><span class="nc">systemtap</span><span class="o">/</span><span class="nt">cache</span><span class="o">/</span><span class="nt">54</span><span class="o">/</span><span class="nt">stap_5407ff18f4496fac55552cb675f64223_13156</span><span class="p">.</span><span class="nc">ko</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">5</span><span class="o">:</span><span class="w"> </span><span class="nt">starting</span><span class="w"> </span><span class="nt">run</span><span class="o">.</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">03</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">04</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Clog</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Clog</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Subtrans</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Subtrans</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Multixact</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Multixact</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">37</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">42</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">438</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">82</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">43</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">657</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">293</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">44</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">647</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">681</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">665</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">52</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1738</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1844</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">1772</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">25</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">3239</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">3239</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3239</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">26</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">4518</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3034</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">28</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">2078</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">428</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">29</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">136</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">68</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">33</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">4880</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">5007</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">4943</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">34</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">5206</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3654</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mi">20</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">3346</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">2497</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mi">24</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">7342</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">7342</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">7342</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mi">25</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">8382</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3949</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">02</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1499</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1646</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">1587</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">03</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">4665</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3021</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">04</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1201</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">6449</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3876</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">7268</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3291</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">07</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1899</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">6311</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">4735</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">08</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">2791</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">7107</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">6017</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">09</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">8343</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">2551</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">22</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">9543</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">12365</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">10954</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">23</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">22017</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">8741</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">25</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">23489</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">11113</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">04</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">128</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">8</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">98</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">5</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">06</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">07</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">48</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">49</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">50</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">51</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="nt">LockRelationForExtension</span><span class="w"> </span><span class="nt">has</span><span class="w"> </span><span class="nt">been</span><span class="w"> </span><span class="nt">called</span><span class="w"> </span><span class="nt">16075</span><span class="w"> </span><span class="nt">times</span><span class="w"></span>
<span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">23489</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">287</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">WARNING</span><span class="o">:</span><span class="w"> </span><span class="nt">Number</span><span class="w"> </span><span class="nt">of</span><span class="w"> </span><span class="nt">errors</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="o">,</span><span class="w"> </span><span class="nt">skipped</span><span class="w"> </span><span class="nt">probes</span><span class="o">:</span><span class="w"> </span><span class="nt">13046</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">5</span><span class="o">:</span><span class="w"> </span><span class="nt">run</span><span class="w"> </span><span class="nt">completed</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nt">20usr</span><span class="o">/</span><span class="nt">100sys</span><span class="o">/</span><span class="nt">527749real</span><span class="w"> </span><span class="nt">ms</span><span class="o">.</span><span class="w"></span>
</pre></div>
<p>From the output of this stap you can see that problems occur while doing buffer
sync and there are situations when a lot of time is spent under holding
ExclusiveLock on extension of relation between
<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/hio.c;h=631af759d78fef6c9e909b50fc48ef37b32cbae9;hb=refs/heads/REL9_4_STABLE#l431">this</a>
and <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/hio.c;h=631af759d78fef6c9e909b50fc48ef37b32cbae9;hb=refs/heads/REL9_4_STABLE#l460">this</a>
lines of code in
<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/hio.c;h=631af759d78fef6c9e909b50fc48ef37b32cbae9;hb=refs/heads/REL9_4_STABLE#l158">RelationGetBufferForTuple function</a>.</p>
<h4>Second stap</h4>
<p>The next stap I had written was the following:</p>
<div class="highlight"><pre><span></span><span class="k">global</span><span class="w"> </span><span class="nf">count</span><span class="p">,</span><span class="w"> </span><span class="n">count_with_clock</span><span class="p">,</span><span class="w"> </span><span class="n">passes</span><span class="p">,</span><span class="w"> </span><span class="n">sec_passes</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="ss">"/usr/pgsql-9.4/bin/postgres"</span><span class="p">).</span><span class="k">function</span><span class="p">(</span><span class="ss">"StrategyGetBuffer"</span><span class="p">).</span><span class="k">return</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">$</span><span class="n">StrategyControl</span><span class="o">-></span><span class="n">completePasses</span><span class="w"></span>
<span class="w"> </span><span class="nf">count</span><span class="o">++</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="o">></span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">count_with_clock</span><span class="o">++</span><span class="w"></span>
<span class="w"> </span><span class="n">passes</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="n">sec_passes</span><span class="o">[</span><span class="n">gettimeofday_s()</span><span class="o">]</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"[%s] %d made %d iterations to find least used buffer\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">gettimeofday_s</span><span class="p">()),</span><span class="w"> </span><span class="n">pid</span><span class="p">(),</span><span class="w"> </span><span class="n">tmp</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">timer</span><span class="p">.</span><span class="n">s</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_s</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">sec_passes</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"[%s] Min: %d ms; Max: %d ms; Avg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">tmp</span><span class="p">),</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="k">end</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"\nStrategyGetBuffer has been called %d times, %d times with clock sweep\n"</span><span class="p">,</span><span class="w"> </span><span class="nf">count</span><span class="p">,</span><span class="w"> </span><span class="n">count_with_clock</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"Min: %d ms\nMax: %d ms\nAvg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">passes</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">passes</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">passes</span><span class="p">))</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
</pre></div>
<p>Note that PostgreSQL should not be built with <code>--enable-dtrace</code> option for this
stap to work (but <code>--enable-debug</code> is mandatory). And this stap gets data from
<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/freelist.c;h=4befab0e1ad05f05e950d3dea6f0951d94b4ef4d;hb=refs/heads/REL9_4_STABLE#l22">StrategyControl structure</a> on exit from
<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/freelist.c;h=4befab0e1ad05f05e950d3dea6f0951d94b4ef4d;hb=refs/heads/REL9_4_STABLE#l94">StrategyGetBuffer function</a>
to see how many ClockSweep passes have been done through shared buffers to find
a buffer to be replaced.</p>
<p>This stap showed me that a huge part of shared_buffers pages could be gone by
ClockSweep inside StrategyGetBuffer while holding ExclusiveLock on extension
of relation and
<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/freelist.c;h=4befab0e1ad05f05e950d3dea6f0951d94b4ef4d;hb=refs/heads/REL9_4_STABLE#l134">BufFreelistLock LWLock</a>.</p>
<p>So this is the problem in core functionality of PostgreSQL and it can’t be
fixed without patching the code :( But fortunatelly there are two patches
that have been already commited for 9.5:</p>
<ol>
<li><a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5d7962c6">Change locking regimen around buffer replacement</a> by Robert Haas,</li>
<li><a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d72731a7">Lockless StrategyGetBuffer clock sweep hot path</a> by Andres Freund.</li>
</ol>
<p>I will definitely try PostgreSQL built from master on the same worload to see
if that helped. Stay tuned.</p>PostgreSQL и SystemTap2014-12-08T21:00:00+03:002014-12-08T21:00:00+03:00d0ubletag:simply.name,2014-12-08:/ru/postgresql-and-systemtap.html<h4>Пролог</h4>
<p>Однажды мы стали наблюдать странные проблемы с производительностью
PostgreSQL 9.4 на пишущей нагрузке с большим shared_buffers. Сама проблема
хорошо описана <a href="http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name">тут</a>,
но она не относится к теме поста. Поскольку PostgreSQL не имеет аналога
интерфейса ожиданий Oracle, мы написали пару простых SystemTap скриптов для
локализации проблемы. Ниже немного …</p><h4>Пролог</h4>
<p>Однажды мы стали наблюдать странные проблемы с производительностью
PostgreSQL 9.4 на пишущей нагрузке с большим shared_buffers. Сама проблема
хорошо описана <a href="http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name">тут</a>,
но она не относится к теме поста. Поскольку PostgreSQL не имеет аналога
интерфейса ожиданий Oracle, мы написали пару простых SystemTap скриптов для
локализации проблемы. Ниже немного деталей.</p>
<h4>Подготовка</h4>
<p>Для начала нужно поставить необходимые пакеты. Строго говоря, далеко не все
из нижеперечисленных пакетов нужны, но совершенно точно, что их достаточно:</p>
<div class="highlight"><pre><span></span><span class="n">root</span><span class="nv">@xdb01d</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">rpm</span><span class="w"> </span><span class="o">-</span><span class="n">qa</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">grep</span><span class="w"> </span><span class="n">systemtap</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">client</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">devel</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">server</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">systemtap</span><span class="o">-</span><span class="n">runtime</span><span class="o">-</span><span class="mf">2.3</span><span class="o">-</span><span class="mf">4.</span><span class="n">el6_5</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">root</span><span class="nv">@xdb01d</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">rpm</span><span class="w"> </span><span class="o">-</span><span class="n">qa</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">grep</span><span class="w"> </span><span class="o">-</span><span class="n">E</span><span class="w"> </span><span class="s1">'kernel.*2.6.32-504'</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">debuginfo</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">firmware</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">debuginfo</span><span class="o">-</span><span class="n">common</span><span class="o">-</span><span class="n">x86_64</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">headers</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">kernel</span><span class="o">-</span><span class="n">devel</span><span class="o">-</span><span class="mf">2.6.32</span><span class="o">-</span><span class="mf">504.</span><span class="n">el6</span><span class="p">.</span><span class="n">x86_64</span><span class="w"></span>
<span class="n">root</span><span class="nv">@xdb01d</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="err">#</span><span class="w"></span>
</pre></div>
<p>Следующее, что необходимо сделать, — это пересобрать PostgreSQL с опцией
<code>--enable-dtrace</code>, переданной <code>configure</code>-скрипту. Поскольку мы используем
<span class="caps">RHEL</span>, я поправил spec-файл следующим образом:</p>
<div class="highlight"><pre><span></span>$ diff postgresql-9.4.spec.orig postgresql-9.4.spec
323a324,325
> --enable-dtrace <span class="se">\</span>
> --enable-debug <span class="se">\</span>
</pre></div>
<p>Вообще-то компиляция с <code>--enable-dtrace</code> необходима только для использования
предопределённых в коде PostgreSQL маркеров. Все они описаны в
<a href="http://www.postgresql.org/docs/current/static/dynamic-trace.html#DTRACE-PROBE-POINT-TABLE">документации</a>.
Именно поэтому аналог интерфейса ожиданий oracle в PostgreSQL когда-нибудь
должен быть сделан.</p>
<h4>Первый stap</h4>
<p>Допустим, у нас есть всё, чтобы начать, что дальше? Первый шаг на самом деле
очень непростой. Я бы порекомендовал начать со следующего:</p>
<ul>
<li><a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/SystemTap_Beginners_Guide/">SystemTap Beginners Guide</a>
от Red Hat. И в первую очередь с
<a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/SystemTap_Beginners_Guide/using-systemtap.html#testing">параграфа про проверку работы SystemTap</a>.</li>
<li><a href="http://blog.endpoint.com/2009/05/postgresql-with-systemtap.html">PostgreSQL with SystemTap</a> от Joshua Tolley.</li>
<li><a href="https://sourceware.org/systemtap/wiki/PostgresqlMarkers">Немного примеров и даже видео</a> на SystemTap wiki.</li>
</ul>
<p>Для начала можно просто скопировать и запустить примеры выше, а затем вносить
в них небольшие изменения, которые вам необходимы. Мой первый stap ниже, он
просто выводит начало и конец всех этапов checkpoint’а (время и pid). Он был
мне необходим для сопоставления всплесков потребления <span class="caps">CPU</span> во время
checkpoint’ов с происходящими событиями.</p>
<div class="highlight"><pre><span></span><span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{
<span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span>
<span class="k">exit</span><span class="ss">()</span>
}
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">clog__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Clog checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">clog__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Clog checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">subtrans__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Subtrans checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">subtrans__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Subtrans checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">multixact__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Multixact checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">multixact__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Multixact checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__sync__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer sync started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
#<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__sync__written</span><span class="s2">"</span><span class="ss">)</span>
#{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer %d sync written by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="mh">$a</span><span class="nv">rg1</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__sync__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer sync done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__checkpoint__sync__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer checkpoint sync started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">buffer__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Buffer checkpoint sync done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">twophase__checkpoint__start</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Twophase checkpoint started by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
<span class="nv">probe</span> <span class="nv">process</span><span class="ss">(</span><span class="s2">"</span><span class="s">/usr/pgsql-9.4/bin/postgres</span><span class="s2">"</span><span class="ss">)</span>.<span class="nv">mark</span><span class="ss">(</span><span class="s2">"</span><span class="s">twophase__checkpoint__done</span><span class="s2">"</span><span class="ss">)</span>
{ <span class="nv">printf</span> <span class="ss">(</span><span class="s2">"</span><span class="s">[%s] Twophase checkpoint done by pid %d</span><span class="se">\n</span><span class="s2">"</span>, <span class="nv">ctime</span><span class="ss">(</span><span class="nv">gettimeofday_s</span><span class="ss">())</span>, <span class="nv">pid</span><span class="ss">())</span> }
</pre></div>
<p>Дальше возникла необходимость отследить время, в течение которого держалась
блокировка на расширение отношения. Я это сделал так:</p>
<div class="highlight"><pre><span></span><span class="k">global</span><span class="w"> </span><span class="nf">count</span><span class="p">,</span><span class="w"> </span><span class="n">abyrvalg</span><span class="p">,</span><span class="w"> </span><span class="n">timings</span><span class="p">,</span><span class="w"> </span><span class="n">sec_timings</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="ss">"/usr/pgsql-9.4/bin/postgres"</span><span class="p">).</span><span class="k">function</span><span class="p">(</span><span class="ss">"LockRelationForExtension"</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">abyrvalg</span><span class="o">[</span><span class="n">$relation, tid()</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_ms</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="nf">count</span><span class="o">++</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="ss">"/usr/pgsql-9.4/bin/postgres"</span><span class="p">).</span><span class="k">function</span><span class="p">(</span><span class="ss">"UnlockRelationForExtension"</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tid</span><span class="p">();</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_ms</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="o">[</span><span class="n">$relation,p</span><span class="o">]</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">abyrvalg</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">abyrvalg</span><span class="o">[</span><span class="n">$relation,p</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="n">timings</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="n">sec_timings</span><span class="o">[</span><span class="n">gettimeofday_s()</span><span class="o">]</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">abyrvalg</span><span class="o">[</span><span class="n">$relation,p</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">#</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="w"> </span><span class="o">></span><span class="mi">1000</span><span class="p">)</span><span class="w"></span>
<span class="err">#</span><span class="w"> </span><span class="n">printf</span><span class="w"> </span><span class="p">(</span><span class="ss">"[%s] Relation %d for extension has been locked by pid %d for %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">gettimeofday_s</span><span class="p">()),</span><span class="w"> </span><span class="err">$</span><span class="n">relation</span><span class="o">-></span><span class="n">rd_id</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">(),</span><span class="w"> </span><span class="n">tmp</span><span class="p">)</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">timer</span><span class="p">.</span><span class="n">s</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_s</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">sec_timings</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"[%s] Min: %d ms; Max: %d ms; Avg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">tmp</span><span class="p">),</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">sec_timings</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="k">end</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"\nLockRelationForExtension has been called %d times\n"</span><span class="p">,</span><span class="w"> </span><span class="nf">count</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"Min: %d ms\nMax: %d ms\nAvg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">timings</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">timings</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">timings</span><span class="p">))</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
</pre></div>
<p>Строго говоря, я добавил эту часть в первый stap и запустил всё вместе. Вывод
команды <code>stap -v</code> был следующий (много строчек пропущено с <code><...></code>):</p>
<div class="highlight"><pre><span></span><span class="nt">Pass</span><span class="w"> </span><span class="nt">1</span><span class="o">:</span><span class="w"> </span><span class="nt">parsed</span><span class="w"> </span><span class="nt">user</span><span class="w"> </span><span class="nt">script</span><span class="w"> </span><span class="nt">and</span><span class="w"> </span><span class="nt">96</span><span class="w"> </span><span class="nt">library</span><span class="w"> </span><span class="nt">script</span><span class="o">(</span><span class="nt">s</span><span class="o">)</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">198148virt</span><span class="o">/</span><span class="nt">26440res</span><span class="o">/</span><span class="nt">3120shr</span><span class="o">/</span><span class="nt">23744data</span><span class="w"> </span><span class="nt">kb</span><span class="o">,</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nt">160usr</span><span class="o">/</span><span class="nt">10sys</span><span class="o">/</span><span class="nt">165real</span><span class="w"> </span><span class="nt">ms</span><span class="o">.</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">2</span><span class="o">:</span><span class="w"> </span><span class="nt">analyzed</span><span class="w"> </span><span class="nt">script</span><span class="o">:</span><span class="w"> </span><span class="nt">25</span><span class="w"> </span><span class="nt">probe</span><span class="o">(</span><span class="nt">s</span><span class="o">),</span><span class="w"> </span><span class="nt">8</span><span class="w"> </span><span class="nt">function</span><span class="o">(</span><span class="nt">s</span><span class="o">),</span><span class="w"> </span><span class="nt">3</span><span class="w"> </span><span class="nt">embed</span><span class="o">(</span><span class="nt">s</span><span class="o">),</span><span class="w"> </span><span class="nt">4</span><span class="w"> </span><span class="nt">global</span><span class="o">(</span><span class="nt">s</span><span class="o">)</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">231456virt</span><span class="o">/</span><span class="nt">44380res</span><span class="o">/</span><span class="nt">12740shr</span><span class="o">/</span><span class="nt">31976data</span><span class="w"> </span><span class="nt">kb</span><span class="o">,</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nt">90usr</span><span class="o">/</span><span class="nt">60sys</span><span class="o">/</span><span class="nt">163real</span><span class="w"> </span><span class="nt">ms</span><span class="o">.</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">3</span><span class="o">:</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">cached</span><span class="w"> </span><span class="o">/</span><span class="nt">root</span><span class="o">/</span><span class="p">.</span><span class="nc">systemtap</span><span class="o">/</span><span class="nt">cache</span><span class="o">/</span><span class="nt">54</span><span class="o">/</span><span class="nt">stap_5407ff18f4496fac55552cb675f64223_13156</span><span class="p">.</span><span class="nc">c</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">4</span><span class="o">:</span><span class="w"> </span><span class="nt">using</span><span class="w"> </span><span class="nt">cached</span><span class="w"> </span><span class="o">/</span><span class="nt">root</span><span class="o">/</span><span class="p">.</span><span class="nc">systemtap</span><span class="o">/</span><span class="nt">cache</span><span class="o">/</span><span class="nt">54</span><span class="o">/</span><span class="nt">stap_5407ff18f4496fac55552cb675f64223_13156</span><span class="p">.</span><span class="nc">ko</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">5</span><span class="o">:</span><span class="w"> </span><span class="nt">starting</span><span class="w"> </span><span class="nt">run</span><span class="o">.</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">03</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">04</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Clog</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Clog</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Subtrans</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Subtrans</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Multixact</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Multixact</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">36</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">37</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">42</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">438</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">82</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">43</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">657</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">293</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">44</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">647</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">681</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">665</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">58</span><span class="p">:</span><span class="mi">52</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1738</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1844</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">1772</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">25</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">3239</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">3239</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3239</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">26</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">4518</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3034</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">28</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">2078</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">428</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">29</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">136</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">68</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">33</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">4880</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">5007</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">4943</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">13</span><span class="p">:</span><span class="mi">59</span><span class="p">:</span><span class="mi">34</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">5206</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3654</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mi">20</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">3346</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">2497</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mi">24</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">7342</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">7342</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">7342</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mi">25</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">8382</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3949</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">02</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1499</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1646</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">1587</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">03</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">4665</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3021</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">04</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1201</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">6449</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3876</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">7268</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">3291</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">07</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">1899</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">6311</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">4735</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">08</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">2791</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">7107</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">6017</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">09</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">8343</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">2551</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">22</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">9543</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">12365</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">10954</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">23</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">22017</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">8741</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">02</span><span class="p">:</span><span class="mi">25</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">23489</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">11113</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">started</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">04</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">128</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">8</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">05</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">98</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">5</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">06</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">07</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="o"><...></span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">48</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">1</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">49</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="o">;</span><span class="w"> </span><span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">50</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Buffer</span><span class="w"> </span><span class="nt">checkpoint</span><span class="w"> </span><span class="nt">sync</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="cp">[</span><span class="nx">Thu</span><span class="w"> </span><span class="nx">Oct</span><span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="mi">14</span><span class="p">:</span><span class="mi">06</span><span class="p">:</span><span class="mi">51</span><span class="w"> </span><span class="mi">2014</span><span class="cp">]</span><span class="w"> </span><span class="nt">Checkpoint</span><span class="w"> </span><span class="nt">done</span><span class="w"> </span><span class="nt">by</span><span class="w"> </span><span class="nt">pid</span><span class="w"> </span><span class="nt">8463</span><span class="w"></span>
<span class="nt">LockRelationForExtension</span><span class="w"> </span><span class="nt">has</span><span class="w"> </span><span class="nt">been</span><span class="w"> </span><span class="nt">called</span><span class="w"> </span><span class="nt">16075</span><span class="w"> </span><span class="nt">times</span><span class="w"></span>
<span class="nt">Min</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">Max</span><span class="o">:</span><span class="w"> </span><span class="nt">23489</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">Avg</span><span class="o">:</span><span class="w"> </span><span class="nt">287</span><span class="w"> </span><span class="nt">ms</span><span class="w"></span>
<span class="nt">WARNING</span><span class="o">:</span><span class="w"> </span><span class="nt">Number</span><span class="w"> </span><span class="nt">of</span><span class="w"> </span><span class="nt">errors</span><span class="o">:</span><span class="w"> </span><span class="nt">0</span><span class="o">,</span><span class="w"> </span><span class="nt">skipped</span><span class="w"> </span><span class="nt">probes</span><span class="o">:</span><span class="w"> </span><span class="nt">13046</span><span class="w"></span>
<span class="nt">Pass</span><span class="w"> </span><span class="nt">5</span><span class="o">:</span><span class="w"> </span><span class="nt">run</span><span class="w"> </span><span class="nt">completed</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="nt">20usr</span><span class="o">/</span><span class="nt">100sys</span><span class="o">/</span><span class="nt">527749real</span><span class="w"> </span><span class="nt">ms</span><span class="o">.</span><span class="w"></span>
</pre></div>
<p>Из вывода этого скрипты видно, что проблемы случаются во время сбрасывания
буфферов из shared_buffers на диск. И есть моменты, когда много времени
проводится под ExclusiveLock на расширение отношения между <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/hio.c;h=631af759d78fef6c9e909b50fc48ef37b32cbae9;hb=refs/heads/REL9_4_STABLE#l431">этой</a>
и <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/hio.c;h=631af759d78fef6c9e909b50fc48ef37b32cbae9;hb=refs/heads/REL9_4_STABLE#l460">этой</a>
строчками кода в <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/hio.c;h=631af759d78fef6c9e909b50fc48ef37b32cbae9;hb=refs/heads/REL9_4_STABLE#l158">функции RelationGetBufferForTuple</a>.</p>
<h4>Второй stap</h4>
<p>Вторым я написал следующий stap:</p>
<div class="highlight"><pre><span></span><span class="k">global</span><span class="w"> </span><span class="nf">count</span><span class="p">,</span><span class="w"> </span><span class="n">count_with_clock</span><span class="p">,</span><span class="w"> </span><span class="n">passes</span><span class="p">,</span><span class="w"> </span><span class="n">sec_passes</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">process</span><span class="p">(</span><span class="ss">"/usr/pgsql-9.4/bin/postgres"</span><span class="p">).</span><span class="k">function</span><span class="p">(</span><span class="ss">"StrategyGetBuffer"</span><span class="p">).</span><span class="k">return</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">$</span><span class="n">StrategyControl</span><span class="o">-></span><span class="n">completePasses</span><span class="w"></span>
<span class="w"> </span><span class="nf">count</span><span class="o">++</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="o">></span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">count_with_clock</span><span class="o">++</span><span class="w"></span>
<span class="w"> </span><span class="n">passes</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="n">sec_passes</span><span class="o">[</span><span class="n">gettimeofday_s()</span><span class="o">]</span><span class="w"> </span><span class="o"><<<</span><span class="w"> </span><span class="n">tmp</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"[%s] %d made %d iterations to find least used buffer\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">gettimeofday_s</span><span class="p">()),</span><span class="w"> </span><span class="n">pid</span><span class="p">(),</span><span class="w"> </span><span class="n">tmp</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="n">timer</span><span class="p">.</span><span class="n">s</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gettimeofday_s</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tmp</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">sec_passes</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"[%s] Min: %d ms; Max: %d ms; Avg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="n">ctime</span><span class="p">(</span><span class="n">tmp</span><span class="p">),</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">sec_passes</span><span class="o">[</span><span class="n">tmp</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
<span class="n">probe</span><span class="w"> </span><span class="k">end</span><span class="w"></span>
<span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"\nStrategyGetBuffer has been called %d times, %d times with clock sweep\n"</span><span class="p">,</span><span class="w"> </span><span class="nf">count</span><span class="p">,</span><span class="w"> </span><span class="n">count_with_clock</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="ss">"Min: %d ms\nMax: %d ms\nAvg: %d ms\n"</span><span class="p">,</span><span class="w"> </span><span class="nv">@min</span><span class="p">(</span><span class="n">passes</span><span class="p">),</span><span class="w"> </span><span class="nv">@max</span><span class="p">(</span><span class="n">passes</span><span class="p">),</span><span class="w"> </span><span class="nv">@avg</span><span class="p">(</span><span class="n">passes</span><span class="p">))</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
</pre></div>
<p>Обратите внимание, что PostgreSQL не должен быть собран с опцией <code>--enable-dtrace</code>
для работы этого скрипта (но <code>--enable-debug</code> обязателен). И этот stap получает данные из
<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/freelist.c;h=4befab0e1ad05f05e950d3dea6f0951d94b4ef4d;hb=refs/heads/REL9_4_STABLE#l22">структуры StrategyControl</a>
на выходе из <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/freelist.c;h=4befab0e1ad05f05e950d3dea6f0951d94b4ef4d;hb=refs/heads/REL9_4_STABLE#l94">функции StrategyGetBuffer</a>,
чтобы увидеть, сколько страничек в shared_buffers было пройдено до того, как
был найден буффер для вытеснения.</p>
<p>Этот stap показал, что ClockSweep может пройти огромную часть shared_buffers
внутри функции <code>StrategyGetBuffer</code>, держа эксклюзивную блокировку на расширение
отношения и <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/freelist.c;h=4befab0e1ad05f05e950d3dea6f0951d94b4ef4d;hb=refs/heads/REL9_4_STABLE#l134">BufFreelistLock LWLock</a>.</p>
<p>Стало быть, проблема в базовой функциональности PostgreSQL и не проблема не
может быть решена без правок в коде :( Но к счастью, есть пару патчей, которые
уже закоммичены в 9.5:</p>
<ol>
<li><a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5d7962c6">Change locking regimen around buffer replacement</a> by Robert Haas,</li>
<li><a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d72731a7">Lockless StrategyGetBuffer clock sweep hot path</a> by Andres Freund.</li>
</ol>
<p>Я обязательно попробую PostgreSQL, собранные из мастера, на том же профиле
нагрузки, чтобы проверить, стало ли лучше. Следите за новостями.</p>Yet another psql color prompt2014-11-07T21:00:00+03:002015-05-10T20:00:00+03:00d0ubletag:simply.name,2014-11-07:/yet-another-psql-color-prompt.html<p>Below are screenshots and configs for yet another color prompting of psql. The goal was to get the color prompting scheme that works well on both light and dark background terminals.</p>
<p>Here is an example of <code>.bashrc</code> file:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nb">export</span> <span class="nv">YELLOW</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1;33m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">LIGHT_CYAN …</span></pre></div><p>Below are screenshots and configs for yet another color prompting of psql. The goal was to get the color prompting scheme that works well on both light and dark background terminals.</p>
<p>Here is an example of <code>.bashrc</code> file:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nb">export</span> <span class="nv">YELLOW</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1;33m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">LIGHT_CYAN</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1;36m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">GREEN</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[0;32m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">NOCOLOR</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[0m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">LESS</span><span class="o">=</span><span class="s2">"-iMSx4 -FXR"</span>
<span class="nb">export</span> <span class="nv">PAGER</span><span class="o">=</span><span class="s2">"sed \"s/^\(([0-9]\+ [rows]\+)\)/</span><span class="nv">$GREEN</span><span class="s2">\1</span><span class="nv">$NOCOLOR</span><span class="s2">/;s/^\(-\[\ RECORD\ [0-9]\+\ \][-+]\+\)/</span><span class="nv">$GREEN</span><span class="s2">\1</span><span class="nv">$NOCOLOR</span><span class="s2">/;s/|/</span><span class="nv">$GREEN</span><span class="s2">|</span><span class="nv">$NOCOLOR</span><span class="s2">/g;s/^\([-+]\+\)/</span><span class="nv">$GREEN</span><span class="s2">\1</span><span class="nv">$NOCOLOR</span><span class="s2">/\" 2>/dev/null | less"</span>
</pre></div>
<p>And here is a <code>.psqlrc</code> example:</p>
<div class="highlight"><pre><span></span><span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">QUIET</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">ON_ERROR_ROLLBACK</span><span class="w"> </span><span class="n">interactive</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">VERBOSITY</span><span class="w"> </span><span class="n">verbose</span><span class="w"></span>
<span class="err">\</span><span class="n">x</span><span class="w"> </span><span class="n">auto</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">PROMPT1</span><span class="w"> </span><span class="s1">'%[%033[38;5;27m%]%`hostname -s`%[%033[38;5;102m%]/%/ %[%033[31;5;27m%]%`/var/lib/pgsql/.role.sh`%[%033[0m%] %# '</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">PROMPT2</span><span class="w"> </span><span class="s1">''</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">HISTFILE</span><span class="w"> </span><span class="o">~/</span><span class="p">.</span><span class="n">psql_history</span><span class="o">-</span><span class="w"> </span><span class="err">:</span><span class="n">DBNAME</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">HISTCONTROL</span><span class="w"> </span><span class="n">ignoredups</span><span class="w"></span>
<span class="err">\</span><span class="n">pset</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="o">[</span><span class="n">null</span><span class="o">]</span><span class="w"></span>
<span class="err">\</span><span class="n">pset</span><span class="w"> </span><span class="n">pager</span><span class="w"> </span><span class="n">always</span><span class="w"></span>
<span class="err">\</span><span class="n">timing</span><span class="w"></span>
<span class="err">\</span><span class="n">unset</span><span class="w"> </span><span class="n">QUIET</span><span class="w"></span>
</pre></div>
<p>Script for determining host role (<code>/var/lib/pgsql/.role.sh</code>) is really simple:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nv">res</span><span class="o">=</span><span class="sb">`</span>psql postgres -t -A -c <span class="s1">'show transaction_read_only;'</span><span class="sb">`</span>
<span class="k">if</span> <span class="o">[</span> <span class="nv">$res</span> <span class="o">==</span> <span class="s1">'off'</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="nb">echo</span> <span class="s1">'M'</span>
<span class="k">else</span>
<span class="nb">echo</span> <span class="s1">'R'</span>
<span class="k">fi</span>
</pre></div>
<p>And here are the screenshots of such psql prompt:
<a href="https://simply.name/images/psql1.png"><img alt="Colorized psql for dark backgrounds" src="https://simply.name/images/psql1.png"></a></p>
<p><a href="https://simply.name/images/psql2.png"><img alt="Colorized psql for light backgrounds" src="https://simply.name/images/psql2.png"></a></p>
<p>Enjoy!</p>Ещё один способ раскрасить psql2014-11-07T21:00:00+03:002015-05-10T20:00:00+03:00d0ubletag:simply.name,2014-11-07:/ru/yet-another-psql-color-prompt.html<p>В этом посте вы найдёте скриншоты и конфиги ещё одного раскрашивания цветом psql. Целью было сделать такую цветовую схему, которая одинаково хорошо бы смотрелась на терминалах со светлым и тёмным фонами.</p>
<p>Пример файла <code>.bashrc</code>:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nb">export</span> <span class="nv">YELLOW</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1;33m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">LIGHT_CYAN</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1 …</span></pre></div><p>В этом посте вы найдёте скриншоты и конфиги ещё одного раскрашивания цветом psql. Целью было сделать такую цветовую схему, которая одинаково хорошо бы смотрелась на терминалах со светлым и тёмным фонами.</p>
<p>Пример файла <code>.bashrc</code>:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nb">export</span> <span class="nv">YELLOW</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1;33m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">LIGHT_CYAN</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[1;36m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">GREEN</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[0;32m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">NOCOLOR</span><span class="o">=</span><span class="sb">`</span><span class="nb">echo</span> -e <span class="s1">'\033[0m'</span><span class="sb">`</span>
<span class="nb">export</span> <span class="nv">LESS</span><span class="o">=</span><span class="s2">"-iMSx4 -FXR"</span>
<span class="nb">export</span> <span class="nv">PAGER</span><span class="o">=</span><span class="s2">"sed \"s/^\(([0-9]\+ [rows]\+)\)/</span><span class="nv">$GREEN</span><span class="s2">\1</span><span class="nv">$NOCOLOR</span><span class="s2">/;s/^\(-\[\ RECORD\ [0-9]\+\ \][-+]\+\)/</span><span class="nv">$GREEN</span><span class="s2">\1</span><span class="nv">$NOCOLOR</span><span class="s2">/;s/|/</span><span class="nv">$GREEN</span><span class="s2">|</span><span class="nv">$NOCOLOR</span><span class="s2">/g;s/^\([-+]\+\)/</span><span class="nv">$GREEN</span><span class="s2">\1</span><span class="nv">$NOCOLOR</span><span class="s2">/\" 2>/dev/null | less"</span>
</pre></div>
<p>И соответствующий ему пример <code>.psqlrc</code>:</p>
<div class="highlight"><pre><span></span><span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">QUIET</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">ON_ERROR_ROLLBACK</span><span class="w"> </span><span class="n">interactive</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">VERBOSITY</span><span class="w"> </span><span class="n">verbose</span><span class="w"></span>
<span class="err">\</span><span class="n">x</span><span class="w"> </span><span class="n">auto</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">PROMPT1</span><span class="w"> </span><span class="s1">'%[%033[38;5;27m%]%`hostname -s`%[%033[38;5;102m%]/%/ %[%033[31;5;27m%]%`/var/lib/pgsql/.role.sh`%[%033[0m%] %# '</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">PROMPT2</span><span class="w"> </span><span class="s1">''</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">HISTFILE</span><span class="w"> </span><span class="o">~/</span><span class="p">.</span><span class="n">psql_history</span><span class="o">-</span><span class="w"> </span><span class="err">:</span><span class="n">DBNAME</span><span class="w"></span>
<span class="err">\</span><span class="k">set</span><span class="w"> </span><span class="n">HISTCONTROL</span><span class="w"> </span><span class="n">ignoredups</span><span class="w"></span>
<span class="err">\</span><span class="n">pset</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="o">[</span><span class="n">null</span><span class="o">]</span><span class="w"></span>
<span class="err">\</span><span class="n">pset</span><span class="w"> </span><span class="n">pager</span><span class="w"> </span><span class="n">always</span><span class="w"></span>
<span class="err">\</span><span class="n">timing</span><span class="w"></span>
<span class="err">\</span><span class="n">unset</span><span class="w"> </span><span class="n">QUIET</span><span class="w"></span>
</pre></div>
<p>Скрипт для определения роли машины (<code>/var/lib/pgsql/.role.sh</code>) весьма простой:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span>
<span class="nv">res</span><span class="o">=</span><span class="sb">`</span>psql postgres -t -A -c <span class="s1">'show transaction_read_only;'</span><span class="sb">`</span>
<span class="k">if</span> <span class="o">[</span> <span class="nv">$res</span> <span class="o">==</span> <span class="s1">'off'</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="nb">echo</span> <span class="s1">'M'</span>
<span class="k">else</span>
<span class="nb">echo</span> <span class="s1">'R'</span>
<span class="k">fi</span>
</pre></div>
<p>И конечно же, скриншоты psql с такими настройками:
<a href="https://simply.name/images/psql1.png"><img alt="Colorized psql for dark backgrounds" src="https://simply.name/images/psql1.png"></a></p>
<p><a href="https://simply.name/images/psql2.png"><img alt="Colorized psql for light backgrounds" src="https://simply.name/images/psql2.png"></a></p>
<p>Наслаждайтесь!</p>Pgcheck2014-10-21T15:20:00+04:002014-10-21T15:20:00+04:00d0ubletag:simply.name,2014-10-21:/pgcheck.html<p>A month ago I <a href="https://simply.name/ru/video-pg-meetup-yandex2015.html">spoke</a> about first steps in Yandex.Mail with PostgreSQL and particularly about our tools to provide fault tolerance. One of them is pgcheck - tool for monitoring backend databases from <a href="http://plproxy.projects.pgfoundry.org/doc/tutorial.html"><span class="caps">PL</span>/Proxy</a> hosts and changing <code>plproxy.get_cluster_partitions</code> function output to for controlling load on databases …</p><p>A month ago I <a href="https://simply.name/ru/video-pg-meetup-yandex2015.html">spoke</a> about first steps in Yandex.Mail with PostgreSQL and particularly about our tools to provide fault tolerance. One of them is pgcheck - tool for monitoring backend databases from <a href="http://plproxy.projects.pgfoundry.org/doc/tutorial.html"><span class="caps">PL</span>/Proxy</a> hosts and changing <code>plproxy.get_cluster_partitions</code> function output to for controlling load on databases.</p>
<p>More info could be found on <a href="https://github.com/yandex/pgcheck">github</a> as pgcheck is open source now. Enjoy.</p>Pgcheck2014-10-21T15:20:00+04:002014-10-21T15:20:00+04:00d0ubletag:simply.name,2014-10-21:/ru/pgcheck.html<p>Месяц назад я <a href="https://simply.name/ru/video-pg-meetup-yandex2015.html">рассказывал</a> о первых шагах Яндекс.Почты с PostgreSQL и, в частности, о наших инструментах обеспечения отказоустойчивости. Один из них pgcheck — средство мониторинга конечных баз с <a href="http://plproxy.projects.pgfoundry.org/doc/tutorial.html"><span class="caps">PL</span>/Proxy</a>-машин и изменения выдачи функции <code>plproxy.get_cluster_partitions</code> для распределения нагрузки на базы.</p>
<p>Больше информации можно найти на <a href="https://github.com/yandex/pgcheck">github …</a></p><p>Месяц назад я <a href="https://simply.name/ru/video-pg-meetup-yandex2015.html">рассказывал</a> о первых шагах Яндекс.Почты с PostgreSQL и, в частности, о наших инструментах обеспечения отказоустойчивости. Один из них pgcheck — средство мониторинга конечных баз с <a href="http://plproxy.projects.pgfoundry.org/doc/tutorial.html"><span class="caps">PL</span>/Proxy</a>-машин и изменения выдачи функции <code>plproxy.get_cluster_partitions</code> для распределения нагрузки на базы.</p>
<p>Больше информации можно найти на <a href="https://github.com/yandex/pgcheck">github</a>. Ура!</p>