The cluster in action - part 1

The cluster in action

PostgreSQL delivers his services ensuring the ACID rules are enforced at any time. This chapter will give an outlook of a ``day in the life’' of a PostgreSQL’s cluster. The chapter approach is purposely generic. At this stage is very important to understand the global picture rather the technical details.

After the startup

When the cluster completes the startup procedure it starts accepting the connections. When a connection is successful then the postgres main process forks into a new backend process which is assigned to the connection for the connection’s lifetime. The fork is quite expensive and does not work very well for a high rate of connection’s requests. The maximum number of connections is set at startup and cannot be changed dynamically. Whether the connection is used or not for each connection slot are consumed 400 bytes of shared memory.
Alongside the client’s request the cluster have several subprocesses working in the background.

The write ahead log

The data pages are stored into the shared buffer either for read and write. A mechanism called pinning ensures that only one backend at time is accessing the requested page. If the backend modifies the page then this becomes dirty. A dirty page is not yet written on its data file. However the page’s change is first saved on the write ahead log as WAL record and the commit status for the transactions is then in the directory clog or the directory pg_serial, depending on the transaction isolation level. The wal records are stored into a shared buffer’s area sized by the parameter wal_buffers before the flush on disk into the pg_xlog directory on fixed length segments. When a WAL segment is full then a a new one is created or recycled. When this happens there is a xlog switch. The writes on the WAL are managed by a background process called WAL writer. This process were first introduced with PostgreSQL 8.3.

The checkpoint

The cluster, on a regular basis, executes an important activity called checkpoint. The frequency of this action is governed by the time and space, measured respectively in seconds and log switches between two checkpoints. The checkpoint scans the shared buffer and writes down to the data files all the dirty pages. When the checkpoint is complete the process determines the checkpoint location and writes this information on the control file stored into the cluster’s pg_global tablespace. In the case of unclean shutdown this value is used to determine the WAL segment from where to start the crash recovery.
Before the version 8.3 the checkpoint represented a potential bottleneck because the unavoidable IO spike generated during the writes. That’s the reason why the version 8.3 introduced the concept of spread checkpoints. The cluster aims to a particular completion target time measured in percent of the checkpoint timeout. The default values are respectively 0.5 and 5 minutes. This way the checkpoint will spread over a target time of 2.5 minutes. From PostgreSQL 9.2 a new checkpointer process has been created to manage efficiently the checkpoint.