nixos/postgresql: implement auto-restart & rework dependencies of postgresql.target
At my employer's NixOS-based platform, PostgreSQL is configured with
`Restart=always` which got never upstreamed, unfortunately.
This however revealed an interesting problem when using bi-directional
BindsTo: when killing `postgresql.service`, sometimes both the service &
target starts back up and sometimes they don't. According to an upstream
bugreport[1] this is a known problem because you have two conflicting
operations scheduled in a single transaction, namely
* When (auto-)restarting, a restart job for all units bound to the
restarting unit are immediately scheduled[2].
* Due to the `BindsTo` relationship, a stop-job for `postgresql.target`
is scheduled immediately by the manager loop[3]. This is caused by the
`UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT` "atom" which is ONLY set for a
BindsTo relationship[4].
When this is processed first, the restart is inhibited:
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Main process exited, code=killed, status=9/KILL
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Changed running -> stop-sigterm
Jul 12 13:25:51 nixos systemd[1]: postgresql.target: Trying to enqueue job postgresql.target/stop/replace
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Installed new job postgresql.service/stop as 80053
Jul 12 13:25:51 nixos systemd[1]: postgresql.target: Installed new job postgresql.target/stop as 80052
Jul 12 13:25:51 nixos systemd[1]: postgresql.target: Enqueued job postgresql.target/stop as 80052
[...]
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Service restart not allowed.
It's subtle and non-obvious from the man-page, but the way how units are
stopped is different when using `PartOf=` or `Requires=` which don't have the
`UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT` property, but instead schedules the
stop/start of the target AFTER the stop-job of postgresql.service which
is turned into a start-job because of Restart=always:
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Main process exited, code=killed, status=9/KILL
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Failed with result 'signal'.
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Service will restart (restart setting)
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.target: Installed new job postgresql.target/restart as 80996
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Installed new job postgresql.service/restart as 80907
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Scheduled restart job, restart counter is at 1.
[...]
Jul 12 13:33:00 nixos systemd[1]: Stopped target postgresql.target.
Jul 12 13:33:00 nixos systemd[1]: postgresql.target: Converting job postgresql.target/restart -> postgresql.target/start
Jul 12 13:33:00 nixos systemd[1]: Stopping postgresql.target...
[...]
Jul 12 13:33:00 nixos systemd[1]: Stopped postgresql.service.
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Converting job postgresql.service/restart -> postgresql.service/start
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Changed dead -> running
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Job 80907 postgresql.service/start finished, result=done
Jul 12 13:33:00 nixos systemd[1]: Started postgresql.service.
Jul 12 13:33:00 nixos systemd[1]: postgresql.target: Changed dead -> active
[...]
Jul 12 13:33:00 nixos systemd[1]: Reached target postgresql.target.
Do note that the stop job (including the restart) of postgresql.service
is fully processed here before dealing with PartOf/ConsistsOf
relationships.
I tested this against the following cases:
| Unit | Action | Propagates to |
| ------------------ | ------------ | ------------------ |
| postgresql.target | restart | postgresql.service |
| postgresql.target | start | postgresql.service |
| postgresql.target | stop | psotgresql.service |
| postgresql.service | start | postgresql.target |
| postgresql.service | restart | postgresql.target |
| postgresql.service | stop | postgresql.target |
| postgresql.service | auto-restart | postgresql.target |
| postgresql.service | failure | postgresql.target |
[1] e.g. systemd issue 8374
[2] https://github.com/systemd/systemd/blob/v256-stable/src/core/service.c#L2535-L2542
[3] https://github.com/systemd/systemd/blob/v256-stable/src/core/manager.c#L1611-L1626
[4] https://github.com/systemd/systemd/blob/v256-stable/src/core/unit-dependency-atom.c#L30-L35
This commit is contained in:
@@ -769,7 +769,7 @@ in
|
||||
systemd.targets.postgresql = {
|
||||
description = "PostgreSQL";
|
||||
wantedBy = [ "multi-user.target" ];
|
||||
bindsTo = [
|
||||
requires = [
|
||||
"postgresql.service"
|
||||
"postgresql-setup.service"
|
||||
];
|
||||
@@ -780,8 +780,13 @@ in
|
||||
|
||||
after = [ "network.target" ];
|
||||
|
||||
# To trigger the .target also on "systemctl start postgresql".
|
||||
bindsTo = [ "postgresql.target" ];
|
||||
# To trigger the .target also on "systemctl start postgresql" as well as on
|
||||
# restarts & stops.
|
||||
# Please note that postgresql.service & postgresql.target binding to
|
||||
# each other makes the Restart=always rule racy and results
|
||||
# in sometimes the service not being restarted.
|
||||
wants = [ "postgresql.target" ];
|
||||
partOf = [ "postgresql.target" ];
|
||||
|
||||
environment.PGDATA = cfg.dataDir;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user