core: let user define start-/stop-timeout behaviour

The usual behaviour when a timeout expires is to terminate/kill the
service. This is what user usually want in production systems. To debug
services that fail to start/stop (especially sporadic failures) it
might be necessary to trigger the watchdog machinery and write core
dumps, though. Likewise, it is usually just a waste of time to
gracefully stop a stuck service. Instead it might save time to go
directly into kill mode.

This commit adds two new options to services: TimeoutStartFailureMode=
and TimeoutStopFailureMode=. Both take the same values and tweak the
behavior of systemd when a start/stop timeout expires:

 * 'terminate': is the default behaviour as it has always been,
 * 'abort': triggers the watchdog machinery and will send SIGABRT
   (unless WatchdogSignal was changed) and
 * 'kill' will directly send SIGKILL.

To handle the stop failure mode in stop-post state too a new
final-watchdog state needs to be introduced.
This commit is contained in:
Jan Klötzke
2019-04-16 16:45:20 +02:00
committed by Lennart Poettering
parent 8b5616fa91
commit bf76080180
10 changed files with 200 additions and 42 deletions

View File

@@ -185,6 +185,7 @@ static const char* const service_state_table[_SERVICE_STATE_MAX] = {
[SERVICE_STOP_SIGTERM] = "stop-sigterm",
[SERVICE_STOP_SIGKILL] = "stop-sigkill",
[SERVICE_STOP_POST] = "stop-post",
[SERVICE_FINAL_WATCHDOG] = "final-watchdog",
[SERVICE_FINAL_SIGTERM] = "final-sigterm",
[SERVICE_FINAL_SIGKILL] = "final-sigkill",
[SERVICE_FAILED] = "failed",

View File

@@ -127,6 +127,7 @@ typedef enum ServiceState {
SERVICE_STOP_SIGTERM,
SERVICE_STOP_SIGKILL,
SERVICE_STOP_POST,
SERVICE_FINAL_WATCHDOG, /* In case the STOP_POST executable needs to be aborted. */
SERVICE_FINAL_SIGTERM, /* In case the STOP_POST executable hangs, we shoot that down, too */
SERVICE_FINAL_SIGKILL,
SERVICE_FAILED,