systemd

mirror of https://github.com/morgan9e/systemd synced 2026-04-14 00:14:32 +09:00

Author	SHA1	Message	Date
Daan De Meyer	9e26ced980	core: Add RootDirectoryFileDescriptor= RootDirectory= but via a open_tree() file descriptor. This allows setting up the execution environment for a service by the client in a mount namespace and then starting a transient unit in that execution environment using the new property. We also add --root-directory= and --same-root-dir= to systemd-run to have it run services within the given root directory. As systemd-run might be invoked from a different mount namespace than what systemd is running in, systemd-run opens the given path with open_tree() and then sends it to systemd using the new RootDirectoryFileDescriptor= property.	2025-10-31 13:09:51 +01:00
Daan De Meyer	836e4e7ea8	core: Clean up includes Split out of #37344.	2025-05-22 09:37:20 +02:00
Daan De Meyer	4ea4abb651	core: Remove circular dependencies between headers Currently there are various circular dependencies between headers in core/. Let's get rid of these by making judicious use of forward declarations and moving includes into implementation files instead of having them in header files. Getting rid of circular header includes simplifies the code and makes various clang based tooling such as iwyu work much better on our code. The most important change is getting rid of the manager.h include in unit.h which is possible thanks to the previous commits. We also move the OOMPolicy and StatusType enums to unit.h to remove the need for other unit headers to include manager.h to get access to these enums.	2025-04-23 10:33:35 +02:00
Mike Yuan	a53e92a17c	core/service: place occurrences of SERVICE_MOUNTING closer to reload states	2024-10-22 19:19:47 +02:00
Mike Yuan	32af4dd80f	core/service: use array rather than list for extra fds, limit max number Follow-up for `3543456f84` I don't think list is particularly useful here. The passed fds are constant for the lifetime of service, and with this commit we track the number of extra fds in a dedicated var anyway.	2024-10-11 18:22:19 +02:00
Mike Yuan	6286f213f5	core/service: use LIST_HEAD where appropriate	2024-10-11 18:21:09 +02:00
Ryan Wilson	3543456f84	Add ExtraFileDescriptor property to StartTransientUnit dbus API This adds the ExtraFileDescriptor property to StartTransient dbus API with format "a(hs)" - array of (file descriptor, name) pairs. The FD will be passed to the unit via sd_notify like Socket and OpenFile. systemctl show also shows ExtraFileDescriptorName for these transient units. We only show the name passed to dbus as the FD numbers will change once passed over the unix socket and are duplicated, so its confusing to display the numbers. We do not add this functionality for systemd-run or general systemd service units as it is not useful for general systemd services. Arguably, it could be useful for systemd-run in bash scripts but we prefer to be cautious and not expose the API yet. Fixes: #34396	2024-10-07 09:01:48 -07:00
Luca Boccassi	5162829ec8	core: do BindMount/MountImage operations in async control process These operations might require slow I/O, and thus might block PID1's main loop for an undeterminated amount of time. Instead of performing them inline, fork a worker process and stash away the D-Bus message, and reply once we get a SIGCHILD indicating they have completed. That way we don't break compatibility and callers can continue to rely on the fact that when they get the method reply the operation either succeeded or failed. To keep backward compatibility, unlike reload control processes, these are ran inside init.scope and not the target cgroup. Unlike ExecReload, this is under our control and is not defined by the unit. This is necessary because previously the operation also wasn't ran from the target cgroup, so suddenly forking a copy-on-write copy of pid1 into the target cgroup will make memory usage spike, and if there is a MemoryMax= or MemoryHigh= set and the cgroup is already close to the limit, it will cause an OOM kill, where previously it would have worked fine.	2024-08-29 12:48:55 +01:00
Luca Boccassi	7d8bbfbe08	service: add 'debug' option to RestartMode= One of the major pait points of managing fleets of headless nodes is that when something fails at startup, unless debug level was already enabled (which usually isn't, as it's a firehose), one needs to manually enable it and pray the issue can be reproduced, which often is really hard and time consuming, just to get extra info. Usually the extra log messages are enough to triage an issue. This new option makes it so that when a service fails and is restarted due to Restart=, log level for that unit is set to debug, so that all setup code in pid1 and sd-executor logs at debug level, and also a new DEBUG_INVOCATION=1 env var is passed to the service itself, so that it knows it should start with a higher log level. Once the unit succeeds or reaches the rate limit the original level is restored.	2024-08-27 12:24:45 +01:00
Mike Yuan	ce31dbf445	core/service: drop redundant flush_n_restarts indicator Now that we track auto-restarts with a dedicated state, there's no need for a separate variable for this. I also took the chance to reorder some struct members.	2024-08-04 09:37:59 +09:00
Mike Yuan	9c025022d9	core/service: store BUSERROR= & VARLINKERROR= received through notification Closes #6073	2024-06-20 19:03:44 +02:00
Mike Yuan	029df9ed7a	core/service: drop unused bus_name_owner Follow-up for `fc67a943d9` After the mentioned comment, we no longer need to record the owner to restore the previous bus owner state. Therefore, bus_name_owner is effectively unused. Kill it.	2024-06-16 19:00:39 +02:00
Lennart Poettering	9cc545447e	core: split out cgroup specific state fields from Unit → CGroupRuntime This refactors the Unit structure a bit: all cgroup-related state fields are moved to a new structure CGroupRuntime, which is only allocated as we realize a cgroup. This is both a nice cleanup and should make unit structures considerably smaller that have no cgroup associated, because never realized or because they belong to a unit type that doesn#t have cgroups anyway. This makes things nicely symmetric: ExecContext → static user configuration about execution ExecRuntime → dynamic user state of execution CGroupContext → static user configuration about cgroups CGroupRuntime → dynamic user state of cgroups And each time the XyzContext is part of the unit type structures such as Service or Slice that need it, but the runtime object is only allocated when a unit is started.	2024-02-16 10:17:40 +01:00
Lennart Poettering	8017ed7e0e	service: don't try to determine selinux label for socket activation if RootImage= is used We cannot determine the SELinux label ahead of time if RootImage= is used, since we'd have to mount the image then, hence don't, and handle this cleanly, and gracefully. While we are at it, stop "reaching over" so much from the socket code to the service code, and instead provide function that most of the hard work in service.c that socket.c just calls. While we are at it, add debug logging and stuff. I noticed the issue when also noticing #30560, but that one is harder to fix, hence I avoided it for now.	2023-12-22 11:51:51 +09:00
Lennart Poettering	c79ab77cd3	core: reference main/control pid of .service units via PidRef The first conversion to PidRef. It's mostly an excercise of search/replace, but with some special care taken for life-cycle (i.e. we need to destroy the structure properly once done, to release the pidfd). It also uses pidfd based killing for some of the killing but leaves most as it is to make the conversion minimal.	2023-09-09 14:03:31 +02:00
Richard Phibel	e568fea9fc	service: add new RestartMode option When this option is set to direct, the service restarts without entering a failed state. Dependent units are not notified of transitory failure. This is useful for the following use case: We have a target with Requires=my-service, After=my-service. my-service.service is a oneshot service and has Restart=on-failure in its definition. my-service.service can get stuck for various reasons and time out, in which case it is restarted. Currently, when it fails the first time, the target fails, even though my-service is restarted. The behavior we're looking for is that until my-service is not restarted anymore, the target stays pending waiting for my-service.service to start successfully or fail without being restarted anymore.	2023-07-06 14:33:52 +02:00
Mike Yuan	49b34f75e7	core: get rid of unused Service.will_auto_restart logic The announced new behavior for OnFailure= never worked properly, and we've fixed the document instead in #27675. Therefore, let's get rid of the unused logic completely. More at #27594. The to-be-added RestartMode= option should cover the use case hopefully. Closes #27594	2023-05-24 21:37:02 +08:00
Mike Yuan	e9f17fa8dd	core: rename RestartSecMax to RestartMaxDelaySec	2023-05-18 00:23:49 +08:00
Lennart Poettering	81a1d6d679	service: rename service_close_socket_fd() → service_release_socket_fd() Just to match service_release_stdio_fd() and service_release_fd_store() in the name, since they do similar things. This follows the concept that we "release" resources, and this is all generically wrapped in "service_release_resources()".	2023-04-13 06:44:27 +02:00
Lennart Poettering	b9c1883a9c	service: add ability to pin fd store Oftentimes it is useful to allow the per-service fd store to survive longer than for a restart. This is useful in various scenarios: 1. An fd to some security relevant object needs to be stashed somewhere, that should not be cleaned automatically, because the security enforcement would be dropped then. 2. A user namespace fd should be allocated on first invocation and be kept around until the user logs out (i.e. systemd --user ends), á la #16328 (This does not implement what #16318 asks for, but should solve the use-case discussed there.) 3. There's interest in allow a concept of "userspace reboots" where the kernel stays running, and userspace is swapped out (i.e. all services exit, and the rootfs transitioned into a new version of it) while keeping some select resources pinned, very similar to how we implement a switch root. Thus it is useful to allow services to exit, while leaving their fds around till the very end. This is exposed through a new FileDescriptorStorePreserve= setting that is closely modelled after RuntimeDirectoryPreserve= (in fact it reused the same internal type), since we want similar behaviour in the end, and quite often they probably want to be used together.	2023-04-13 06:44:27 +02:00
Mike Yuan	5171356eee	core: always calculate the next restart interval Follow-up for #26902 and #26971 Let's always calculate the next restart interval since that's more useful. For that, we add 1 to s->n_restarts unconditionally, and change RestartUSecCurrent property to RestartUSecNext.	2023-03-31 01:22:58 +01:00
Lennart Poettering	2e34aed32b	Merge pull request #26971 from poettering/autostart-dead-failed pid1: introduce new SERVICE_{DEAD\|FAILED}_BEFORE_AUTO_RESTART service…	2023-03-29 21:41:31 +02:00
Lennart Poettering	a7b6eee4ac	Merge pull request #26968 from DaanDeMeyer/exec-runtime core: Introduce unit private exec runtime	2023-03-29 21:40:48 +02:00
Lennart Poettering	a1d315730f	pid1: introduce new SERVICE_{DEAD\|FAILED}_BEFORE_AUTO_RESTART service substates When a service deactivates and is then automatically restarted via Restart= we currently quickly transition through SERVICE_DEAD/SERVICE_FAILED. Which is weird given it's not the normal ("permanent") dead/failed state, but a transitory one we immediately leave from again. We do this so that software that looks for failures/successes can take notice, even if we restart as a consequence of the deactivation. Let's clean this up a bit: let's introduce two new states: SERVICE_DEAD_BEFORE_AUTO_RESTART and SERVICE_FAILED_BEFORE_AUTO_RESTART that are used for the transitory states. Both the SERVICE_DEAD and SERVICE_DEAD_BEFORE_AUTO_RESTART will map to the high-level UNIT_INACTIVE state though. (and similar for the respective failed states). This means the high-level state machine won't change by this, only the low-level one. This clearly seperates the substates, which makes the state engine cleaner, and allows clients to follow precisely whether we are in a transitory dead/failed state, or a permanent one, by looking at the service substate. Moreover it allows us to remove the 'n_keep_fd_store' which so far we used to ensure the fdstore was not released during this transitory dead/failed state but only during the permanent one. Since we can now distinguish these states properly we can just use that. This has been bugging me for a while. Let's clean this up. Note that the unit restart logic is already nicely covered in the testsiute, hence this adds no new tests for that. And yes, this could be considered a compat break, but sofar we took the liberty to make changes to the low-level state machine (i.e. SERVICE_xyz states, sometimes called "substates") without considering this a bad breakage – the high-level state machine (i.e. UNIT_xyz states) should be considered API that cannot be changed.	2023-03-29 17:22:07 +02:00
Daan De Meyer	1522077269	core: Move DynamicCreds into ExecRuntime This is just another piece of runtime data so let's store it in ExecRuntime alongside the other runtime data.	2023-03-27 14:47:30 +02:00
Daan De Meyer	28135da3cd	core: Introduce unit private exec runtime Currently, exec runtimes can be shared between units (using JoinsNamespaceOf=). Let's introduce a concept of a private exec runtime that isn't shared with JoinsNamespaceOf=. The existing ExecRuntime struct is renamed to ExecRuntimeShared and becomes a private member of the new private ExecRuntime.	2023-03-27 14:46:57 +02:00
Daan De Meyer	e76506b748	execute: Rename ExecRuntime to ExecSharedRuntime Preparation for next commit	2023-03-27 14:05:30 +02:00
Mike Yuan	be1adc27fc	core: add RestartSteps= and RestartSecMax= for exponentially increasing interval between restarts RestartSteps= accepts a positive integer as the number of steps to take to increase the interval between auto-restarts from RestartSec= to RestartSecMax=, or 0 to disable it. Closes #6129	2023-03-27 19:31:12 +08:00
Mike Yuan	19dff6914d	core: support overriding NOTIFYACCESS= through sd-notify during runtime Closes #25963	2023-03-22 06:33:12 +08:00
Lennart Poettering	3bd28bf721	pid1: add new Type=notify-reload service type Fixes: #6162	2023-01-10 18:28:38 +01:00
Richard Phibel	cd48e23f6a	core: add OpenFile setting	2023-01-10 15:16:26 +01:00
Nishal Kulkarni	38c41427c7	core/oomd: Use oom-kill ServiceResult for oomd To notify user of kill events from systemd-oomd we now use `SERVICE_FAILURE_OOM_KILL` as the failure result. `unit_check_oomd_kill` now calls `notify_cgroup_oom` to update the service result to `oom-kill`. We add a new xattr `user.oomd_ooms` to keep track of the OOM kills initiated by systemd-oomd, this helps us resolve a race between sending SIGKILL to processes and checking for OOM kill status from the xattr. Related to: #20649	2022-03-22 17:57:59 +05:30
Lennart Poettering	e39eb045a5	pid1: lookup owning PID of BusName= name of services asynchronously A first step of removing blocking calls to the D-Bus broker from PID 1. There's a lot more to got (i.e. grep src/core/ for sd_bus_creds basically), but it's a start. Removing blocking calls to D-Bus broker deals systematicallly with deadlocks caused by dbus-daemon blocking on synchronous IPC calls back to PID1 (e.g. Varlink calls through nss-systemd). Bugs such as #15316. Also-see: https://github.com/systemd/systemd/pull/22038#issuecomment-1042958390	2022-02-18 10:49:31 +01:00
Lennart Poettering	3fabebf45e	socket: always pass socket, fd and SocketPeer ownership to service together Per-connection socket instances we currently maintain three fields related to the socket: a reference to the Socket unit, the connection fd, and a reference to the SocketPeer object that counts socket peers. Let's synchronize their lifetime, i.e. always set them all three together or unset them together, so that their reference counters stay synchronous. THis will in particuar ensure that we'll drop the SocketPeer reference whenever we leave an active state of the service unit, i.e. at the same time we close the fd for it. Fixes: #20685	2021-11-25 00:05:03 +01:00
Henri Chain	596e447076	Reintroduce ExitType This introduces `ExitType=main\|cgroup` for services. Similar to how `Type` specifies the launch of a service, `ExitType` is concerned with how systemd determines that a service exited. - If set to `main` (the current behavior), the service manager will consider the unit stopped when the main process exits. - The `cgroup` exit type is meant for applications whose forking model is not known ahead of time and which might not have a specific main process. The service will stay running as long as at least one process in the cgroup is running. This is intended for transient or automatically generated services, such as graphical applications inside of a desktop environment. Motivation for this is #16805. The original PR (#18782) was reverted (#20073) after realizing that the exit status of "the last process in the cgroup" can't reliably be known (#19385) This version instead uses the main process exit status if there is one and just listens to the cgroup empty event otherwise. The advantages of a service with `ExitType=cgroup` over scopes are: - Integrated logging / stdout redirection - Avoids the race / synchronisation issue between launch and scope creation - More extensive use of drop-ins and thus distro-level configuration: by moving from scopes to services we can have drop ins that will affect properties that can only be set during service creation, like `OOMPolicy` and security-related properties - It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously only services can be used in the generator, not scopes. [1] https://bugs.kde.org/show_bug.cgi?id=433299	2021-11-08 10:15:23 +01:00
Albert Brox	5918a93355	core: implement RuntimeMaxDeltaSec directive	2021-09-28 16:46:20 +02:00
Zbigniew Jędrzejewski-Szmek	abaf5edd08	Revert "Introduce ExitType" This reverts commit `cb0e818f7c`. After this was merged, some design and implementation issues were discovered, see the discussion in #18782 and #19385. They certainly can be fixed, but so far nobody has stepped up, and we're nearing a release. Hopefully, this feature can be merged again after a rework. Fixes #19345.	2021-06-30 21:56:47 +02:00
Zbigniew Jędrzejewski-Szmek	35243b7736	test-unit-serialize: add a very basic test that command deserialization works We should test both serialization and deserialization works properly. But the serialization/deserialization code is deeply entwined with the manager state, and I think quite a bit of refactoring will be required before this is possible. But let's at least add this simple test for now.	2021-04-26 16:15:26 +02:00
Henri Chain	cb0e818f7c	Introduce ExitType	2021-03-31 10:26:07 +02:00
Zbigniew Jędrzejewski-Szmek	2d93c20e5f	tree-wide: use -EINVAL for enum invalid values As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617. This does not touch anything exposed in src/systemd. Changing the defines there would be a compatibility break. Note that tests are broken after this commit. They will be fixed in the next one.	2021-02-10 14:46:59 +01:00
Yu Watanabe	db9ecf0501	license: LGPL-2.1+ -> LGPL-2.1-or-later	2020-11-09 13:23:58 +09:00
Jan Klötzke	bf76080180	core: let user define start-/stop-timeout behaviour The usual behaviour when a timeout expires is to terminate/kill the service. This is what user usually want in production systems. To debug services that fail to start/stop (especially sporadic failures) it might be necessary to trigger the watchdog machinery and write core dumps, though. Likewise, it is usually just a waste of time to gracefully stop a stuck service. Instead it might save time to go directly into kill mode. This commit adds two new options to services: TimeoutStartFailureMode= and TimeoutStopFailureMode=. Both take the same values and tweak the behavior of systemd when a start/stop timeout expires: * 'terminate': is the default behaviour as it has always been, * 'abort': triggers the watchdog machinery and will send SIGABRT (unless WatchdogSignal was changed) and * 'kill' will directly send SIGKILL. To handle the stop failure mode in stop-post state too a new final-watchdog state needs to be introduced.	2020-06-09 10:04:57 +02:00
Chris Down	4793c31083	service: Display updated WatchdogUSec from sd_notify Suppose a service has WatchdogSec set to 2 seconds in its unit file. I then start the service and WatchdogUSec is set correctly: % systemctl --user show psi-notify -p WatchdogUSec WatchdogUSec=2s Now I call `sd_notify(0, "WATCHDOG_USEC=10000000")`. The new timer seems to have taken effect, since I only send `WATCHDOG=1` every 4 seconds, and systemd isn't triggering the watchdog handler. However, `systemctl show` still shows WatchdogUSec as 2s: % systemctl --user show psi-notify -p WatchdogUSec WatchdogUSec=2s This seems surprising, since this "original" watchdog timer isn't the one taking effect any more. This patch makes it so that we instead display the new watchdog timer after sd_notify(WATCHDOG_USEC): % systemctl --user show psi-notify -p WatchdogUSec WatchdogUSec=10s Fixes #15726.	2020-05-27 09:09:40 +02:00
Kenny Levinsen	3052049260	core: (De-)Serialize poll flag for fds in fdstore This replaces manual string splitting and unescaping with extract_first_word.	2020-04-30 19:42:53 +02:00
Yu Watanabe	12213aed12	core: move timeout_clean_usec from Service to ExecContext	2019-08-28 23:09:54 +09:00
Anita Zhang	31cd5f63ce	core: ExecCondition= for services Closes #10596	2019-07-17 11:35:02 +02:00
Lennart Poettering	4c2f584230	core: hook up service unit type with the new clean operation The implementation is pretty straight-foward: when we get a request to clean some type of resources we fork off a process doing that, and while it is running we are in the "cleaning" state.	2019-07-11 12:18:51 +02:00
Anita Zhang	b3d593673c	core: add ExecStartXYZEx= with dbus support for executable prefixes Closes #11654	2019-05-30 20:41:42 -07:00
Yu Watanabe	9c79f0e0a0	core: add assertion in two inline functions	2019-04-14 20:46:24 +09:00
Yu Watanabe	54c1a6ab8c	core: change type of Service::timeout_abort_set to bool Follow-up for `dc653bf487` (#11211).	2019-04-14 20:13:47 +09:00

1 2 3

125 Commits