systemd

mirror of https://github.com/morgan9e/systemd synced 2026-04-14 00:14:32 +09:00

Author	SHA1	Message	Date
Mike Yuan	58686034eb	core: expose transactions with ordering cycle Closes #3829 Alternative to #35417 I don't think the individual "WasOnDependencyCycle" attrs on units are particularly helpful and comprehensible, as it's really about the dep relationship between them. And as discussed, the dependency cycle is not something persistent, rather local to the currently loaded set of units and shall be reset with daemon-reload (see also https://github.com/systemd/systemd/issues/35642#issuecomment-2591296586). Hence, let's report system state as degraded and point users to the involved transactions when ordering cycles are encountered instead. Combined with log messages added in `6912eb315f` it should achieve the goal of making ordering cycles more observable, while avoiding all sorts of subtle bookkeeping in the service manager. The degraded state can be reset via the existing ResetFailed() manager-wide method.	2025-11-12 23:47:39 +01:00
Mike Yuan	b03e1b09af	core/service: rework ExecReload= + Type=notify-reload interaction, add ExecReloadPost= When Type=notify-reload got introduced, it wasn't intended to be mutually exclusive with ExecReload=. However, currently ExecReload= is immediately forked off after the service main process is signaled, leaving states in between essentially undefined. Given so broken it is I doubt any sane user is using this setup, hence I took a stab to rework everything: 1. Extensions are refreshed (unchanged) 2. ExecReload= is forked off without signaling the process 3a. If RELOADING=1 is sent during the ExecReload= invocation, we'd refrain from signaling the process again, instead just transition to SERVICE_RELOAD_NOTIFY directly and wait for READY=1 3b. If not, signal the process after ExecReload= finishes (from now on the same as Type=notify-reload w/o ExecReload=) 4. To accomodate the use case of performing post-reload tasks, ExecReloadPost= is introduced which executes after READY=1 The new model greatly simplifies things, as no control processes will be around in SERVICE_RELOAD_SIGNAL and SERVICE_RELOAD_NOTIFY states. See also: https://github.com/systemd/systemd/issues/37515#issuecomment-2891229652	2025-11-04 12:18:33 +01:00
Mike Yuan	48632305c7	man/org.freedesktop.systemd1: fix typo (ExecStop -> -Post)	2025-11-04 12:17:33 +01:00
Quentin Deslandes	79dd24cf14	core: Add UserNamespacePath= This allows a service to reuse the user namespace created for an existing service, similarly to NetworkNamespacePath=. The configuration is the initial user namespace (e.g. ID mapping) is preserved.	2025-11-04 10:55:04 +01:00
Yu Watanabe	63b27d1765	man: mention about UINT64_MAX value for OOMKills and ManagedOOMKills DBus properties Follow-up for `9cf6ad16dd`. Addresses https://github.com/systemd/systemd/pull/38906#discussion_r2363291116.	2025-10-05 23:09:31 +09:00
Daan De Meyer	9cf6ad16dd	core: Expose oom kills and managed oom kills as properties It can be useful for users to know this information so let's expose it as properties so it can be queried.	2025-09-19 13:54:54 +02:00
Brett Holman	04abe03189	man: correct the number of active unit states	2025-07-28 20:32:48 +01:00
Lennart Poettering	03b4a607f6	core: followups for the recent subgroup killing commits This is a follow-up for `0f23564ad4` and `6b02854f50`, as suggested here: https://github.com/systemd/systemd/pull/37855#pullrequestreview-2997596953	2025-07-10 13:32:51 +09:00
Yu Watanabe	1cf5b39d64	core: add 'DefaultRestrictSUIDSGID' config option (#38126 ) closes #37602, see there for extra motivation and considered alternatives. On typical systems, only few services need to create SUID/SGID files. This often is limited to the user explicitly setting suid/sgid, the `systemd-tmpfiles*` services, and the package manager. Allowing a default to globally restrict creation of suid/sgid files makes it easier to apply this restriction precisely. ## testing done - built on aarch64-linux and x86_64-linux - ran a VM test on x86_64-linux, checking for: - VM system boots successfully - defaults apply (both `yes`, `no`, and undefined) - systemd tmpfiles can set suid/sgid on journal log path - Other services explicitly defining `RestrictSUIDSGID=no` can create suid files	2025-07-10 13:30:07 +09:00
Grimmauld	97998d1cbe	core/dbus-manager: Support 'DefaultRestrictSUIDSGID' option	2025-07-09 21:45:38 +02:00
Matteo Croce	ea9826eb94	core: add options to delegate BPFFS token creation Add four new options BPFDelegate{Commands,Maps,Programs,Attachments}= in order to delegate to a BPFFS instance the permission to create tokens. The value is a list of options taken from: https://github.com/torvalds/linux/blob/v6.14/include/uapi/linux/bpf.h#L922-L1121 The special value "any" means to allow every possible values. More informations about BPF tokens here: https://lwn.net/Articles/947173/	2025-07-08 22:35:29 +02:00
Matteo Croce	3a47437fc9	core: Introduce PrivateBPF= to mount a private BPFFS Add a new option PrivateBPF= to mount a new instance of bpffs within a namespace. PrivateBPF= can be set to "no" to use the host bpffs in readonly mode and "yes" to do a new mount. The mount is done with the new fsopen()/fsmount() API because in future we'll hook some commands between the two calls.	2025-07-08 22:33:28 +02:00
Lennart Poettering	0f23564ad4	pid1: add ability to kill processes in a subgroup of a unit This is useful for things like machined, where the system machined wants to manage a machine owned by the user somewhere down the tree.	2025-07-08 03:14:53 +02:00
Andres Beltran	26c6f3271a	core: add quota support for State, Cache, and Log exec directories	2025-07-07 17:28:47 +00:00
Lennart Poettering	d03714e4e4	tree-wide: "human readable" → "human-readable" Apparently, the spelling with a hyphen is better style in the English language. Suggested by: #36165	2025-07-07 11:21:25 +02:00
Mike Yuan	1b4ab5a209	core/socket: introduce DeferTrigger= and DeferTriggerMaxSec= Alternative to `b50f6dbe57` The commit naively returned early from socket_enter_running(), which however is quite problematic, as the socket will be woken up over and over again without doing a thing, until we eventually hit Poll/TriggerLimit*=. On top of that it requires hacks to hold the start job for initrd-switch-root.service up. Overall I doubt that is the right approach. Let's instead hook this into our job engine, and try to activate the service again when some other units are stopped. If all installed jobs have been run yet we're still seeing the conflict or the manually selected timeout is reached, fail the socket as before.	2025-06-30 13:10:43 +02:00
Lennart Poettering	99fd08224f	man: add proper version info for RandomizedOffsetUSec Follow-up for: #36437 Fixes: #37870	2025-06-26 17:28:49 +02:00
Lennart Poettering	279962a9e8	core/timer: Introduce RandomOffsetSec= knob (#36437 ) This is like RandomDelaySec, but it doesn't reset whenever the manager restarts. Fixes https://github.com/systemd/systemd/issues/21166	2025-06-17 16:05:12 +02:00
Mike Yuan	5c12797fc3	core/socket: introduce AcceptFileDescriptors= This controls the new SO_PASSRIGHTS socket option in kernel v6.16. Note that I intentionally choose a different naming scheme than Pass=, since all other Pass= options controls whether some extra bits are attached to the message, while this one's about denying file descriptor transfer and it feels more explicit this way. And diverging from underlying socket option name is precedented by Timestamping=. But happy to change it to just say PassRights= if people disagree.	2025-06-17 13:16:42 +02:00
Mike Yuan	35462aa14a	core/socket: add PassPIDFD=	2025-06-17 13:16:41 +02:00
Mike Yuan	b36ab0d4ce	core/socket: don't suggest PassFileDescriptorsToExec= is a socket option by not interleaving it among socket options.	2025-06-17 13:16:07 +02:00
Mike Yuan	29da53dde3	core: always enable CPU accounting Our baseline is v5.4 and cgroup v2 is enforced now, which means CPU accounting is cheap everywhere without requiring any controller, hence just remove the directive.	2025-05-15 02:19:16 +02:00
Lennart Poettering	9ca16f6f18	pid1: add a concurrency limit to slice units Fixes: #35862	2025-04-22 18:53:51 +02:00
Yu Watanabe	8c35e8a9d2	core: remove deprecated StartAuxiliaryScope() DBus method The method is deprecated since `64f173324e` (v257) and announced that it will be removed in v258. Let's remove it now. This effectively reverts `84c01612de`.	2025-04-22 09:02:45 +09:00
Yu Watanabe	db6986e02c	core: deprecate CGroup v1 DBus properties	2025-04-15 22:34:22 +09:00
Luca Boccassi	b065ff03b1	man: fix typo in org.freedesktop.systemd1.xml	2025-03-24 18:25:29 +00:00
Daan De Meyer	8234cd9989	core: Add DelegateNamespaces= This delegates one or more namespaces to the service. Concretely, this setting influences in which order we unshare namespaces. Delegated namespaces are unshared after the user namespace is unshared. Other namespaces are unshared before the user namespace is unshared. Fixes #35369	2025-03-01 13:54:58 +01:00
Adrian Vovk	9a0749c82b	core/timer: Introduce RandomOffsetSec= knob This is like RandomDelaySec, but it doesn't reset whenever the manager restarts. Fixes https://github.com/systemd/systemd/issues/21166	2025-02-18 19:16:57 -05:00
Marco Trevisan (Treviño)	bd887a75d4	man/org.freedesktop.systemd1.xml: Clarify the behavior of Subscribe() It was unclear that it was applied to standard signals too, and this lead to unexpected behavior. See: https://github.com/systemd/systemd/pull/36366	2025-02-18 09:56:11 +00:00
Mike Yuan	0d76f1c423	core/mount: rework GracefulOptions= to be just x-systemd.graceful-option= `09fbff57fc` introduced new knob for such functionality. However, that seems unnecessary. The mount option string is ubiquitous in that all of fstab, kernel cmdline, credentials, systemd-mount, ... speak it. And we already have x-systemd.device-bound= that's parsed by pid1 instead of fstab-generator. It feels hence more natural for graceful options to be an extension of that, rather than its own property. There's also one nice side effect that the setting itself is now more graceful for systemd versions not supporting such feature.	2025-02-12 18:16:44 +01:00
Mike Yuan	0fa062f983	core/dbus-mount: add missing ReloadResult and CleanResult properties	2025-02-12 15:34:54 +01:00
Lennart Poettering	09fbff57fc	pid1: add GracefulOptions= setting to .mount units This new setting can be used to specify mount options that shall only be added to the mount option string if the kernel supports them. This shall be used for adding "usrquota" to tmp.mount without breaking compat, but is generally be useful.	2025-01-15 21:05:06 +01:00
Lennart Poettering	390dffb862	man: also fix documentation of start-limit-hit	2025-01-15 10:42:10 +01:00
Lennart Poettering	94634b4b03	pid1: add D-Bus API for removing delegated subcgroups When running unprivileged containers, we run into a scenario where an unpriv owned cgroup has a subcgroup delegated to another user (i.e. the container's own UIDs). When the owner of that cgroup dies without cleaning it up then the unpriv service manager might encounter a cgroup it cannot delete anymore. Let's address that: let's expose a method call on the service manager (primarly in PID1) that can be used to delete a subcgroup of a unit one owns. This would then allow the unpriv service manager to ask the priv service manager to get rid of such a cgroup. This commit only adds the method call, the next commit then adds the code that makes use of this.	2025-01-08 15:27:25 +01:00
Jan Engelhardt	44855c77a1	man: expand word contractions For written text, contractions are not normally used.	2024-12-25 17:00:31 +01:00
Jan Engelhardt	91dc2a52f5	man: grammar fixes: replace "respectively" Unlike the German "bzw.", "respectively" cannot be used as an infix, and is not abbreviated either.	2024-12-25 17:00:26 +01:00
Yu Watanabe	e76fcd0e40	core: make ProtectHostname= optionally take a hostname Closes #35623.	2024-12-16 23:55:44 +09:00
Ryan Wilson	6746f28854	core: Migrate ProtectHostname to use enum vs boolean Migrating ProtectHostname to enum will set the stage for adding more properties like ProtectHostname=private in future commits. In addition, we add PrivateHostnameEx property to dbus API which uses string instead of boolean.	2024-12-06 13:33:49 -08:00
Štěpán Němec	597c6cc119	man: fix incorrect volume numbers in internal man page references Some ambiguity (e.g., same-named man pages in multiple volumes) makes it impossible to fully automate this, but the following Python snippet (run inside the man/ directory of the systemd repo) helped to generate the sed command lines (which were subsequently manually reviewed, run and the false positives reverted): from pathlib import Path import lxml from lxml import etree as ET man2vol: dict[str, str] = {} man2citerefs: dict[str, list] = {} for file in Path(".").glob("*.xml"): tree = ET.parse(file, lxml.etree.XMLParser(recover=True)) meta = tree.find("refmeta") if meta is not None: title = meta.findtext("refentrytitle") if title is not None: vol = meta.findtext("manvolnum") if vol is not None: man2vol[title] = vol citerefs = list(tree.iter("citerefentry")) if citerefs: man2citerefs[title] = citerefs for man, refs in man2citerefs.items(): for ref in refs: title = ref.findtext("refentrytitle") if title is not None: has = ref.findtext("manvolnum") try: should_have = man2vol[title] except KeyError: # Non-systemd man page reference? Ignore. continue if has != should_have: print( f"sed -i '\\\|<citerefentry><refentrytitle>{title}" f"</refentrytitle><manvolnum>{has}</manvolnum>" f"</citerefentry>\|s\|<manvolnum>{has}</manvolnum>\|" f"<manvolnum>{should_have}</manvolnum>\|' {man}.xml" )	2024-11-11 20:31:08 +01:00
Zbigniew Jędrzejewski-Szmek	fe45f8dc9b	man: drop whitespace from final <programlisting> lines In the troff output, this doesn't seem to make any difference. But in the html output, the whitespace is sometimes preserved, creating an additional gap before the following content. Drop it everywhere to avoid this.	2024-11-08 14:14:36 +01:00
Lennart Poettering	607d297487	man: link up D-Bus API docs from daemon man pages Let's systematically make sure that we link up the D-Bus interfaces from the daemon man pages once in prose and once in short form at the bottom ("See Also"), for all daemons. Also, add reverse links at the bottom of the D-Bus API docs. Fixes: #34996	2024-11-05 22:57:51 +01:00
Daan De Meyer	406f177501	core: Introduce PrivatePIDs= This new setting allows unsharing the pid namespace in a unit. Because you have to fork to get a process into a pid namespace, we fork in systemd-executor to get into the new pid namespace. The parent then sends the pid of the child process back to the manager and exits while the child process continues on with the rest of exec_invoke() and then executes the actual payload. Communicating the child pid is done via a new pidref socket pair that is set up on manager startup. We unshare the PID namespace right before the mount namespace so we mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes to mount procfs. When running unprivileged in a user session, user namespace is set up first to allow for PID namespace to be unshared. However, when running in privileged mode, we unshare the user namespace last to ensure the user namespace does not own the PID namespace and cannot break out of the sandbox. Note we disallow Type=forking services from using PrivatePIDs=yes since the init proess inside the PID namespace must not exit for other processes in the namespace to exist. Note Daan De Meyer did the original work for this commit with Ryan Wilson addressing follow-ups. Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>	2024-11-05 05:32:02 -08:00
Luca Boccassi	890bdd1d77	core: add read-only flag for exec directories When an exec directory is shared between services, this allows one of the service to be the producer of files, and the other the consumer, without letting the consumer modify the shared files. This will be especially useful in conjunction with id-mapped exec directories so that fully sandboxed services can share directories in one direction, safely.	2024-11-01 10:46:55 +00:00
Ryan Wilson	cd58b5a135	cgroup: Add support for ProtectControlGroups= private and strict This commit adds two settings private and strict to the ProtectControlGroups= property. Private will unshare the cgroup namespace and mount a read-write private cgroup2 filesystem at /sys/fs/cgroup. Strict does the same except the mount is read-only. Since the unit is running in a cgroup namespace, the new root of /sys/fs/cgroup is the unit's own cgroup. We also add a new dbus property ProtectControlGroupsEx which accepts strings instead of boolean. This will allow users to use private/strict via dbus and systemd-run in addition to service files. Note private and strict fall back to no and yes respectively if the kernel doesn't support cgroup2 or system is not using unified hierarchy. Fixes: #34634	2024-10-28 08:37:36 -07:00
Mike Yuan	7e40b51a2e	man/org.freedesktop.systemd1: complete version info for ManagedOOMMemoryPressureDurationUSec Follow-up for `63d4c4271c` Some unit types were left out.	2024-10-22 19:12:27 +02:00
Ryan Wilson	63d4c4271c	cgroup: Add ManagedOOMMemoryPressureDurationSec= override setting for units This will allow units (scopes/slices/services) to override the default systemd-oomd setting DefaultMemoryPressureDurationSec=. The semantics of ManagedOOMMemoryPressureDurationSec= are: - If >= 1 second, overrides DefaultMemoryPressureDurationSec= from oomd.conf - If is empty, uses DefaultMemoryPressureDurationSec= from oomd.conf - Ignored if ManagedOOMMemoryPressure= is not "kill" - Disallowed if < 1 second Note the corresponding dbus property is DefaultMemoryPressureDurationUSec which is in microseconds. This is consistent with other time-based dbus properties.	2024-10-16 20:12:38 -07:00
Arthur Shau	cc0ab8c810	timer: introduce DeferReactivation setting By default, in instances where timers are running on a realtime schedule, if a service takes longer to run than the interval of a timer, the service will immediately start again when the previous invocation finishes. This is caused by the fact that the next elapse is calculated based on the last trigger time, which, combined with the fact that the interval is shorter than the runtime of the service, causes that elapse to be in the past, which in turn means the timer will trigger as soon as the service finishes running. This behavior can be changed by enabling the new DeferReactivation setting, which will cause the next calendar elapse to be calculated based on when the trigger unit enters inactivity, rather than the last trigger time. Thus, if a timer is on an realtime interval, the trigger will always adhere to that specified interval. E.g. if you have a timer that runs on a minutely interval, the setting guarantees that triggers will happen at ::00 times, whereas by default this may skew depending on how long the service runs. Co-authored-by: Matteo Croce <teknoraver@meta.com>	2024-10-11 22:54:16 +02:00
Lennart Poettering	0aaacc3a10	Merge pull request #34593 from Werkov/deprecate-aux-scopes core/manager: Deprecate StartAuxiliaryScope() method	2024-10-09 10:25:30 +02:00
Michal Koutný	64f173324e	core/manager: Deprecate StartAuxiliaryScope() method The method was added with migration of resources in mind (e.g. process's allocated memory will follow it to the new scope), however, such a resource migration is not in cgroup semantics. The method may thus have the intended users and others could be guided to StartTransientUnit(). Since this API was advertised in a regular release, start the removal with a deprecation message to callers. Eventually, the goal is to remove the method to clean up DBus API and simplify code (removal of cgroup_context_copy()). Part of DBus docs is retained to satisfy build checks.	2024-10-08 17:49:13 +02:00
Ryan Wilson	3543456f84	Add ExtraFileDescriptor property to StartTransientUnit dbus API This adds the ExtraFileDescriptor property to StartTransient dbus API with format "a(hs)" - array of (file descriptor, name) pairs. The FD will be passed to the unit via sd_notify like Socket and OpenFile. systemctl show also shows ExtraFileDescriptorName for these transient units. We only show the name passed to dbus as the FD numbers will change once passed over the unix socket and are duplicated, so its confusing to display the numbers. We do not add this functionality for systemd-run or general systemd service units as it is not useful for general systemd services. Arguably, it could be useful for systemd-run in bash scripts but we prefer to be cautious and not expose the API yet. Fixes: #34396	2024-10-07 09:01:48 -07:00

1 2 3 4 5

248 Commits