systemd

mirror of https://github.com/morgan9e/systemd synced 2026-04-14 00:14:32 +09:00

Author	SHA1	Message	Date
Lennart Poettering	fc3adbbbcb	man: always prefix links to uapi specs with their UAPI.XY spec number Let's try to establish the spec numbers, by mentioning them in most doc links. Follow-up for: https://github.com/uapi-group/specifications/pull/187	2025-11-23 18:09:11 +01:00
Christoph Anton Mitterer	07f4718242	man: clarify what “failed” means systemd.service(5)’s documentation of `ExecCondition=` uses “failed” with respect to the unit active state. In particular the unit won’t be considered failed when `ExecCondition=`’s command exits with a status of 1 through 254 (inclusive). It will however, when it exits with 255 or abnormally (e.g. timeout, killed by a signal, etc.). The table “Defined $SERVICE_RESULT values” in systemd.exec(5) uses “failed” however rather with respect to the condition. Tests seem to have shown that, if the exit status of the `ExecCondition=` command is one of 1 through 254 (inclusive), `$SERVICE_RESULT` will be `exec-condition`, if it is 255, `$SERVICE_RESULT` will be `exit-code` (but `$EXIT_CODE` and `$EXIT_STATUS` will be empty or unset), if it’s killed because of `SIGKILL`, `$SERVICE_RESULT` will `signal` and if it times out, `$SERVICE_RESULT` will be `timeout`. This commit clarifies the table at least for the case of an exit status of 1 through 254 (inclusive). The others (signal, timeout and 255 are probably also still ambiguous (e.g. `signal` uses “A service process”, which could be considered as the actual service process only). Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>	2025-11-06 10:47:06 +01:00
Quentin Deslandes	79dd24cf14	core: Add UserNamespacePath= This allows a service to reuse the user namespace created for an existing service, similarly to NetworkNamespacePath=. The configuration is the initial user namespace (e.g. ID mapping) is preserved.	2025-11-04 10:55:04 +01:00
Luca Boccassi	e84aa21af8	man: RootImageOptions= is only supported for system services right now Support via mountfsd is being worked on but will take more time, fix the documentation to be correct in the meanwhile Follow-up for `fad01f798d`	2025-10-22 17:22:03 +01:00
Daniel Foster	c7a444a9c1	tree-wide: extend $LISTEN_FDS protocol with $LISTEN_PIDFDID Although extremely unlikely, there is a race present in solely checking the $LISTEN_PID environment variable, due to PID recycling. Fix that by introducing $LISTEN_PIDFDID, which contains the 64-bit ID of a pidfd for the child process that is not subject to recycling.	2025-10-22 09:34:14 +02:00
Luca Boccassi	fad01f798d	dissect: add support for verity-protected bare filesystems via mountfsd Needed to implement support for RootHashSignature=/RootVerity=/RootHash= and friends when going through mountfsd, for example with user units, so that system and user units provide the same features at the same level	2025-10-16 16:22:33 +01:00
Luca Boccassi	68b476a298	core: also enable PrivateUsers= for user services when using images via mountfsd RootDirectory= and other options already implicitly enable PrivateUsers= since `6ef721cbc7` if they are set in user units, so that they can work out of the box. Now with mountfsd support we can do the same for the images settings, so enable them and document them.	2025-10-16 12:58:59 +01:00
Lennart Poettering	4be269563d	core: if we cannot decode a TPM credential skip over it for ImportCredential= let's skip over credentials we cannot decode when they are found with ImportCredential=. When installing an OS on some disk and using that disk on a different machine than assumed we'll otherwise end up with a broken boot, because the credentials cannot be decoded when starting systemd-firstboot. Let's handle this somewhat gracefully. This leaves handling for LoadCredential=/SetCredential= as it is (i.e. failure to decrypt results in service failure), because it is a lot more explicit and focussed as opposed to ImportCredentials= which looks everywhere, uses globs and so on and is hence very vague and unfocussed. Fixes: #34740	2025-09-18 22:11:57 +02:00
Yu Watanabe	369f311686	man: fix typo Follow-up for `7aefb194e7`.	2025-07-11 14:11:04 +09:00
Matteo Croce	7aefb194e7	man/systemd.exec: explain how BPF token works Add a small paragraph explaining how BPF token works, how it's being created and its relationship between the BPF filesystem. Move all the relevant documentation in the PrivateBPF= section and let point all the BPFDelegate* options to that one.	2025-07-10 21:40:07 +02:00
Yu Watanabe	f436c64e61	man: fix typo Follow-up for `7baf403430`.	2025-07-10 14:02:00 +09:00
Yu Watanabe	1cf5b39d64	core: add 'DefaultRestrictSUIDSGID' config option (#38126 ) closes #37602, see there for extra motivation and considered alternatives. On typical systems, only few services need to create SUID/SGID files. This often is limited to the user explicitly setting suid/sgid, the `systemd-tmpfiles*` services, and the package manager. Allowing a default to globally restrict creation of suid/sgid files makes it easier to apply this restriction precisely. ## testing done - built on aarch64-linux and x86_64-linux - ran a VM test on x86_64-linux, checking for: - VM system boots successfully - defaults apply (both `yes`, `no`, and undefined) - systemd tmpfiles can set suid/sgid on journal log path - Other services explicitly defining `RestrictSUIDSGID=no` can create suid files	2025-07-10 13:30:07 +09:00
Matteo Croce	7baf403430	man/systemd.exec: update documentation for PrivateBPF= Add a short description about what PrivateBPF=yes does and how it can be useful.	2025-07-10 01:57:14 +02:00
Grimmauld	0316fb8219	core: document 'DefaultRestrictSUIDSGID'	2025-07-09 21:45:46 +02:00
Matteo Croce	ea9826eb94	core: add options to delegate BPFFS token creation Add four new options BPFDelegate{Commands,Maps,Programs,Attachments}= in order to delegate to a BPFFS instance the permission to create tokens. The value is a list of options taken from: https://github.com/torvalds/linux/blob/v6.14/include/uapi/linux/bpf.h#L922-L1121 The special value "any" means to allow every possible values. More informations about BPF tokens here: https://lwn.net/Articles/947173/	2025-07-08 22:35:29 +02:00
Matteo Croce	3a47437fc9	core: Introduce PrivateBPF= to mount a private BPFFS Add a new option PrivateBPF= to mount a new instance of bpffs within a namespace. PrivateBPF= can be set to "no" to use the host bpffs in readonly mode and "yes" to do a new mount. The mount is done with the new fsopen()/fsmount() API because in future we'll hook some commands between the two calls.	2025-07-08 22:33:28 +02:00
Andres Beltran	26c6f3271a	core: add quota support for State, Cache, and Log exec directories	2025-07-07 17:28:47 +00:00
Lennart Poettering	2be3a06bb2	core: when PrivateDevices= is enabled and we need to decrypt TPM2 credentials, go via IPC Also, if a device ACL list is defined, also go via IPC (instead of trying to patch it, as before). The outcome is that the tighter rules continue to apply when configured. Fixes: #35959	2025-06-24 22:16:01 +02:00
Anton Ryzhov	bd02e15710	man/systemd-creds: fix documentation typo in systemd.exec.xml	2025-06-03 07:42:44 +09:00
Zbigniew Jędrzejewski-Szmek	b082968d19	man: better tags, more links, minor grammar and formatting improvements Closes https://github.com/systemd/systemd/issues/35751.	2025-05-28 15:35:53 +02:00
Luca Boccassi	6946eed3fa	core: Also refresh confext extensions when reloading notify-reload service (#33995 ) `ExtensionImages=` and `ExtensionDirectories=` now let you specify vpick-named extensions; however, since they just get set up once when the service is started, you can't see newer versions without restarting the service entirely. Here, also reload confext extensions when you reload a service. This allows you to deploy a new version of some configuration and have it picked up at reload time without interruption to your workload. Right now, we would only reload confext extensions and leave the sysext ones behind, since it didn't seem prudent to swap out what is likely program code at reload. This is made possible by only going for the `SYSTEMD_CONFEXT_HIERARCHIES` overlays (which only contains `/etc`). This PR: - Adjusts `service.c` to also refresh extensions when needed. - Adds integration tests to check that a confext reload actually occurred. - Adds to the `systemd.exec` man pages to document this behavior. This is a follow up to #24864 and #31364. Thank you to @bluca and @goenkam for help in getting this up.	2025-05-20 11:27:34 +01:00
maia x.	67ecc2c7fe	man: document confext reload behavior for ExtensionDirectories/Images	2025-05-19 13:36:21 +01:00
Lennart Poettering	bfb1f9e2c9	core: pass the socket cookie to invoked per-connection service instances as $SO_COOKIE env var The socket cookie is just too useful for identifying connections, let's emphasize this a bit and pass it as environment variable.	2025-05-15 09:45:32 +02:00
Lennart Poettering	3bdcd994cd	man: correct version information when $REMOTE_ADDR/$REMOTE_PORT where added This was in commit `3b1c524154`, i.e. in the v220 cycle.	2025-05-15 09:45:19 +02:00
Yu Watanabe	8ac5b047fc	man/systemd.exec: update documents for PrivateTmp=	2025-05-11 03:33:02 +09:00
Zbigniew Jędrzejewski-Szmek	2dc4e87849	man/systemd.exec: reword description of RestrictAddressFamilies= The text is reordered and broken into more paragraphs. A recommendation to combine RestrictAddressFamilies= with SystemCallFilter=@service is added.	2025-05-06 21:14:03 +02:00
Zbigniew Jędrzejewski-Szmek	802d23fcfb	man/systemd.exec: reword description of SystemCallFilter= The existing text grew organically as features were added and was not very organized. Reorder it and break into paragraphs grouped by topic. The description of the :errno syntax is replaced by a short reference to the SystemCallErrorNumber= setting. This makes the text shorter and makes it easier to explain how the two settings combine.	2025-05-06 21:14:03 +02:00
Yu Watanabe	4db8663b81	tree-wide: fix typo	2025-04-27 10:36:12 +09:00
Daan De Meyer	ba77798bba	unit: Make sure individual unit maximum log level always takes priority Currently LogLevelMax= can only be used to decrease the maximum log level for a unit but not to increase it. Let's make sure the latter works as well, so LogLevelMax=debug can be used to enable debug logging for specific units without enabling debug logging globally.	2025-04-23 14:46:12 +02:00
Mike Yuan	32b69b190b	core: delegate mountns implicitly when any of pidns/cgns/netns is in use	2025-03-30 18:57:18 +02:00
NetSysFire	1f0e4af329	systemd.exec(5): RestrictAddressFamilies: mention address_families(7)	2025-03-11 00:00:55 +09:00
Daan De Meyer	8234cd9989	core: Add DelegateNamespaces= This delegates one or more namespaces to the service. Concretely, this setting influences in which order we unshare namespaces. Delegated namespaces are unshared after the user namespace is unshared. Other namespaces are unshared before the user namespace is unshared. Fixes #35369	2025-03-01 13:54:58 +01:00
Lennart Poettering	7933e971ce	pid1: pass pidfdids to invoked services in $MAINPIDFDID and $MANAGERPIDFDID	2025-01-20 21:51:40 +01:00
Lennart Poettering	8af1b296cb	pid1: when a password is requested during PAMName= processing, query it via the ask-password logic	2025-01-18 11:45:44 +00:00
Michal Sekletar	f1a0f311e6	man: adjust description of PrivateUsers= so it is in line with reality When the option is not available unit will not even start so there is no security risk. Fixes #34983	2024-12-29 14:38:00 +09:00
Jan Engelhardt	c592ebdf4f	man: grammar fixes for introductory adverbs/phrases	2024-12-25 17:24:38 +01:00
Jan Engelhardt	44855c77a1	man: expand word contractions For written text, contractions are not normally used.	2024-12-25 17:00:31 +01:00
Jan Engelhardt	82ea392a99	man: grammar fixes for "regardless"	2024-12-25 17:00:31 +01:00
Lennart Poettering	4103bf9f2f	man: document the new per-use credstore paths (And some other minor tweaks)	2024-12-20 17:52:07 +01:00
Lennart Poettering	00a415fc8f	tree-wide: remove support for kernels lacking ambient caps Let's bump the kernel baseline a bit to 4.3 and thus require ambient caps. This allows us to remove support for a variety of special casing, most importantly the ExecStart=!! hack.	2024-12-17 17:34:46 +01:00
Yu Watanabe	e76fcd0e40	core: make ProtectHostname= optionally take a hostname Closes #35623.	2024-12-16 23:55:44 +09:00
Luca Boccassi	6dfd290031	core: Add PrivateUsers=full (#35183 ) Recently, PrivateUsers=identity was added to support mapping the first 65536 UIDs/GIDs from parent to the child namespace and mapping the other UID/GIDs to the nobody user. However, there are use cases where users have UIDs/GIDs > 65536 and need to do a similar identity mapping. Moreover, in some of those cases, users want a full identity mapping from 0 -> UID_MAX. To support this, we add PrivateUsers=full that does identity mapping for all available UID/GIDs. Note to differentiate ourselves from the init user namespace, we need to set up the uid_map/gid_map like: ``` 0 0 1 1 1 UINT32_MAX - 1 ``` as the init user namedspace uses `0 0 UINT32_MAX` and some applications - like systemd itself - determine if its a non-init user namespace based on uid_map/gid_map files. Note systemd will remove this heuristic in running_in_userns() in version 258 (https://github.com/systemd/systemd/pull/35382) and uses namespace inode. But some users may be running a container image with older systemd < 258 so we keep this hack until version 259 for version N-1 compatibility. In addition to mapping the whole UID/GID space, we also set /proc/pid/setgroups to "allow". While we usually set "deny" to avoid security issues with dropping supplementary groups (https://lwn.net/Articles/626665/), this ends up breaking dbus-broker when running /sbin/init in full OS containers. Fixes: #35168 Fixes: #35425	2024-12-13 12:25:13 +00:00
Ryan Wilson	2665425176	core: Set /proc/pid/setgroups to allow for PrivateUsers=full When trying to run dbus-broker in a systemd unit with PrivateUsers=full, we see dbus-broker fails with EPERM at `util_audit_drop_permissions`. The root cause is dbus-broker calls the setgroups() system call and this is disallowed via systemd's implementation of PrivateUsers= by setting /proc/pid/setgroups = deny. This is done to remediate potential privilege escalation vulnerabilities in user namespaces where an attacker can remove supplementary groups and gain access to resources where those groups are restricted. However, for OS-like containers, setgroups() is a pretty common API and disabling it is not feasible. So we allow setgroups() by setting /proc/pid/setgroups to allow in PrivateUsers=full. Note security conscious users can still use SystemCallFilter= to disable setgroups() if they want to specifically prevent this system call. Fixes: #35425	2024-12-12 11:36:10 +00:00
Yu Watanabe	627d1a9ac1	core: Add ProtectHostname=private (#35447 ) This PR allows an option for systemd exec units to enable UTS namespaces but not restrict changing hostname via seccomp. Thus, units can change hostname without affecting the host. This is useful for OS-like containers running as units where they should have freedom to change their container hostname if they want, but not the host's hostname. Fixes: #30348	2024-12-11 10:17:25 +09:00
Ryan Wilson	219a6dbbf3	core: Fix time namespace in RestrictNamespaces= RestrictNamespaces= would accept "time" but would not actually apply seccomp filters e.g. systemd-run -p RestrictNamespaces=time unshare -T true should fail but it succeeded. This commit actually enables time namespace seccomp filtering.	2024-12-10 20:55:26 +01:00
Ryan Wilson	cf48bde7ae	core: Add ProtectHostname=private This allows an option for systemd exec units to enable UTS namespaces but not restrict changing hostname via seccomp. Thus, units can change hostname without affecting the host. Fixes: #30348	2024-12-06 13:34:04 -08:00
Ryan Wilson	705cc82938	core: Add PrivateUsers=full Recently, PrivateUsers=identity was added to support mapping the first 65536 UIDs/GIDs from parent to the child namespace and mapping the other UID/GIDs to the nobody user. However, there are use cases where users have UIDs/GIDs > 65536 and need to do a similar identity mapping. Moreover, in some of those cases, users want a full identity mapping from 0 -> UID_MAX. Note to differentiate ourselves from the init user namespace, we need to set up the uid_map/gid_map like: ``` 0 0 1 1 1 UINT32_MAX - 1 ``` as the init user namedspace uses `0 0 UINT32_MAX` and some applications - like systemd itself - determine if its a non-init user namespace based on uid_map/gid_map files. Note systemd will remove this heuristic in running_in_userns() in version 258 and uses namespace inode. But some users may be running a container image with older systemd < 258 so we keep this hack until version 259. To support this, we add PrivateUsers=full that does identity mapping for all available UID/GIDs. Fixes: #35168	2024-12-05 10:34:32 -08:00
Septatrix	5857f31c2c	man: clarify wording regarding MONITOR_* envs	2024-12-06 03:01:19 +09:00
Zbigniew Jędrzejewski-Szmek	fe45f8dc9b	man: drop whitespace from final <programlisting> lines In the troff output, this doesn't seem to make any difference. But in the html output, the whitespace is sometimes preserved, creating an additional gap before the following content. Drop it everywhere to avoid this.	2024-11-08 14:14:36 +01:00
Lennart Poettering	b711737096	man: document that PrivateTmp= is unaffected by ProtectSystem=strict Fixes: #33130	2024-11-05 22:57:51 +01:00

1 2 3 4 5 ...

680 Commits