systemd

mirror of https://github.com/morgan9e/systemd synced 2026-04-15 17:06:39 +09:00

Author	SHA1	Message	Date
Lennart Poettering	de90700f36	pid1: set SYSTEMD_NSS_DYNAMIC_BYPASS=1 env var for dbus-daemon There's currently a deadlock between PID 1 and dbus-daemon: in some cases dbus-daemon will do NSS lookups (which are blocking) at the same time PID 1 synchronously blocks on some call to dbus-daemon. Let's break that by setting SYSTEMD_NSS_DYNAMIC_BYPASS=1 env var for dbus-daemon, which will disable synchronously blocking varlink calls from nss-systemd to PID 1. In the long run we should fix this differently: remove all synchronous calls to dbus-daemon from PID 1. This is not trivial however: so far we had the rule that synchronous calls from PID 1 to the dbus broker are OK as long as they only go to interfaces implemented by the broke itself rather than services reachable through it. Given that the relationship between PID 1 and dbus is kinda special anyway, this was considered acceptable for the sake of simplicity, since we quite often need metadata about bus peers from the broker, and the asynchronous logic would substantially complicate even the simplest method handlers. This mostly reworks the existing code that sets SYSTEMD_NSS_BYPASS_BUS= (which is a similar hack to deal with deadlocks between nss-systemd and dbus-daemon itself) to set SYSTEMD_NSS_DYNAMIC_BYPASS=1 instead. No code was checking SYSTEMD_NSS_BYPASS_BUS= anymore anyway, and it used to solve a similar problem, hence it's an obvious piece of code to rework like this. Issue originally tracked down by Lukas Märdian. This patch is inspired and closely based on his patch: https://github.com/systemd/systemd/pull/22038 Fixes: #15316 Co-authored-by: Lukas Märdian <slyon@ubuntu.com>	2022-02-18 10:49:36 +01:00
Luca Boccassi	a07b992606	core: add ExtensionDirectories= setting Add a new setting that follows the same principle and implementation as ExtensionImages, but using directories as sources. It will be used to implement support for extending portable images with directories, since portable services can already use a directory as root.	2022-01-21 22:53:12 +09:00
Peter Morrow	cdebedb4d4	service: pass service exit status to spawned On{Failure,Success}= dependency When a service exits and triggers either an OnFailure= or OnSuccess= dependency we now set a new environment variable for the ExecStart= and ExecStartPre= process. This variable $MONITOR_METADATA exposes the metadata relating to the service which triggered the dependency. MONITOR_METADATA takes the following form: MONITOR_METADATA="SERVICE_RESULT=<result-string0>,EXIT_CODE=<exit-code0>,EXIT_STATUS=<exit-status0>,INVOCATION_ID=<id>,UNIT=<triggering-unit0.service>;SERVICE_RESULT=<result-stringN>,EXIT_CODE=<exit-codeN>,=EXIT_STATUS=<exit-statusN>,INVOCATION_ID=<id>,UNIT=<triggering-unitN.service>" MONITOR_METADATA is space separated set of metadata relating to the service(s) which triggered the dependency. This is a list since if we have 2 services which trigger the same dependency then the dependency start job may be merged. In this case we need to pass both service metadata to the triggered service. If there is no job merging then MONITOR_METADATA will be a single entry. For example, in the case we had a service "failer.service" which triggers "failer-handler.service", the following variable is exported to the ExecStart= and ExecStartPre= processes in failer-handler.service: MONITOR_METADATA="SERVICE_RESULT=exit-code,EXIT_CODE=exited,EXIT_STATUS=1,INVOCATION_ID=67c657ed7b34466ea369abdf994c6393,UNIT=failer.service" In another example where we have failer.service and failer2.service which both also trigger failer-handler.service then the start job for failer-handler.service may be merged and we might get the following: MONITOR_METADATA="SERVICE_RESULT=exit-code,EXIT_CODE=exited,EXIT_STATUS=1,INVOCATION_ID=16a93ad196c94109990fb8b9aa5eef5f,UNIT=failer.service;SERVICE_RESULT=exit-code,EXIT_CODE=exited,EXIT_STATUS=1,INVOCATION_ID=ff70131e4cc145e994fb621de25a3e8f,UNIT=failer2.service"	2021-12-13 11:25:57 +00:00
Daan De Meyer	51462135fb	exec: Add TTYRows and TTYColumns properties to set TTY dimensions	2021-11-05 21:32:14 +00:00
Luca Boccassi	211a3d87fb	core: add [State\|Runtime\|Cache\|Logs]Directory symlink as second parameter When combined with a tmpfs on /run or /var/lib, allows to create arbitrary and ephemeral symlinks for StateDirectory or RuntimeDirectory. This is especially useful when sharing these directories between different services, to make the same state/runtime directory 'backend' appear as different names to each service, so that they can be added/removed to a sharing agreement transparently, without code changes. An example (simplified, but real) use case: foo.service: StateDirectory=foo bar.service: StateDirectory=bar foo.service.d/shared.conf: StateDirectory= StateDirectory=shared:foo bar.service.d/shared.conf: StateDirectory= StateDirectory=shared:bar foo and bar use respectively /var/lib/foo and /var/lib/bar. Then the orchestration layer decides to stop this sharing, the drop-in can be removed. The services won't need any update and will keep working and being able to store state, transparently. To keep backward compatibility, new DBUS messages are added.	2021-10-28 10:47:46 +01:00
Iago Lopez Galeiras	b1994387d3	core: use LSM BPF functions to implement RestrictFileSystems= It attaches the LSM BPF program when the system manager starts up. It populates the hash of maps BPF map when services that have RestrictFileSystems= set start. It cleans up the hash of maps when the unit cgroup is pruned. To pass the file descriptor of the BPF map we add it to the keep_fds array.	2021-10-06 10:52:14 +02:00
alexlzhu	8c35c10d20	core: Add ExecSearchPath parameter to specify the directory relative to which binaries executed by Exec= should be found Currently there does not exist a way to specify a path relative to which all binaries executed by Exec should be found. The only way is to specify the absolute path. This change implements the functionality to specify a path relative to which binaries executed by Exec= can be found. Closes #6308	2021-09-28 14:52:27 +01:00
Lennart Poettering	43144be4a1	pid1: add support for encrypted credentials	2021-07-08 09:30:56 +02:00
Xℹ Ruoyao	a70581ffb5	New directives PrivateIPC and IPCNamespacePath	2021-03-04 00:04:36 +08:00
Luca Boccassi	93f597013a	Add ExtensionImages directive to form overlays Add support for overlaying images for services on top of their root fs, using a read-only overlay.	2021-02-23 15:34:46 +00:00
Zbigniew Jędrzejewski-Szmek	2d93c20e5f	tree-wide: use -EINVAL for enum invalid values As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617. This does not touch anything exposed in src/systemd. Changing the defines there would be a compatibility break. Note that tests are broken after this commit. They will be fixed in the next one.	2021-02-10 14:46:59 +01:00
Topi Miettinen	ddc155b2fd	New directives NoExecPaths= ExecPaths= Implement directives `NoExecPaths=` and `ExecPaths=` to control `MS_NOEXEC` mount flag for the file system tree. This can be used to implement file system W^X policies, and for example with allow-listing mode (NoExecPaths=/) a compromised service would not be able to execute a shell, if that was not explicitly allowed. Example: [Service] NoExecPaths=/ ExecPaths=/usr/bin/daemon /usr/lib64 /usr/lib Closes: #17942.	2021-01-29 12:40:52 +00:00
Lennart Poettering	3bdc25a4cf	core: make NotifyAccess= in combination with RootDirectory=/RootImage= work Previously if people enabled RootDirectory=/RootImage= and NotifyAccess= together, things wouldn't work, they'd have to explicitly add BindReadOnlyPaths=/run/systemd/notify too. Let's make this implicit. Since both options are opt-in, if people use them together it would be pointless not also defining the BindReadOnlyPaths= entry, in which case we can just do it automatically. See: #18051	2021-01-20 22:39:07 +01:00
Luca Boccassi	5e8deb94c6	core: add DBUS method to bind mount new nodes without service restart Allow to setup new bind mounts for a service at runtime (via either DBUS or a new 'systemctl bind' verb) with a new helper that forks into the unit's mount namespace. Add a new integration test to cover this. Useful for zero-downtime addition to services that are running inside mount namespaces, especially when using RootImage/RootDirectory. If a service runs with a read-only root, a tmpfs is added on /run to ensure we can create the airlock directory for incoming mounts under /run/host/incoming.	2021-01-18 17:24:05 +00:00
Lucas Werkmeister	8d7dab1fda	Add truncate: to StandardOutput= etc. This adds the ability to specify truncate:PATH for StandardOutput= and StandardError=, similar to the existing append:PATH. The code is mostly copied from the related append: code. Fixes #8983.	2021-01-15 09:54:50 +01:00
Yu Watanabe	db9ecf0501	license: LGPL-2.1+ -> LGPL-2.1-or-later	2020-11-09 13:23:58 +09:00
Zbigniew Jędrzejewski-Szmek	fcb7138ca7	test-path: do not fail the test if we fail to start a service because of cgroup setup The test was failing because it couldn't start the service: path-modified.service: state = failed; result = exit-code path-modified.path: state = waiting; result = success path-modified.service: state = failed; result = exit-code path-modified.path: state = waiting; result = success path-modified.service: state = failed; result = exit-code path-modified.path: state = waiting; result = success path-modified.service: state = failed; result = exit-code path-modified.path: state = waiting; result = success path-modified.service: state = failed; result = exit-code path-modified.path: state = waiting; result = success path-modified.service: state = failed; result = exit-code Failed to connect to system bus: No such file or directory -.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied path-modified.service: Failed to create cgroup /system.slice/kojid.service/path-modified.service: Permission denied path-modified.service: Failed to attach to cgroup /system.slice/kojid.service/path-modified.service: No such file or directory path-modified.service: Failed at step CGROUP spawning /bin/true: No such file or directory path-modified.service: Main process exited, code=exited, status=219/CGROUP path-modified.service: Failed with result 'exit-code'. Test timeout when testing path-modified.path In fact any of the services that we try to start may fail, especially considering that we're doing some rogue cgroup operations. See https://github.com/systemd/systemd/pull/16603#issuecomment-679133641.	2020-10-22 11:05:17 +02:00
Lennart Poettering	74e1252072	execute: add helper for checking if root_directory/root_image are set in ExecContext	2020-10-01 11:02:11 +02:00
Zbigniew Jędrzejewski-Szmek	5e98086d16	core: remember when we set ExecContext.mount_apivfs No functional change intended so far.	2020-09-24 10:03:18 +02:00
Topi Miettinen	9df2cdd8ec	exec: SystemCallLog= directive With new directive SystemCallLog= it's possible to list system calls to be logged. This can be used for auditing or temporarily when constructing system call filters. --- v5: drop intermediary, update HASHMAP_FOREACH_KEY() use v4: skip useless debug messages, actually parse directive v3: don't declare unused variables with old libseccomp v2: fix build without seccomp or old libseccomp	2020-09-15 12:54:17 +03:00
Lennart Poettering	bb0c0d6f29	core: add credentials logic Fixes: #15778 #16060	2020-08-25 19:45:35 +02:00
Lennart Poettering	4e39995371	core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options Kernel 5.8 gained a hidepid= implementation that is truly per procfs, which allows us to mount a distinct once into every unit, with individual hidepid= settings. Let's expose this via two new settings: ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=). Replaces: #11670	2020-08-24 20:11:02 +02:00
Luca Boccassi	b3d133148e	core: new feature MountImages Follows the same pattern and features as RootImage, but allows an arbitrary mount point under / to be specified by the user, and multiple values - like BindPaths. Original implementation by @topimiettinen at: https://github.com/systemd/systemd/pull/14451 Reworked to use dissect's logic instead of bare libmount() calls and other review comments. Thanks Topi for the initial work to come up with and implement this useful feature.	2020-08-05 21:34:55 +01:00
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00
Zbigniew Jędrzejewski-Szmek	56a13a495c	pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only Read-only /var/tmp is more likely, because it's backed by a real device. /tmp is (by default) backed by tmpfs, but it doesn't have to be. In both cases the same consideration applies. If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail because we cannot create the subdir under /var/tmp to mount the private directory. But many services actually don't require /var/tmp (either because they only use it occasionally, or because they only use /tmp, or even because they don't use the temporary directories at all, and PrivateTmp=yes is used to isolate them from the rest of the system). To handle both cases let's create a read-only directory under /run/systemd and mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into dumping too much data in /run.) $ sudo systemd-run -t -p PrivateTmp=yes bash Running as unit: run-u14.service Press ^] three times within 1s to disconnect TTY. [root@workstation /]# ls -l /tmp/ total 0 [root@workstation /]# ls -l /var/tmp/ total 0 [root@workstation /]# touch /tmp/f [root@workstation /]# touch /var/tmp/f touch: cannot touch '/var/tmp/f': Read-only file system This commit has more changes than I like to put in one commit, but it's touching all the same paths so it's hard to split. exec_runtime_make() was using the wrong cleanup function, so the directory would be left behind on error.	2020-07-14 19:47:15 +02:00
Luca Boccassi	d4d55b0d13	core: add RootHashSignature service parameter Allow to explicitly pass root hash signature as a unit option. Takes precedence over implicit checks.	2020-06-25 08:45:21 +01:00
Lennart Poettering	6b000af4f2	tree-wide: avoid some loaded terms https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.	2020-06-25 09:00:19 +02:00
Luca Boccassi	0389f4fa81	core: add RootHash and RootVerity service parameters Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.	2020-06-23 10:50:09 +02:00
Lennart Poettering	f3dc6af20f	core: automatically update StandardOuput=syslog to =journal (and similar for StandardError=) Let's go one step further and upgrade implicitly. Usually =syslog assignments are historic artifacts only. Let's upgrade the lines automatically, and politely suggest people update their unit files/configuration (and drop the lines altogether, without replacement). Fixes: #15807	2020-05-15 00:05:46 +02:00
Zbigniew Jędrzejewski-Szmek	ad21e542b2	manager: add CoredumpFilter= setting Fixes #6685.	2020-04-09 14:08:48 +02:00
Michal Sekletár	e2b2fb7f56	core: add support for setting CPUAffinity= to special "numa" value systemd will automatically derive CPU affinity mask from NUMA node mask. Fixes #13248	2020-03-16 08:57:28 +01:00
Michal Sekletár	1808f76870	shared: split out NUMA code from cpu-set-util.c to numa-util.c	2020-03-16 08:23:18 +01:00
Lennart Poettering	91dd5f7cbe	core: add new LogNamespace= execution setting	2020-01-31 15:01:43 +01:00
Kevin Kuehler	fc64760dda	core: shared: Add ProtectClock= to systemd.exec	2020-01-26 12:23:33 -08:00
Kevin Kuehler	8470304018	core: Add ProtectKernelLogs If seccomp is enabled, load the SYSCALL_FILTER_SET_SYSLOG into the seccomp filter set. Drop the CAP_SYSLOG capability.	2019-11-11 12:12:02 -08:00
Zbigniew Jędrzejewski-Szmek	5ac1530eca	tree-wide: say "ratelimit" not "rate_limit" "ratelimit" is a real word, so we don't need to use the other form anywhere. We had both forms in various places, let's standarize on the shorter and more correct one.	2019-09-20 16:05:53 +02:00
Yu Watanabe	12213aed12	core: move timeout_clean_usec from Service to ExecContext	2019-08-28 23:09:54 +09:00
Lennart Poettering	6b7b2ed96b	core: add type of resource string table	2019-07-11 12:18:51 +02:00
Lennart Poettering	4c2f584230	core: hook up service unit type with the new clean operation The implementation is pretty straight-foward: when we get a request to clean some type of resources we fork off a process doing that, and while it is running we are in the "cleaning" state.	2019-07-11 12:18:51 +02:00
Lennart Poettering	380dc8b0a2	core: add generic "clean" operation to units This adds basic infrastructure to implement a "clean" operation for unit types. This "clean" operation is supposed to remove on-disk resources of units, and is supposed to be used in a later commit to clean our RuntimeDirectory=, StateDirectory= and so on of service units. Later commits will open this up to the bus, and hook up service units with this. This also adds a new generic ActiveState called UNIT_MAINTENANCE. It's supposed to cover all kinds of "maintainance" state of units. Specifically, this is supposed to cover the "cleaning" operations later added for service units which might take a bit of time. This high-level, generic, abstract state is called UNIT_MAINTENANCE instead of the more specific "UNIT_CLEANING", since I think this should be kept open for different operations possibly later on that could be nicely subsumed under this (for example, maybe a recursive chown()ing operation could be covered by this, and similar).	2019-07-11 12:18:51 +02:00
Michal Sekletar	b070c7c0e1	core: introduce NUMAPolicy and NUMAMask options Make possible to set NUMA allocation policy for manager. Manager's policy is by default inherited to all forked off processes. However, it is possible to override the policy on per-service basis. Currently we support, these policies: default, prefer, bind, interleave, local. See man 2 set_mempolicy for details on each policy. Overall NUMA policy actually consists of two parts. Policy itself and bitmask representing NUMA nodes where is policy effective. Node mask can be specified using related option, NUMAMask. Default mask can be overwritten on per-service level.	2019-06-24 16:58:54 +02:00
Anita Zhang	b3d593673c	core: add ExecStartXYZEx= with dbus support for executable prefixes Closes #11654	2019-05-30 20:41:42 -07:00
Zbigniew Jędrzejewski-Szmek	0985c7c4e2	Rework cpu affinity parsing The CPU_SET_S api is pretty bad. In particular, it has a parameter for the size of the array, but operations which take two (CPU_EQUAL_S) or even three arrays (CPU_{AND,OR,XOR}_S) still take just one size. This means that all arrays must be of the same size, or buffer overruns will occur. This is exactly what our code would do, if it received an array of unexpected size over the network. ("Unexpected" here means anything different from what cpu_set_malloc() detects as the "right" size.) Let's rework this, and store the size in bytes of the allocated storage area. The code will now parse any number up to 8191, independently of what the current kernel supports. This matches the kernel maximum setting for any architecture, to make things more portable. Fixes #12605.	2019-05-29 10:20:42 +02:00
Lennart Poettering	f69567cbe2	core: expose SUID/SGID restriction as new unit setting RestrictSUIDSGID=	2019-04-02 16:56:48 +02:00
Lennart Poettering	0a6991e0bb	tree-wide: reorder various structures to make them smaller and use fewer cache lines Some "pahole" spelunking.	2019-03-27 18:11:11 +01:00
Zbigniew Jędrzejewski-Szmek	ca78ad1de9	headers: remove unneeded includes from util.h This means we need to include many more headers in various files that simply included util.h before, but it seems cleaner to do it this way.	2019-03-27 11:53:12 +01:00
Lennart Poettering	6f765baf23	core: rework how we reset the TTY after use by a service This makes two changes: 1. Instead of resetting the configured service TTY each time after a process exited, let's do so only when the service goes back to "dead" state. This should be preferable in case the started processes leave background child processes around that still reference the TTY. 2. chmod() and chown() the TTY at the same time. This should make it safe to run "systemd-run -p DynamicUser=1 -p StandardInput=tty -p TTYPath=/dev/tty8 /bin/bash" without leaving a TTY owned by a dynamic user around.	2019-03-20 21:28:02 +01:00
Lennart Poettering	a8d08f39d1	core: add new setting NetworkNamespacePath= for configuring a netns by path for a service Fixes: #2741	2019-03-07 16:55:23 +01:00
Anita Zhang	7ca69792e5	core: add ':' prefix to ExecXYZ= skip env var substitution	2019-02-20 17:58:14 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00

1 2 3 4

186 Commits