Commit Graph

8644 Commits

Author SHA1 Message Date
Michal Koutný
cf62e00295 path: Close inotify FD asynchronously
inotify FD may take several milliseconds to close.  We measured
daemon-reload

        default: (0.427 ± 0.05) s
        async:   (0.323 ± 0.02) s

with 5 path units out of 422 units. I.e. ~1% of units cause ~25% of
delay, hence this fix seems like low-hanging fruit on the daemon-reload
critical path.

Particular inotify slowness pointed out by @fbuihuu.
2025-03-04 21:37:58 +01:00
Daan De Meyer
5abf819a5f basic: remove unnecessary definition in missing_xyz.h (#36565) 2025-03-04 08:41:14 +01:00
Yu Watanabe
f342c2420a chattr-util: two trivial cleanups (#36593) 2025-03-04 13:13:25 +09:00
Yu Watanabe
059d23c966 exec-invoke: add missing assertions and drop unnecessary conditions
Fixes CID#1534358.
2025-03-04 05:18:15 +09:00
Yu Watanabe
34b58da114 exec-invoke: modernize get_supplementary_groups()
- drop unused argument 'group',
- rename output arguments,
- add missing assertions for output arguments,
- always initialize output arguments on success.
2025-03-04 05:18:15 +09:00
Yu Watanabe
ec32732043 basic: introduce our own sys/mount.h implementation
To resolve conflict with sys/mount.h and linux/mount.h or linux/fs.h.

The conflict between sys/mount.h and linux/mount.h is resolved in
glibc-2.37 (774058d72942249f71d74e7f2b639f77184160a6), but our baseline
is still glibc-2.31. Also, even with the version or newer, still
sys/mount.h conflicts with linux/fs.h, which is included by
linux/btrfs.h.

This introduces our own implementation of sys/mount.h, that can be
simultaneously included with linux/mount.h and linux/fs.h. This also
imports linux/fs.h, linux/mount.h, and several other dependent headers.
The introduced sys/mount.h header itself may not be enough simple, but
by using the header, we can drop most of workarounds in other source files.
2025-03-04 02:24:49 +09:00
Yu Watanabe
2c2d832eb0 missing_securebits: remove unnecessary header
Our kernel baseline is 5.4, hence all entries in the headers are defined
in linux/securebits.h.
2025-03-04 02:24:49 +09:00
Yu Watanabe
a7cb43d8d1 missing_resource.h: RLIMIT_RTTIME is defined since glibc-2.14
Now our baseline is glibc-2.31.
2025-03-04 02:24:49 +09:00
Yu Watanabe
8e091ec420 basic/linux: import prctl.h from linux 6.14-rc4 2025-03-04 02:24:49 +09:00
Yu Watanabe
e7e91769e8 basic/linux: import ioprio.h from kernel 6.14-rc4
This also fixes the maximum allowed ioprio class: 8 -> 7
2025-03-04 02:24:49 +09:00
Yu Watanabe
a997f3387f chattr-util: drop mostly unused 'previous' argument from chattr_path() and friends 2025-03-04 00:47:12 +09:00
Steve Ramage
241a0f6e0a core: DelegateNamespaces= does not depend on seccomp (#36580) 2025-03-03 14:34:31 +09:00
Mike Yuan
14a40a6d1c core/main: don't write shutdown OSC context outside of pid1
Follow-up for 98c283131c
2025-03-02 16:22:40 +01:00
Daan De Meyer
38701809a8 core: Add DelegateNamespaces= (#36532) 2025-03-01 15:18:45 +01:00
Daan De Meyer
8234cd9989 core: Add DelegateNamespaces=
This delegates one or more namespaces to the service. Concretely,
this setting influences in which order we unshare namespaces. Delegated
namespaces are unshared *after* the user namespace is unshared. Other
namespaces are unshared *before* the user namespace is unshared.

Fixes #35369
2025-03-01 13:54:58 +01:00
Lennart Poettering
19ade24464 notify-recv: add notify_recv() flavour that returns a split up strv instead of he message text as string
This is useful at various places, since we split up the message as first
thing there anyway.
2025-02-28 14:17:52 +01:00
Lennart Poettering
bbdad5c025 core: also issue OSC 3008 from service context
(Note: we also change TEST-13-NSPAWN.machined.sh minimally here, because
it checks for byte precise output of a pty allocated for a service
invocation - which it's not going to get if it claims that the pty is an
all-powerful one. After all this PR ensures that we'll generate the new
OSC sequence on non-dumb terminals associated with services. Hence, set
TERM=dumb explicitly to ensure no ANSI sequences are generated, ever.
Which is a nice test btw that TERM=dumb really does its thing here.)
2025-02-27 15:17:34 +01:00
Lennart Poettering
5b3eaf9e68 terminal-util: change conditioning in terminal_reset_defensive()
So far we conditioned the logic that issues ansi sequences for resetting
the TTY based on whether something is a pty is not (under the assumption
we need no reset on ptys, since they are shortlived).

This is simply wrong though. The pty that a container getty is invoked
on is generally long-lived: as long as the container is up, and it will
be reused between getty instances/sessions all the time. In such a case
we really should reset properly.

Let's instead make the logic dependent on whether TERM is set to
anything other than "dumb". The previous commit made sure we always set
TERM in a sensible way in systemd-run, hence this
*explicit* logic sounds like a much better choice now, as it mea
2025-02-27 15:17:34 +01:00
Lennart Poettering
9ab703d8e1 terminal-util: change 2nd parameter of terminal_reset_defensive() to flags
let's convert the 2nd argumeng form a boolean to a proper flags
parameter. Doesn't change behaviour in anyway, but is more readable, and
prepares ground for adding more flags soon.
2025-02-27 15:13:15 +01:00
Lennart Poettering
98c283131c pid1: issue boot context issue at boot 2025-02-27 15:09:25 +01:00
Daan De Meyer
7904c1dbe6 exec-invoke: Introduce setup_delegated_namespaces()
No functional change, just refactoring.
2025-02-27 10:26:52 +01:00
Daan De Meyer
4fea4f8295 exec-invoke: Simplify logic 2025-02-27 09:30:06 +01:00
Daan De Meyer
0ea00396fd exec-invoke: Move KSM logic up
Let's move it up to be located together with other resource logic
instead of having it stuffed inbetween the sandboxing logic.
2025-02-27 09:16:32 +01:00
Daan De Meyer
d9e41bfe02 exec-invoke: Fix unshare() error handling (#36537) 2025-02-27 09:16:07 +01:00
Daan De Meyer
f215835cb8 exec-invoke: Fix invalid use of error variable
Follow up for 406f177501
2025-02-27 09:15:22 +01:00
Daan De Meyer
c78b06b1d2 exec-invoke: Fix unshare() error handling
Follow up for cd58b5a135
2025-02-27 09:15:03 +01:00
Mike Yuan
c337a1301f core/service: do not propagate reload for combined RELOADING=1 + READY=1 when notify-reload
Follow-up for 3bd28bf721

SERVICE_RELOAD_SIGNAL state can only be reached via explicit reload jobs,
and we have a clear distinction between that and plain RELOADING=1
notifications, the latter of which is issued by clients doing reload
outside of our job engine. I.e. upon SERVICE_RELOAD_SIGNAL + RELOADING=1
we don't propagate reload jobs again, since that's done during transaction
construction stage already. The handling of combined RELOADING=1 + READY=1
so far is bogus however, as it tries to propagate duplicate reload jobs.
Amend this by following the logic for standalone RELOADING=1.
2025-02-26 23:41:33 +00:00
Lennart Poettering
14871a6529 efivars: kill SystemdOptions efi var support
This has been depracted since v254 (2023). Let's kill it for
good now, it has been long enough with 2y. Noone has shown up who wants
to keep it. And given it doesn't work in SB world anyway, and is not
measured is quite problematic security wise.
2025-02-26 17:28:43 +01:00
Mike Yuan
5d09689b5c core/manager: port to notify_recv_with_fds() 2025-02-26 13:27:39 +01:00
Mike Yuan
37149e692a process-util: introduce SIGINFO_CODE_IS_DEAD helper 2025-02-21 18:08:02 +01:00
Mike Yuan
384949f7de core: dlopen()'ify libapparmor
In Arch Linux we currently have a half-baked apparmor support,
in particular we cannot link systemd to libapparmor for service
context integration, since that will pull apparmor into base system.
Hence, let's turn this into a dlopen dep.

Ref: https://gitlab.archlinux.org/archlinux/packaging/packages/systemd/-/issues/22
2025-02-21 14:22:51 +01:00
Daan De Meyer
698ac172aa exec-invoke: Use FORK_DETACH when forking off pid namespace child
This ensures the child process is immediately re-parented to the
manager process which avoids a "Supervising process xxx which is
not our child. We'll most likely not notice when it exits." warning
which can currently happen if the parent systemd-executor parent
process sends the pid namespace child process pidref to the manager
process and the manager process dispatches the child process pidref
before the systemd-executor parent process exits, since at that point
the pid namespace child process's parent will still be the
systemd-executor parent process and not the manager process.
2025-02-20 21:00:55 +01:00
Lennart Poettering
66b5e7dfaa catalog: assign a proper message ID for mounts on symlinked paths
For some reason we reused the non-empty catalog entry so far, which is
plain wrong. Correct that.
2025-02-18 13:49:24 +01:00
Lennart Poettering
38c35970b1 core: port mount unit inode creation to make_mount_point_inode_from_mode() too
This also ports over things to use chase() to create/pin the underlying
to mount, and in particular checks that the path does not contain any
symlinks. That's crucial since we cannot allow mounts to be established
with that, since it would mean we couldn't recognize the entries in
/proc/self/mountinfo anymore.
2025-02-18 13:49:24 +01:00
Yu Watanabe
4053af87bb core/mount: rework GracefulOptions= as x-systemd.graceful-option= (#36356)
Prompted by #36337
2025-02-14 13:01:14 +09:00
Mike Yuan
f565e5a94a core/mount: log only once about fs not supporting new mount API 2025-02-12 18:16:44 +01:00
Mike Yuan
0d76f1c423 core/mount: rework GracefulOptions= to be just x-systemd.graceful-option=
09fbff57fc introduced new knob
for such functionality. However, that seems unnecessary.

The mount option string is ubiquitous in that all of fstab,
kernel cmdline, credentials, systemd-mount, ... speak it.
And we already have x-systemd.device-bound= that's parsed
by pid1 instead of fstab-generator. It feels hence more natural
for graceful options to be an extension of that, rather than
its own property.

There's also one nice side effect that the setting itself
is now more graceful for systemd versions not supporting
such feature.
2025-02-12 18:16:44 +01:00
Mike Yuan
818315ae61 core/service: drop unneeded unit_add_to_gc_queue()
Follow-up for a1d315730f
and 6ac62d61db

With the aforementioned commits, unit_release_resources()
is dispatched in a dedicated queue, and Service.n_keep_fd_store
has been dropped, hence the comment is outdated. Moreover,
the unit is added to GC queue in unit_notify() already.
No other unit types do this in corresponding _enter_dead()
functions, nor does Service need it anymore.
2025-02-12 17:54:34 +01:00
Mike Yuan
468e87267f core/unit: use UNIT_IS_INACTIVE_OR_FAILED at one more place 2025-02-12 17:49:22 +01:00
Mike Yuan
0fa062f983 core/dbus-mount: add missing ReloadResult and CleanResult properties 2025-02-12 15:34:54 +01:00
Mike Yuan
c7c6cf2031 core/mount: trivial coding style cleanups 2025-02-12 15:34:53 +01:00
Mike Yuan
74c0d9726c core/mount: report accurate can_start and can_reload 2025-02-12 15:33:11 +01:00
Mike Yuan
65bc0c03b9 core/mount: check parameters_fragment first in mount_enter_(re)mounting()
I.e. don't perform any action if we can't spawn mount task anyway.
Later the same check would be added to mount_can_start/reload(),
so this makes things more coherent too.
2025-02-12 15:32:30 +01:00
Mike Yuan
7e9a78d6be core/mount: filter out "fail" option as well 2025-02-12 14:43:06 +01:00
Mike Yuan
5fe4c30ca7 core/dbus-service: fix alignment 2025-02-12 14:43:04 +01:00
Paul Fertser
a3aad16c6e socket: resolve unit specifiers in BindToDevice
There are cases where templated Socket unit files are used for network services
with interface name used as an instance. This patch allows using %i for
BindToDevice setting to limit the scope automatically.
2025-02-12 12:03:42 +01:00
Yu Watanabe
869b0dfe6e core: remove path to transient unit file from unit name maps on stop (#36186)
Fixes #35190.
2025-02-10 00:48:01 +09:00
Michal Sekletar
a128273f7b core/namespace: relabel bind mount source based on the target path
Some bind mounts, e.g. /tmp bind mount when PrivateTmp=disconnected,
must be explicitly relabeled because now it would have incorrect SELinux
label. /tmp is expected to have well-known SELinux label, tmp_t. Now it
has label inherited from the source directory of the bind mount.
2025-02-07 12:24:31 +01:00
Yu Watanabe
9eb348c9c5 core/exec-invoke: drop unnecessary casts
Follow-up for c554acd11d.
2025-02-07 09:18:49 +01:00
Lennart Poettering
c554acd11d exec-invoke: respect $HOME set via PAM
This follows the same recent change in util-linux:

https://github.com/util-linux/util-linux/pull/3354

i.e. we generally want that PAM modules can override $HOME and it is
honoured for the CWD after login.

(This renames the 'home' variable we maintained sofar to 'pwent_home',
to clarify that it's the home directory listed in the struct passwd
entry, and thus not necessarily the one actually used)
2025-02-06 09:23:49 +01:00