Commit Graph

6586 Commits

Author SHA1 Message Date
Lennart Poettering
6eeeef9f66 process-util: introduce new FORK_FREEZE flag for safe_fork()
Often we want to fork off a process that just hangs until we kill it,
let's add a simple flag to create one of this type, and use it at
various places.
2025-01-16 11:55:21 +01:00
Lennart Poettering
8110b34b64 pidref: various shortcuts to pidref_equal()
This adds some shortcuts to pidref_equal(), so that we don't have to
query the pidfs id if there's no need.
2025-01-16 11:55:21 +01:00
Lennart Poettering
9ef559a036 tree-wide: drop support for kernels without pidfd_open() and pidfd_send_signal() (#35971) 2025-01-16 11:37:17 +01:00
Lennart Poettering
39706728e1 namespace-util: don't reset UID/GIDs in namespace_enter() unless we enter a userns
The reset of UID/GID only really makes sense if we enter a userns, hence
let#s restrict it to that.
2025-01-16 11:26:57 +01:00
Mike Yuan
70923ed358 meson: enable -Wzero-as-null-pointer-constant
Support for C added in gcc 15:
236c0829ee
2025-01-16 02:26:56 +01:00
Mike Yuan
347eb8fbe3 tree-wide: remove unnecessary gcc >= 7 version check
Our baseline is gcc 8.4.
2025-01-16 02:26:56 +01:00
hanjinpeng
7e91a68b2f log: check isempty for object_field and extra_field 2025-01-15 22:36:58 +00:00
Lennart Poettering
2ca0f3ed2e pty_open_peer() follow-up (#36027) 2025-01-15 21:05:59 +01:00
Yu Watanabe
e722fe74ca random-util: fix compilation error
Fixes the following error:
```
../src/basic/random-util.c: In function "fallback_random_bytes":
../src/basic/random-util.c:45:26: error: initializer-string for array of "char" is too long [-Werror=unterminated-string-initialization]
   45 |                 .label = "systemd fallback random bytes v1",
      |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
```
2025-01-15 20:24:30 +01:00
Mike Yuan
d693ba5f8e terminal-util: drop unused open_terminal_in_namespace()
With our baseline including TIOCGPTPEER we now systematically
open pty peer through ioctl(), i.e. this sits unused
since 1d522f1a86. Kill it!
2025-01-15 17:46:10 +01:00
wrvsrx
6013dee98d efivars: deal with uncommitted efi variables
Unfortunately kernel reports EOF if there's an inconsistency between efivarfs var list
and what's actually stored in firmware, c.f. #34304. A zero size env var is not allowed in
efi and hence the variable doesn't really exist in the backing store as long as it is zero
sized, and the kernel calls this "uncommitted". Hence we translate EOF back to ENOENT here,
as with kernel behavior before
3fab70c165

If the kernel changes behaviour (to flush dentries on resume), we can drop
this at some point in the future. But note that the commit is 11
years old at this point so we'll need to deal with the current behaviour for
a long time.

Fix #34304.
2025-01-15 16:53:21 +01:00
Lennart Poettering
b5a6f4c05b string-util: make strjoin() just a special case of strextend() (#36011)
This is split out of #36010, but makes a ton of sense on its own.
2025-01-15 13:25:08 +01:00
Lennart Poettering
7adafb0832 missing: add quotactl_fd() wrapper 2025-01-15 13:24:04 +01:00
Lennart Poettering
fd3b7cf772 string-util: add a mechanism for strextend_with_separator() for specifying "ignore" arguments
in strv_new() we have STRV_IGNORE for skipping over an argument in the
argument list. Let's add the same to strextend_with_separator():

strextend_with_separator(&x, "foo", POINTER_MAX, "bar");

will result in "foobar" being appended to "x". (POINTER_MAX Which is
different from NULL, which terminates the argument list).

This is useful for ternary op situations.

(We should probably get rid of STRV_IGNORE and just use POINTER_MAX
everywhere directly, but that's for another time.)
2025-01-15 10:52:38 +01:00
Lennart Poettering
34467ffa3c string-util: make strjoin() just a special case of strextend()
The functions are very similar, let's make them the same. If the first
argument to strextend() is NULL instead of extending a string we'll
allocate a fresh one and return that.
2025-01-15 10:51:53 +01:00
Lennart Poettering
1d522f1a86 terminal-util: drop support for pre-TIOCGPTPEER kernels
Our minimum baseline is now far beyond 4.13, hence let's drop these
fallback paths.
2025-01-15 10:39:04 +01:00
Zbigniew Jędrzejewski-Szmek
69c9629da7 sysusers: emit audit events for user and group creation (#35957)
Background: Fedora/RHEL are switching to sysusers.d metadata for
creation of users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must
emit audit events so that it provides a drop-in replacement.

systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP
when adding users and groups. The operation "names" are copied from
shadow-utils, so the format of the events that is generated on success
should be identical. On failure, things are more complicated. We write
the whole file at once, once, so we first generate "success" messages
for each entry, then we try to write the files, and if things fail, we
generate failure messages to all entries that we failed to write.
2025-01-15 10:36:07 +01:00
Zbigniew Jędrzejewski-Szmek
9c6afab6b6 sysusers: emit audit events for user and group creation
Background: Fedora/RHEL are switching to sysusers.d metadata for creation of
users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must emit
audit events so that it provides a drop-in replacement.

systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP when
adding users and groups. The operation "names" are copied from shadow-utils in
Fedora (which has a patch to change them from the upstream version), so the
format of the events that is generated on success should be identical.

The helper code is shared between sysusers and utmp-wtmp. I changed the
audit_fd variable to be unconditional. This way we can avoid ugly iffdefery
every time the variable would be used. The cost is that 4 bytes of unused
storage might be present. This is negligible, and the compiler might even be
able to optimize that away if it inlines things.
2025-01-15 10:35:28 +01:00
Yu Watanabe
132a164d97 Follow-ups for recent namespace PRs (#35923) 2025-01-15 14:10:21 +09:00
Jeremy Linton
2572bf6a39 confidential-virt: add detection for aarch64 CCA
The arm confidential compute architecture (CCA) provides a platform design for
confidential VMs running in a new realm context.

This can be detected by the existence of a platform device exported for the
arm-cca-guest driver, which provides attestation services via the realm
services interface (RSI) to the Realm Management Monitor (RMM).

Like the other methods systemd uses to detect Confidential VM's, checking
the sysfs entry suggests that this is a confidential VM and should only be
used for informative purposes, or to trigger further attestation.

Like the s390 detection logic, the sysfs path being checked is not labeled
as ABI, and may change in the future. It was chosen because its
directly tied to the kernel's detection of the realm service interface rather
to the Trusted Security Module (TSM) which is what is being triggered by the
device entry. The TSM module has a provider string of 'arm-cca-guest' which
could also be used, but that (IMHO) doesn't currently provide any additional
benefit except that it can fail of the module isn't loaded.

More information can be found here:
https://developer.arm.com/documentation/den0125/0300

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2025-01-15 13:51:12 +09:00
Mike Yuan
e755cde735 process-util: depend on CLONE_PIDFD 2025-01-12 00:17:20 +01:00
Mike Yuan
6e14c46bac tree-wide: drop support for kernels without pidfd_open() and pidfd_send_signal()
Our baseline is v5.4 now.
2025-01-12 00:01:07 +01:00
Mike Yuan
1ca04eaf96 missing_syscall: require a bunch of syscalls below baseline
pidfd-related ones are left out and will be dealt with later.
2025-01-11 23:47:51 +01:00
Lennart Poettering
361327e929 convert more code to PidRef (#35895) 2025-01-11 23:14:33 +01:00
Mike Yuan
95a5c658a3 namespace-util: use pidref_namespace_open_by_type() where appropriate
Several logs are dropped, since all callers log loudly already.
2025-01-11 16:06:38 +01:00
Mike Yuan
a0e3dbdef5 uid-range: make uid_map_search_root() take UIDRangeUsernsMode 2025-01-11 15:53:15 +01:00
Mike Yuan
8c2c8235a6 namespace-util: introduce userns_enter_and_pin() helper
which generalizes forking a process into userns and freeze()

Addresses https://github.com/systemd/systemd/pull/35833/files#r1905508153
2025-01-11 15:53:14 +01:00
Mike Yuan
c7704ecd04 namespace-util: group userns functions together 2025-01-11 15:53:14 +01:00
Mike Yuan
407ebf29cc basic: move nsfs ioctls from missing_fs to missing_namespace
Addresses https://github.com/systemd/systemd/pull/35833#discussion_r1905333757
2025-01-11 15:53:14 +01:00
Mike Yuan
dfef02c675 process-util: drop duplicate assertions 2025-01-11 15:53:13 +01:00
Lennart Poettering
97ac59f18d machine: follow ups for varlink PRs recently merged (#35940)
Follow ups for:
- https://github.com/systemd/systemd/pull/35880
- https://github.com/systemd/systemd/pull/35066
2025-01-10 22:12:22 +01:00
Lennart Poettering
0b8b13324e namespace-util: port namespace_get_leader() to PidRef 2025-01-10 14:22:49 +01:00
Lennart Poettering
47e45ea738 process-util: make pidref_safe_fork_full() work with FORK_WAIT
(This is useful for the test case added in the next commit, where it's
kinda nice being able to use pidref_safe_fork_full() and acquiring a
pidref of the child in the child in one go. There's no other value in
this than a bit of synctactic sugar for that test. But otoh thre's no
good reason to prohibit FORK_WAIT use like this, hence either way, this
commit should be a good thing.)
2025-01-10 14:14:17 +01:00
Lennart Poettering
9237a63a80 process-util: add new helper pidref_get_ppid_as_pidref() 2025-01-10 14:14:17 +01:00
Lennart Poettering
b25430deab terminal-util: pidref'ify two terminal related calls 2025-01-10 14:09:48 +01:00
Lennart Poettering
fa8b70f2c8 userdb: define new 64K "foreign UID" range (#35932)
This is establish the basic concepts for #35685, in the hope to get this
merged first.

This defines a special, fixed 64K UID range that is supposed to be used
by directory container images on disk, that is mapped to a dynamic UID
range at runtime (via idmapped mounts).

This enables a world where each container can run with a dynamic UID
range, but this in no way leaks onto the disk, thus making supposedly
dynamic, transient UID range assignments persistent.

This is infrastructure later used for the primary part of #35685: unpriv
container execution with directory images inside user's home dirs, that
are assigned to this special "foreign UID range".

This PR only defines the ranges, synthesizes NSS records for them via
userdb, and then exposes them in a new "systemd-dissect --shift" command
that can re-chown a container directory tree into this range (and in
fact any range).

This comes with docs. But no tests. There are tests in #35685 that cover
all this, but they are more comprehensive and also test nspawn's hook-up
with this, hence are excluded from this PR.
2025-01-10 13:49:11 +01:00
Ivan Kruglov
03b89cf213 basic: fixes in read_errno()
follow ups for https://github.com/systemd/systemd/pull/35880
2025-01-10 11:49:49 +01:00
Lennart Poettering
7893362508 process-util: do not unblock unrelated signals while forking
This makes sure when we are blocking signals in preparation for fork()
we'll not temporarily unblock any signals previously set, by mistake.

It's safe for us to block more, but not to unblock signals already
blocked. Fix that.

Fixes: #35470
2025-01-10 16:10:31 +09:00
Ivan Kruglov
64db44f7fb process-util: read_errno() 2025-01-09 10:47:24 +01:00
Lennart Poettering
ec0c10fc9d user-classification: add new "foreign" UID range
This makes the UID range configurable via build time options, but of
course it really shouldn't be changed. The default range I picked is
outside even of IPAs current (ridiculously large) allocation ranges,
hence hopefully minimizes conflicts.
2025-01-08 21:41:03 +01:00
Mike Yuan
684e4e5bfb two pidref tweaks (#35918) 2025-01-08 18:58:20 +01:00
nl6720
96963e5615 dissect-image: mount the ESP with fmask=0177 (#35871)
Avoid showing the files on the ESP (i.e. a FAT formatted volume) as
executable by removing the execute permission from them.

IMO this makes the colored output of `ls` more sensible since the file
system will be mounted with `noexec` anyway.

Add a `fstype_can_fmask_dmask` function that checks if a file system
type can use the `fmask` and `dmask` mount options.

This replaces `fstype_can_umask` since it was only used in
`partition_pick_mount_options` which only cares about the file system
support for fmask & dmask now.

It somewhat reduces the coverage of the feature since there are more file
systems that support umask as opposed to those supporting dmask & dmask,
but it should not be much of an issue since fmask & dmask are supported
by vfat, exfat and ntfs3.
2025-01-08 15:19:33 +01:00
Lennart Poettering
0df46f8908 pidref: drop support for kernels lacking waitid(P_PIDFD, …)
Our baseline is not 5.4, which is where P_PIDFD was introduced.
2025-01-08 14:51:19 +01:00
Lennart Poettering
06744cbb52 pidref: copy fd id in pidref_copy() too 2025-01-08 14:51:19 +01:00
Lennart Poettering
2ade821859 namespace-util: add process_is_owned_by_uid() helper 2025-01-07 23:55:34 +01:00
Lennart Poettering
3d57b6692f namespace-util: add helper to get base UID from userns 2025-01-07 23:55:34 +01:00
Lennart Poettering
5b14690a39 namespace-util: slightly tweak proc_mounted() handling in namespace_is_init()
Let's not sloppily eat up errors here.
2025-01-07 23:51:28 +01:00
Lennart Poettering
adcc805929 namespace-util: return recognizable error if namespace_open_by_type() fails because ns type not supported
This makes sure the the codepath that derives an nsfd from a pid works
the same for the pidfd case and the non-pidfd case: if we can verify
that /proc/ is mounted but the /proc/$PID/ns/ files are missing, we can
assume the ns type is not supported by the kernel. Hence return the same
ENOPKG error in this case as we already do in the pidfd ioctl based
codepath.
2025-01-07 23:51:28 +01:00
Mike Yuan
16ac586e5a Bump minimum kernel baseline to 5.4, recommended version to 5.7
As requested, a list of kernel version to feature mapping
for kernels older than minimum baseline is also included,
in order to ease potential backport work.
2025-01-07 22:43:45 +01:00
Lennart Poettering
56a07d10a5 xopenat(): introduce new XO_REGULAR flag (#35834)
This is something I think we should have added a long time ago: a
flavour of open() that safely ensures the inode we are opening is a
regular file, before we open it. It does this by means of pinning the
inode via O_PATH first, and after verification actually opening it.

This ports some code over to this, but sooner or later we should
probably use this a lot more, so that we don't accidentally open weird
stuff such as device nodes or pipes, where we should not.
2025-01-07 08:55:56 +01:00