Commit Graph

6670 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek
69c9629da7 sysusers: emit audit events for user and group creation (#35957)
Background: Fedora/RHEL are switching to sysusers.d metadata for
creation of users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must
emit audit events so that it provides a drop-in replacement.

systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP
when adding users and groups. The operation "names" are copied from
shadow-utils, so the format of the events that is generated on success
should be identical. On failure, things are more complicated. We write
the whole file at once, once, so we first generate "success" messages
for each entry, then we try to write the files, and if things fail, we
generate failure messages to all entries that we failed to write.
2025-01-15 10:36:07 +01:00
Zbigniew Jędrzejewski-Szmek
9c6afab6b6 sysusers: emit audit events for user and group creation
Background: Fedora/RHEL are switching to sysusers.d metadata for creation of
users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must emit
audit events so that it provides a drop-in replacement.

systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP when
adding users and groups. The operation "names" are copied from shadow-utils in
Fedora (which has a patch to change them from the upstream version), so the
format of the events that is generated on success should be identical.

The helper code is shared between sysusers and utmp-wtmp. I changed the
audit_fd variable to be unconditional. This way we can avoid ugly iffdefery
every time the variable would be used. The cost is that 4 bytes of unused
storage might be present. This is negligible, and the compiler might even be
able to optimize that away if it inlines things.
2025-01-15 10:35:28 +01:00
Yu Watanabe
132a164d97 Follow-ups for recent namespace PRs (#35923) 2025-01-15 14:10:21 +09:00
Jeremy Linton
2572bf6a39 confidential-virt: add detection for aarch64 CCA
The arm confidential compute architecture (CCA) provides a platform design for
confidential VMs running in a new realm context.

This can be detected by the existence of a platform device exported for the
arm-cca-guest driver, which provides attestation services via the realm
services interface (RSI) to the Realm Management Monitor (RMM).

Like the other methods systemd uses to detect Confidential VM's, checking
the sysfs entry suggests that this is a confidential VM and should only be
used for informative purposes, or to trigger further attestation.

Like the s390 detection logic, the sysfs path being checked is not labeled
as ABI, and may change in the future. It was chosen because its
directly tied to the kernel's detection of the realm service interface rather
to the Trusted Security Module (TSM) which is what is being triggered by the
device entry. The TSM module has a provider string of 'arm-cca-guest' which
could also be used, but that (IMHO) doesn't currently provide any additional
benefit except that it can fail of the module isn't loaded.

More information can be found here:
https://developer.arm.com/documentation/den0125/0300

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2025-01-15 13:51:12 +09:00
Mike Yuan
e755cde735 process-util: depend on CLONE_PIDFD 2025-01-12 00:17:20 +01:00
Mike Yuan
6e14c46bac tree-wide: drop support for kernels without pidfd_open() and pidfd_send_signal()
Our baseline is v5.4 now.
2025-01-12 00:01:07 +01:00
Mike Yuan
1ca04eaf96 missing_syscall: require a bunch of syscalls below baseline
pidfd-related ones are left out and will be dealt with later.
2025-01-11 23:47:51 +01:00
Lennart Poettering
361327e929 convert more code to PidRef (#35895) 2025-01-11 23:14:33 +01:00
Mike Yuan
95a5c658a3 namespace-util: use pidref_namespace_open_by_type() where appropriate
Several logs are dropped, since all callers log loudly already.
2025-01-11 16:06:38 +01:00
Mike Yuan
a0e3dbdef5 uid-range: make uid_map_search_root() take UIDRangeUsernsMode 2025-01-11 15:53:15 +01:00
Mike Yuan
8c2c8235a6 namespace-util: introduce userns_enter_and_pin() helper
which generalizes forking a process into userns and freeze()

Addresses https://github.com/systemd/systemd/pull/35833/files#r1905508153
2025-01-11 15:53:14 +01:00
Mike Yuan
c7704ecd04 namespace-util: group userns functions together 2025-01-11 15:53:14 +01:00
Mike Yuan
407ebf29cc basic: move nsfs ioctls from missing_fs to missing_namespace
Addresses https://github.com/systemd/systemd/pull/35833#discussion_r1905333757
2025-01-11 15:53:14 +01:00
Mike Yuan
dfef02c675 process-util: drop duplicate assertions 2025-01-11 15:53:13 +01:00
Lennart Poettering
97ac59f18d machine: follow ups for varlink PRs recently merged (#35940)
Follow ups for:
- https://github.com/systemd/systemd/pull/35880
- https://github.com/systemd/systemd/pull/35066
2025-01-10 22:12:22 +01:00
Lennart Poettering
0b8b13324e namespace-util: port namespace_get_leader() to PidRef 2025-01-10 14:22:49 +01:00
Lennart Poettering
47e45ea738 process-util: make pidref_safe_fork_full() work with FORK_WAIT
(This is useful for the test case added in the next commit, where it's
kinda nice being able to use pidref_safe_fork_full() and acquiring a
pidref of the child in the child in one go. There's no other value in
this than a bit of synctactic sugar for that test. But otoh thre's no
good reason to prohibit FORK_WAIT use like this, hence either way, this
commit should be a good thing.)
2025-01-10 14:14:17 +01:00
Lennart Poettering
9237a63a80 process-util: add new helper pidref_get_ppid_as_pidref() 2025-01-10 14:14:17 +01:00
Lennart Poettering
b25430deab terminal-util: pidref'ify two terminal related calls 2025-01-10 14:09:48 +01:00
Lennart Poettering
fa8b70f2c8 userdb: define new 64K "foreign UID" range (#35932)
This is establish the basic concepts for #35685, in the hope to get this
merged first.

This defines a special, fixed 64K UID range that is supposed to be used
by directory container images on disk, that is mapped to a dynamic UID
range at runtime (via idmapped mounts).

This enables a world where each container can run with a dynamic UID
range, but this in no way leaks onto the disk, thus making supposedly
dynamic, transient UID range assignments persistent.

This is infrastructure later used for the primary part of #35685: unpriv
container execution with directory images inside user's home dirs, that
are assigned to this special "foreign UID range".

This PR only defines the ranges, synthesizes NSS records for them via
userdb, and then exposes them in a new "systemd-dissect --shift" command
that can re-chown a container directory tree into this range (and in
fact any range).

This comes with docs. But no tests. There are tests in #35685 that cover
all this, but they are more comprehensive and also test nspawn's hook-up
with this, hence are excluded from this PR.
2025-01-10 13:49:11 +01:00
Ivan Kruglov
03b89cf213 basic: fixes in read_errno()
follow ups for https://github.com/systemd/systemd/pull/35880
2025-01-10 11:49:49 +01:00
Lennart Poettering
7893362508 process-util: do not unblock unrelated signals while forking
This makes sure when we are blocking signals in preparation for fork()
we'll not temporarily unblock any signals previously set, by mistake.

It's safe for us to block more, but not to unblock signals already
blocked. Fix that.

Fixes: #35470
2025-01-10 16:10:31 +09:00
Ivan Kruglov
64db44f7fb process-util: read_errno() 2025-01-09 10:47:24 +01:00
Lennart Poettering
ec0c10fc9d user-classification: add new "foreign" UID range
This makes the UID range configurable via build time options, but of
course it really shouldn't be changed. The default range I picked is
outside even of IPAs current (ridiculously large) allocation ranges,
hence hopefully minimizes conflicts.
2025-01-08 21:41:03 +01:00
Mike Yuan
684e4e5bfb two pidref tweaks (#35918) 2025-01-08 18:58:20 +01:00
nl6720
96963e5615 dissect-image: mount the ESP with fmask=0177 (#35871)
Avoid showing the files on the ESP (i.e. a FAT formatted volume) as
executable by removing the execute permission from them.

IMO this makes the colored output of `ls` more sensible since the file
system will be mounted with `noexec` anyway.

Add a `fstype_can_fmask_dmask` function that checks if a file system
type can use the `fmask` and `dmask` mount options.

This replaces `fstype_can_umask` since it was only used in
`partition_pick_mount_options` which only cares about the file system
support for fmask & dmask now.

It somewhat reduces the coverage of the feature since there are more file
systems that support umask as opposed to those supporting dmask & dmask,
but it should not be much of an issue since fmask & dmask are supported
by vfat, exfat and ntfs3.
2025-01-08 15:19:33 +01:00
Lennart Poettering
0df46f8908 pidref: drop support for kernels lacking waitid(P_PIDFD, …)
Our baseline is not 5.4, which is where P_PIDFD was introduced.
2025-01-08 14:51:19 +01:00
Lennart Poettering
06744cbb52 pidref: copy fd id in pidref_copy() too 2025-01-08 14:51:19 +01:00
Lennart Poettering
2ade821859 namespace-util: add process_is_owned_by_uid() helper 2025-01-07 23:55:34 +01:00
Lennart Poettering
3d57b6692f namespace-util: add helper to get base UID from userns 2025-01-07 23:55:34 +01:00
Lennart Poettering
5b14690a39 namespace-util: slightly tweak proc_mounted() handling in namespace_is_init()
Let's not sloppily eat up errors here.
2025-01-07 23:51:28 +01:00
Lennart Poettering
adcc805929 namespace-util: return recognizable error if namespace_open_by_type() fails because ns type not supported
This makes sure the the codepath that derives an nsfd from a pid works
the same for the pidfd case and the non-pidfd case: if we can verify
that /proc/ is mounted but the /proc/$PID/ns/ files are missing, we can
assume the ns type is not supported by the kernel. Hence return the same
ENOPKG error in this case as we already do in the pidfd ioctl based
codepath.
2025-01-07 23:51:28 +01:00
Mike Yuan
16ac586e5a Bump minimum kernel baseline to 5.4, recommended version to 5.7
As requested, a list of kernel version to feature mapping
for kernels older than minimum baseline is also included,
in order to ease potential backport work.
2025-01-07 22:43:45 +01:00
Lennart Poettering
56a07d10a5 xopenat(): introduce new XO_REGULAR flag (#35834)
This is something I think we should have added a long time ago: a
flavour of open() that safely ensures the inode we are opening is a
regular file, before we open it. It does this by means of pinning the
inode via O_PATH first, and after verification actually opening it.

This ports some code over to this, but sooner or later we should
probably use this a lot more, so that we don't accidentally open weird
stuff such as device nodes or pipes, where we should not.
2025-01-07 08:55:56 +01:00
Lennart Poettering
9ed2725867 process-util: a process from a foreign pidns is definitely not our child
Addresses: https://github.com/systemd/systemd/pull/35242#pullrequestreview-2531712318
2025-01-07 08:55:21 +01:00
Yu Watanabe
62e9cd6b09 basic/linux: update kernel headers from v6.13-rc6
This also removes README and moves the explanation about the header
modification to the script.
2025-01-06 23:35:14 +00:00
Lennart Poettering
dffafa47ae fs-util: add XO_REGULAR flag for xopenat()
If this flag is set we guarantee that the fd returned refers to a
regular file. If the file exists and is not one, fails.
2025-01-06 23:20:09 +01:00
Lennart Poettering
336acebc77 basic: port various pidfd/pidref helpers to PIDFD_GET_INFO and PIDFD_GET_*_NAMESPACE (#35242)
Supersedes #35308 (cherry-picked one commit and replaced the rest)

(I left a few comments that's folded by GitHub. Please make sure to
check them too.)
2025-01-06 11:23:08 +01:00
Lennart Poettering
7f72184f12 more pidref'ification (#35839)
This is split out of #35264, but makes a ton of sense on its own.
2025-01-06 11:21:43 +01:00
Lennart Poettering
dd445d6e99 cgroup-util: add remoteness checks to all cg_pidref_get_xyz() calls 2025-01-06 09:54:41 +01:00
Lennart Poettering
92d78966fd cgroup-util: add pidref counterparts for cg_pid_get_session() + cg_pid_get_owner_uid() 2025-01-06 09:54:41 +01:00
Lennart Poettering
b2206fe514 socket-util: introduce getpeerpidref()
This combines getpeercred() and getpeerpidfd() and returns a PidRef
2025-01-06 09:45:57 +01:00
Mike Yuan
9598708a12 cgroup-util: explain why cg_pidref_get_path() cannot be ported over to pidfd helpers (yet)
See also: https://github.com/systemd/systemd/pull/35242#issuecomment-2506686806
2025-01-04 17:48:23 +01:00
Mike Yuan
f1ba5c900b cgroup-util: introduce generic cg_path_from_cgroupid() helper
Taken from nsresourced/userns-registry.c userns_destroy_cgroup()
2025-01-04 17:48:22 +01:00
Mike Yuan
223d455670 process-util: make pid_is_unwaited() wrapper around pidref version 2025-01-04 17:48:22 +01:00
Mike Yuan
47f64104d1 process-util: port pidref_get_uid() and pidref_is_my_child() to pidfd helpers 2025-01-04 17:48:22 +01:00
Mike Yuan
85e7bbfaa4 pidfd-util: introduce pidfd_get_{ppid,uid,cgroupid} which goes via PIDFD_GET_INFO too 2025-01-04 17:08:01 +01:00
Mike Yuan
dcf0ef3f42 pidfd-util: try to translate pidfd -> pid through ioctl(PIDFD_GET_INFO) 2025-01-04 17:08:01 +01:00
Mike Yuan
92b8e5e72f namespace-util: introduce pidref_in_same_namespace() 2025-01-04 17:08:01 +01:00
Mike Yuan
a33f691374 process-util: move namespace_get_leader() to namespace-util
This allows us to drop the hack for recursive includes.
2025-01-04 17:08:00 +01:00