Commit Graph

6452 Commits

Author SHA1 Message Date
Mike Yuan
5dfccccce9 basic/time-util: modernize parse_time() a bit 2024-12-10 20:50:36 +01:00
Yu Watanabe
896b53ef4e basic: update syscall tables 2024-12-10 11:15:48 +09:00
Zbigniew Jędrzejewski-Szmek
22996a3393 basic/namespace-util: fix double logging after fork failure
[   10.056930] (journald)[104]: Failed to fork off '(sd-mkuserns)': Invalid argument
[   10.063727] systemd[1]: systemd-modules-load.service: About to execute: /usr/lib/systemd/systemd-modules-load
[   10.071148] (journald)[104]: Failed to fork process (sd-mkuserns): Invalid argument

safe_fork_full() already logs at debug level, so the caller shouldn't.
2024-12-02 11:51:23 +01:00
Zbigniew Jędrzejewski-Szmek
afb368951c pid1: assume user namespaces are unavailable if we get -EINVAL from clone()
As reported in https://github.com/systemd/systemd/issues/35400,
on riscv64, with Linux version 6.6.51-linux4microchip+fpga-2024.09, we get:

[   10.063727] systemd[1]: systemd-modules-load.service: About to execute: /usr/lib/systemd/systemd-modules-load
[   10.071148] (journald)[104]: Failed to fork process (sd-mkuserns): Invalid argument

Fixes https://github.com/systemd/systemd/issues/35400.

'r' is used to make the repeated checks shorter. Without that, the long variable
name is distracting.
2024-12-02 11:30:06 +01:00
Mike Yuan
4da9f38de1 cgroup-util: use RET_NERRNO where appropriate 2024-11-27 18:38:00 +01:00
Mike Yuan
7a719510c8 basic/fileio: minor coding style cleanup
Follow-up for bbec1c87d3
2024-11-27 14:33:23 +01:00
gerblesh
bbec1c87d3 sysext: set SELinux context for hierarchies and workdir 2024-11-26 17:47:32 +00:00
Zbigniew Jędrzejewski-Szmek
d293fade24 Check inode number to see if we are in init namespace (#35306)
This is a more comprehensive fix compared to #35273. Also adds a minimal
test only.

Based on Luca's #35273 but generalizes the code a bit.

In v258 we really should get rid of the old heuristics around userns and
cgroupns detection, but given we are late in the v257 cycle this keeps
them in.
2024-11-25 14:13:36 +01:00
Yu Watanabe
56c761f8c6 namespace-util: handle -ENOSPC by userns_acquire() gracefully in is_idmapping_supported() (#35313)
Follow-up for edae62120f.
Fixes #35311.
2024-11-23 17:32:23 +09:00
Yu Watanabe
3dda236c5c basic/linux: update kernel headers from v6.12 2024-11-23 17:31:12 +09:00
Lennart Poettering
a2429f507c virt: make use of ns inode check in running_in_userns() and running_in_cgroupns() too 2024-11-23 00:14:20 +01:00
Luca Boccassi
193bf42ab0 detect-virt: check the inode number of the pid namespace
The indoe number of root pid namespace is hardcoded in the kernel to
0xEFFFFFFC since 3.8, so check the inode number of our pid namespace
if all else fails. If it's not 0xEFFFFFFC then we are in a pid
namespace, hence a container environment.

Fixes https://github.com/systemd/systemd/issues/35249

[Reworked by Lennart, to make use of namespace_is_init()]
2024-11-23 00:14:20 +01:00
Lennart Poettering
18ead2b03d namespace-util: add generic namespace_is_init() call 2024-11-23 00:14:20 +01:00
Yu Watanabe
2994ca354b namespace-util: update log messages 2024-11-23 06:52:48 +09:00
Yu Watanabe
eb14b993bb namespace-util: handle -ENOSPC by userns_acquire() gracefully in is_idmapping_supported()
Follow-up for edae62120f.
Fixes #35311.
2024-11-23 06:52:38 +09:00
Luca Boccassi
b7eefa1996 cgroup-util: fix memory leak on error
CID#1565824

Follow-up for f6793bbcf0
2024-11-21 14:02:34 +09:00
Lennart Poettering
f6793bbcf0 killall: gracefully handle processes inserted into containers via nsenter -a
"nsenter -a" doesn't migrate the specified process into the target
cgroup (it really should). Thus the cgroup will remain in a cgroup
that is (due to cgroup ns) outside our visibility. The kernel will
report the cgroup path of such cgroups as starting with "/../". Detect
that and print a reasonably error message instead of trying to resolve
that.
2024-11-20 18:11:38 +00:00
Mike Yuan
f87863a8ff process-util: refuse to operate on remote PidRef
Follow-up for 7e3e540b88
2024-11-20 18:10:26 +00:00
Mike Yuan
eea9d3eb10 basic/user-util: split out placeholder suppression from USER_CREDS_CLEAN into its own flag
No functional change, preparation for later commits.
2024-11-19 00:38:18 +01:00
Mike Yuan
579ce77ead basic/user-util: introduce shell_is_placeholder() helper 2024-11-19 00:38:18 +01:00
Mike Yuan
c8590ad60d process-util: refuse FORK_DETACH + FORK_DEATHSIG_*
There's no synchoronization between the intermediate process
and the double-forked child, and the semantics are not useful.
Refuse such combination.
2024-11-14 12:22:15 +00:00
Lennart Poettering
9466fe014f namespace-util: pin pid via pidfd during namespace_open() 2024-11-13 14:18:05 +00:00
Yu Watanabe
d762b14e38 audit-util: return -ENODATA from audit_{session|loginuid}_from_pid() if invoked in a container (#35072)
The auditing subsystem is still not virtualized for containers, hence
the two values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.

This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.

While are at it, modernize the calls in more ways:

1. switch to pidref behaviour, all but one of our uses are using pidref
anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonably distinguish ENOENT errors when reading the process proc
files: distinguish the case where /proc is not mounted, from the case
where the process is already gone, from where auditing is not enabled in
the kernel build.
2024-11-13 10:08:29 +09:00
Lennart Poettering
c892816ceb run0: when changing privileges to non-root, do not show superhero emoji
Let's show an idcard logo instead, to indicate that we changed ids.
2024-11-12 23:09:21 +01:00
Lennart Poettering
7bf0149e9b process-util: more gracefully handle oom adjust parsing/setting
Who knows what kind of mount shenanigans people employ, let's gracefully
handle parse failures of proc files, like we alway do otherwsie.
2024-11-12 23:03:40 +01:00
Lennart Poettering
68c554f23a audit-util: modernize use_audit() a bit
Use ERRNO_IS_xyz() macros where appropriate.

Also, reduce indentation a bit by inverted early check.

And log in more error codepaths.
2024-11-12 23:03:40 +01:00
Lennart Poettering
7e02ee98d8 audit-util: return -ENODATA from audit_{session|loginuid}_from_pid() if invoked in a container
The auditing subsystem is still not virtualized for containers, hence the two
values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.

This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.

While are at it, modernize the calls in more ways:

1. switch to pidref behaviour, all but one of our uses are using pidref
   anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonable distinguish ENOENT errors when reading the process proc
   files: distinguish the case where /proc is not mounted, from the case
   where the process is already gone, from where auditing is not enabled
   in the kernel build.
2024-11-12 23:03:03 +01:00
Lennart Poettering
56933f2073 uid-classification: properly classify *all* container UIDs
A bit confusingly CONTAINER_UID_BASE_MAX is just the maximum *base* UID
for a container. Thus, with the usual 64K UID assignments, the last
actual container UID is CONTAINER_UID_BASE_MAX+0xFFFF.

To make this less confusing define CONTAINER_UID_MIN/MAX that add the
missing extra space.

Also adjust two uses where this was mishandled so far, due to this
confusion.

With this change the UID ranges we default to should properly match what
is documented on https://systemd.io/UIDS-GIDS/.
2024-11-08 23:18:39 +00:00
Lennart Poettering
af3baf174a fs-util: add comment about XO_NOCOW 2024-11-08 09:21:25 +01:00
Ivan Kruglov
a567de392d process-util: introduce report_errno_and_exit() as part of src/basic/process-util.{h,c} 2024-11-06 11:18:38 +01:00
Andres Beltran
f348831d27 namespace-util: make idmapping not supported if syscalls return EPERM 2024-11-06 09:27:33 +01:00
Zbigniew Jędrzejewski-Szmek
2257be13fe tree-wide: time-out → timeout
For justification, see 3f9a0a522f.
2024-11-05 19:32:19 +00:00
Luca Boccassi
7af37f3a90 Add PrivatePIDs= (continued) (#34940) 2024-11-05 18:42:28 +00:00
Daan De Meyer
406f177501 core: Introduce PrivatePIDs=
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.

Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.

We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.

When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.

Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.

Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.

Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
2024-11-05 05:32:02 -08:00
Lennart Poettering
cb42df5310 sd-daemon: add fd array size safety check to sd_notify_with_fds()
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
2024-11-04 12:10:09 +01:00
Daan De Meyer
a07864a4fe bootctl: Add --secure-boot-auto-enroll
When specified, bootctl install will also set up secure boot
auto-enrollment. For now, We sign all variables using the same
certificate and key pair.
2024-11-03 10:46:17 +01:00
Daan De Meyer
d5c12da904 efivars: Remove STRINGIFY() helper macros
The names of these conflict with macros from efi.h that we'll move
to efi-fundamental.h in a later commit. Let's avoid the conflict by
getting rid of these helpers. Arguably this also improves readability
by clearly indicating we're passing arbitrary strings and not constants
to the macros when we invoke them.
2024-11-02 23:20:57 +01:00
Andres Beltran
edae62120f namespace-util: add util function to check if id-mapped mounts are supported for a given path 2024-11-01 18:41:27 +00:00
Luca Boccassi
fdccba15be util-lib/systemd-run: implement race-free PTY peer opening (#34953)
This makes use of the new TIOCGPTPEER pty ioctl() for directly opening a
PTY peer, without going via path names. This is nice because it closes a
race around allocating and opening the peer. And also has the nice
benefit that if we acquired an fd originating from some other
namespace/container, we can directly derive the peer fd from it, without
having to reenter the namespace again.
2024-11-01 11:29:19 +00:00
Luca Boccassi
d86e9b64e4 tweaks to ANSI sequence (OSC) handling (#34964)
Fixes: #34604

Prompted by that I realized we do not correctly recognize both "ST"
sequences we want to recognize, fix that.
2024-11-01 11:18:57 +00:00
Lennart Poettering
0e3e075b56 iovw: normalize destructors
instead of passing a boolean picking the destruction method just have
different functions. That's much nicer in context of _cleanup_, and how
we usually do things.
2024-10-31 23:08:11 +01:00
Lennart Poettering
811aa36ab6 iovw: add simpler iovw_done() destructor 2024-10-31 23:08:11 +01:00
Lennart Poettering
2865561eaa coredump: move to _cleanup_ for destroying iovw object 2024-10-31 23:08:11 +01:00
Lennart Poettering
960b045875 coredump: parse signal number at the same time as parsing other fields 2024-10-31 23:08:11 +01:00
Lennart Poettering
5ca96e2717 machine: several follow-ups for recent change (#34882)
Follow-ups for #34761.
2024-10-31 21:43:18 +01:00
Mike Gilbert
ff94426f8a posix_spawn_wrapper: do not set POSIX_SPAWN_SETSIGDEF flag
Setting this flag is a noop without a corresponding call to
posix_spawnattr_setsigdefault.

If we call posix_spawnattr_setsigdefault with a full signal set,
it causes glibc's posix_spawn implementation to call sigaction 63 times,
once for each signal. That seems wasteful.

This feature is really only useful for signals which have their
disposition set to SIG_IGN. Otherwise the dispostion gets set to
SIG_DFL automatically, either by clone(CLONE_CLEAR_SIGHAND) or the
subsequent execve.

As far as I can tell, systemd does not have any signals set to SIG_IGN
under normal operating conditions.
2024-10-31 18:16:58 +01:00
Lennart Poettering
a39c51799b string-util: also check for 0x1b 0x5c ST when stripping ANSI from strings 2024-10-31 11:38:18 +01:00
Lennart Poettering
0367424786 terminal-util: define ANSI_OSC as macro for the OSC terminal sequence prefix 2024-10-31 11:38:18 +01:00
Lennart Poettering
b8311af810 tree-wide: prefer generating 0x1B 0x5C as ANSI sequence "ST"
OSC sequences can be closed with one of three terminators:

1. ASCII code 7, aka BEL, aka ^G, aka \x07, aka \a
2. ASCII code 156, aka \x9c
2. Pair of ASCII code 27 followed by ASCII code 92, aka \x1b\x5c

Of these, in some corner case scenarios BEL makes problem (see #34604).
Hence switch away from that wherever we use it, and prefer the \x1b\x5c
instead. That's preferable over \x9c, since the latter is also a valid
UTF-8 codepoint. See discussion here for example:

https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#the-escape-sequence

Fixes: #34604
2024-10-31 11:38:08 +01:00
Lennart Poettering
e65b0904a0 string-util: it's called OSC sequence, not CSO sequence 2024-10-31 11:28:57 +01:00