Commit Graph

6437 Commits

Author SHA1 Message Date
Luca Boccassi
b7eefa1996 cgroup-util: fix memory leak on error
CID#1565824

Follow-up for f6793bbcf0
2024-11-21 14:02:34 +09:00
Lennart Poettering
f6793bbcf0 killall: gracefully handle processes inserted into containers via nsenter -a
"nsenter -a" doesn't migrate the specified process into the target
cgroup (it really should). Thus the cgroup will remain in a cgroup
that is (due to cgroup ns) outside our visibility. The kernel will
report the cgroup path of such cgroups as starting with "/../". Detect
that and print a reasonably error message instead of trying to resolve
that.
2024-11-20 18:11:38 +00:00
Mike Yuan
f87863a8ff process-util: refuse to operate on remote PidRef
Follow-up for 7e3e540b88
2024-11-20 18:10:26 +00:00
Mike Yuan
eea9d3eb10 basic/user-util: split out placeholder suppression from USER_CREDS_CLEAN into its own flag
No functional change, preparation for later commits.
2024-11-19 00:38:18 +01:00
Mike Yuan
579ce77ead basic/user-util: introduce shell_is_placeholder() helper 2024-11-19 00:38:18 +01:00
Mike Yuan
c8590ad60d process-util: refuse FORK_DETACH + FORK_DEATHSIG_*
There's no synchoronization between the intermediate process
and the double-forked child, and the semantics are not useful.
Refuse such combination.
2024-11-14 12:22:15 +00:00
Lennart Poettering
9466fe014f namespace-util: pin pid via pidfd during namespace_open() 2024-11-13 14:18:05 +00:00
Yu Watanabe
d762b14e38 audit-util: return -ENODATA from audit_{session|loginuid}_from_pid() if invoked in a container (#35072)
The auditing subsystem is still not virtualized for containers, hence
the two values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.

This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.

While are at it, modernize the calls in more ways:

1. switch to pidref behaviour, all but one of our uses are using pidref
anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonably distinguish ENOENT errors when reading the process proc
files: distinguish the case where /proc is not mounted, from the case
where the process is already gone, from where auditing is not enabled in
the kernel build.
2024-11-13 10:08:29 +09:00
Lennart Poettering
c892816ceb run0: when changing privileges to non-root, do not show superhero emoji
Let's show an idcard logo instead, to indicate that we changed ids.
2024-11-12 23:09:21 +01:00
Lennart Poettering
7bf0149e9b process-util: more gracefully handle oom adjust parsing/setting
Who knows what kind of mount shenanigans people employ, let's gracefully
handle parse failures of proc files, like we alway do otherwsie.
2024-11-12 23:03:40 +01:00
Lennart Poettering
68c554f23a audit-util: modernize use_audit() a bit
Use ERRNO_IS_xyz() macros where appropriate.

Also, reduce indentation a bit by inverted early check.

And log in more error codepaths.
2024-11-12 23:03:40 +01:00
Lennart Poettering
7e02ee98d8 audit-util: return -ENODATA from audit_{session|loginuid}_from_pid() if invoked in a container
The auditing subsystem is still not virtualized for containers, hence the two
values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.

This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.

While are at it, modernize the calls in more ways:

1. switch to pidref behaviour, all but one of our uses are using pidref
   anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonable distinguish ENOENT errors when reading the process proc
   files: distinguish the case where /proc is not mounted, from the case
   where the process is already gone, from where auditing is not enabled
   in the kernel build.
2024-11-12 23:03:03 +01:00
Lennart Poettering
56933f2073 uid-classification: properly classify *all* container UIDs
A bit confusingly CONTAINER_UID_BASE_MAX is just the maximum *base* UID
for a container. Thus, with the usual 64K UID assignments, the last
actual container UID is CONTAINER_UID_BASE_MAX+0xFFFF.

To make this less confusing define CONTAINER_UID_MIN/MAX that add the
missing extra space.

Also adjust two uses where this was mishandled so far, due to this
confusion.

With this change the UID ranges we default to should properly match what
is documented on https://systemd.io/UIDS-GIDS/.
2024-11-08 23:18:39 +00:00
Lennart Poettering
af3baf174a fs-util: add comment about XO_NOCOW 2024-11-08 09:21:25 +01:00
Ivan Kruglov
a567de392d process-util: introduce report_errno_and_exit() as part of src/basic/process-util.{h,c} 2024-11-06 11:18:38 +01:00
Andres Beltran
f348831d27 namespace-util: make idmapping not supported if syscalls return EPERM 2024-11-06 09:27:33 +01:00
Zbigniew Jędrzejewski-Szmek
2257be13fe tree-wide: time-out → timeout
For justification, see 3f9a0a522f.
2024-11-05 19:32:19 +00:00
Luca Boccassi
7af37f3a90 Add PrivatePIDs= (continued) (#34940) 2024-11-05 18:42:28 +00:00
Daan De Meyer
406f177501 core: Introduce PrivatePIDs=
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.

Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.

We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.

When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.

Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.

Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.

Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
2024-11-05 05:32:02 -08:00
Lennart Poettering
cb42df5310 sd-daemon: add fd array size safety check to sd_notify_with_fds()
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
2024-11-04 12:10:09 +01:00
Daan De Meyer
a07864a4fe bootctl: Add --secure-boot-auto-enroll
When specified, bootctl install will also set up secure boot
auto-enrollment. For now, We sign all variables using the same
certificate and key pair.
2024-11-03 10:46:17 +01:00
Daan De Meyer
d5c12da904 efivars: Remove STRINGIFY() helper macros
The names of these conflict with macros from efi.h that we'll move
to efi-fundamental.h in a later commit. Let's avoid the conflict by
getting rid of these helpers. Arguably this also improves readability
by clearly indicating we're passing arbitrary strings and not constants
to the macros when we invoke them.
2024-11-02 23:20:57 +01:00
Andres Beltran
edae62120f namespace-util: add util function to check if id-mapped mounts are supported for a given path 2024-11-01 18:41:27 +00:00
Luca Boccassi
fdccba15be util-lib/systemd-run: implement race-free PTY peer opening (#34953)
This makes use of the new TIOCGPTPEER pty ioctl() for directly opening a
PTY peer, without going via path names. This is nice because it closes a
race around allocating and opening the peer. And also has the nice
benefit that if we acquired an fd originating from some other
namespace/container, we can directly derive the peer fd from it, without
having to reenter the namespace again.
2024-11-01 11:29:19 +00:00
Luca Boccassi
d86e9b64e4 tweaks to ANSI sequence (OSC) handling (#34964)
Fixes: #34604

Prompted by that I realized we do not correctly recognize both "ST"
sequences we want to recognize, fix that.
2024-11-01 11:18:57 +00:00
Lennart Poettering
0e3e075b56 iovw: normalize destructors
instead of passing a boolean picking the destruction method just have
different functions. That's much nicer in context of _cleanup_, and how
we usually do things.
2024-10-31 23:08:11 +01:00
Lennart Poettering
811aa36ab6 iovw: add simpler iovw_done() destructor 2024-10-31 23:08:11 +01:00
Lennart Poettering
2865561eaa coredump: move to _cleanup_ for destroying iovw object 2024-10-31 23:08:11 +01:00
Lennart Poettering
960b045875 coredump: parse signal number at the same time as parsing other fields 2024-10-31 23:08:11 +01:00
Lennart Poettering
5ca96e2717 machine: several follow-ups for recent change (#34882)
Follow-ups for #34761.
2024-10-31 21:43:18 +01:00
Mike Gilbert
ff94426f8a posix_spawn_wrapper: do not set POSIX_SPAWN_SETSIGDEF flag
Setting this flag is a noop without a corresponding call to
posix_spawnattr_setsigdefault.

If we call posix_spawnattr_setsigdefault with a full signal set,
it causes glibc's posix_spawn implementation to call sigaction 63 times,
once for each signal. That seems wasteful.

This feature is really only useful for signals which have their
disposition set to SIG_IGN. Otherwise the dispostion gets set to
SIG_DFL automatically, either by clone(CLONE_CLEAR_SIGHAND) or the
subsequent execve.

As far as I can tell, systemd does not have any signals set to SIG_IGN
under normal operating conditions.
2024-10-31 18:16:58 +01:00
Lennart Poettering
a39c51799b string-util: also check for 0x1b 0x5c ST when stripping ANSI from strings 2024-10-31 11:38:18 +01:00
Lennart Poettering
0367424786 terminal-util: define ANSI_OSC as macro for the OSC terminal sequence prefix 2024-10-31 11:38:18 +01:00
Lennart Poettering
b8311af810 tree-wide: prefer generating 0x1B 0x5C as ANSI sequence "ST"
OSC sequences can be closed with one of three terminators:

1. ASCII code 7, aka BEL, aka ^G, aka \x07, aka \a
2. ASCII code 156, aka \x9c
2. Pair of ASCII code 27 followed by ASCII code 92, aka \x1b\x5c

Of these, in some corner case scenarios BEL makes problem (see #34604).
Hence switch away from that wherever we use it, and prefer the \x1b\x5c
instead. That's preferable over \x9c, since the latter is also a valid
UTF-8 codepoint. See discussion here for example:

https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#the-escape-sequence

Fixes: #34604
2024-10-31 11:38:08 +01:00
Lennart Poettering
e65b0904a0 string-util: it's called OSC sequence, not CSO sequence 2024-10-31 11:28:57 +01:00
Yu Watanabe
7633001cdd env-util: introduce strv_env_get_merged() 2024-10-31 11:02:35 +09:00
Yu Watanabe
e4d477efc6 env-util: replace 'char **' with 'char**' 2024-10-31 11:02:35 +09:00
Lennart Poettering
fc9dc71a3f terminal-util: add pty_open_peer() helper
This opens a pty peer in one go, and uses the new race-free TIOCGPTPEER
ioctl() to do so – if it is available.
2024-10-30 22:37:44 +01:00
Lennart Poettering
fbd2679f66 terminal-util: various minor modernizations
Various fixes:

1. Adds O_CLOEXEC for two socketpair()s where we forgot it.

2. Uses FORK_WAIT instead of manual wait_for_terminate_and_check()
   invocations.

3. Prefix opaque NULL/0 arguments with comments what they are.

4. Add a banch of assert()s, and change flag validation in
   open_terminal() to be assert() (since flags mistakes are programming
   errors, not runtime errors).
2024-10-30 22:15:56 +01:00
Yu Watanabe
f7804c1aa2 basic/missing: add short comment about when CLONE_NEWCGROUP is added 2024-10-26 13:59:19 +09:00
Integral
ddb8a639d5 tree-wide: replace for loop with FOREACH_ELEMENT or FOREACH_ARRAY macros (#34893) 2024-10-26 07:10:22 +09:00
Lennart Poettering
115fac3c29 run0: optionally show superhero emoji on each shell prompt
This makes use of the infra introduced in 229d4a9806 to indicate visually on each prompt that we are in superuser mode temporarily.
pick ad5de3222f userdbctl: add some basic client-side filtering
2024-10-25 17:31:06 +02:00
Lennart Poettering
4167e9e210 user-util: tighten shell validation a tiny bit 2024-10-24 22:28:17 +02:00
Mike Yuan
4e69da071d Merge pull request #34799 from YHNdnzj/service-followups
core: follow-ups for live mount
2024-10-24 19:44:10 +02:00
Integral
b6b8527cd1 refactor: replace sizeof in loop with ELEMENTSOF & FOREACH_ELEMENT (#34863) 2024-10-23 10:32:02 +02:00
Mike Yuan
d845254b7f basic/fs-util: move unlink_tempfilep() to tmpfile-util 2024-10-22 19:19:39 +02:00
Lennart Poettering
b9633ebb2a fs-util: move attempts counter in openat_report_new() into loop 2024-10-22 17:51:26 +02:00
Lennart Poettering
4ffecbbbee label: move label_ops_reset() up a bit
Let#s move it close to label_ops_set(), since it is somewhat symmetric
to it.
2024-10-22 17:51:26 +02:00
Lennart Poettering
4e4ed4b64d label: add missing assert() to label_ops_set() 2024-10-22 17:51:26 +02:00
Lennart Poettering
aec1262a2e fileio: port write_string_file_full() to openat_report_new()
This brings two benefits: we will label the created file only if it is
actually created, and we can correctly delete any file we create again
on failure.
2024-10-22 17:51:26 +02:00