Commit Graph

6670 Commits

Author SHA1 Message Date
Luca Boccassi
7af37f3a90 Add PrivatePIDs= (continued) (#34940) 2024-11-05 18:42:28 +00:00
Daan De Meyer
406f177501 core: Introduce PrivatePIDs=
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.

Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.

We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.

When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.

Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.

Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.

Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
2024-11-05 05:32:02 -08:00
Lennart Poettering
cb42df5310 sd-daemon: add fd array size safety check to sd_notify_with_fds()
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
2024-11-04 12:10:09 +01:00
Daan De Meyer
a07864a4fe bootctl: Add --secure-boot-auto-enroll
When specified, bootctl install will also set up secure boot
auto-enrollment. For now, We sign all variables using the same
certificate and key pair.
2024-11-03 10:46:17 +01:00
Daan De Meyer
d5c12da904 efivars: Remove STRINGIFY() helper macros
The names of these conflict with macros from efi.h that we'll move
to efi-fundamental.h in a later commit. Let's avoid the conflict by
getting rid of these helpers. Arguably this also improves readability
by clearly indicating we're passing arbitrary strings and not constants
to the macros when we invoke them.
2024-11-02 23:20:57 +01:00
Andres Beltran
edae62120f namespace-util: add util function to check if id-mapped mounts are supported for a given path 2024-11-01 18:41:27 +00:00
Luca Boccassi
fdccba15be util-lib/systemd-run: implement race-free PTY peer opening (#34953)
This makes use of the new TIOCGPTPEER pty ioctl() for directly opening a
PTY peer, without going via path names. This is nice because it closes a
race around allocating and opening the peer. And also has the nice
benefit that if we acquired an fd originating from some other
namespace/container, we can directly derive the peer fd from it, without
having to reenter the namespace again.
2024-11-01 11:29:19 +00:00
Luca Boccassi
d86e9b64e4 tweaks to ANSI sequence (OSC) handling (#34964)
Fixes: #34604

Prompted by that I realized we do not correctly recognize both "ST"
sequences we want to recognize, fix that.
2024-11-01 11:18:57 +00:00
Lennart Poettering
0e3e075b56 iovw: normalize destructors
instead of passing a boolean picking the destruction method just have
different functions. That's much nicer in context of _cleanup_, and how
we usually do things.
2024-10-31 23:08:11 +01:00
Lennart Poettering
811aa36ab6 iovw: add simpler iovw_done() destructor 2024-10-31 23:08:11 +01:00
Lennart Poettering
2865561eaa coredump: move to _cleanup_ for destroying iovw object 2024-10-31 23:08:11 +01:00
Lennart Poettering
960b045875 coredump: parse signal number at the same time as parsing other fields 2024-10-31 23:08:11 +01:00
Lennart Poettering
5ca96e2717 machine: several follow-ups for recent change (#34882)
Follow-ups for #34761.
2024-10-31 21:43:18 +01:00
Mike Gilbert
ff94426f8a posix_spawn_wrapper: do not set POSIX_SPAWN_SETSIGDEF flag
Setting this flag is a noop without a corresponding call to
posix_spawnattr_setsigdefault.

If we call posix_spawnattr_setsigdefault with a full signal set,
it causes glibc's posix_spawn implementation to call sigaction 63 times,
once for each signal. That seems wasteful.

This feature is really only useful for signals which have their
disposition set to SIG_IGN. Otherwise the dispostion gets set to
SIG_DFL automatically, either by clone(CLONE_CLEAR_SIGHAND) or the
subsequent execve.

As far as I can tell, systemd does not have any signals set to SIG_IGN
under normal operating conditions.
2024-10-31 18:16:58 +01:00
Lennart Poettering
a39c51799b string-util: also check for 0x1b 0x5c ST when stripping ANSI from strings 2024-10-31 11:38:18 +01:00
Lennart Poettering
0367424786 terminal-util: define ANSI_OSC as macro for the OSC terminal sequence prefix 2024-10-31 11:38:18 +01:00
Lennart Poettering
b8311af810 tree-wide: prefer generating 0x1B 0x5C as ANSI sequence "ST"
OSC sequences can be closed with one of three terminators:

1. ASCII code 7, aka BEL, aka ^G, aka \x07, aka \a
2. ASCII code 156, aka \x9c
2. Pair of ASCII code 27 followed by ASCII code 92, aka \x1b\x5c

Of these, in some corner case scenarios BEL makes problem (see #34604).
Hence switch away from that wherever we use it, and prefer the \x1b\x5c
instead. That's preferable over \x9c, since the latter is also a valid
UTF-8 codepoint. See discussion here for example:

https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#the-escape-sequence

Fixes: #34604
2024-10-31 11:38:08 +01:00
Lennart Poettering
e65b0904a0 string-util: it's called OSC sequence, not CSO sequence 2024-10-31 11:28:57 +01:00
Yu Watanabe
7633001cdd env-util: introduce strv_env_get_merged() 2024-10-31 11:02:35 +09:00
Yu Watanabe
e4d477efc6 env-util: replace 'char **' with 'char**' 2024-10-31 11:02:35 +09:00
Lennart Poettering
fc9dc71a3f terminal-util: add pty_open_peer() helper
This opens a pty peer in one go, and uses the new race-free TIOCGPTPEER
ioctl() to do so – if it is available.
2024-10-30 22:37:44 +01:00
Lennart Poettering
fbd2679f66 terminal-util: various minor modernizations
Various fixes:

1. Adds O_CLOEXEC for two socketpair()s where we forgot it.

2. Uses FORK_WAIT instead of manual wait_for_terminate_and_check()
   invocations.

3. Prefix opaque NULL/0 arguments with comments what they are.

4. Add a banch of assert()s, and change flag validation in
   open_terminal() to be assert() (since flags mistakes are programming
   errors, not runtime errors).
2024-10-30 22:15:56 +01:00
Yu Watanabe
f7804c1aa2 basic/missing: add short comment about when CLONE_NEWCGROUP is added 2024-10-26 13:59:19 +09:00
Integral
ddb8a639d5 tree-wide: replace for loop with FOREACH_ELEMENT or FOREACH_ARRAY macros (#34893) 2024-10-26 07:10:22 +09:00
Lennart Poettering
115fac3c29 run0: optionally show superhero emoji on each shell prompt
This makes use of the infra introduced in 229d4a9806 to indicate visually on each prompt that we are in superuser mode temporarily.
pick ad5de3222f userdbctl: add some basic client-side filtering
2024-10-25 17:31:06 +02:00
Lennart Poettering
4167e9e210 user-util: tighten shell validation a tiny bit 2024-10-24 22:28:17 +02:00
Mike Yuan
4e69da071d Merge pull request #34799 from YHNdnzj/service-followups
core: follow-ups for live mount
2024-10-24 19:44:10 +02:00
Integral
b6b8527cd1 refactor: replace sizeof in loop with ELEMENTSOF & FOREACH_ELEMENT (#34863) 2024-10-23 10:32:02 +02:00
Mike Yuan
d845254b7f basic/fs-util: move unlink_tempfilep() to tmpfile-util 2024-10-22 19:19:39 +02:00
Lennart Poettering
b9633ebb2a fs-util: move attempts counter in openat_report_new() into loop 2024-10-22 17:51:26 +02:00
Lennart Poettering
4ffecbbbee label: move label_ops_reset() up a bit
Let#s move it close to label_ops_set(), since it is somewhat symmetric
to it.
2024-10-22 17:51:26 +02:00
Lennart Poettering
4e4ed4b64d label: add missing assert() to label_ops_set() 2024-10-22 17:51:26 +02:00
Lennart Poettering
aec1262a2e fileio: port write_string_file_full() to openat_report_new()
This brings two benefits: we will label the created file only if it is
actually created, and we can correctly delete any file we create again
on failure.
2024-10-22 17:51:26 +02:00
Lennart Poettering
8eeb870971 fileio: port write_string_file() to LabelOps, and thus add WRITE_STRING_FILE_LABEL flag
Given that we have the LabelOps abstraction these days, we can teach
write_string_file() to use it, which means we can get rid of
fileio-label.[ch] as a separate concept.

(The only reason that fileio-label.[ch] exists independently of
fileio.[ch] was that the former linekd to libselinux potentially, and
thus had to be in src/shared/ while the other always was in src/basic/.
But the LabelOps vtable provides us with a nice work-around)
2024-10-22 17:51:26 +02:00
Lennart Poettering
4946dd4197 fs-util: tweak how openat_report_new() operates when O_CREAT is used on a dangling symlink
One of the big mistakes of Linux is that when you create a file with
open() and O_CREAT and the file already exists as dangling symlink that
the symlink will be followed and the file created that it points to.
This has resulted in many vulnerabilities, and triggered the creation of
the O_MOFOLLOW flag, addressing the problem.

O_NOFOLLOW is less than ideal in many ways, but in particular one: when
actually creating a file it makes sense to set, because it is a problem
to follow final symlinks in that case. But if the file is already
existing, it actually does make sense to follow the symlinks. With
openat_report_new() we distinguish these two cases anyway (the whole
function exists only to distinguish the create and the exists-already
case after all), hence let's do something about this: let's simply never
create files "through symlinks".

This can be implemented very easily: just pass O_NOFOLLOW to the 2nd
openat() call, where we actually create files.

And then basically remove 0dd82dab91
again, because we don't need to care anymore, we already will see ELOOP
when we touch a symlink.

Note that this change means that openat_report_new() will thus start to
deviate from plain openat() behaviour in this one small detail: when
actually creating files we will *never* follow the symlink. That should
be a systematic improvement of security.

Fixes: #34088
2024-10-22 17:51:26 +02:00
Lennart Poettering
64053bed08 fs-util: always call label post ops in xopenat_full(), in both success and error path
For SELinux it is essential that we reset the file creation label both
in the success and in the error path, hence do so.

Moreover, when calling the label post ops do it if possible with the
opened fd of the inode itself, rather than always going via its path,
simply to reduce the attack surface.
2024-10-22 17:51:26 +02:00
Lennart Poettering
da3d81cccd fs-util: don't second guess openat_report_new() return values
If openat_report_new() fails, then 'made_file' will be false, as no file
was created, hence there's no need to skip the unlinkat() explicitly
early, given that we check for 'made_file' anyway in the error path. The
extra error code checks are hence entirely redundant.
2024-10-22 17:51:26 +02:00
Lennart Poettering
d49449c89b label: tweak LabelOps post() hook to take "created" boolean
We have two distinct implementations of the post hook.

1. For SELinux we just reset the selinux label we told the kernel
   earlier to use for new inodes.

2. For SMACK we might apply an xattr to the specified file.

The two calls are quite different: the first call we want to call in all
cases (failure or success), the latter only if we actually managed to
create an inode, in which case it is called on the inode.
2024-10-22 17:51:26 +02:00
Lennart Poettering
652371a3c1 fs-util: always go through the unlink cleanup paths in xopenat_full()
We didn't go through it at all if label_ops_post() failed.
2024-10-22 17:45:41 +02:00
Lennart Poettering
12620ca1fb fs-util: remove misplaced RET_NERRNO() 2024-10-22 17:45:41 +02:00
Lennart Poettering
9312b3dc28 Merge pull request #34403 from poettering/askpw-per-user
modernize the ask-password logic, and add unpriv askpw agents to the concept
2024-10-21 16:37:28 +02:00
Lennart Poettering
e8139b15e1 varlinkctl: respect $COLUMNS when rebreaking lines and we are not connected to a TTY
Let's provide a mechanism to select the number of screen columns for
rebreaking comments in Varlink IDL connected to a TTY, by honouring the
$COLUMNS env var then too. Previously we'd only honour when connected to
a TTY, but it's also useful otherwise for rebreaking ridiculously long
comments, hence honour it in this case too.
2024-10-21 15:47:25 +02:00
Lennart Poettering
2ee6fa552e ask-password-api: don't accidentally create a dir, when we don't want one
Previously, we were using touch(), which usually works fine, because the
path should always refer to an existing directory, in which case it just
updates the timestamp. However, if the dir does not exist yet (which
shouldn't happen), it would be created as regular file, which is just
wrong.

Hence, let's instead create the dir as dir if it is missing, and then
update its timestamp.
2024-10-21 14:14:16 +02:00
Adrian Vovk
3e18762123 fs-util: Introduce symlinkat_idempotent 2024-10-18 17:58:45 -04:00
Adrian Vovk
fafc3c2d5c GREEDY_REALLOC_APPEND: Make more type safe
Previously, GREEDY_REALLOC_APPEND would compile perfectly fine and cause
subtle memory corruption if the caller messes up the type they're passing
in (i.e. by forgetting to pass-by-reference when appending a Type* to an
array of Type*). Now this will lead to compilation failure
2024-10-18 14:22:58 +09:00
Mike Yuan
102efcd312 Bump kernel recommended baseline to v5.4 2024-10-16 18:06:11 +02:00
Yu Watanabe
6a6c0dab30 pidref: fix typo
Follow-up for de34ec188c.
2024-10-17 00:46:45 +09:00
Yu Watanabe
770980bc13 Merge pull request #34781 from poettering/write-string-rename-full
fileio: write_string_file() naming clean-ups
2024-10-16 06:18:57 +09:00
Lennart Poettering
7e3e540b88 pidref: add explicit concept of "remote" PidRef
This PidRef just track some data, but cannot be used for any active
operation.

Background: for https://github.com/systemd/systemd/pull/34703 it makes
sense to track explicitly if some PidRef is not a local one, so that we
never attempt to for example "kill a remote process" and thus
acccidentally hit the wrong process (i.e. a local one by the same PID).
2024-10-15 18:26:05 +02:00
Lennart Poettering
8a0adc973a fileio: clean up write_string_file() naming
let's rename the "_ts" flavour of these calls "_full" instead, exposing
the full functionality. And then keep two more minimal versions around:
one "_at" (which has the ts parameter suppressed, but keeps the dir_fd
one). And one without suffix (which supresses both).

Do the same for the label versions of these calls.
2024-10-15 18:20:27 +02:00