If we cannot find the callout we need in the build dir let's look for it
in $PATH as last resort.
This makes invoke_callout_binary() usable for all binaries we install
into $PATH (as opposed to /usr/lib/systemd), but has no effect
on callout binaries specified with full path.
This is useful, since we soon want to invoke journalctl as a callout.
Follow-up for ed024abac6
and 9fbe26cfa8
Also, let's not get too tangled up in the style of defining variables
in between. The functions are short enough, and vars involved are still
effectively at the beginning... Put differently, the separation from
'int r' is too deliberate and brings no actual value in my eyes.
If we have multiple trusted fs (i.e. luks or dm-verity) we generate via
repart at boot, we must make sure they cannot be "misappropriated", i.e.
used for a different mount they were intended for.
Hence, let's introduce "mount constraint" data (encoded in xattrs on the
root inode of the fs) that tells us where a file system has to be
mounted, and what the gpt partition metadata has to be for the fs to be
valid.
Inspired by this thread:
https://lists.freedesktop.org/archives/systemd-devel/2025-March/051244.html
- SO_PEERGROUPS is since kernel v4.13
(28b5ba2aa0f55d80adb2624564ed2b170c19519e),
- SO_BINDTOIFINDEX is since kernel v5.1
(f5dd3d0c9638a9d9a02b5964c4ad636f06cf7e2c).
Let's turn on validatefs automatically for all auto-discovered
partitions.
Let's add an x-systemd.validatefs option to optionally turn this on for
fstab listed file systems.
strv_skip was written to carefully return the original array, but this turns
out to be an unnecessary complication. After the previous patch, no caller
cares about the distinction between NULL and { NULL }, but various callers need
to wrap the process the returned value with strv_isempty(), sometimes more than
once. Let's always return NULL for an empty result to allow callers to be
simplified.
This introduce `LOG_ITEM()` macro that checks format strings in
log_struct() and friends.
Hopefully, this silences false-positive warnings by Coverity.
Currently DelegateNamespaces= only works for services spawned by the
system manager. User managers will always unshare the user namespace
first even if they're running with CAP_SYS_ADMIN.
Let's add support for DelegateNamespaces= for user managers if they're
running with CAP_SYS_ADMIN. By default, we'll still delegate all
namespaces
for user managers, but this can now be overridden by explicitly passing
DelegateNamespaces=.
If a user manager is running without CAP_SYS_ADMIN, the user manager is
still always unshared first just like before.
capability_ambient_set_apply() can be called with capability sets
containing unknown capabilities. Let's not crash when this is the
case but instead ignore the unknown capabilities.
This fixes a crash when running the following command:
"systemd-run -p "AmbientCapabilities=~" --wait --pipe id"
Fixes d5e12dc75e
This introduce LOG_ITEM() macro that checks arbitrary formats in
log_struct().
Then, drop _printf_ attribute from log_struct_internal(), as it does not
help so much, and compiler checked only the first format string.
Hopefully, this silences false-positive warnings by Coverity.
Let's return the size in a return parameter instead of the return value.
And if NULL is specified this tells us the caller doesn't care about the
size and expects a NUL terminated string. In that case look for an
embedded NUL byte, and refuse in that case.
This should lock things down a bit, as we'll systematically refuse
embedded NUL strings now when we expect strings.
This is a simple helper for creating a userns that just maps the
callers user to UID 0 in the namespace. This can be acquired unpriv,
which makes it useful for various purposes, for example for the logic in
is_idmapping_supported(), hence port it over.
(is_idmapping_supported() used a different mapping before, with the
nobody users, but there's no real reason for that, and we'll use
userns_acquire_self_root() elsewhere soon, where the root mapping is
important).
Unprivileged namespaces are only allowed if the "setgroups" file is set
to "deny" for processes. And we need to write it before writing the
gidmap. Hence add a parameter for that.
Then, also patch all current users to actually enable this. The usecase
generally don't need it (because they don't care about unprivileged
userns), but it doesn't hurt to enable the concept anyway in all current
users (none of them actually runs complex userspace in them, but they
mostly use userns_acquire() for idmapped mounts and similar).
Let's anyway make this option explicit in the function call, to indicate
that the concept exists and is applied.
To support C23, this introduces UTF8() macro to define UTF-8 literals,
as C23 changed char8_t from char to unsigned char.
This also makes pointer signedness warning critical, and updates C
standards table for tests.
C23 changed char8_t from char to unsigned char, hence assigning a u8 literal
to const char* emits pointer sign warning, e.g.
========
../src/shared/qrcode-util.c: In function ‘print_border’:
../src/shared/qrcode-util.c:16:34: warning: pointer targets in passing argument 1 of ‘fputs’ differ in signedness [-Wpointer-sign]
16 | #define UNICODE_FULL_BLOCK u8"█"
| ^~~~~
| |
| const unsigned char *
../src/shared/qrcode-util.c:65:39: note: in expansion of macro ‘UNICODE_FULL_BLOCK’
65 | fputs(UNICODE_FULL_BLOCK, output);
| ^~~~~~~~~~~~~~~~~~
========
This introduces UTF8() macro, which define u8 literal and casts to consth char*,
then rewrites all u8 literal definitions with the macro.
With this change, we can build systemd with C23.
Admittedly, some of our glyphs _are_ special, e.g. "O=" for SPECIAL_GLYPH_TOUCH ;)
But we don't need this in the name. The very long names make some invocations
very wordy, e.g. special_glyph(SPECIAL_GLYPH_SLIGHTLY_UNHAPPY_SMILEY).
Also, I want to add GLYPH_SPACE, which is not special at all.
Kernel API file systems typically use either "raw" or "seq_file" to
implement their various interface files. The former are really simple
(to point I'd call them broken), in that they have no understanding of
file offsets, and return their contents again and again on every read(),
and thus EOF is indicated by a short read, not by a zero read. The
latter otoh works like a typical file: you read until you get a
zero-sized read back.
We have read_virtual_file() to read the "raw" files, and can use regular
read_full_file() to read the "seq_file" ones.
Apparently all files in the top-level /proc/ directory use 'seq_file'.
but we accidentally used read_virtual_file() for them. Fix that.
Also clarify in a comment what the rules are.
Fixes: #36131
In one of the next commits we'd like to introduce a concept of
optionally hashing the hostname from the machine ID. For that we we need
to optionally back gethostname_full() by code involving sd-id128, hence
let's move it from src/basic/ to src/shared/, since only there we are
allowed to use our public APIs.
The capsule instances are related to user instances, so treat them
equally to user@.service when handling cgroup paths. This also saves us
from polluting public libsystemd API with variant for capsules too.
Fix: https://github.com/systemd/systemd/issues/36098
The capsule instances are related to user instances, so treat them
equally to user@.service when handling cgroup paths. This also saves us
from polluting public libsystemd API with variant for capsules too.
Fix: #36098
So apparently "linux,dummy-virt" is a devicetree in popular use by
various hypervisors, including crosvm:
e5d7a64d37/aarch64/src/fdt.rs (L692)
and qemu:
98c7362b1e/hw/arm/virt.c (L283)
and that's because the kernel ships support for that natively:
https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/linux%2Cdummy-virt.yaml
It's explicitly for using in virtualization. Hence it's suitable for
detecting it as generic fallback.
This hence adds the check, similar to how we already look for one other
qemu-specific devicetree.
I ran into this while playing around with the new Pixel "Linux Terminal"
app from google which runs a Debian in a crosvm apparently. So far
systemd didn't recognize execution in it at all. Let's at least
recognize it as VM at all, even if this doesn't recognize it as
crosvm.
tianocore does some weird shit with its terminal emulation and regular
fills half the terminal with grey background and then invokes us with
this not cleared up. Hence let us clear this up for it: as part of the
ansi sequence based reset let's position the cursor explicitly at the
beginning of the current line, and erase everything till the end of the
screen. This makes boot output in tianocore vms much much cleaner.
Note that this does *not* erase any terminal output *before* the cursor
position where we take over, because that typically contains valuable
information still we should not erase.
This used to be relevant since in old versions of glibc an internal
cache is maintained, while we might sidestep their invalidation
with raw_clone(). After glibc 2.25 getpid() is a trivial wrapper
for the syscall, and hence there's no need to have a separate
raw_getpid().