Commit Graph

6293 Commits

Author SHA1 Message Date
Yu Watanabe
382886fe11 log: protect errno from log_syntax_invalid_utf8_internal()
Potentially, utf8_escape_invalid() called by
log_syntax_invalid_utf8_internal() may update errno.
2024-09-02 05:45:09 +09:00
Yu Watanabe
1e04eb00f7 log: introduce log_syntax_parse_error()
This provides generic error message for failures in conf parsers.
Currently this is not used, but will be used later.
2024-09-02 05:45:04 +09:00
Mike Yuan
9517c81747 basic/raw-clone: refuse CLONE_PIDFD too 2024-09-01 10:44:39 +09:00
Luca Boccassi
5162829ec8 core: do BindMount/MountImage operations in async control process
These operations might require slow I/O, and thus might block PID1's main
loop for an undeterminated amount of time. Instead of performing them
inline, fork a worker process and stash away the D-Bus message, and reply
once we get a SIGCHILD indicating they have completed. That way we don't
break compatibility and callers can continue to rely on the fact that when
they get the method reply the operation either succeeded or failed.

To keep backward compatibility, unlike reload control processes, these
are ran inside init.scope and not the target cgroup. Unlike ExecReload,
this is under our control and is not defined by the unit. This is necessary
because previously the operation also wasn't ran from the target cgroup,
so suddenly forking a copy-on-write copy of pid1 into the target cgroup
will make memory usage spike, and if there is a MemoryMax= or MemoryHigh=
set and the cgroup is already close to the limit, it will cause an OOM
kill, where previously it would have worked fine.
2024-08-29 12:48:55 +01:00
Yu Watanabe
83c187f585 parse-util: drop unused parse_ip_prefix_length() 2024-08-25 06:18:30 +09:00
Mike Yuan
d71f138156 basic/sigbus: use FOREACH_ELEMENT where appropriate, assert >= 0 for success 2024-08-22 20:14:25 +02:00
Mike Yuan
e06c5be29a process-util: always retry with pidfd_spawn() w/o cgroup first
Follow-up for 7ac58157ca

With the mentioned commit, iff E2BIG we'd retry pidfd_spawn()
with POSIX_SPAWN_SETCGROUP disabled. However, the same strategy
should actually apply to EOPNOTSUPP/ENOSYS/EPERM too -
they can mean two things here: no clone3() or no CLONE_PIDFD.
Therefore, let's first try clone() + CLONE_PIDFD, and fall further back
to plain clone() (posix_spawn()) only as last resort. Plus, record
the fact so that we don't unnecessarily retry every single time
if CLONE_PIDFD is the one that's unavailable.
2024-08-21 15:27:57 +02:00
Mike Yuan
df99a8ef3d process-util: check the flag instead of 'cgroup' param
We might skip CLONE_INTO_CGROUP wholly if not supported.
2024-08-21 15:17:05 +02:00
Daan De Meyer
1ce69e0661 Revert "cgroup-util: Don't try to open pidfd for kernel threads"
The kernel patch was reverted so let's try again to open pidfds
for kernel threads.

This reverts commit ead48ec35c.
2024-08-21 14:32:54 +02:00
Kornilios Kourtis
7ac58157ca process-util: handle pidfd_spawn() returning E2BIG
In some kernels (specifically, 5.4) even though the clone3 syscall is
supported, setting CLONE_INTO_CGROUP is not. The error message returned
in this case is E2BIG.

If posix_spawn_wrapper encounters this error, it does not retry, and
cannot spawn any programs in said kernels.

This commit adds a check for the E2BIG error and retries pidfd_spawn()
without the POSIX_SPAWN_SETCGROUP flag.

If we encounter an E2BIG error, and the pidfd_spawn() succeeds after
removing the POSIX_SPAWN_SETCGROUP flag, then we cache the result so
that we do not retry every time.

Originally, this issue was reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1077204.

Signed-off-by: Kornilios Kourtis <kornilios@gmail.com>
2024-08-21 02:04:57 +09:00
Yu Watanabe
41f5e66cf2 Merge pull request #34044 from poettering/isatty-fixes
fixes around isatty() handling
2024-08-20 20:36:07 +09:00
Lennart Poettering
300b7e7620 tree-wide: use isatty_safe() more 2024-08-20 11:11:53 +02:00
Lennart Poettering
aae47bf7a3 terminal-util: don't assume errno is correctly set when using isatty_safe()
let's instead generate ENOTTY on our own. This is more correct with out
coding style (since we generally do not propagate errors via errno), and
also addresses #34039 as side effect. (#34039 really needs to be fixed
in musl though, too, this is just a work-around as side-effect).

Fixes: #34039
2024-08-20 10:59:47 +02:00
Lennart Poettering
1b24357c41 terminal-util: fix isatty_safe() on hung-up TTYs
glibc returs EIO on ttys that are hung up. That's not really correct,
POSIX seems to disagree.

Work around this in our code, and turn this into a clean "1", since a
hung up tty doesn't stop being a tty just because it is hung up.

Background: https://github.com/systemd/systemd/pull/34039
2024-08-20 10:57:49 +02:00
Yu Watanabe
933448defe network/routing-policy-rule: use int32_t for suppress_prefixlen
The kernel parses FRA_SUPPRESS_PREFIXLEN as uint32_t, but internally
handled as signed integer and negative values as unset. Let's explicitly
specify the size of the variable.

No functional change, just refactoring.
2024-08-20 02:21:21 +09:00
Yu Watanabe
dff27ce65a Merge pull request #34025 from YHNdnzj/edit-util-wrong-place
edit-util: catch and warn about edits outside of markers
2024-08-19 04:33:56 +09:00
Yu Watanabe
dc64f66756 Merge pull request #34022 from YHNdnzj/unit-is-filtered
core/unit: two trivial cleanups
2024-08-19 04:29:54 +09:00
Mike Yuan
f0f044a456 string-util: update ptr declaration to match our coding style 2024-08-18 16:41:44 +02:00
Mike Yuan
f32538e1cc basic/process-util: modernize setpriority_closest()
Before this commit, the "Cannot raise nice level" branch
is rather confusing, as we're actually lowering the nice.
Also, it's better to log about the final nice value
for both cases, no matter whether we need to set to limit
or not.
2024-08-18 15:16:03 +02:00
Mike Yuan
6e0f959360 core/unit: unit_is_filtered() -> unit_passes_filter() and invert logic
Follow-up for 6d2984d21b

The current semantics of "filtered" in unit_is_filtered()
are actually the contrary of ListUnitsFiltered(). Let's
make things consistent, i.e. return true when the unit
shall be included.
2024-08-17 20:09:51 +02:00
Daan De Meyer
2701c2f67d Add $SYSTEMD_IN_CHROOT to override chroot detection
When running unprivileged, checking /proc/1/root doesn't work because
it requires privileges. Instead, let's add an environment variable so
the process that chroot's can tell (systemd) subprocesses whether
they're running in a chroot or not.
2024-08-16 10:11:29 +02:00
Daan De Meyer
2031fe7461 basic: Various cleanups for ratelimit functions 2024-08-14 14:18:40 +02:00
Mike Yuan
7036dd8b27 terminal-util: do not query kernel cmdline for pty size
This is pointless and noisy even for debug level.
2024-08-10 13:01:56 +02:00
Daan De Meyer
bc3477fdc5 crash-handler: Call vhangup on /dev/console before spawning crash shell
When pid 1 crashes, the getty unit for the console will happily keep
running which means we end up with two shells competing for the same
tty. Let's call vhangup on /dev/console to kill every other process
attached to the console before we spawn the crash shell. The getty
units have Restart=always but lucky for us, pid 1 just crashed in fire
and flames so it isn't actually able to restart the getty unit.
2024-08-07 21:24:57 +02:00
Cristian Rodríguez
af1a6db58f basic|boot: silence Wunterminated-string-initialization gcc15 warnings
gcc15 has -Wunterminated-string-initialization in -Wextra and
warns about string constants that are not null terminated even though
the functions do do out of bounds access.
Silence the warnings by simply not providing an explicit size.
2024-08-07 00:14:53 +02:00
Yu Watanabe
2e308032f4 basic/linux: update kernel headers from v6.11-rc1 2024-08-04 14:55:32 +09:00
Yu Watanabe
da24dacf34 syscall-list: update syscall tables
This adds fstatat (and its friends), llseek, and uretprobe.
2024-08-04 14:47:30 +09:00
Yu Watanabe
564547d295 Merge pull request #33911 from YHNdnzj/cgroup-setup-cleanup
cgroup-setup/util: several cleanups; make use of cgroup.kill on client request
2024-08-03 06:20:02 +09:00
Yu Watanabe
ec4964692a cgroup-util: fix typo
Follow-up for 0fbb569de1.
2024-08-03 05:48:54 +09:00
Daniel P. Berrangé
6c35e0a51c confidential-virt: add detection for s390x target
The s390x platform provides confidential VMs using the "Secure Execution"
technology, which is also referred to as "Protected Virtualization" or
just "prot virt" in Linux / QEMU.

This can be detected through a simple sysfs attribute.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-08-02 16:53:20 +01:00
Daniel P. Berrangé
1c4bd7adcc confidential-virt: split caching of CVM detection into separate method
We have different impls of detect_confidential_virtualization per
architecture. The detection is cached in the x86_64 impl, and as we
add support for more targets, we want to use caching for all. It thus
makes sense to split caching out into an architecture independent
method.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-08-02 16:26:00 +01:00
Mike Yuan
2176841b9e cgroup-util: clean up cg_kill() and friends, completely split out cg_kill_kernel_sigkill()
cg_kill_kernel_sigkill() has a narrow use case, and currently
no code really reaches that branch. Let's detach it from
cg_kill_recursive() hence, and call it explicitly later
where appropriate.
2024-08-02 16:36:09 +02:00
Mike Yuan
031860d6cb cgroup-util: drop unused cg_rmdir()
When removing a cgroup, we always want to eliminate subcgroups
first, i.e. use cg_trim(). And cg_rmdir() (along with
CGROUP_REMOVE flag) is simply unused. Kill it.
2024-08-02 16:36:08 +02:00
Mike Yuan
1daf575990 cgroup-util: refactor cg_{ns,freezer,kill}_supported 2024-08-02 16:36:08 +02:00
Mike Yuan
ea25672de5 cgroup-setup: move cg_{,un}install_release_agent from cgroup-util
They're pid1-specific, so move them out of basic/.
2024-08-02 16:36:07 +02:00
Mike Yuan
3386f66200 cgroup-setup: drop unused cg_migrate_callback for cg_attach_everywhere()
While at it, move the typedef from cgroup-util to -setup.
2024-08-02 14:47:39 +02:00
Yu Watanabe
029709f932 socket-util: introduce netlink_socket_get_multicast_groups()
No functional change. Preparation for later commits.
2024-08-02 11:16:33 +09:00
Daan De Meyer
ff5662129a Merge pull request #33885 from DaanDeMeyer/pidref-kthread
Two pidfd fixes
2024-07-31 19:07:35 +02:00
Daan De Meyer
5551426785 Merge pull request #33884 from DaanDeMeyer/log-context
log: Fix size calculation for number of iovecs
2024-07-31 14:23:08 +02:00
Daan De Meyer
ead48ec35c cgroup-util: Don't try to open pidfd for kernel threads
The kernel might start returning -EINVAL when trying to open pidfd's
for kernel threads so let's not try to open pidfd's for kernel threads.
2024-07-31 13:50:16 +02:00
Daan De Meyer
fc83ff3f55 log: Fix size calculation for number of iovecs
Each log context field can expand to up to three iovecs (key, value
and newline) so let's fix the size calculation to take this into
account.
2024-07-31 13:12:55 +02:00
Daan De Meyer
7881f485c9 execute: Drop log level to unit log level in exec_spawn()
All messages logged from exec_spawn() are attributed to the unit
and as such we should set the log level to the unit's max log level
for the duration of the function.
2024-07-31 13:12:55 +02:00
Daniel P. Berrangé
9d7be044ca Fix detection of TDX confidential VM on Azure platform
The original CVM detection logic for TDX assumes that the guest can see
the standard TDX CPUID leaf. This was true in Azure when this code was
originally written, however, current Azure now blocks that leaf in the
paravisor. Instead it is required to use the same Azure specific CPUID
leaf that is used for SEV-SNP detection, which reports the VM isolation
type.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-07-30 22:39:20 +02:00
Daan De Meyer
0fbb569de1 cgroup-util: Ignore kernel threads in cg_kill_items()
Similar to the implementation of cgroup.kill in the kernel, let's
skip kernel threads in cg_kill_items() as trying to kill kernel
threads as an unprivileged process will fail with EPERM and doesn't
do anything when running privileged.
2024-07-30 11:53:32 +02:00
Zbigniew Jędrzejewski-Szmek
e520b1258c Merge pull request #30307 from bluca/enforce_inhibitors
logind: always check for inhibitor locks
2024-07-26 13:52:34 +02:00
Luca Boccassi
7020fa8feb Merge pull request #33825 from DaanDeMeyer/chattr
repart: Create disk image file with copy-on-write disabled on btrfs
2024-07-25 14:11:11 +01:00
Luca Boccassi
804874d26a logind: always check for inhibitor locks
Currently inhibitors are bypassed unless an explicit request is made to
check for them, or even in that case when the requestor is root or the
same uid as the holder of the lock.

But in many cases this makes it impractical to rely on inhibitor locks.
For example, in Debian there are several convoluted and archaic
workarounds that divert systemctl/reboot to some hacky custom scripts
to try and enforce blocking accidental reboots, when it's not expected
that the requestor will remember to specify the command line option
to enable checking for active inhibitor locks.

Also in many cases one wants to ensure that locks taken by a user are
respected by actions initiated by that same user.

Change logind so that inhibitors checks are not skipped in these
cases, and systemctl so that locks are checked in order to show a
friendly error message rather than "permission denied".

Add new block-weak and delay-weak modes that keep the previous
behaviour unchanged.
2024-07-25 12:22:36 +01:00
Mike Yuan
268f58076f basic/log: do not treat all negative errnos as synthetic
Currently, IS_SYNTHETIC_ERRNO() evaluates to true for all negative errnos,
because of the two's-complement negative value representation.
Subsequently, ERRNO= is not logged for most of our own code.
Let's fix this, by formatting all synthetic errnos as positive.
Then, treat all negative values as non-synthetic.

While at it, mark the evaluation order explicitly, and remove
unneeded comment.

Fixes #33800
2024-07-25 12:03:59 +02:00
Daan De Meyer
5e49684521 Make read_attr_path() more generic
Let's make this an openat() like function so it can be used in more
scenarios.
2024-07-24 18:58:41 +02:00
Daan De Meyer
1b05ac946a fs-util: Add XO_NOCOW flag
Let's add a flag for xopenat() that immediately makes a file NOCOW
after opening it if it's supported.
2024-07-24 18:58:41 +02:00