Commit Graph

5646 Commits

Author SHA1 Message Date
Lennart Poettering
d7d748548b process-util: add pidref_get_comm() and rename get_process_comm() to pid_get_comm() 2023-10-18 14:39:33 +02:00
Lennart Poettering
fc87713bed process-util: add pidref_is_kernel_thread() 2023-10-18 14:39:33 +02:00
Lennart Poettering
a034620f1a process-util: add pidref_get_cmdline() 2023-10-18 14:39:33 +02:00
Lennart Poettering
3d7ba61a7b pidref: we never have to verify PID 1
The process exists as long as the kernel/userns exists at all, hence we
don't have to verify a pidfd to it.
2023-10-18 14:39:33 +02:00
Lennart Poettering
f2a2e60be6 cgroup-util: make cg_pidref_get_path() PidRef parameter const 2023-10-18 14:39:33 +02:00
Lennart Poettering
44c55e5a3f pidref: make signal sending calls take const PidRef 2023-10-18 14:38:07 +02:00
Lennart Poettering
bd389293f0 pidref: make pidref_verify() parameter const 2023-10-18 10:32:03 +02:00
Lennart Poettering
820fe745c7 cgroup-util: rename all return parameters in cgroup-util to ret_xyz 2023-10-18 10:02:43 +02:00
Lennart Poettering
0ff6ff2b29 tree-wide: port various parsers over to read_stripped_line() 2023-10-17 14:36:54 +02:00
Lennart Poettering
c56cb33f09 fileio: add read_stripped_line() as trivial read_line() + strstrip() combo 2023-10-17 14:36:54 +02:00
Lennart Poettering
cde8cc946b Merge pull request #29272 from enr0n/coredump-container
coredump: support forwarding coredumps to containers
2023-10-16 16:13:16 +02:00
Luca Boccassi
7c83d42ef8 mount-util: use mount beneath to replace previous namespace mount
Instead of mounting over, do an atomic swap using mount beneath, if
available. This way assets can be mounted again and again (e.g.:
updates) without leaking mounts.
2023-10-16 14:33:47 +01:00
Luca Boccassi
0e3986bc1c Merge pull request #29525 from poettering/confext-sysext-multimodal
dissect: make sure we can dissect and inspect DDIs that are both confext *and* sysext
2023-10-14 00:28:47 +01:00
Luca Boccassi
ccba67f494 Merge pull request #27890 from bluca/executor
core: add systemd-executor binary
2023-10-13 22:01:16 +01:00
Nick Rosbrook
ade39d9ab8 process-util: introduce namespace_get_leader helper
For a given PID and namespace type, this helper function gives the PID
of the leader of the namespace containing the given PID. Use this in
systemd-coredump instead of using the existing get_mount_namespace_leader.

This helper will be used again in a later commit.
2023-10-13 15:13:11 -04:00
Nick Rosbrook
6cf96ab456 core: add CoredumpReceive= setting
This setting indicates that the given unit wants to receive coredumps
for processes that crash within the cgroup of this unit. This setting
requires that Delegate= is also true, and therefore is only available
where Delegate= is available.

This will be used by systemd-coredump to support forwarding coredumps to
containers.
2023-10-13 15:13:11 -04:00
Nick Rosbrook
b426b4eed8 cgroup-util: add cg_is_delegated helper
Take is_delegated from cgroup-show.c, and make it a generic helper
function. This new helper will be used again in a later commit.
2023-10-13 15:13:11 -04:00
Luca Boccassi
5986e3f4db Merge pull request #29502 from keszybz/sd-boot-config-tweaks
Tweaks to sd-boot UX
2023-10-12 23:08:56 +01:00
Luca Boccassi
bb5232b6a3 core: add systemd-executor binary
Currently we spawn services by forking a child process, doing a bunch
of work, and then exec'ing the service executable.

There are some advantages to this approach:

- quick: we immediately have access to all the enourmous amount of
  state simply by virtue of sharing the memory with the parent
- easy to refactor and add features
- part of the same binary, will never be out of sync

There are however significant drawbacks:

- doing work after fork and before exec is against glibc's supported
  case for several APIs we call
- copy-on-write trap: anytime any memory is touched in either parent
  or child, a copy of that page will be triggered
- memory footprint of the child process will be memory footprint of
  PID1, but using the cgroup memory limits of the unit

The last issue is especially problematic on resource constrained
systems where hard memory caps are enforced and swap is not allowed.
As soon as PID1 is under load, with no page out due to no swap, and a
service with a low MemoryMax= tries to start, hilarity ensues.

Add a new systemd-executor binary, that is able to receive all the
required state via memfd, deserialize it, prepare the appropriate
data structures and call exec_child.

Use posix_spawn which uses CLONE_VM + CLONE_VFORK, to ensure there is
no copy-on-write (same address space will be used, and parent process
will be frozen, until exec).
The sd-executor binary is pinned by FD on startup, so that we can
guarantee there will be no incompatibilities during upgrades.
2023-10-12 15:01:51 +01:00
Luca Boccassi
6ecdfe7d10 process-util: add posix_spawn helper
This provides CLONE_VM + CLONE_VFORK semantics, so it is useful to
avoid CoW traps and other issues around doing work between fork()
and exec().
2023-10-12 13:37:22 +01:00
Luca Boccassi
58cb36e56b env-util: add helper to replace env block 2023-10-12 13:37:22 +01:00
Lennart Poettering
a81fe93e95 dissect: allow confext/sysext to be in the same image
This reworks the image discovery logic, and conceptually allows DDIs
that are both confext and sysext to exist. Previously we'd only extract
one type of exension data from a DDI, with this we allow to extract both
if both exist.

This doesn't add support for true "multi-modal" DDIs, that qualify as
various things at once, it just lays some ground work that ensures we at
least can dissect such images.

This reworks 484d26dac1 quite a bit.

This changes systemd-dissect's JSON output, but given the
version with the fields it changes/dops has never been released (as the
above patch was merged post-v254) this shouldn't be an issue.
2023-10-11 15:56:08 +02:00
Lennart Poettering
bde7e12255 limits-util: suppress noisy debug message when reading tasks in top-level cgroup
We have the "tasks.max" cgroup attribute only if we run in a cgroup
namespace, but not on the host. Hence let's handle ENODATA silently
simply to reduce the debug noise generated.
2023-10-11 11:30:53 +02:00
Luca Boccassi
3b66a6764e Move CLEANUP_ARRAY to src/fundamental 2023-10-09 22:22:09 +01:00
Zbigniew Jędrzejewski-Szmek
3be6ab5c11 basic/macro.h: move a bunch of stuff to macro-fundamental.h
We should start using this functionality in src/boot/efi/ too.
2023-10-07 12:09:54 +02:00
Zbigniew Jędrzejewski-Szmek
3c4c109de1 basic/macro: add comment explaining DEFINE_TRIVIAL_DESTRUCTOR() 2023-10-06 16:48:22 +02:00
Lennart Poettering
5e71f86dff alloc-util: add realloc0() helper than is like realloc() but zero-initializes appended space 2023-10-06 07:44:47 +02:00
Lennart Poettering
2c07d314b2 fileio: revamp search_and_fopen()
Let's modernize and clean up search_and_fopen a bit: let's add support
for regular open() (instead of fopen()), as well as access() (if caller
just wants to check if a file exists without opening it.

This unifies much of the code involved, which previously was duplicated
in search_and_fopen() and search_and_fopen_nulstr()
2023-10-05 19:01:28 +02:00
Lennart Poettering
b0ae589b3e pidref: add trivial helper pidref_set_self() to set pidref to our handle to our own process 2023-10-05 17:08:35 +02:00
Daan De Meyer
d852352b9c mountpoint-util: Check hardcoded list before asking kernel if option is supported
mount_option_supported() will call fsopen() which will probe the
kernel filesystem module. This means that we'll suddenly start
probing filesystem modules when running generators as those determine
which mount options to use. To prevent generators from loading kernel
filesystem modules as much as possible, let's always first check the
hardcoded list of filesystem which we know support a feature before
falling back to asking the kernel.
2023-10-05 16:50:30 +02:00
Lennart Poettering
c6711da087 Merge pull request #29454 from poettering/cg-pidref-get-path
cgroup-util: add cg_pidref_get_path() helper and use it
2023-10-05 15:44:25 +02:00
Lennart Poettering
a906224288 cgroup-util: add cg_pidref_get_path() helper and use it 2023-10-05 13:26:25 +02:00
Lennart Poettering
b30da1c632 cgroup-util: make sure cg_get_owner() only works for cgroups, not cgroup attribute files 2023-10-05 11:12:38 +02:00
Lennart Poettering
bd1791b597 cgroup-util: drop "controller" argument from various cgroup helper calls
systemd's own cgroup hierarchy is special to us, we use it to actually
manage processes. Because of that many calls tha apply to cgroups are
only ever called with the SYSTEMD_CGROUP_CONTROLLER as controller
argument. Let's hence remove the argument altogether.

This in particular touches the kill and xattr routines.

This changes no behaviour, we just drop an argument that is always set
to the same value anyway.

This is preparation to eventually getting rid of the cgroupvs1, because
on cgroupvs2 the cgroup paths do not change for different controllers,
there's only a single hierarchy there.
2023-10-05 11:11:04 +02:00
Yu Watanabe
fcdd21ec6a tree-wide: fix typo 2023-10-04 08:58:10 +09:00
NRK
be1666886b macro: use __builtin_unreachable on NDEBUG
note that this slightly changes the semantic of assert when NDEBUG is
defined. if there's an extern function call (without attribute pure or
similar) then the compiler has to assume it has side effects and still
emit the function call.

whereas the old assert guaranteed that nothing will be evaluated on
NDEBUG.

Closes: https://github.com/systemd/systemd/issues/29408
2023-10-03 21:34:38 +02:00
Luca Boccassi
df3e378a5d Merge pull request #29339 from bluca/mount_namespace_new_api
Use new mount API for bind/image mount tunnel
2023-10-02 16:04:26 +01:00
Lennart Poettering
015d19e3ac Merge pull request #29405 from poettering/boot-xmalloc0
boot: add xmalloc0() + memzero() helpers
2023-10-02 16:45:40 +02:00
Luca Boccassi
f273c09c51 mountpoint-util: add bool mount_new_api_supported() helper 2023-10-02 14:02:32 +01:00
Lennart Poettering
4ac79c2b77 memory-util: move memzero() to src/fundamental/ to share with UEFI
(and while we are at it, make sure it returns the input pointer as
output)
2023-10-02 15:00:13 +02:00
Yu Watanabe
7e2a5fbd85 fileio: make read_full_file_full() usable with size and READ_FULL_FILE_UNBASE64
When READ_FULL_FILE_UNBASE64 (or READ_FULL_FILE_UNHEX) is specified,
setting size argument by caller is difficult, as it is hard to estimate
the encoded length.

This makes when size is specified with decoding option, let's read file
more, and check decoded size later with the specified size.
2023-10-02 10:36:43 +09:00
Daan De Meyer
4444564a95 Merge pull request #29193 from keszybz/path-util-adjustment
Make unit mangling follow paths
2023-09-29 11:33:12 +02:00
Lennart Poettering
ec8dc83530 pidref: add pidref_verify() helper
This new helper can be used after reading process info from procfs, to
verify that the data that was just read actually matches the pidfd, and
does not belong to some new process that just reused the numeric PID of
the process we originally pinned.
2023-09-28 23:22:58 +02:00
Lennart Poettering
9cb7e49f11 pidref: add pidref_hash_ops
This adds a "hash_ops" structure, which allows using PidRef structures
as keys in Hashmap and Set objects.
2023-09-28 23:22:58 +02:00
Lennart Poettering
837659825f pidref: add helpers for managing PidRef on the heap
Usually we want to embed PidRef in other structures, but sometimes it
makes sense to allocate it on the heap in case it should be used
standalone. Add helpers for that.

Primary usecase: use as key in Hashmap objects, that for example map
process to unit objects in PID 1.

This adds pidref_free()/pidref_freep() for freeing such an allocated
struct, as well as pidref_dup() (for duplicating an existing PidRef
on the heap 1:1), and pidref_new_pid() (for allocating a new PidRef from a
PID).
2023-09-28 23:22:58 +02:00
Lennart Poettering
dcfcea6d02 pidref: add PIDREF_MAKE_FROM_PID()
This helper truns a pid_t into a PidRef. It's different from
pidref_set_pid() in being "passive", i.e. it does not attempt to acquire
a pidfd for the pid.

This is useful when using the PidRef as a lookup key that shall also
work after a process is already dead, and hence no conversion to a pidfd
is possible anymore.
2023-09-28 23:22:58 +02:00
Lennart Poettering
12c7d27b65 cgroup-util: add cg_read_pidref() helper
Just like cg_read_pid() but returns a PidRef
2023-09-28 23:22:58 +02:00
Zbigniew Jędrzejewski-Szmek
5342eb4633 Rework unit_name_mangle_with_suffix() to (very slightly) simplify the path
'systemctl status /../dev' now looks for 'dev.mount', not '-..-dev.service',
and 'systemctl status /../foo' looks for 'foo.mount', not '-..-foo.service'. I
think this much more useful. I think the escaping is not very useful, so I plan
to submit a later series which changes that behaviour. But I think this first
step here is already useful on its own.

Note that the patch is smaller than it seems: before, is_device_path() would
return true only for absolute paths, so moving of is_device_path() under the
path_is_absolute() conditional doesn't influence the logic.
2023-09-28 13:09:25 +02:00
Lennart Poettering
4ed9e2619c bootctl: highlight SecureBoot enabled state in green 2023-09-28 12:07:15 +02:00
Lennart Poettering
0869e1326a oomd: correct listening sockets
So, unfortunately oomd uses "io.system." rather than "io.systemd." as
prefix for its sockets. This is a mistake, and doesn't match the
Varlink interface naming or anything else in oomd.

hence, let's fix that.

Given that this is an internal protocol between PID1 and oomd let's
simply change this without retaining compat.
2023-09-25 23:27:18 +02:00