Commit Graph

21 Commits

Author SHA1 Message Date
Quentin Deslandes
79dd24cf14 core: Add UserNamespacePath=
This allows a service to reuse the user namespace created for an
existing service, similarly to NetworkNamespacePath=. The configuration
is the initial user namespace (e.g. ID mapping) is preserved.
2025-11-04 10:55:04 +01:00
Daan De Meyer
836e4e7ea8 core: Clean up includes
Split out of #37344.
2025-05-22 09:37:20 +02:00
Daan De Meyer
1cf40697e3 tree-wide: Sort includes
This was done by running a locally built clang-format with
https://github.com/llvm/llvm-project/pull/137617 and
https://github.com/llvm/llvm-project/pull/137840 applied on all .c
and .h files.
2025-04-30 09:30:51 +02:00
Daan De Meyer
44d50ba88e execute: Get rid of custom logging macros
We already have LOG_CONTEXT_PUSH_EXEC() which with two additions
does exactly the same as the custom logging macros, so let's get rid
of the custom logging macros and use LOG_CONTEXT_PUSH_EXEC() instead.
2025-04-23 14:48:45 +02:00
Daan De Meyer
4ea4abb651 core: Remove circular dependencies between headers
Currently there are various circular dependencies between headers
in core/. Let's get rid of these by making judicious use of forward
declarations and moving includes into implementation files instead of
having them in header files.

Getting rid of circular header includes simplifies the code and makes
various clang based tooling such as iwyu work much better on our code.

The most important change is getting rid of the manager.h include in
unit.h which is possible thanks to the previous commits. We also move
the OOMPolicy and StatusType enums to unit.h to remove the need for
other unit headers to include manager.h to get access to these enums.
2025-04-23 10:33:35 +02:00
Yu Watanabe
3cf6a3a3d4 tree-wide: check more log message format in log_struct() and friends
This introduce LOG_ITEM() macro that checks arbitrary formats in
log_struct().
Then, drop _printf_ attribute from log_struct_internal(), as it does not
help so much, and compiler checked only the first format string.

Hopefully, this silences false-positive warnings by Coverity.
2025-03-19 01:56:48 +09:00
Mike Yuan
c4c416b109 core: clean up ambient capability logging
Follow-up for e0ebc81b2d
2024-07-31 21:40:28 +02:00
Łukasz Stelmach
e0ebc81b2d core: drop ambient capabilities in systemd-executor
Since the commit 963b6b906e ("core: drop ambient capabilities in
user manager") systemd running as the session manager has dropped ambient
capabilities retaining other sets allowing user services to be started
with elevated capabilities. This, worked fine until the introduction of
sd-executor. For a non-root process to be started with elevated
capabilities by a non-root parent it either needs file capabilities or
ambient capabilities in the parent process. Thus, systemd needs to allow
sd-executor to inherit its ambient capabilities and sd-executor should
drop them as systemd did before.

The ambient set is managed for both system and session managers, but
with the default set for PID#1 being empty, this code does not affect
operation of PID#1.

Fixes: bb5232b6a3 ("core: add systemd-executor binary")
2024-07-31 11:09:58 +02:00
Mike Yuan
210ca71cb5 core/execute: clean up log_exec_full_errno and friends
Also drop unused log_exec_struct_iovec().
2024-02-19 23:12:59 +08:00
Mike Yuan
80b18d217a core/exec-invoke: record correct exit status when failed to locate executable
Follow-up for 4d8b0f0f7a

After the mentioned commit, when the ExecCommand executable is missing,
and failure will be ignored by manager, we exit with EXIT_SUCCESS at executor
side too. The behavior however contradicts systemd.service(5), which states:

> If the executable path is prefixed with "-", an exit code of the command
> normally considered a failure (i.e. non-zero exit status or abnormal exit
> due to signal is _recorded_, but has no further effect and is considered
> equivalent to success.

and thus makes debugging unexpected failures harder. Therefore, let's still
exit with EXIT_EXEC, but just skip LOG_ERR level log.
2024-02-19 23:12:59 +08:00
Mike Yuan
ba8245a77a core/executor: do destruct static variables and selinux before exiting
I was wondering why I couldn't trigger the assertion in safe_fclose()
when submitting #30251. It turned out that the static destructor was
not run at all :/

Replace main() with a minimized version of main-func.h. This also
prevents emitting negative exit codes.
2023-12-10 14:13:35 +09:00
Mike Yuan
b041175e08 core/executor: save argv for later use by rename_process()
Partially fixes #30352
2023-12-08 21:49:27 +08:00
Mike Yuan
f38cbaff63 core/exec-invoke: remove redundant fd_cloexec() call 2023-12-01 00:14:37 +08:00
Mike Yuan
79bad078bb core/executor: avoid double closing serialization fd
Before this commit, between fdopen() (in parse_argv()) and fdset_remove(),
the serialization fd is owned by both arg_serialization FILE stream and fdset.
Therefore, if something wrong happens between the two calls, or if --deserialize=
is specified more than once, we end up closing the serialization fd twice.
Normally this doesn't matter much, but I still think it's better to fix this.

Let's call fdset_new_fill() after parsing serialization fd hence.
We set the fd to CLOEXEC in parse_argv(), so it will be filtered
when the fdset is created.

While at it, also move fdset_new_fill() under the second log_open(), so
that we always log to the log target specified in arguments.
2023-11-30 09:56:59 +00:00
Daan De Meyer
5c314412f0 core: Always call log_open() in systemd-executor
log_setup() will open the console in systemd-executor because it's
not pid 1 and it's not connected to the journal. So if the log target
is later changed to kmsg, we have to reopen the log.

But since log_open() won't open the same log twice, let's just call it
unconditionally since it will be a noop if we try to reopen the same log.

This makes sure that systemd-executor will log to the log target passed
via --log-target= after parsing arguments.
2023-11-29 22:56:50 +00:00
Luca Boccassi
894288340f executor: lazily load SELinux
Loading the SELinux DB on every invocation can be slow and
takes 2ms-10ms, so do not initialize it unconditionally, but
wait for the first use. On a mkosi Fedora rawhide image, this
cuts the number of loads in half.
2023-11-11 12:33:19 +00:00
Luca Boccassi
e34435857e core: call mac_init from sd-executor
Before the split the SELinux database was inherited via CoW. Since
the split we need to reopen it.

Follow-up for bb5232b6a3
2023-11-08 17:44:36 +00:00
Luca Boccassi
fba173ff6a core: rename and add comment to ExecParameters cleanup functions 2023-11-01 12:43:22 +09:00
Lennart Poettering
7113640493 fd-uitl: rename PIPE_EBADF → EBADF_PAIR, and add EBADF_TRIPLET
We use it for more than just pipe() arrays. For example also for
socketpair(). Hence let's give it a generic name.

Also add EBADF_TRIPLET to mirror this for things like
stdin/stdout/stderr arrays, which we use a bunch of times.
2023-10-26 22:30:42 +02:00
Luca Boccassi
75689fb2d4 core: move code from execute.c to exec-invoke.c
No functional changes, only moving code that is only needed in
exec_invoke, and adding new dependencies for seccomp/selinux/apparmor/pam
in meson for the sd-executor binary.
2023-10-12 15:01:51 +01:00
Luca Boccassi
bb5232b6a3 core: add systemd-executor binary
Currently we spawn services by forking a child process, doing a bunch
of work, and then exec'ing the service executable.

There are some advantages to this approach:

- quick: we immediately have access to all the enourmous amount of
  state simply by virtue of sharing the memory with the parent
- easy to refactor and add features
- part of the same binary, will never be out of sync

There are however significant drawbacks:

- doing work after fork and before exec is against glibc's supported
  case for several APIs we call
- copy-on-write trap: anytime any memory is touched in either parent
  or child, a copy of that page will be triggered
- memory footprint of the child process will be memory footprint of
  PID1, but using the cgroup memory limits of the unit

The last issue is especially problematic on resource constrained
systems where hard memory caps are enforced and swap is not allowed.
As soon as PID1 is under load, with no page out due to no swap, and a
service with a low MemoryMax= tries to start, hilarity ensues.

Add a new systemd-executor binary, that is able to receive all the
required state via memfd, deserialize it, prepare the appropriate
data structures and call exec_child.

Use posix_spawn which uses CLONE_VM + CLONE_VFORK, to ensure there is
no copy-on-write (same address space will be used, and parent process
will be frozen, until exec).
The sd-executor binary is pinned by FD on startup, so that we can
guarantee there will be no incompatibilities during upgrades.
2023-10-12 15:01:51 +01:00