Follow-up for 4d8b0f0f7a
After the mentioned commit, when the ExecCommand executable is missing,
and failure will be ignored by manager, we exit with EXIT_SUCCESS at executor
side too. The behavior however contradicts systemd.service(5), which states:
> If the executable path is prefixed with "-", an exit code of the command
> normally considered a failure (i.e. non-zero exit status or abnormal exit
> due to signal is _recorded_, but has no further effect and is considered
> equivalent to success.
and thus makes debugging unexpected failures harder. Therefore, let's still
exit with EXIT_EXEC, but just skip LOG_ERR level log.
This is useful for situations where an array of FDs is to be passed into
a child process (i.e. by passing it through safe_fork). This function
can be called in the child (before calling exec) to pack the FDs to all
be next to each-other starting from SD_LISTEN_FDS_START (i.e. 3)
Inspired by CVE-2024-21626, let's add a longer comment explaining why
the code really shouldn#t be moved any earlier.
Just in the hope that anyone who feels tempted to move this around maybe
actually reads the comment and reconsiders.
The man page pam_setcred(3) states:
> The credentials should be deleted after the session has been closed
> (with pam_close_session(3)).
Follow-up for 3bb39ea936.
This file is a bit misnamed. What it actually implements is one specific
BPF LSM module, that restricts file systems. As such it really should be
named after that, and not primarily by the mechanism it uses for that.
With this our glue code is now named the same way as the actual bpf code
files in src/core/bpf/, thus things become a bit more symmetric.
This is particular relevant as we'll soon have another BPF LSM in our
tree, see #26826, and we should be able to distinguish them by name.
This commit just renames the files and does some dumb search/replace of
the string. A follow-up commit will name some functions more expressively
inside the files.
If PAMName= is used we'll spawn a PAM session for the service, and leave
a process around that closes the PAM session eventually. That process
must close the "exec_fd" that we use to implement Type=exec. After all
the logic relies on the fact that execve() will implicitly close the
exec_fd, and the EOF seen on it is hence indication for the service
manager that execve() has worked. But if we keep an fd open in the PAM
service process, then this is not going to work.
Hence close the fd explicitly so that it definitely doesn't stay pinned
in the child.
This geneally makes sense as setting up a PAM session pretty much
defines what a login session is.
In context of #30547 this has the benefit that we can take benefit of
the SetLoginEnvironment= effect without having to set it explicitly,
thus retaining some compat of the uid0 client towards older systemd
service managers.
Just use ExecParam directly, as these are all internal to sd-exec now
anyway. Avoids double close when execution fails after FDs are set up
for inheritance and were already re-arranged.
Fixes https://github.com/systemd/systemd/issues/30412
Follow-up for 5b6319dcee (gosh this is
ancient), and effectively reverts 3dead8d925.
sigwait() is documented to "suspend execution of the calling thread
until one of the signals specified in the signal set becomes pending".
And the only error it returns is EINVAL, when "set contains an invalid
signal number". Therefore, there's no need to run it in a loop or
to check for runtime error.
SELinux logs before we have a chance to apply it, move it up as it
breaks TEST-04-JOURNAL:
[ 408.578624] testsuite-04.sh[11463]: ++ journalctl -b -q -u silent-success.service
[ 408.578743] testsuite-04.sh[11098]: + [[ -z Dec 03 13:38:41 H systemd-executor[11459]: SELinux enabled state cached to: disabled ]]
Follow-up for: bb5232b6a3
If exec_fd is closed in add_shifted_fd() by close_and_replace(),
but something goes wrong later, we may close exec_fd twice
in exec_params_shallow_clear().
This combines exec_context_determine_tty_size() and
terminal_set_size_fd() since we always use one after the other.
Also make exec_context_determine_tty_size() return void, since it cannot
fail.
Until now, using any form of seccomp while being unprivileged (User=)
resulted in systemd enabling no_new_privs.
There's no need for doing this because:
* We trust the filters we apply
* If User= is set and a process wants to apply a new seccomp filter, it
will need to set no_new_privs itself
An example of application that might want seccomp + !no_new_privs is a
program that wants to run as an unprivileged user but uses file
capabilities to start a web server on a privileged port while
benefitting from a restrictive seccomp profile.
We now keep the privileges needed to do seccomp before calling
enforce_user() and drop them after the seccomp filters are applied.
If the syscall filter doesn't allow the needed syscalls to drop the
privileges, we keep the previous behavior by enabling no_new_privs.
Sometimes it makes sense to hard kill a client if we die. Let's hence
add a third FORK_DEATHSIG flag for this purpose: FORK_DEATHSIG_SIGKILL.
To make things less confusing this also renames FORK_DEATHSIG to
FORK_DEATHSIG_SIGTERM to make clear it sends SIGTERM. We already had
FORK_DEATHSIG_SIGINT, hence this makes things nicely symmetric.
A bunch of users are switched over for FORK_DEATHSIG_SIGKILL where we
know it's safe to abort things abruptly. This should make some kernel
cases more robust, since we cannot get confused by signal masks or such.
While we are at it, also fix a bunch of bugs where we didn't take
FORK_DEATHSIG_SIGINT into account in safe_fork()
When a late error occurs in sd-executor, the cleanup-on-close of the
context structs happen, but at that time all FDs might have already
been closed via close_all_fds(), so a double-close happens. This
can be seen when DynamicUser is enabled, with a non-existing
WorkingDirectory.
Invalidate the FDs in the context structs if close_all_fds succeeds.
We use it for more than just pipe() arrays. For example also for
socketpair(). Hence let's give it a generic name.
Also add EBADF_TRIPLET to mirror this for things like
stdin/stdout/stderr arrays, which we use a bunch of times.
This is preparation for #28891, which adds a bunch more helpers around
"struct iovec", at which point this really deserves its own .c/.h file.
The idea is that we sooner or later can consider "struct iovec" as an
entirely generic mechanism to reference some binary blob, and is the
go-to type for this purpose whenever we need one.
Before the split, it made sense to assert, as checks were on setup.
But now these come from deserialization, and the fuzzer hits the
asserts, so simply return an error instead.
No functional changes, only moving code that is only needed in
exec_invoke, and adding new dependencies for seccomp/selinux/apparmor/pam
in meson for the sd-executor binary.