This tries to get rid of most manual sigprocmask() changes, in favour
of:
1. The SD_EVENT_SIGNAL_PROCMASK flag to sd_event_add_signal()
2. The sd_event_set_signal_exit() call for handling SIGTERM/SIGINT
3. Move masking of SIGWINCH into ptyfwd, out of nspawn/vmspawn/run
And while we are at it get rid of a bunch of event source fields whose
lifetime is bound to the sd_event object they belong to anyway, and make
use of the "floating" event source feature of sd-event instead.
We generally use utmpx instead of utmp (both are actually identical on
Linux, but utmpx is POSIX, while utmp is not). Let's fix one left-over
case.
UT_NAMESIZE does not exist in utmpx world, it has no direct counterpart,
hence let's just sizeof_field() to determine the size of the actual
field. (which comes to the same result of course: 32).
We typically want to deal in usec_t, hence let's change the prototype
accordingly, and do proper range checks. Also, make sure are not
confused by negative times.
Do something similar for mktime_or_timegm().
This is a more comprehensive alternative to #34065
Replaces: #34065
These operations might require slow I/O, and thus might block PID1's main
loop for an undeterminated amount of time. Instead of performing them
inline, fork a worker process and stash away the D-Bus message, and reply
once we get a SIGCHILD indicating they have completed. That way we don't
break compatibility and callers can continue to rely on the fact that when
they get the method reply the operation either succeeded or failed.
To keep backward compatibility, unlike reload control processes, these
are ran inside init.scope and not the target cgroup. Unlike ExecReload,
this is under our control and is not defined by the unit. This is necessary
because previously the operation also wasn't ran from the target cgroup,
so suddenly forking a copy-on-write copy of pid1 into the target cgroup
will make memory usage spike, and if there is a MemoryMax= or MemoryHigh=
set and the cgroup is already close to the limit, it will cause an OOM
kill, where previously it would have worked fine.
Follow-up for 7ac58157ca
With the mentioned commit, iff E2BIG we'd retry pidfd_spawn()
with POSIX_SPAWN_SETCGROUP disabled. However, the same strategy
should actually apply to EOPNOTSUPP/ENOSYS/EPERM too -
they can mean two things here: no clone3() or no CLONE_PIDFD.
Therefore, let's first try clone() + CLONE_PIDFD, and fall further back
to plain clone() (posix_spawn()) only as last resort. Plus, record
the fact so that we don't unnecessarily retry every single time
if CLONE_PIDFD is the one that's unavailable.
In some kernels (specifically, 5.4) even though the clone3 syscall is
supported, setting CLONE_INTO_CGROUP is not. The error message returned
in this case is E2BIG.
If posix_spawn_wrapper encounters this error, it does not retry, and
cannot spawn any programs in said kernels.
This commit adds a check for the E2BIG error and retries pidfd_spawn()
without the POSIX_SPAWN_SETCGROUP flag.
If we encounter an E2BIG error, and the pidfd_spawn() succeeds after
removing the POSIX_SPAWN_SETCGROUP flag, then we cache the result so
that we do not retry every time.
Originally, this issue was reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1077204.
Signed-off-by: Kornilios Kourtis <kornilios@gmail.com>
let's instead generate ENOTTY on our own. This is more correct with out
coding style (since we generally do not propagate errors via errno), and
also addresses #34039 as side effect. (#34039 really needs to be fixed
in musl though, too, this is just a work-around as side-effect).
Fixes: #34039
glibc returs EIO on ttys that are hung up. That's not really correct,
POSIX seems to disagree.
Work around this in our code, and turn this into a clean "1", since a
hung up tty doesn't stop being a tty just because it is hung up.
Background: https://github.com/systemd/systemd/pull/34039
The kernel parses FRA_SUPPRESS_PREFIXLEN as uint32_t, but internally
handled as signed integer and negative values as unset. Let's explicitly
specify the size of the variable.
No functional change, just refactoring.
Before this commit, the "Cannot raise nice level" branch
is rather confusing, as we're actually lowering the nice.
Also, it's better to log about the final nice value
for both cases, no matter whether we need to set to limit
or not.
Follow-up for 6d2984d21b
The current semantics of "filtered" in unit_is_filtered()
are actually the contrary of ListUnitsFiltered(). Let's
make things consistent, i.e. return true when the unit
shall be included.
When running unprivileged, checking /proc/1/root doesn't work because
it requires privileges. Instead, let's add an environment variable so
the process that chroot's can tell (systemd) subprocesses whether
they're running in a chroot or not.