discover-image.[ch] largely already supports per-scope operations, let's
extend this however to also cover finding .nspawn settings files and
managing the pool dir.
We intend to make self-registering machines an unprivileged operation,
but currently that would allow an unprivileged user to register a
process they own in the root namespace, and then login as any
user they like, including root, which is not ideal.
Forbid non-root from shelling into a machine that is running in
the root user namespace.
Current methods take a numeric PID, but we know that is unreliable for
the usual reasons. Add variants that take a PIDFD instead, or a
PID + PIDFDID combination for remote users.
UID entry in the machine state file is introduced in v258,
hence when a host is upgraded to v258, the field does not exist in the
file, thus the variable 'uid' is NULL.
Follow-up for 276d200186.
Fixes#39061.
Registering a process as a machine means a caller can get machined
to send sigterm to it, and more. If an unpriv user is registering,
ensure the registered process is actually owned by the user.
Follow-up for adaff8eb35
Currently, pty_forward_set_ignore_vhangup() is only used for disabling
the flag. To make the function also disable PTY_FORWARD_IGNORE_INITIAL_VHANGUP
flag, this renames it to pty_forward_honor_vhangup().
Also, for consistency, pty_forward_get_ignore_vhangup() and
ignore_vhangup() are replaced with pty_forward_vhangup_honored().
So far, machined strictly tracked the "leader" process of a machine,
i.e. the topmost process that is actually the payload of the machine.
Its runtime also defines the runtime of the machine, and we can directly
interact with it if we need to, for example for containers to join the
namespaces, or kill it.
Let's optionally also track the "supervisor" process of a machine, i.e.
the host process that manages the payload if there is one. This is
generally useful info, but in particular is useful because we might need
to communicate with it to shutdown a machine without cooperation of the
payload. Traditionally we did this by simply stopping the unit of the
machine, but this is not doable now that the host machined can be used
to track per-user machines.
In the long run we probably want a more bespoke protocol between
machined and supervisors (so that we can execute other commands too,
such as request cooperative reboots/shutdowns), but that's for later.
Some environments call the concept "monitor" rather than "supervisor" or
use some other term. I stuck to "supervisor" because nspawn uses this,
and ultimately one name is as good as another.
And of course, in other implementations of VM managers of containers
there might not be a single process tracking each VM/container. Because
of this, the concept of a supervisor is optional.
The difference between these two operations are large: one is relatively
superficial: for "registration" all resources remain associated with the
invoking user, only the cgroup is reported to machined which then keeps
track of the machine, too. OTOH "creation" a scope is allocated in
system context, hence the invoked code will be owned by the system, and
its resource usage charged against the system.
Hence, use two distinct polkit actions for this, so that we can relax
access to registration, but keep access to creation tough.
Now that unpriv clients can register machines, let's register their UID
too. This allows us to do two things:
1. make sure the scope delegation is assigned to the right UID (so that
the unpriv user can actually create cgroups below the delegated
scope)
2. permit certain types of access (i.e. killing, or pty access) to the
client without auth if it owns the machine.
All other status output lines use tabs, use that for the ID shift line
too. otherwise output will appear unaligned if log viewers have fixed
tab stop positions.
We use symbols provided by unistd.h without including it. E.g.
open(), close(), read(), write(), access(), symlink(), unlink(), rmdir(),
fsync(), syncfs(), lseek(), ftruncate(), fchown(), dup2(), pipe2(),
getuid(), getgid(), gettid(), getppid(), pipe2(), execv(), _exit(),
environ, STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, F_OK, and their
friends and variants, so on.
Currently, unistd.h is indirectly included mainly in the following two paths:
- through missing_syscall.h, which is planned to covert to .c file.
- through signal.h -> bits/sigstksz.h, which is new since glibc-2.34.
Note, signal.h is included by sd-eevent.h. So, many source files
indirectly include unistd.h if newer glibc is used.
Currently, our baseline on glibc is 2.31. We need to support glibc older
than 2.34, but unfortunately, we do not have any CI environments with
such old glibc. CIFuzz uses glibc-2.31, but it builds only fuzzers, and
many files are even not compiled.
These method calls all already have polkit hookup, hence actually allow
them to go through on all levels.
This is mostly playing catchup with a variety of calls added over the
years.
The method call already does a PK check, it was just forgotten to
allowlist this in the dbus policy. And in the dbus vtable for
OpenMachinePTY() call. (It was allowlisted in the per-machine
vtable…)
Anyway, clean this up.
git grep -l 'Failed to open /'|xargs sed -r -i 's|"Failed to open (/[^ ]+): %m"|"Failed to open %s: %m", "\1"|g'
git grep -l $'Failed to open \'/'|xargs sed -r -i $'s|"Failed to open \'(/[^ ]+)\': %m"|"Failed to open %s: %m", "\\1"|g'
git grep -l "Failed to open /"|xargs sed -r -i $'s|"Failed to open (/[^ ]+), ignoring: %m"|"Failed to open %s, ignoring: %m", "\\1"|g'
+ some manual fixups.
Let's check the leader alive state, and let's log about dbus errors.
This mimics (but is not quite identical to) what we do these days in
logind for GC'ing user sessions.
This effectively renames cg_is_empty_recursive() to cg_is_empty().
Note that all existing code calls the former and not the latter,
hence with cgv1 support being dropped it's trivial to consult
cgroup.events directly for populated state everywhere.
Additionally, use more generic cg_get_keyed_attribute() helper
rather than cg_read_event().