It's time. sd-json was already done earlier in this cycle, let's now
make sd-varlink public too.
This is mostly just a search/replace job of epical proportions.
I left some functions internal (mostly IDL handling), and I turned some
static inline calls into regular calls.
This is preparation for making our Varlink API a public API. Since our
Varlink API is built on top of our JSON API we need to make that public
first (it's a nice API, but JSON APIs there are already enough, this is
purely about the Varlink angle).
I made most of the json.h APIs public, and just placed them in
sd-json.h. Sometimes I wasn't so sure however, since the underlying data
structures would have to be made public too. If in doubt I didn#t risk
it, and moved the relevant API to src/libsystemd/sd-json/json-util.h
instead (without any sd_* symbol prefixes).
This is mostly a giant search/replace patch.
libbpf returns error codes from the kernel unmodified, and we don't understand
them so non-fatal ones are handled as hard errors.
Add a translation helper, and start by translating 524 to EOPNOTSUPP, which is
returned when nsresourced tries to use LSM BPF hooks that are not
implemented on a given arch (in this case, arm64 is misssing trampolines).
Fixes https://github.com/systemd/systemd/issues/32170
bpf_rdonly_cast() was introduced in libbpf commit 688879f together with
the definition of a bpf_core_cast macro. So use that one to avoid
defining a prototype for bpf_rdonly_cast;
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.