This moves around the UserDBMatch handling, moves it out of userdbctl
and into generic userdb code, so that it can be passed to the server
side, to allow server side filtering.
This is preparation for one day allowing complex software to do such
filtering server side, and thus reducing the necessary traffic.
Right now no server side actually knows this, hence care is taken to
downgrade to the userdb varlink API as it was in v257 in case the new
options are not understood. This retains compatibility with any
implementation hence.
In containers securityfs is typically not mounted. Our lsm-bpf code
so far detected this situation and claimed the kernel was lacking
lsm-bpf support. Which isn't quite true though, it might very well
support it. This made boots of systemd in systemd-nspawn a bit ugly,
because of the misleading log message at boot.
Let's improve things, and make clearer what is going on.
Let's optionally mangle any passed name on the server side so that it is
useful for identifying a userns, if it isn't suitable for that
right-away. This mostly means truncating it if too long.
It's just too nasty to leave this to the client side, since they'd have
to understand the precise rules for naming userns then.
While we are at it, add full Varlink IDL comments.
…ith fd passing enabled
Let's add a simple flag that enables fd passing for all connections of a
server. It's much easier to use this than to install a connect handler
which manually enables this for each connection.
- Make fd_is_namespace() take NamespaceType
- Drop support for kernel without NS_GET_NSTYPE (< 4.11)
- Port is_our_namespace() to namespace_open_by_type()
(preparation for later commits, where the latter
would go by pidfd if available, avoiding procfs)
The values that were previously hardcoded in sd-varlink.c are now defined
in new varlink_set_info_systemd() and that function is called everywhere
where we create a server.
This is the same as json_dispatch_user_group_name() but fills in the
string as "const char*" to the JSON field. Or in other words, it's what
sd_json_dispatch_const_string() is to sd_json_dispatch_string().
Note this drops the SD_JSON_STRICT flags from various dispatch tables
for these fields, and replaces this by SD_JSON_RELAX, i.e. the opposite
behaviour. As #34558 correctly suggests we should validate user names
in lookup functions using the lax rules, rather than the strict ones,
since clients not knowing the rules might ask us for arbitrary
resolution.
(SD_JSON_RELAX internally translates to valid_user_group_name() with the
VALID_USER_RELAX flag).
See: #34558
```
In file included from ../src/nsresourced/nsresourced-manager.c:9:
../src/shared/bpf-link.h:5:10: fatal error: bpf/libbpf.h: No such file or directory
5 | #include <bpf/libbpf.h>
| ^~~~~~~~~~~~~~
```
Follow-up for 46718d344f
It's time. sd-json was already done earlier in this cycle, let's now
make sd-varlink public too.
This is mostly just a search/replace job of epical proportions.
I left some functions internal (mostly IDL handling), and I turned some
static inline calls into regular calls.
This is preparation for making our Varlink API a public API. Since our
Varlink API is built on top of our JSON API we need to make that public
first (it's a nice API, but JSON APIs there are already enough, this is
purely about the Varlink angle).
I made most of the json.h APIs public, and just placed them in
sd-json.h. Sometimes I wasn't so sure however, since the underlying data
structures would have to be made public too. If in doubt I didn#t risk
it, and moved the relevant API to src/libsystemd/sd-json/json-util.h
instead (without any sd_* symbol prefixes).
This is mostly a giant search/replace patch.
libbpf returns error codes from the kernel unmodified, and we don't understand
them so non-fatal ones are handled as hard errors.
Add a translation helper, and start by translating 524 to EOPNOTSUPP, which is
returned when nsresourced tries to use LSM BPF hooks that are not
implemented on a given arch (in this case, arm64 is misssing trampolines).
Fixes https://github.com/systemd/systemd/issues/32170
bpf_rdonly_cast() was introduced in libbpf commit 688879f together with
the definition of a bpf_core_cast macro. So use that one to avoid
defining a prototype for bpf_rdonly_cast;
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.