core: Add PrivateUsers=full

Recently, PrivateUsers=identity was added to support mapping the first
65536 UIDs/GIDs from parent to the child namespace and mapping the other
UID/GIDs to the nobody user.

However, there are use cases where users have UIDs/GIDs > 65536 and need
to do a similar identity mapping. Moreover, in some of those cases, users
want a full identity mapping from 0 -> UID_MAX.

Note to differentiate ourselves from the init user namespace, we need to
set up the uid_map/gid_map like:
```
0 0 1
1 1 UINT32_MAX - 1
```

as the init user namedspace uses `0 0 UINT32_MAX` and some applications -
like systemd itself - determine if its a non-init user namespace based on
uid_map/gid_map files. Note systemd will remove this heuristic in
running_in_userns() in version 258 and uses namespace inode. But some users
may be running a container image with older systemd < 258 so we keep this
hack until version 259.

To support this, we add PrivateUsers=full that does identity mapping for
all available UID/GIDs.

Fixes: #35168
This commit is contained in:
Ryan Wilson
2024-11-15 06:56:05 -08:00
parent 3cf362f6f5
commit 705cc82938
5 changed files with 38 additions and 2 deletions

View File

@@ -2009,8 +2009,8 @@ BindReadOnlyPaths=/var/lib/systemd</programlisting>
<varlistentry>
<term><varname>PrivateUsers=</varname></term>
<listitem><para>Takes a boolean argument or one of <literal>self</literal> or
<literal>identity</literal>. Defaults to false. If enabled, sets up a new user namespace for the
<listitem><para>Takes a boolean argument or one of <literal>self</literal>, <literal>identity</literal>,
or <literal>full</literal>. Defaults to false. If enabled, sets up a new user namespace for the
executed processes and configures a user and group mapping. If set to a true value or
<literal>self</literal>, a minimal user and group mapping is configured that maps the
<literal>root</literal> user and group as well as the unit's own user and group to themselves and
@@ -2026,6 +2026,10 @@ BindReadOnlyPaths=/var/lib/systemd</programlisting>
since all UIDs/GIDs are chosen identically it does provide process capability isolation, and hence is
often a good choice if proper user namespacing with distinct UID maps is not appropriate.</para>
<para>If the parameter is <literal>full</literal>, user namespacing is set up with an identity
mapping for all UIDs/GIDs. Similar to <literal>identity</literal>, this does not provide UID/GID
isolation, but it does provide process capability isolation.</para>
<para>If this mode is enabled, all unit processes are run without privileges in the host user
namespace (regardless if the unit's own user/group is <literal>root</literal> or not). Specifically
this means that the process will have zero process capabilities on the host's user namespace, but