Commit Graph

79054 Commits

Author SHA1 Message Date
Lennart Poettering
65664bba40 man: document new nspawn functionality around unpriv support 2025-01-23 21:48:02 +01:00
Lennart Poettering
46b7e96783 nspawn: add support for 'managed' userns mode even when we run privileged
So far, we supported two modes:

1. when running unpriv we'd get the mounts from mountfsd, and the userns
   from nsresourced
2. when running priv we'd do the mounts/userns ourselves

This untangles this a bit, so that we can also use mountfsd/nsresourced
when running privilged.

I think this is generally a bit nicer, and probably something we should
switch to entirely one day, as it reduces the variety of codepaths.

With this patch the default behaviour remains unchanged, but by
selecting the new "managed" option for --private-users= the codepaths
via mountfsd/nsresourced can be explicitly requested even when running
with privs.

This is mostly just reworks that we check for arg_userns_mode !=
USER_NAMESPACE_MANAGED rather than arg_privileged for a number of
codepaths, but requires more fixes, too. The devil is in the details.
2025-01-23 21:48:02 +01:00
Lennart Poettering
ca23deae09 nspawn: support foreign mappings also when nspawn doing the mapping itself
This adds a new "foreign" value to --private-users-ownership= which is a
lot like "map", but maps from the host's foreign UID range rather than from the
host's 0.

(This has nothing much to do with making unprivileged directory-based
containers work, it's just very handy that we can run privileged
contains with such a mapping too, with an easy switch)
2025-01-23 21:48:02 +01:00
Lennart Poettering
88252ca889 nspawn: allow to run unpriv from dir
This simply calls into mountfsd to acquire the root mount and uses it as
root for the container.

Note that this also makes one more change: previously we ran containers
directory off their backing directory. Except when we didn't, and there
were a variety of exceptions: if we had no privs, if we ran off a disk
image, if the directory was the host's root dir, and some others.

This simplifies the logic a bit: we now simply always create a temporary
directory in /tmp/ and bind mount everything there, in all code paths.
This simplifies our code a bit. After all, in order to control
propagation we need to turn the root into a mount point anyway, hence we
might just do it at one place for all cases.
2025-01-23 21:48:02 +01:00
Lennart Poettering
e57f99305e dissect-image: add client side API wrapper for MountDirectory() varlink call
This is simply a Varlink API client that taks a directory path and
userns fd and returns a mount fd.
2025-01-23 21:48:02 +01:00
Lennart Poettering
d6f8e1ae87 mntfsd: add api to mount dirs for containers
systemd-mountfsd so far provided a MountImage() API call for mounting a
disk image and returning a set of mount fds. This complements the API
with a new MountDirectory() API call, that operates on a directory
instead of an image file. Now, what makes this interesting is that it
applies an idmapping from the foreign UID range to the provided target
userns – and in which case unpriveleged operation is allowed (well,
under some conditions: in particular the client must own a parent dir of
the provided path).

This allows container managers to run fully unprivileged from
directories – as long as those directories are owned by the foreign UID
range. Basic operation is like this:

1. acquire a transient userns from systemd-nsresourced with 64K users
2. ask systemd-mountfsd for an idmapped mount of the container dir
   matching that userns
3. join the userns and bind the mount fd as root.

Note that we have to drop various sandboxing knobs from the mountfsd
service file for this to work, since the kernel's security checks that
try to ensure than an obstructed /proc/ cannot be circumvented via
mounting a new procfs will otherwise prohibit mountfsd to duplicate the
mounts properly.
2025-01-23 21:48:02 +01:00
Lennart Poettering
16ea491528 docs: mention the two other userdb services we ship these days 2025-01-23 21:13:41 +01:00
Yu Watanabe
544a67c8f7 udev-rules: check OWNER/GROUP= setting more strictly (#36123)
- refuses lines with unknown or invalid user/group,
- refuses non-system user/group in the setting.
2025-01-24 05:09:39 +09:00
Mike Yuan
0dc1716854 creds: permit interactive polkit auth when encrypting/decrypting through IPC 2025-01-24 05:08:12 +09:00
Mike Yuan
f3ba767d6c core/job: fix typo 2025-01-24 05:08:12 +09:00
Yu Watanabe
7e6786b7fb NEWS: mention OWNER=/GROUP= in udev rules now refuses non-system user/group 2025-01-24 02:33:18 +09:00
Yu Watanabe
02ec3dd4ef test: add test cases for OWNER=/GROUP= with non-system user/group 2025-01-24 02:33:18 +09:00
Yu Watanabe
f5cdf9515a udev-rules: ignore non-system user/group in OWNER=/GROUP=
Recently, we introduce 'clock' system group, and set it for rtc/ptp
devices. See af96ccfc24.

However, if non-system group with the same name is already exist,
previously the devices were owned by the non-system group. That may
possibly happen on updating systemd.

Let's avoid accidentally devices being owned by non-system user/group.
2025-01-24 02:33:18 +09:00
Yu Watanabe
a1ee55e3c9 udev-rules: ignore OWNER=/GROUP= with unknown user/group
Previously, when an unknown or invalid user/group is specified,
a token was installed with UID_INVALID/GID_INVALID. That's not only
meaningless in most cases, but also clears previous assignment,
if multiple OWNER=/GROUP= token exist for the same device, e.g.

KERNEL=="sda", GROUP="disk"
KERNEL=="sda", GROUP="nonexistentuser"

This makes when an unknown user/group is specified, the line will be
ignored. Hence, in the above example, the device will be owned by the
group "disk".
2025-01-24 02:33:18 +09:00
Yu Watanabe
e89eaeb027 udev-rules: get_user_creds()/get_group_creds() return -ESRCH when user/group does not exist
This drops -ENOENT error check for get_user_creds()/get_group_creds(),
as nowadays they always return -ESRCH when the specified user/groups
cannot be found.

This also adds short comments for NULL arguments.
2025-01-24 02:33:18 +09:00
Lennart Poettering
3e7910829e units: modprobe@.service tweaks (#36132) 2025-01-23 18:18:10 +01:00
Yu Watanabe
b7622cbab6 sd-device: chase sysattr and refuse to read/write outside of sysfs (#36004) 2025-01-24 01:58:19 +09:00
Yu Watanabe
e7fdc7644f udevadm: introduce cat command to show udev rules (#35893)
Closes #35818.
2025-01-24 01:49:42 +09:00
Lennart Poettering
71b6f718e2 units: don't load squasfs/erofs kmods explicitly
File system modules should be something the kernel can autoload
automatically, and according to my testing that works fine, hence let's
drop the explicit deps, in particular as systems usually stick to one fs
for these things, not both.

I inquired bluca about the reason to add it, and didn't remember
anymore, and was fine with me removing this. So let's remove this for
now, should issues arise we can revert this.
2025-01-23 16:29:28 +01:00
Lennart Poettering
6f69568cff units: mountfsd needs to pull DM and loop kmods
mountfsd is supposed to be available during early boot aleady, before
systemd-tmpfiles-setup-dev-early.service completes, hence make sure
loopback devices and DM already work before that.

As suggested by yuwata here:

https://github.com/systemd/systemd/pull/35685#issuecomment-2608157569
2025-01-23 16:29:22 +01:00
Lennart Poettering
9fc2126386 units: add a longer comment to modprobe@.service explaining when to use it 2025-01-23 16:29:20 +01:00
Yu Watanabe
1fe5b06363 sd-device: use device_in_subsystem() at more places 2025-01-23 22:54:11 +09:00
Yu Watanabe
640f8e9c4d sd-device: use specific setters for read entries from uevent file
Previously, if e.g. DRIVER=foo is specified in uevent file, the value is
only saved as property, but was not set to sd_device.driver.
That was inconsistent to the case when a device is created through
netlink uevent.

Let's always set when we get e.g. sd_device.driver when DRIVER=foo
from both uevent file and netlink uevent.
2025-01-23 22:54:11 +09:00
Yu Watanabe
17dc9ec4b6 sd-device: use sd_device_get_sysattr_value() to read uevent file
This also replaces the custom parser with strv_split_newlines_full().
No functional change, just refactoring.
2025-01-23 22:54:11 +09:00
Yu Watanabe
6ebbdcc0dd sd-device: use sd_device_get_sysattr_value() to read special symlinks
Then, cached result may be used. No functional change, just refactoring.
2025-01-23 22:54:11 +09:00
Yu Watanabe
8d89667aba sd-device: chase sysattr and refuse to read/write files outside of sysfs
This makes sd_device_get_sysattr_value()/sd_device_set_sysattr_value()
refuse to read/write files outside of sysfs for safety.

Also this makes
- use chase() to resolve and open the symlink in path to sysfs attribute,
- use delete_trailing_chars(),
- include error code in cache entry, so we can cache more error cases,
- refuse caching value written to uevent file of any devices, i.e.
  sd_device_set_sysattr_value(dev, "../uevent", "add") will also not
  cache the value "add".
2025-01-23 22:54:11 +09:00
Yu Watanabe
06503dd0df fileio: make read_virtual_file_at() accept O_PATH file descriptor
Then, merge read_virtual_file_at() and read_virtual_file_fd(), and make
the latter inline.
2025-01-23 22:54:07 +09:00
Yu Watanabe
f3c5c2b001 fileio: make write_string_file_at() accept O_PATH fd and an empty filename
Then, introduce an inline wrapper write_string_file_fd().
2025-01-23 22:53:05 +09:00
Yu Watanabe
9e096259ce fileio: fix verification on failure in write_string_file_full()
Fixes a bug introduced by 0ab5e2a4b4.
2025-01-23 22:24:19 +09:00
Yu Watanabe
7f2175eabb udevadm: introduce cat command
This introduces 'udevadm cat' command, that shows udev rules files or
udev.conf, which may be useful for debugging.

Closes #35818.
2025-01-23 22:23:45 +09:00
Yu Watanabe
bbe1ba5e87 bash-completion/udevadm-verify: suggest found udev rules files
This also fixes the issue that no suggestion is provided after a standalone
option is specified.
2025-01-23 22:23:45 +09:00
Yu Watanabe
7cb4508c5a udevadm-verify: chase specified paths
Also, when a filename is specified, also search udev rules file in
udev/rules.d directories.

This also refuses non-existing files, and file neither nor a regular
nor a directory, e.g. /dev/null.
2025-01-23 22:23:45 +09:00
Yu Watanabe
8e0f023548 udev-rules: log the first line number when continued 2025-01-23 22:23:45 +09:00
Yu Watanabe
86a08e70a8 udev: sort builtins
Then, 'udevadm test-builtin --help' lists builtins alphabetically.
2025-01-23 22:23:45 +09:00
Yu Watanabe
c3d526d765 shell-completion/udevadm: add net_driver
Follow-up for 2b5b25f123.
2025-01-23 22:23:45 +09:00
Yu Watanabe
eb86b4e63b tree-wide: use hash ops with destructor (#36107) 2025-01-23 22:20:42 +09:00
Daan De Meyer
6733b07d43 mkosi: Add back --preserve-env when running integrationt tests
The test wrapper script depends on various github actions environment
variables so let's make sure those are propagated.
2025-01-23 12:18:21 +01:00
Yu Watanabe
38f7edd9d3 hashmap: drop hashmap_free_free() and friends 2025-01-23 18:22:53 +09:00
Yu Watanabe
58f0cd14a0 test: use hash ops with destructor 2025-01-23 18:22:53 +09:00
Yu Watanabe
06835cb397 remount-fs: use hash ops with destructor 2025-01-23 18:22:53 +09:00
Yu Watanabe
60cc858e9d exec-util: use hash ops with destructor 2025-01-23 18:22:52 +09:00
Yu Watanabe
04b7949ecf network: use hash ops with destructor 2025-01-23 18:22:47 +09:00
Yu Watanabe
938a6b49bd sd-journal: use hash ops with destructor 2025-01-23 18:19:28 +09:00
Yu Watanabe
2d23cadd19 journal-file: use hash ops with destructor
This also makes JournalFile.chain_cache allocated when necessary.
2025-01-23 18:19:28 +09:00
Yu Watanabe
b87501ea3c sd-bus: use hash ops with destructor
This also makes vtable_methods and vtable_properties managed by Set,
as the key and value of each entry are equivalent.
2025-01-23 18:19:28 +09:00
Yu Watanabe
4516022833 delta: use hash ops with destructor
This also makes it use RET_GATHER().
2025-01-23 18:19:28 +09:00
Yu Watanabe
c1bfee0bdb bootctl: use hash ops with destructor
This also makes the hashmap allocated when necessary.
2025-01-23 18:19:28 +09:00
Yu Watanabe
852c05c94f catalog: modernize code
- set destructors to catalog_hash_ops,
- acquire OrderedHashmap when necessary,
- gracefully handle NULL catalog directories and output stream,
- rename function output arguments,
- add many many assertions,
- use RET_GATHER().
2025-01-23 18:19:28 +09:00
Yu Watanabe
12006a7233 wait-online: use hash ops with destructor 2025-01-23 18:19:28 +09:00
Yu Watanabe
a22620e39f udev: use hash ops with destructor 2025-01-23 18:19:28 +09:00