Commit Graph

1716 Commits

Author SHA1 Message Date
Mike Yuan
85352c095e various: turn off SO_PASSRIGHTS where fds are not expected 2025-06-17 13:16:44 +02:00
Mike Yuan
9453a92ad7 units/systemd-journald@.socket: enable SO_TIMESTAMP
Follow-up for 02229dff2b

This applies the change to journal namespace instances too.
2025-06-17 13:16:07 +02:00
Lennart Poettering
008818ec96 units: make sure the network tap driver is actually loaded
We have the After= line, but not the Wants= line. Fix that.
2025-06-14 13:29:14 +09:00
Lennart Poettering
273d14f5dd nsresourced: make sure "tun" driver is properly loaded and accessible
We need access to /dev/net/tun, hence make sure we can actually see
/dev/. Also make sure the module is properly loaded before we operate,
given that we run with limit caps. But then again give the CAP_NET_ADMIN
cap, since we need to configure the network tap/tun devices.

Follow-up for: 1365034727
2025-06-14 00:59:37 +02:00
Lennart Poettering
0cca16a836 units: enable watchdog notifications for vmspawn
nspawn supports it and enables it. Let's do this for vmspawn too. It
already supports it in code. Let's make it also work in the unit file.
2025-05-26 13:23:45 +02:00
Yu Watanabe
d766c75acd units: kill only udev services and keep udev sockets on switching root
This also makes initrd-cleanup.service explicitly start
initrd-switch-root.service with replace-irreversibly mode, to avoid
systemd-udevd.service being triggered by kernel events and the start
job of initrd-switch-root.service being cancelled.

Follow-ups for 676fb42aae.
Addresses https://github.com/systemd/systemd/pull/37374#issuecomment-2875990471.
2025-05-17 12:47:52 +01:00
Igor Opaniuk
8130af42e2 units: fix systemd-boot-clear-sysfail description
Fix 's/systemd-boot-random-seed/systemd-boot-clear-sysfail/g'
copypaste.

Fixes: https://github.com/systemd/systemd/issues/37415
Signed-off-by: Igor Opaniuk <igor.opaniuk@foundries.io>
2025-05-14 09:34:07 +02:00
Lennart Poettering
a388f007e0 journald: make journal Varlink IPC accessible to unpriv clients
The Synchronize() function is just too useful for clients, so that we
can make "systemd-run -v --user" actually useful. Hence let's make the
socket accessible without privs. Deny most method calls however, except
for the Synchronize() call.
2025-05-13 15:39:57 +02:00
Igor Opaniuk
2857a83975 bootctl: configure a sysfail entry
You can configure the sysfail boot entry using the bootctl command:
$ bootctl set-sysfail sysfail.conf

The value will be stored in the `LoaderEntrySysFail` EFI variable.

The `LoaderEntrySysFail` EFI variable would be unset automatically
during next boot by `systemd-boot-clear-sysfail.service` if no
system failure occured, otherwise it would be kept as it is and a system
failure reason will be saved to `LoaderSysFailReason` EFI variable.

Signed-off-by: Igor Opaniuk <igor.opaniuk@foundries.io>
2025-05-12 15:37:47 +02:00
Yu Watanabe
676fb42aae units: enable IgnoreOnIsolate=yes on systemd-udevd-kernel.socket
Otherwise, initrd-cleanup.service requests isolation thus the socket
is stopped before switching root, and several early events after
switching root may be lost.
2025-05-08 01:29:53 +09:00
Mike Yuan
fd66dc60a0 units: enable RemoveOnStop= for oomd and userdbd sockets
We usually don't care, but here the existence of socket
is public API to a certain degree and signals availability
of the service (userdbd in particular, oomd is checked in
core-varlink.c). Hence let's be more careful and remove them
if stopped.
2025-04-30 21:30:53 +02:00
Mike Yuan
e803ec1e25 units: unify deps between service and socket units
The current arrangement of service and socket units is
sort of all over the place. Let's clean it up a little,
roughly following the principles below:

- socket units have implicit ordering deps (not to be confused
  with default ones which are subject to DefaultDependencies=)
  before associated service, so drop any explicit After=

- If socket can be enabled, remember to link to it in service
  via Also= and Sockets= (the latter replaces Wants=).
  If the service Requires= socket however, Sockets= is omitted.

- If socket is statically enabled, no need for service
  to pull it in - machined
2025-04-30 21:27:37 +02:00
Nick Rosbrook
0fa188307b resolved: support socket activation via varlink sockets
Add two new socket units, one for each of systemd-resolved's varlink
servers:

 systemd-resolved-varlink.socket
 systemd-resolved-monitor.socket

Add logic to grab socket fds via sd_varlink_server_listen_name(), but
fallback to the existing sd_varlink_server_listen_address() calls if no
fds were given.

This will be used to make systemd-networkd-wait-online --dns more robust
against systemd-resolved restarts etc.
2025-04-30 11:12:15 -04:00
Daan De Meyer
29257d927d udev: Enable delegation without delegating any controllers
Delegation is enabled for udev so that it can mess around with the
cgroup hierarchy to avoid killing control processes when it calls
cg_kill in on_post() when it goes idle. We don't actually care about
any specific cgroup controllers in udev, so set Delegate= to enable
delegation without delegating any controllers

Follow up for https://github.com/systemd/systemd/pull/22752
2025-04-29 20:03:34 +02:00
Yu Watanabe
0d1819e791 units: stop systemd-udevd before soft-reboot
Otherwise, queued uevents may be lost on soft-reboot.

Similar to f89985ca49, but for
systemd-udevd.
2025-04-23 10:48:51 +09:00
Mike Yuan
a04da2db6b oomd: it's safe to assume cgv2 now 2025-04-13 18:09:40 +02:00
Yu Watanabe
b5d68c6ded units: update comment
Follow-up for f89985ca49.
2025-04-07 17:34:08 +09:00
Yu Watanabe
beaf7e04eb udev: push inotify fd to file descriptor store
Then, if we get inotify fd on start, it is not necessary to re-enable
inotify watch.
2025-04-05 17:33:14 +09:00
Yu Watanabe
011360eed3 meson: rename RC_LOCAL_PATH -> SYSTEM_SYSVRCLOCAL_PATH
No functional change, but just for emphasizing that this is for
SysV compatibility.
2025-04-03 00:19:49 +09:00
Mike Yuan
0a7d86d53d units/systemd-validatefs@.service: FailureAction= is a [Unit] knob 2025-03-31 19:23:51 +02:00
Lennart Poettering
0bdd5ccc81 validatefs: add new tool that enforces mount constraints
This new tool looks for a three xattr on the root inode of a file system
that encode mount constraints of the file system. The tool is supposed
to be hooke into the mount logic and is supposed to protect against
misappropriating trusted file systems in unintended ways.

Consider the following scenario: we boot up on first boot and create a
tpm-locked pair of /var/ and /srv/ partitions via systemd-repart. An
attacker then offline modifies the partition table, exchanging the
metadata of the /var/ and /srv/ partition. So far we'd happily accept
that, honour the modified metadata and boot up. This could be used to
revert changes to /var/ or similar. And all that even though both
partitions are encrypted and locked to TPM!

With this new mechanism we can encode in the protected contents of the
file systems the ways it can be used: the partition type uuid, the
partition label and the intended mount point can be stored in xattrs,
and we can check them automatically on mount, and take action on
mismatch. (action would typically be immediate reboot).
2025-03-31 15:14:13 +02:00
Yu Watanabe
5578f8e974 homed: move things over to quotactl_fd() (#36902)
Let's use quotactl_fd() wherever we can, it's 2025. quotactl() is such a
mess after all.
2025-03-31 21:15:03 +09:00
Lennart Poettering
8b21bbd6f0 pcrextend: whenever we fail to extend PCRs, reboot immediately
PCR extensions are supposed to be useful for "destroying" the ability to
access TPM bound secrets. Hence, if for some reason we fail to extend a
PCR, it's safer to just reboot, instead of going on without the
extension, leaving secrets potentially accessible which should not be
accessible.

Note that the services exit gracefully if no TPM is found, hence this
should not be triggered on TPM-less systems. However, this enforces that
if there is a TPM that is accessible to Linux and that works properly,
the PCR measurement must complete too.

Inspired by this thread:

https://lists.freedesktop.org/archives/systemd-devel/2025-March/051244.html
2025-03-31 21:13:33 +09:00
Lennart Poettering
5daca30b0f homed: always use quotactl_fd() if its available
Let's always prefer quotactl_fd() when it's available and use quotactl()
only as as a fallback on old kernels.

This way we can operate on the fds we typically already have open, or if
needed we can open a new one, and use for multiple fs operation.

In the long run we should really focus on operating exclusively by fd
instead of by path, by device nor or otherwise. This gets us a step
closer to that.
2025-03-31 11:51:15 +02:00
Luca Boccassi
d95818f522 meson: add feature flag for nspawn build
Other tools have it, nspawn doesn't, add one
2025-03-28 10:34:02 +00:00
Daan De Meyer
511bf79b4e userdb: Add userdb.user.* and userdb.group.* credentials (#36740)
Let's allow providing extra userdb users and groups via credentials.
Similarly to systemd-udev-load-credentials.service, we ship
systemd-userdb-load-credentials.service which transform the JSON
user/group records provided via the corresponding credentials to static
userdb dropins in /run/userdb.
2025-03-19 10:30:52 +01:00
Daan De Meyer
04a44e25b9 units: Add systemd-machined.socket 2025-03-19 09:28:12 +01:00
Daan De Meyer
fe0342edf4 userdb: Add userdb.user.* and userdb.group.* credentials
Let's allow providing extra userdb users and groups via credentials.
Similarly to systemd-udev-load-credentials.service, we ship
systemd-userdb-load-credentials.service which transform the JSON
user/group records provided via the corresponding credentials to static
userdb dropins in /etc/userdb.

Replaces #33811
2025-03-18 22:46:10 +01:00
Yu Watanabe
c06a630f0c nspawn: introduce --cleanup option to clear propagation and unix-export directories
This is useful when the previous invocation is unexpectedly killed.

Otherwise, if systemd-nspawn is killed forcibly, then unix-export
directory is not cleared and unmounted, and the subsequent invocation
will fail. E.g.
===
[   18.895515] TEST-13-NSPAWN.sh[645]: + machinectl start long-running
[   18.945703] systemd-nspawn[1387]: Mount point '/run/systemd/nspawn/unix-export/long-running' exists already, refusing.
[   18.949236] systemd[1]: systemd-nspawn@long-running.service: Failed with result 'exit-code'.
[   18.949743] systemd[1]: Failed to start systemd-nspawn@long-running.service.
===
2025-03-16 11:02:09 +09:00
Lennart Poettering
5dbf476b11 units: order oomd after swap.target
oomd only works well if we have swap, hence we should not start it
before swaps are up, in particular as we will print an annoying message
otherwise.

Fixes: #36704
2025-03-13 05:24:11 +09:00
Yu Watanabe
c0cc01de8a meson: use install_symlink() where applicable
Now our baseline of meson is 0.62, hence install_symlink() can be used.

Note, install_symlink() implies install_emptydir() for specified
install_dir. Hence, this also drops several unnecessary
install_emptydir() calls.

Note, the function currently does not support 'relative' and 'force' flags,
so several 'ln -frsT' inline calls cannot be replaced.
2025-03-10 02:41:40 +09:00
Mike Yuan
28ac3309d7 units/meson: remove unneeded linebreak 2025-03-05 17:03:59 +01:00
Mike Yuan
651b44bdda units: refuse manual operations on factory-reset-now.target and friends
It is strictly mandatory that this is done during initial
transaction, and not later when the system is already running.
Hence let's refuse manual start for all of the involved units.
Additionally, refuse manual stop for systemd-factory-reset-complete.service,
as it flags the factory reset completion through
/run/systemd/factory-reset-complete, which never gets removed
for the whole boot.
2025-03-05 17:03:59 +01:00
Lennart Poettering
73e53d2ee4 tpm2-clear: optionally reset TPM during a factory reset 2025-03-05 12:37:51 +01:00
Lennart Poettering
daae8f858d units: also require /dev/tpm0 to be around before tpm2.target can be reached
While we typically just use /dev/tpmrm0 for accessing the TPM chip (i.e
via the kernel's own resource manager), some sysfs properties that
matter are on /dev/tpm0 only (i.e. the version without the kernel TPM
resource manager). Hence, wait for both to show up in tpm2.target, so
that we can be sure the full API is available.

This matters because we want to access /sys/class/tpm/tpm0/ppi/request
in the next commit.
2025-03-05 12:37:48 +01:00
Lennart Poettering
41d9ed93d9 factory-reset: revamp infrastructure
This introduces a bunch of facilities:

1. The factory-reset.target unit that requests a factory reset is now
   complemented by factory-reset-now.target that executes it at next
   boot.

2. This latter is added to the initial transaction via the new trivial
   systemd-factory-reset-generator.

3. A tool systemd-factory-reset has been added to query, request,
   cancel, complete factory reset operations (via EFI variables). Two of
   these are wrapped into units that are plugged into
   factory-reset.target and factory-reset-now.target respectively. The
   tool also provides a simple Varlink API.

This should make things a lot cleaner, and both be useful as explicit
implementation on UEFI, and as template + hookpoints for alternative
implementations on non-UEFI.
2025-03-05 12:37:26 +01:00
Lennart Poettering
99e6d1b924 units: don't block on terminating agents
Terminating the plymouth/console agents when the wall agent takes over
can happen asynchronously, after all the pw queries are async anyway and
hence can be seen by both the plymouth/console agents and the wall
agent.

By stopping the two agents with "--no-block" we add a bit of robustness,
since trouble of them exiting won't block the wall agent to start.

This addresses the issue the previous commit fixes in a different way.
2025-03-03 10:47:09 +01:00
Lennart Poettering
6ee3bc046b units: measure "factory-reset" into PCR 11 when we request factory reset
Let's make sure that the moment where factory reset is requested is
visible in the TPM PCR state, so that access to secrets is terminated.

This is particulary interesting when the system is booted with
systemd.unit=factory-reset.target on the kernel command line, requesting
a factory reset on the following boot. The preparations done in
userspace should already lose access to the TPM in that case.
2025-02-27 13:20:23 +01:00
Lennart Poettering
b493502475 units: measure the fact we enter storage target mode into TPM
storagetm mode means we we are network accessible. let's lock down
access to TPM secrets in this case: let's measure a pcr "phase" string
into PCR 11.

This is good as it means that if we are exploited in this state FDE
secrets protected by TPM are likely to remain protected, since the PCR
values wouldn't allow access.
2025-02-27 13:20:23 +01:00
Lennart Poettering
810708f4b8 integritysetup: add remote-integritysetup.target to match remote-{crypt|verity}setup.target
Let's make the three subsystems more alike, and add remote-*setup.traget
for all three, enable them all three in the presets, and make them
behave in a similar fashion.
2025-02-25 21:40:05 +01:00
Lennart Poettering
c88fdb1e56 import-generator: optionally create loopback devices after download
This is useful for booting from a freshly downloaded disk image: just
specify

    rd.systemd.pull=verify=no,machine,blockdev,raw:image:https://192.168.100.1:8081/image.raw
    root=/dev/disk/by-loop-ref/image.raw-part2

on the kernel command line, and we'll download that in the initrd and boot from it.

(note the above disables download-time verification, putting trust in
verity and image policy that this won#t do harm)

Here's a more complete example. From a git checkout do:

    ninja -C build && mkosi -f -T serve

and then from another terminal do within the same checkout:

    ./build/systemd-vmspawn \
            --ram=16G \
            --register=no \
            -n \
            -i ./build/mkosi.output/image.raw \
            rd.systemd.pull=verify=no,machine,blockdev,raw:image:http://192.168.100.1:8081/image.raw \
            root=/dev/disk/by-loop-ref/image.raw-part2 \
            rootflags=x-systemd.device-timeout=infinity \
            ip=any

This will then boot via the ESP of the specified image, then download
the image via HTTP from the mkosi instance running in the first
terminal, attach it to a loopback block device, and then use its second
partition as root fs, and boot into it.

(this assumes your host is 192.168.100.1, of course)

Note that downloading the full image takes a bit of time (this downloads
it uncompressed after all), hence we turn off the timeout to wait for
the device.

This also introduces a new "imports.target" unit (and associated
"imports-pre.target") between imports are grouped, and which ensure the
imports actually are ordered correctly both on the host and in the
initrd.
2025-02-21 10:03:32 +01:00
Lennart Poettering
50063d496d units: add generic service for attaching a file to a loopback device
This is mostly just a friendly unit wrapper around "systemd-dissect
--attach".

This is useful so that we can automatically attach disk images as
block device at boot.
2025-02-21 09:57:02 +01:00
Mike Yuan
0d76f1c423 core/mount: rework GracefulOptions= to be just x-systemd.graceful-option=
09fbff57fc introduced new knob
for such functionality. However, that seems unnecessary.

The mount option string is ubiquitous in that all of fstab,
kernel cmdline, credentials, systemd-mount, ... speak it.
And we already have x-systemd.device-bound= that's parsed
by pid1 instead of fstab-generator. It feels hence more natural
for graceful options to be an extension of that, rather than
its own property.

There's also one nice side effect that the setting itself
is now more graceful for systemd versions not supporting
such feature.
2025-02-12 18:16:44 +01:00
Lennart Poettering
d6f8e1ae87 mntfsd: add api to mount dirs for containers
systemd-mountfsd so far provided a MountImage() API call for mounting a
disk image and returning a set of mount fds. This complements the API
with a new MountDirectory() API call, that operates on a directory
instead of an image file. Now, what makes this interesting is that it
applies an idmapping from the foreign UID range to the provided target
userns – and in which case unpriveleged operation is allowed (well,
under some conditions: in particular the client must own a parent dir of
the provided path).

This allows container managers to run fully unprivileged from
directories – as long as those directories are owned by the foreign UID
range. Basic operation is like this:

1. acquire a transient userns from systemd-nsresourced with 64K users
2. ask systemd-mountfsd for an idmapped mount of the container dir
   matching that userns
3. join the userns and bind the mount fd as root.

Note that we have to drop various sandboxing knobs from the mountfsd
service file for this to work, since the kernel's security checks that
try to ensure than an obstructed /proc/ cannot be circumvented via
mounting a new procfs will otherwise prohibit mountfsd to duplicate the
mounts properly.
2025-01-23 21:48:02 +01:00
Lennart Poettering
71b6f718e2 units: don't load squasfs/erofs kmods explicitly
File system modules should be something the kernel can autoload
automatically, and according to my testing that works fine, hence let's
drop the explicit deps, in particular as systems usually stick to one fs
for these things, not both.

I inquired bluca about the reason to add it, and didn't remember
anymore, and was fine with me removing this. So let's remove this for
now, should issues arise we can revert this.
2025-01-23 16:29:28 +01:00
Lennart Poettering
6f69568cff units: mountfsd needs to pull DM and loop kmods
mountfsd is supposed to be available during early boot aleady, before
systemd-tmpfiles-setup-dev-early.service completes, hence make sure
loopback devices and DM already work before that.

As suggested by yuwata here:

https://github.com/systemd/systemd/pull/35685#issuecomment-2608157569
2025-01-23 16:29:22 +01:00
Lennart Poettering
9fc2126386 units: add a longer comment to modprobe@.service explaining when to use it 2025-01-23 16:29:20 +01:00
Lennart Poettering
c9f805674a units: enable usrquota support on /tmp/ 2025-01-18 23:13:06 +01:00
Luca Boccassi
af0a28854d meson: add udev/hwdb build aliases
Allows to do:

meson compile libudev udev hwdb
meson install --no-rebuild --tags libudev,udev,hwdb
2025-01-15 09:48:27 +00:00
Sea-Eun Lee
015a3b8cb1 oomd: support reloading configuration at runtime 2025-01-14 14:42:23 +01:00