We only support a subset of filesystems, and no RAID, for DDIs. blkid spends a lot
of time trying to probe for the filesystem type, so cut it short by using
the filtering options to restrict it to the filesystems we support, and to
exclude raid probing.
Coverity gets confused because the names were swapped. The parameters
are all passed in the right position, so there's no functional issue,
but the naming is confusing and trips static analyzers, so fix it.
CID#1621624
Follow-up for 8a9ab3dbbc
After the update to systemd 257.7 in Fedora, there are reports that we fail to
create a symlink:
systemd-gpt-auto-generator[585]: Failed to create symlink /run/systemd/generator/local-fs.target.wants/systemd-fsck-root.service: File exists
(sd-exec-[574]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
I guess that some other generator created the symlink. We silently ignore
EEXIST in similar codepaths, so add that in one more place. (The target of the
symlink doesn't really matter. The name of the link matters. So something like
symlink_idempotent would not be better. For example, a different generator
might use a slightly different target path, and symlink_idempotent would be too
strict.)
Fixes a regression caused by d307410327.
The link_mode_masks flex array in struct ethtool_link_settings contains
three packed arrays, and the length of each array is given by
link_mode_masks_nwords field:
```
__u32 link_mode_masks[];
/* layout of link_mode_masks fields:
* __u32 map_supported[link_mode_masks_nwords];
* __u32 map_advertising[link_mode_masks_nwords];
* __u32 map_lp_advertising[link_mode_masks_nwords];
*/
```
Hence, we cannot use the received data as is through the union, but need
to shift the array to make each map accessible through the union.
We might be operating with a newer systemctl on an image with older
systemd and thus without an initrd-preset directory. Before
4a8c395167, we would use the system
presets, let's make sure we keep doing that if we're operating on an
image without initrd presets.
Follow up for 4a8c395167.
Function execute_directories logged in a way that was meaningless
without additional context:
systemd[1]: No executables found.
In execute_strv this was partially rectified by extracting the directory
name from one of the directories and using this as the identifier. But
the directory name is not always meaningful, and can also be set from
an environment variable. Let's simplify things by providing a fixed name
that can be used consistently in all log messages. In particular this will
make error messages easier to understand if users report just the error
without additional context.
Early boot log:
Sep 01 12:06:03 fedora systemd[1]: Mounting tmpfs to /dev/shm of type tmpfs with options mode=01777,usrquota.
Sep 01 12:06:03 fedora systemd[1]: Mounting tmpfs (tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777,usrquota")...
Sep 01 12:06:03 fedora systemd[1]: Mounting devpts to /dev/pts of type devpts with options mode=0600,gid=5.
Sep 01 12:06:03 fedora systemd[1]: Mounting devpts (devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0600,gid=5")...
Sep 01 12:06:03 fedora systemd[1]: Mounting tmpfs to /run of type tmpfs with options mode=0755,size=20%,nr_inodes=800k.
Sep 01 12:06:03 fedora systemd[1]: Mounting tmpfs (tmpfs) on /run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=0755,size=20%,nr_inodes=800k")...
Sep 01 12:06:03 fedora systemd[1]: Mounting cgroup2 to /sys/fs/cgroup of type cgroup2 with options nsdelegate,memory_recursiveprot.
Sep 01 12:06:03 fedora systemd[1]: Mounting cgroup2 (cgroup2) on /sys/fs/cgroup (MS_NOSUID|MS_NODEV|MS_NOEXEC "nsdelegate,memory_recursiveprot")...
Sep 01 12:06:03 fedora systemd[1]: Mounting pstore to /sys/fs/pstore of type pstore with options ''.
Sep 01 12:06:03 fedora systemd[1]: Mounting pstore (pstore) on /sys/fs/pstore (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Sep 01 12:06:03 fedora systemd[1]: Mounting efivarfs to /sys/firmware/efi/efivars of type efivarfs with options ''.
Sep 01 12:06:03 fedora systemd[1]: Mounting efivarfs (efivarfs) on /sys/firmware/efi/efivars (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Sep 01 12:06:03 fedora systemd[1]: Mounting bpf to /sys/fs/bpf of type bpf with options mode=0700.
Sep 01 12:06:03 fedora systemd[1]: Mounting bpf (bpf) on /sys/fs/bpf (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0700")...
We logged and then called mount_verbose_full() immediately after, resulting in
duplicate logging. The second line is more informative than the first one, so
kill the first one.
Since c5de7b14ae
file searching implies a new mount api syscall by default,
to trigger automounts.
But, this is not necessary in most cases, e.g. when chasing
syspath in sd-device (actually this causes regression in umockdev,
see https://github.com/martinpitt/umockdev/issues/271).
Another example is reading unit files, especially .network files,
as automount may trigger mounting network filesystems...
Also, when this is used in NSS plugins, programs that load the
plugins may fail because of spuriously configured seccomp. See #38565.
Let's not trigger automount by default, and do only when explicitly
requested.
This introduces CHASE_TRIGGER_AUTOFS, and use it in
- service manager,
- bootctl and finding ESP/xbootldr,
- sysupdate,
- mountfsd,
- systemd-mount.
There may be several more places we should trigger automount, but let's
do that later.
Follow-up for c5de7b14ae.
Fixes#38565.
Replaces #38569.
Co-authored-by: Luca Boccassi <luca.boccassi@gmail.com>
This reverts commit 490aa05ca1.
As commented https://github.com/systemd/systemd/pull/38569#discussion_r2284978273,
the commit makes autofs check bypassed. Before the commit, when
CHASE_NO_AUTOFS is set, we did not shortcut chasing paths, and refused
any autofs mount points in the path. However, with the commit, the flag
was swapped but even when CHASE_AUTOFS is unset, the autofs check may be
skipped.
To fix the issue, rather than swapping the flag, we should introduce
another flag, say CHASE_TRIGGER_AUTOFS. This revert the commit, and in a
later commit, the new flag will be introduced.
Since c5de7b14ae
file searching implies a new mount api syscall by default,
to trigger automounts.
This is problematic in NSS plugins, as they are dlopen'ed inside
processes by glibc, for two reasons.
First of all, potentially searching on a networked filesystem
automount could lead to nasty surprises, such as the process
responsible for setting up the network filesystem trying to
search on that same filesystem.
More importantly, the new mount api syscall was never part of
the filesystem seccomp filter that we provide by default, and
given mounting/remounting/bind mounting is one of the possible
ways to bypass sandboxing it is very likely not allowed when
custom filters are used in sandboxed processes, if they don't
need to do these operations otherwise.
The filesystem seccomp mask we provide has been updated, however
this only takes effect on the next restart of a service. When
systemd is upgraded via a package upgrade, the new nss plugin is
installed and will be immediately dlopen'ed by glibc when needed,
without waiting for the process to restart, which means the existing
seccomp filter applies, causing the filter to trigger.
Given it's not really possible for any arbitrary program to
predict which NSS modules glibc will load, given programs do not
configure that and instead nsswitch is set up by the sysadmin,
it's impossible to handle at each process level. It's also not
possible to know when it will be triggered, given the plugin
is not linked in each binary tools like need-restart cannot
even pre-emptively restart services that may be affected.
This means in practice, upgrading from systemd << v258 to >= v258
requires a reboot to avoid either subtle or catastrophic system
failures.
By avoiding to trigger automounts in nss-systemd we can avoid
both issues.
userdb drop-ins are searched for in:
/etc/userdb/
/run/userdb/
/run/host/userdb/
/usr/local/lib/userdb/
/usr/lib/userdb/
none of which are supported as automounts anyway.
Note that this happens only when the userdbd service is not running,
as otherwise nss-systemd will go through the varlink IPC, rather than
doing the searches in-process.
So invert CHASE_NO_AUTOFS to CHASE_AUTOFS and set it in the places where
we do want to trigger automounts, like looking for the ESP.
Follow-up for c5de7b14ae
Fixes https://github.com/systemd/systemd/issues/38565
Fixes the following error message (the last line):
```
[FAILED] Failed to start TEST-60-MOUNT-RATELIMIT.service.
Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
All filesystems, swaps, loop devices, MD devices and DM devices detached.
Exiting container.
Failed to read from pty input fd: Bad file descriptor
```
Follow-up for b823809bca and
cf89e48028.
Then, also make nss modules parse $SYSTEMD_ASSERT_RETURN_IS_CRITICAL
environment variable.
This also moves nss-util.c and nss-util.h from src/basic/ to src/shared/,
as they are not used by libsystemd.
gethostname_full() is used in nss-myhostname, and hence random
application may indirectly call it. When an application with a too strict
seccomp filter loads the nss module, the application may trigger the
assertion.
Partially fixes#38582.
For safety, though typically Esys_Free() is just a simple wrapper of
free(), but let's do unconditionally. See the comment in the code.
While at it, this makes it store the result into struct iovec.
../src/shared/seccomp-util.c: In function ‘seccomp_restrict_sxid’:
../src/shared/seccomp-util.c:2228:25: error: ‘__NR_fchmodat2’ undeclared (first use in this function); did you mean ‘fchmodat2’?
2228 | __NR_fchmodat2,
| ^~~~~~~~~~~~~~
| fchmodat2
The override/sys/syscalls.h needs to be included before the seccomp
headers, otherwise the internal seccomp preprocessor machinery will
not see the local definitions, so the local ifdef will be true but
the seccomp own definitions will be empty
It is not easy to understand what happens to a journal file
even with debug logs enabled. Add more dbg messages around operations
started by users to make it possible to follow the flow of operations.
drained() checks PTYForward.master_readable flag, but it may be
tentatively unset due to a tentative error like EAGAIN in the previous
IO event. Let's try to call shovel() one more time, which re-read the
master and call drained() at the end. Otherwise, we may lost some data.
When PTYForward.done is set, the PTYForward.master is already
disconnected. Let's not try to read the already closed file descriptor.
Also, if we previously received vhangup, then it is not necessary to
re-read the device to check vhangup, as we already know.
This also make the check slightly delayed, and use a defer event source,
to make the function can be called safely in another event source.
Currently, pty_forward_set_ignore_vhangup() is only used for disabling
the flag. To make the function also disable PTY_FORWARD_IGNORE_INITIAL_VHANGUP
flag, this renames it to pty_forward_honor_vhangup().
Also, for consistency, pty_forward_get_ignore_vhangup() and
ignore_vhangup() are replaced with pty_forward_vhangup_honored().
Previously, do_shovel() sometimes call pty_forward_done(), and
its caller shovel() also call pty_forward_done(). Let's move all
pty_forward_done() calls to shovel(), and do_shovel() not call it.
No functional change, just refactoring.
We had errno_to_name() which works for "known" errnos, and returns NULL for
unknown ones, and then ERRNO_NAME which always returns an answer, possibly just
a number as a string, but requires a helper buffer.
It is possible for the kernel to add a new errno. We recently learned that some
architectures define custom errno names. Or for some function to unexpectedly
return a bogus errno value. In almost all cases it's better to print that value
rather than "n/a" or "(null)". So let's use ERRNO_NAME is most error handling
code. Noteably, our code wasn't very good in handling the potential NULL, so
in various places we could print "(null)". Since this is supposed to be used
most of the time, let's shorten the names to ERRNO_NAME/errno_name.
There are a few places where we don't want to use the fallback path, in
particular for D-Bus error names or when saving the error name. Let's rename
errno_to_name() to errno_name_no_fallback() to make the distinction clearer.
The usual pattern of using colors to distinguish the mount path (/efi/)
and the rest is used. If the file cannot be read for reasons other than
-ENOENT, the error message is highlighted.
I considered a few places where to add this, but this section seems the
most reaosonable. We already print the 'token' there, which is also part of
the configuration.
Boot Loader Entry Locations:
ESP: /efi (/dev/disk/by-partuuid/31659406-5a17-46ec-8195-0dea1667db58)
config: /efi//loader/loader.conf
XBOOTLDR: /boot (/dev/disk/by-partuuid/4f8a8fe9-4b45-4070-9e9b-a681be51c902, $BOOT)
token: fedora
Fixes the following error when running with sanitizers:
```
TEST-87-AUX-UTILS-VM.sh[670]: + bootctl install --make-entry-directory=yes
TEST-87-AUX-UTILS-VM.sh[695]: Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi.signed" to "/boot/EFI/systemd/systemd-bootx64.efi".
TEST-87-AUX-UTILS-VM.sh[695]: Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi.signed" to "/boot/EFI/BOOT/BOOTX64.EFI".
TEST-87-AUX-UTILS-VM.sh[695]: Created "/boot/fedora".
TEST-87-AUX-UTILS-VM.sh[695]: Random seed file /boot/loader/random-seed successfully refreshed (32 bytes).
TEST-87-AUX-UTILS-VM.sh[695]: ../src/shared/efi-api.c:618:38: runtime error: left shift of 243 by 24 places cannot be represented in type 'int'
```
cpu_set_add_range() is used in parse_cpu_set(), hence already tested.
But it is better to test these functions explicitly.
For CID#1611787 and CID#1611788, that should be false-positive.