Commit Graph

42880 Commits

Author SHA1 Message Date
Daan De Meyer
750d9859c1 sulogin-shell: Start initrd.target on exit in the initrd
sulogin is documented to continue booting up on exit. To do that
in the initrd, we need to start initrd.target and not default.target.
2023-04-21 16:46:06 +02:00
Lennart Poettering
4560d99e5e tre-wide: use FORMAT_DEVNUM() a bit more 2023-04-21 12:45:49 +02:00
Lennart Poettering
67458536af tree-wide: convert more cases do DEVNUM_FORMAT_STR()/DEVNUM_FORMAT_VAL()
Let's use our nice macros a bit more.

(Not comprehensive)
2023-04-21 12:41:15 +02:00
Luca Boccassi
21453b8b4b Merge pull request #27349 from mrc0mmand/codespell
tree-wide: code spelling fixes
2023-04-20 22:02:17 +01:00
Frantisek Sumsal
94d82b5980 tree-wide: code spelling fixes
As reported by Fossies.
2023-04-20 21:54:59 +02:00
Zbigniew Jędrzejewski-Szmek
08c2f9c626 detect-virt: add message at debug level
Normal users do not have permissions to access /proc/1/root, so
'systemd-detect-virt -r' fails, but the output, even at debug level
is cryptic:

$ SYSTEMD_LOG_LEVEL=debug build/systemd-detect-virt -r
Failed to check for chroot() environment: Permission denied

Let's make this a bit easier to figure out:

$ SYSTEMD_LOG_LEVEL=debug build/systemd-detect-virt -r
Cannot stat /proc/1/root: Permission denied
Failed to check for chroot() environment: Permission denied

I looked over other users of files_same(), and I think in general the message
at debug level is OK for them too.
2023-04-21 03:20:24 +08:00
Gustavo Noronha Silva
6b8e90545e Apply known iocost solutions to block devices
Meta's resource control demo project[0] includes a benchmark tool that can
be used to calculate the best iocost solutions for a given SSD.

  [0]: https://github.com/facebookexperimental/resctl-demo

A project[1] has now been started to create a publicly available database
of results that can be used to apply them automatically.

  [1]: https://github.com/iocost-benchmark/iocost-benchmarks

This change adds a new tool that gets triggered by a udev rule for any
block device and queries the hwdb for known solutions. The format for
the hwdb file that is currently generated by the github action looks like
this:

  # This file was auto-generated on Tue, 23 Aug 2022 13:03:57 +0000.
  # From the following commit:
  # ca82acfe93
  #
  # Match key format:
  # block:<devpath>:name:<model name>:

  # 12 points, MOF=[1.346,1.346], aMOF=[1.249,1.249]
  block:*:name:HFS256GD9TNG-62A0A:fwver:*:
    IOCOST_SOLUTIONS=isolation isolated-bandwidth bandwidth naive
    IOCOST_MODEL_ISOLATION=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_ISOLATION=rpct=0.00 rlat=8807 wpct=0.00 wlat=59023 min=100.00 max=100.00
    IOCOST_MODEL_ISOLATED_BANDWIDTH=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_ISOLATED_BANDWIDTH=rpct=0.00 rlat=8807 wpct=0.00 wlat=59023 min=100.00 max=100.00
    IOCOST_MODEL_BANDWIDTH=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_BANDWIDTH=rpct=0.00 rlat=8807 wpct=0.00 wlat=59023 min=100.00 max=100.00
    IOCOST_MODEL_NAIVE=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_NAIVE=rpct=99.00 rlat=8807 wpct=99.00 wlat=59023 min=75.00 max=100.00

The IOCOST_SOLUTIONS key lists the solutions available for that device
in the preferred order for higher isolation, which is a reasonable
default for most client systems. This can be overriden to choose better
defaults for custom use cases, like the various data center workloads.

The tool can also be used to query the known solutions for a specific
device or to apply a non-default solution (say, isolation or bandwidth).

Co-authored-by: Santosh Mahto <santosh.mahto@collabora.com>
2023-04-20 16:45:57 +02:00
Lennart Poettering
18010d394b Merge pull request #27327 from DaanDeMeyer/hotplug
kmod-setup: Add early loading for virtio_console
2023-04-20 16:34:12 +02:00
Daan De Meyer
a93aaede29 kmod-setup: Add early loading for virtio_console
getty-generator enables serial-getty@.service for virtualizer consoles
that it can find in /sys/class/tty. To make sure this works for
virtio consoles, let's make sure we load the module is loaded early
so that the /sys/class/tty/hvc0 exists before we run getty-generator.
2023-04-20 13:43:37 +02:00
Daan De Meyer
d2f57745d5 core: Parse logging environment earlier
Let's make sure we parse the logging environment ASAP so that the
options apply to more code. e.g. to allow debugging kmod-setup.c
for example.
2023-04-20 13:43:37 +02:00
Daan De Meyer
e1d8f702a2 kmod-setup: Introduce match_modalias_recurse_dir_cb()
Let's make the logic around matching a modalias a bit more generic.
2023-04-20 13:43:37 +02:00
Daan De Meyer
70cc7ed97e string-util: Add startswith_strv()
This is the function version of STARTSWITH_SET(). We also move
STARTSWITH_SET() to string-util.h as it fits more there than in
strv.h and reimplement it using startswith_strv().
2023-04-20 13:43:37 +02:00
Daan De Meyer
3fe07e9525 log: Log when kmsg is being ratelimited
Let's avoid confusing developers and users when log messages suddenly
stop getting logged to kmsg because of ratelimiting by logging an
additional message if we start ratelimiting log messages to kmsg.
2023-04-20 13:43:36 +02:00
Daan De Meyer
8750a06b6c log: Add knob to disable kmsg ratelimiting
This allows us to disable kmsg ratelimiting in the integration tests
and mkosi for easier debugging.
2023-04-20 13:43:34 +02:00
Lennart Poettering
14ce246771 dissect: let's check for crypto_LUKS before fstype allowlist check
When trying to mount a partition that is encrypted without the
encryption first having been set up we want to return a
recognizable error (EUNATCH). This was broken by
80ce8580f5 which added an allowlist check
for permissible file systems first. Let's reverse the check order, so
that we get EUNATCH again, as before. (And leave EIDRM as error for the
failed allowlist check).
2023-04-20 13:39:28 +02:00
Lennart Poettering
ed6a6bac45 ratelimit: handle counter overflows somewhat sanely
An overflow here (i.e. the counter reaching 2^32 within a ratelimit time
window) is not so unlikely. Let's handle this somewhat sanely
and simply stop counting, while remaining in the "limit is hit" state until
the time window has passed.
2023-04-20 13:39:06 +02:00
Lennart Poettering
4d49f44f0f dissect-image: issue BLKFLSBUF before probing an fs at block device offset != 0
See added code comment for a longer explanation. TLDR: Linux maintains
distinct block device caches for partition and "whole" block devices,
and a simply BLKFLSBUF should make the worst confusions this causes go
away.
2023-04-20 13:38:32 +02:00
Robert Meijers
4646cdaa37 networkd: fallback to chaddr for static lease lookup when not found
DHCP static leases are looked up by the client identifier as send by
the client, while configured based on MAC. As RFC 2131 states the client
identifier is an opaque key and must not be interpreted by the server
this means that DHCP clients can (/will) also use a client identifier
which is not a MAC address. One of these clients actually is
systemd-networkd which uses an RFC 4361 by default to generate the
client identifier. For these kind of DHCP clients static leases thus
don't work because of this mismatch between configuring a MAC address
but the server matching based on client identifier. This adds a fallback
to try to look up a configured static lease based on the "chaddr" of the
DHCP message as this will always contain the MAC address of the client.

Fixes #21368
2023-04-20 19:18:50 +09:00
Yu Watanabe
114e85d28e core/device: rewrite how device unit is removed from Manager.devices_by_sysfs
If the device unit is not the head of the list saved in
Manager.devices_by_sysfs, then it is not necessary to replace the
existing hashmap entry. This should not change any behavior, just
refactoring.
2023-04-20 09:22:25 +02:00
Yu Watanabe
24a5370bbc list: fix double evaluation 2023-04-20 09:20:08 +02:00
Daan De Meyer
59e4eeed78 Merge pull request #27299 from yuwata/chase-absolute
chase: return absolute path when dir_fd points to the root directory
2023-04-20 09:19:22 +02:00
Yu Watanabe
cb3c6aec3a core: add one missing assertion for release_resource_queue
Follow-up for 6ac62d61db.
2023-04-19 21:12:08 +01:00
Quintin Hill
0214ead6ee dissect-image: fix log level in dissect_log_error
Actually use the log_level argument in this function!

Fixes 4953e39
2023-04-20 02:04:15 +08:00
Yu Watanabe
60e761d8f3 chase: replace path_prefix_root_cwd() with chaseat_prefix_root()
The function path_prefix_root_cwd() was introduced for prefixing the
result from chaseat() with root, but
- it is named slightly generic,
- the logic is different from what chase() does.

This makes the name more explanative and specific for the result of the
chaseat(), and make the logic consistent with chase().

Fixes https://github.com/systemd/systemd/pull/27199#issuecomment-1511387731.

Follow-up for #27199.
2023-04-19 03:38:59 +09:00
Yu Watanabe
8d3c49b168 fd-util: skip to check mount ID if kernel is too old and /proc is not mounted
Now, dir_fd_is_root() is heavily used in chaseat(), which is used at
various places. If the kernel is too old and /proc is not mounted, then
there is no way to get the mount ID of a directory. In that case, let's
silently skip the mount ID check.

Fixes https://github.com/systemd/systemd/pull/27299#issuecomment-1511403680.
2023-04-19 03:38:47 +09:00
Yu Watanabe
4b1e461c49 mountpoint-util: check /proc is mounted on failure 2023-04-19 03:28:34 +09:00
Yu Watanabe
9a0dcf03fa chase: prefix with the root directory only when it is not "/" 2023-04-19 03:28:34 +09:00
Yu Watanabe
237bf933de chase: drop repeated call of empty_to_root() 2023-04-19 03:28:34 +09:00
Yu Watanabe
b3ef56bc8e chase: update outdated comment about result path 2023-04-19 03:28:34 +09:00
Yu Watanabe
24be89ebd8 chase: make the result absolute when a symlink is absolute
As the path may be outside of the specified dir_fd.
2023-04-19 03:28:34 +09:00
Yu Watanabe
c0552b359c chase: make chaseat() provides absolute path also when dir_fd points to the root directory
Usually, we pass the file descriptor of the root directory to chaseat()
when `--root=` is not specified. Previously, even in such case, the
result was relative, and we need to prefix the path with "/" when we
want to pass the path to other functions that do not support dir_fd, or
log or show the path. That's inconvenient.
2023-04-19 03:28:34 +09:00
Mike Yuan
d81fc15254 Merge pull request #27323 from keszybz/gpt-auto-generator-warning-cleanup
gpt-auto-generator: do not error out when no partitions are found
2023-04-19 02:06:06 +08:00
Zbigniew Jędrzejewski-Szmek
4953e39c70 gpt-auto-generator: "translate" errno codes into proper messages
E.g. in logs on jammy-ppc64el in https://github.com/systemd/systemd/pull/27294:
Apr 16 17:42:50 H systemd-gpt-auto-generator[300]: Failed to dissect partition table of block device /dev/sda: No message of desired type
Apr 16 17:42:50 H (sd-execu[295]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.

ee0e6e476e made this particular condition not an
error. But for other errnos we want to print a better message too.
dissect_loop_device_and_warn() already does this, but it always prints the
error at error level. We want to suppress some of the errors, so let's make the
print helper public and do the error suppression in the caller.
2023-04-18 11:58:33 +02:00
Zbigniew Jędrzejewski-Szmek
de47cd0610 fstab-generator: add missing phrase in comment 2023-04-18 11:55:03 +02:00
Lennart Poettering
0a5d3c0b5b kmod-setup: bypass heavy virtio-rng check if we are not running in a VM anyway
detect_vm() is cheap, because cached, let's hence do that early before
we get out the big guns and sweep through sysfs.
2023-04-18 10:52:04 +02:00
Lennart Poettering
fa505db314 kmod-setup: use STARTSWITH_SET() where appropriate 2023-04-18 10:51:00 +02:00
Lennart Poettering
ff707dd1b1 Revert "getty-generator: Use device hotplug to instantiate virtualizer consoles"
This reverts commit e7e6ce5f8d.
2023-04-18 10:38:38 +02:00
Lennart Poettering
766c30a3b5 Merge pull request #27256 from medhefgo/boot-rdtsc
boot: Improve timer frequency detection
2023-04-18 10:38:15 +02:00
Yu Watanabe
ee0e6e476e gpt-auto: do not fail when no suitable partitions found
Follow-up for 598fd4da1c.
2023-04-18 17:37:56 +09:00
Daan De Meyer
e7e6ce5f8d getty-generator: Use device hotplug to instantiate virtualizer consoles
If getty-generator runs in the initrd, the corresponding tty might not
have been instantiated yet in /dev, which means a serial getty is not
spawned on it. Instead, let's instantiate the serial-getty when the
device appears so that it always gets instantiated.
2023-04-18 09:35:14 +02:00
Lennart Poettering
b3a062cb80 lsm-util: move detection of support of LSMs into a new lsm-util.[ch] helper
This makes the bpf LSM check generic, so that we can use it elsewhere.
it also drops the caching inside it, given that bpf-lsm code in PID1
will cache it a second time a stack frame further up when it checks for
various other bpf functionality.
2023-04-18 08:22:21 +02:00
Dominique Martinet
25d9c6cdaf bpf-firewall: give a name to maps used
Running systemd with IP accounting enabled generates many bpf maps (two
per unit for accounting, another two if IPAddressAllow/Deny are used).

Systemd itself knows which maps belong to what unit and commands like
`systemctl status <unit>` can be used to query what service has which
map, but monitoring these values all the time costs 4 dbus requests
(calling the .IP{E,I}gress{Bytes,Packets} method for each unit) and
makes services like the prometheus systemd_exporter[1] somewhat slow
when doing that for every units, while less precise information could
quickly be obtained by looking directly at the maps.

Unfortunately, bpf map names are rather limited:
- only 15 characters in length (16, but last byte must be 0)
- only allows isalnum(), _ and . characters

If it wasn't for the length limit we could use the normal unit escape
functions but I've opted to just make any forbidden character into
underscores for maximum brievty -- the map prefix is also rather short:
This isn't meant as a precise mapping, but as a hint for admins who want
to look at these.

(Note there is no problem if multiple maps have the same name)

Link: https://github.com/povilasv/systemd_exporter [1]
2023-04-18 08:23:55 +09:00
Lennart Poettering
38cdd08b22 process-util: be more careful with pidfd_get_pid() special cases
Let's be more careful with generating error codes for (expected) error
causes.

This does not introduce new error conditions, it just changes what we
return under specific cases, to make things nicely recognizable in each
case. Most importantly this detects if fdinfo reports a pid of "-1" for
pidfds with processes that are already reaped (and thus have no PID
anymore)

None of our current users care about these error codes, but let's get
this right for the future.
2023-04-17 21:38:41 +01:00
Florian Klink
360c9cdc65 fsck: use execv_p_ and execl_p_
Instead of invoking find_executable on our own, use the variants of exec
provided by glibc which does this for us.
2023-04-17 19:56:06 +01:00
Luca Boccassi
c9210b7470 creds: make available to all ExecStartPre= and ExecStart= processes
Fixes https://github.com/systemd/systemd/issues/27275
2023-04-17 17:47:28 +01:00
jcg
1034dfd0d8 user-util:remove duplicate includes 2023-04-17 23:58:04 +08:00
Benjamin Herrenschmidt
aab896e213 virt: Further improve detection of EC2 metal instances
Commit f90eea7d18
virt: Improve detection of EC2 metal instances

Added support for detecting EC2 metal instances via the product
name in DMI by testing for the ".metal" suffix.

Unfortunately this doesn't cover all cases, as there are going to be
instance types where ".metal" is not a suffix (ie, .metal-16xl,
.metal-32xl, ...)

This modifies the logic to also allow those new forms.

Signed-off-by: Benjamin Herrenschmidt <benh@amazon.com>
2023-04-17 13:21:11 +01:00
Luca Boccassi
ad7793b59c Merge pull request #27298 from mrc0mmand/test-async-tweaks
test: modernize test-async a bit
2023-04-16 23:32:33 +01:00
Yu Watanabe
2cd04086ee process-util: make safe_fork() unset $NOTIFY_SOCKET
Propagating $NOTIFY_SOCKET is typically dangerous. Let's unset it unless
explicitly requested to keep it.

Fixes #27288.
Replaces #27291.
2023-04-17 05:46:32 +08:00
Frantisek Sumsal
3d9c3b7e89 test: modernize test-async a bit
Mainly to give it some debug output to, hopefully, see why it sometimes
gets stuck in CI when run with sanitizers.
2023-04-16 20:30:58 +02:00