Detection of VirtualBox is accomplished in the existing code by *either* `innotek GmbH`
or `Oracle Corporation` existing in any of:
- /sys/class/dmi/id/product_name
- /sys/class/dmi/id/sys_vendor
- /sys/class/dmi/id/board_vendor
- /sys/class/dmi/id/bios_vendor
With Oracle's physical servers, both `/sys/class/dmi/id/sys_vendor` and
`/sys/class/dmi/id/board_vendor` contain `Oracle Corporation`, so those
servers are detected as `oracle` (VirtualBox).
VirtualBox has the following values in the latest versions:
- /sys/class/dmi/id/product_name: `VirtualBox`
- /sys/class/dmi/id/sys_vendor: `innotek GmbH`
- /sys/class/dmi/id/board_vendor: `Oracle Corporation`
- /sys/class/dmi/id/bios_vendor: `innotek GmbH`
Presumably the existing check for `innotek GmbH` is meant to detect
older versions of VirtualBox, while changing the second checked value
from `Oracle Corporation` to `VirtualBox` will reliably detect later and future
versions.
Apparently memory sanitizer doesn't grok getdents64() properly. Let's
address that by explicitly marken memory initialized by getdents64() as
unpoisoned.
That way we have a single syscall only for it, instead of the multiple
readdir() and friends do. And we can operate entirely on the stack, no
malloc() implicit.
We already have a similar loop twice, let's make it easier to read via
an iteration macro.
(The new macro is a bit more careful even, as it verifies the full
dirent fits into the remaining buffer when returning it)
That way we can fail earlier if the specified fd is not actually a
directory.
(Also, it's not exactly according to standards to open things without
either O_RDONLY/O_RDWR...)
In order to minimize EFI variable NVRAM wear, do not rewrite variables
if they are already in the wanted state (i.e. same data and attributes).
This allows e.g. performing repeat calls of "bootctl install" (which
always rewrites the EFI boot entry) without consuming EFI NVRAM write
cycles.
We are using this for creating userns namespaces, and we really
shouldn't try to sync there. Moreover the use of free() in shutdown code
doesn't need it anyway, since it just sync()ed right before anyway. Only
the third user of freeze() we have actually needs the syc(), hence do it
there and nowhere else.
This returns a namespace fd, and takes a uidmap/gidmap as string. This
is split out out mount-util.c's remount_idmap() logic, so that we can
allocate a userns independently.
This adds a tiny shortcut to fd_reopen(): if we are about to reopen the
fd via O_DIRECTORY then we know it#s a directory and we might as well
reopen it via opening "." using the fd as "at fd" in openat().
This has the benefit that we don't need /proc/self/fd/ around for this
special case: fewer sources of errors.
Let's define two helpers strdupa_safe() + strndupa_safe() which do the
same as their non-safe counterparts, except that they abort if called
with allocations larger than ALLOCA_MAX.
This should ensure that all our alloca() based allocations are subject
to this limit.
afaics glibc offers three alloca() based APIs: alloca() itself,
strndupa() + strdupa(). With this we have now replacements for all of
them, that take the limit into account.
This is like alloca(), but does two things:
1. Verifies the allocation is smaller than ALLOCA_MAX
2. Ensures we allocate at least one byte
This was previously done manually in all invocations. This adds a handy
helper that does that implicitly.
This is a debugging feature. It's sometimes incredibly useful to be able
to run a second instance of homed that operates on another dir than
/home/.
Specifically, if you build homed from the source tree you can now run an
instance of it pretty reasonably directly from the build tree via:
sudo SYSTEMD_HOME_DEBUG_SUFFIX=foo SYSTEMD_HOMEWORK_PATH=$(pwd)/build/systemd-homework SYSTEMD_HOME_ROOT=/home/foo ./build/systemd-homed
And then talk to it via
sudo SYSTEMD_HOME_DEBUG_SUFFIX=foo homectl …
(you might need to tweak your dbus policy for this to work fully though)
Currently, when Xen PV domains are nested within a hypervisor which is
detected through CPUID (such as VMware), the detected hypervisor might
not be Xen, because we don't check for Xen until after the CPUID check.
This change moves the Xen check before CPUID checks to fix the issue,
and moves Dom0 checking to detect_vm_xen so that we keep ignoring Xen
when we are in Dom0.
Let's use the underlying Linux API directly, instead of
opendir()/readdir(). This makes it possible for us to do a single memory
allocation for all directory entries in common cases, instead of one for
each entry.
glibc 2.30 (Aug 2019) added a wrapper for getdents64(). For older
versions let's define our own.
(This syscall exists since Linux 2.4, hence should be safe to use for
us)
ANSI C reserves identifiers beginning with an underscore for compiler
internal stuff. We already invade that namespace plenty and probably
should not. But even going for the doubly underscore prefixed namespace
is a bit too much. Let's just rename the offending table as
"static_signal_table[]", since it lists the static defined signals
rather than the "dynamic" RTSIGMIN/RTSIGMAX signals.
So far we assumed every power source was a battery except for the ones
which definitely are not. I think this logic makes little sense, as
"battery" is kinda the exceptional case here, not the other way round.
Hence let's invert the type check, and denylist "Battery" devices rather
than allowlist "Mains" devices.
This should increase compatibility with alternative types of power
sources, in particular USB ones.
This takes into account that additional power types have been added
since we wrote the original code, and in particular should cover the
siutation discussed here OK:
https://sources.debian.org/src/powermgmt-base/1.36/power_supply.txt/#L31https://sources.debian.org/src/powermgmt-base/1.36/on_ac_power/#L25
Also, modernizes the code in various was ways.
Inspired by and fixes: #20964
udev: use netlink more aggressively
I'm pasting the comment from https://github.com/systemd/systemd/pull/20744#issuecomment-934485287
which is quite informative. The code wasn't changed significantly since then:
atenart commented 6 days ago:
> I ran tests without (93caec7) and with this PR (06735f2) on Fedora, having a few udev rules
> using attributes eligible to be cached and creating 50 veth on 4 CPUs. Although the time spent
> running the test is variable between runs, I generally saw an improvement when using this PR, e.g:
>
> 249-910-g93caec7:
> real 0m3.691s
> user 0m0.022s
> sys 0m1.338s
>
> 249-920-g06735f2:
> real 0m2.950s
> user 0m0.005s
> sys 0m0.399s
>
> On a different system than the one used above, I even saw a 40% improvement; results depend
> on many parameters (distro, udev rules, concurrent daemons accessing sysfs, etc.).
>
> Because it's quite hard to measure the improvement here (as the kernel behave differently between
> the two test cases), I also ran tests using a modified kernel not hitting the trylock logic. There was
> an improvement with this PR as well. (Take this with a grain of salt though, as the kernel was
> modified not using patches approved upstream).
<limits.h> calls this ULLONG_MAX. It's not clear to me where ULONGLONG_MAX
can be found. This seems to be just a mistake.
Fixes: c7ed718720 ('macro: handle overflow in ALIGN_TO() somewhat reasonably')
Based on the FIPS 198 specification. Not optimized and probably
completely unsafe, to be used only for non-strong-cryptographic
purposes when OpenSSL cannot be used.
So far we ignored if readdir_ensure_type() failed, the .d_type would
then still possibly report DT_UNKNOWN, possibly confusing the caller.
Let's make this safer: if we get an error on readdir_ensure_type() then
report it — except if it is ENOENT which indicates the dirent vanished
by now, which is not a problem and we should just skip to the next
entry.
Let's ask exactly for the one field we actually want to know, i.e.
STATX_TYPE.
(While we are at it, also copy over the inode number, if we have it,
simply to report the most recent info we have)
(Also, see AT_NO_AUTOMOUNT, so that we don't trigger automounts here.
After all, if we want to know the inode type of a dirent here, then
there's not need to trigger the automount, the inode type is not going
to change by that.)
Apparently glibc already has a helper for this. (Not in the man pages
for Linux, but FreeBSD does document these cryptic helpers, and its
exported by glibc. That should be good enough for us.)