This partially reverts d6267b9b18
So, arch-chroot currently uses a rather cursed setup:
it sets up a PID namespace, but mounts /proc/ from the outside
into the chroot tree, and then call chroot(2), essentially
making it somewhere between chroot(8) and a full-blown
container. Hence, the PID dirs in /proc/ reveal the outer world.
The offending commit switched chroot detection to compare
/proc/1/root and /proc/OUR_PID/root, exhibiting the faulty behavior
where the mentioned environment now gets deemed to be non-chroot.
Now, this is very much an issue in arch-chroot. However,
if /proc/ is to be properly associated with the pidns,
then we'd treat it as a container and no longer a chroot.
Also, the previous logic feels more readable and more
honestly reported errors in proc_mounted(). Hence I opted
for reverting the change here. Still note that the culprit
(once again :/) lies in the arch-chroot's pidns impl, not
systemd.
Fixes https://gitlab.archlinux.org/archlinux/packaging/packages/systemd/-/issues/54
From the previous changes, bare-metal support has been added by using
the `detect_vm_cpuid()` which works only for x86_64 and i386 architecture.
Do not use this change for other architectures to avoid wrong result of
the detect-virt tool.
Follow-up for fb71571d3a.
Fixes#38125.
When booting Linux with ACPI in QEMU, the device tree is not used and
the DT based detection will not work. DMI values are accurate though
and indicate QEMU.
While detect_vm_dmi_vendor() was enabled for RISC-V in a previous commit,
it missed detect_vm_dmi(), so it was never actually used. Fix that.
Signed-off-by: Fabian Vogt <fvogt@suse.de>
Google Compute Engine are not only virtual but can be also physical
machines. Therefore checking only the dmi is not enough to detect if it
is a virtual machine. Therefore systemd-detect-virt return "google"
instead of "none" in c3-highcpu-metal machine.
SMBIOS will not help us to make the difference as for EC2 machines.
However, GCE use KVM hypervisor for these VM, we can use this
information to detect virtualization. [0]
Issue and changes has been tested on SUSE SLE-15-SP7 images with
systemd-254 for both GCE, bare-metal and VM.
[0] -
https://cloud.google.com/blog/products/gcp/7-ways-we-harden-our-kvm-hypervisor-at-google-cloud-security-in-plaintext
Let's move some more implementation logic into functions. We keep
the logic that requires the macro in the macro and move the rest into
functions.
While we're at it, let's also make the parameter declarations of
all the string table macros less clausthrophobic.
- Drop effectively unused "terminator" param, imply whitespace
- Make ret param optional
- Return ENODATA if the requested key is not found, rather than
ENOENT
- Turn ENOENT -> ENOSYS if /proc/ is not mounted
- Don't skip whitespaces before ':', nothing needs this handling
anyways
- Remove the special treatment for all "0"s. We don't actually
use this for capabilities given pidref_get_capability() exists
- Switch away from read_full_virtual_file() - files using "field"
scheme under /proc/ seem all to be "seq_file"s (refer to
da65941c3e for details on file types)
Now that the necessary functions from log.h have been moved to macro.h,
we can stop including log.h in macro.h. This requires modifying source
files all over the tree to include log.h instead.
So apparently "linux,dummy-virt" is a devicetree in popular use by
various hypervisors, including crosvm:
e5d7a64d37/aarch64/src/fdt.rs (L692)
and qemu:
98c7362b1e/hw/arm/virt.c (L283)
and that's because the kernel ships support for that natively:
https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/linux%2Cdummy-virt.yaml
It's explicitly for using in virtualization. Hence it's suitable for
detecting it as generic fallback.
This hence adds the check, similar to how we already look for one other
qemu-specific devicetree.
I ran into this while playing around with the new Pixel "Linux Terminal"
app from google which runs a Debian in a crosvm apparently. So far
systemd didn't recognize execution in it at all. Let's at least
recognize it as VM at all, even if this doesn't recognize it as
crosvm.
Now that we have an explicit userns check we can drop the heuristic for
it, given that it's kinda wrong (because mapping the full host UID range
into a userns is actually a thing people do).
Hence, just delete the code and only keep the userns inode check in
place.
Now that we have a reliable pidns check I don't think we really should
look for cgroupns anymore, it's too weak a check. I mean, if I myself
would implement a desktop app sandbox (like flatpak) I'd always enable
cgroupns, simply to hide the host cgroup hierarchy.
Hence drop the check.
I suggested adding this 4 years ago here:
https://github.com/systemd/systemd/pull/17902#issuecomment-745548306
The indoe number of root pid namespace is hardcoded in the kernel to
0xEFFFFFFC since 3.8, so check the inode number of our pid namespace
if all else fails. If it's not 0xEFFFFFFC then we are in a pid
namespace, hence a container environment.
Fixes https://github.com/systemd/systemd/issues/35249
[Reworked by Lennart, to make use of namespace_is_init()]
When running unprivileged, checking /proc/1/root doesn't work because
it requires privileges. Instead, let's add an environment variable so
the process that chroot's can tell (systemd) subprocesses whether
they're running in a chroot or not.
Identify an virtualbox instance even if product_name, sys_vendor and bios_vendor reflect the
information of the real hardware, by checking if board_vendor == "Oracle Corporation"
This fixes#13429 again
The previous fix was removed in #21127
This is a supplement to #24419. On macOS Intel machines, detection needs to be done through cpuid.
In macOS, `dmi_vendors` detection is only applicable to M series.
Signed-off-by: Black-Hole1 <bh@bugs.cc>
To get the CPUID with EAX=7, we need explicitly set 0 to ECX.
From Intel® Architecture Instruction Set Extensions Programming
Reference and Related Specifications,
===
Leaf 07H output depends on the initial value in ECX.
If ECX contains an invalid sub leaf index, EAX/EBX/ECX/EDX return 0
===
Fixes#30822.
Let's be more accurate about what this function does: it checks whether
the underlying reported inode is the same. Internally, this already uses
a better named stat_inode_same() call, hence let's similarly name the
wrapping function following the same logic.
Similar for files_same_at() and path_equal_or_same_files().
No code changes, just some renaming.
Commit f90eea7d18
virt: Improve detection of EC2 metal instances
Added support for detecting EC2 metal instances via the product
name in DMI by testing for the ".metal" suffix.
Unfortunately this doesn't cover all cases, as there are going to be
instance types where ".metal" is not a suffix (ie, .metal-16xl,
.metal-32xl, ...)
This modifies the logic to also allow those new forms.
Signed-off-by: Benjamin Herrenschmidt <benh@amazon.com>
IN C23, thread_local is a reserved keyword and we shall therefore
do nothing to redefine it. glibc has it defined for older standard
version with the right conditions.
v2 by Yu Watanabe:
Move the definition to missing_threads.h like the way we define e.g.
missing syscalls or missing definitions, and include it by the users.
Co-authored-by: Yu Watanabe <watanabe.yu+github@gmail.com>
Commit 1b86c7c59e ("virt: make virtualization enum a named type")
made the conversion from `if (!r)` to `if (v != VIRTUALIZATION_NONE)`.
However, the initial test was meaning "if r is null", IOW "if r IS
`VIRTUALIZATION_NONE`).
The test is wrong and this can lead to false detection of the container
environment (when calling `systemctl exit`).
For example, https://gitlab.freedesktop.org/whot/libevdev/-/jobs/34207974
is calling `systemctl exit 0`, and systemd terminates with the exit code
`130`.
Fixing that typo makes `systemctl exit 0` returns `0`.
Fixes: 1b86c7c59e.
The logic of running_in_chroot() has been the same since the introduction of
this function in b4f10a5e89: if /proc is not
mounted, the function returns -ENOENT and all callers treat this as false. But
that might be the most common case of chrooted calls, esp. in all the naïve
chroots that were done with the chroot binary without additional setup.
(In particular rpm executes all scriptlets in a chroot without bothering to set
up /proc or /sys, and we have codepaths in sysusers and tmpfiles to support
running in such an environment.)
This change effectively shortcircuits various calls to udevadm, downgrades
logging in tmpfiles, and disables all verbs marked with VERB_ONLINE_ONLY in
systemctl. detect-virt -r is also affected:
$ sudo chroot /var/lib/machines/rawhide
before> systemd-detect-virt -r && echo OK
Failed to check for chroot() environment: No such file or directory
after> systemd-detect-virt -r && echo OK
OK
Kubevirt is currently technically based on KVM (but not xen yet[1]).
The systemd-detect-virt command, used to differentiate the current
virtualization environment, works fine on x86 relying on CPUID, while
fails to get the correct value (none instead of kvm) on aarch64.
Let's fix this by adding a new 'vendor[KubeVirt] = kvm' classification
considering the sys_vendor is always KubeVirt.
[1] https://groups.google.com/g/kubevirt-dev/c/C6cUgzTOsVg
Signed-off-by: Fei Li <lifei.shirley@bytedance.com>