PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
Previously, systemd-detect-virt was unable to detect "systemd-nspawn -a"
container environments, i.e. where PID 1 is a stub process running in host
context, as in that case /proc/1/environ was inherited from the host. Let's
improve that, and add an additional check for container environments where
/proc/1/environ is not cleaned up and does not contain the $container
environment variable:
The /proc/1/sched file shows the host PID in the first line. if this is not
1, we know we are running in a PID namespace (but not which implementation).
With these changes we should be able to detect container environments that
don't set $container at all.
This adds a concept of "extrinsic" mounts. If mounts are extrinsic we consider
them managed by something else and do not add automatic ordering against
umount.target, local-fs.target, remote-fs.target.
Extrinsic mounts are considered:
- All mounts if we are running in --user mode
- API mounts such as everything below /proc, /sys, /dev, which exist from
earliest boot to latest shutdown.
- All mounts marked as initrd mounts, if we run on the host
- The initrd's private directory /run/initrams that should survive until last
reboot.
This primarily merges a couple of different exclusion lists into a single
concept.
Let's be a bit more careful when detecting chroot() environments, so that we
can discern them from namespaced environments.
Previously this would simply check if the root directory of PID 1 matches our
own root directory. With this commit, we also check whether the namespaces of
PID 1 and ourselves are the same. If not we assume we are running inside of a
namespaced environment instead of a chroot() environment.
This has the benefit that systemctl (which uses running_in_chroot()) will work
as usual when invoked in a namespaced service.
This patch ensures that each system service gets its own session kernel keyring
automatically, and implicitly. Without this a keyring is allocated for it
on-demand, but is then linked with the user's kernel keyring, which is OK
behaviour for logged in users, but not so much for system services.
With this change each service gets a session keyring that is specific to the
service and ceases to exist when the service is shut down. The session keyring
is not linked up with the user keyring and keys hence only search within the
session boundaries by default.
(This is useful in a later commit to store per-service material in the keyring,
for example the invocation ID)
(With input from David Howells)
systemd.journal-fields(7) documents CODE_FUNC=. Internally, we were
inconsistent: sd_journal_print uses CODE_FUNC=, log.h has CODE_FUNCTION=,
python-systemd and bootchart also used CODE_FUNC=, when they were internal.
Most external projects use sd_journal_* functions, so CODE_FUNC=,
python-systemd still uses CODE_FUNC=, as does systemd-bootchart, and
independent reimplementations in golang-github-coreos-go-systemd, qtbase,
network manager, glib, pulseaudio. Hence, I don't think there's much
choice.
Those square brackets don't fit how our other messages look like; we use colons
everywhere else. The "[a:b]" format was originally added in
ed5bcfbe3c, and remained unchanged for 7 years,
but in the meantime other conventions evolved.
The new version is also one character shorter.
[/etc/systemd/system/systemd-networkd.service.d/override.conf:2] Failed to parse sec value, ignoring: ...
↓
/etc/systemd/system/systemd-networkd.service.d/override.conf:2: Failed to parse sec value, ignoring: ...
Our warning message was misleading, because we wouldn't "correct" anything,
we'd just ignore unkown escapes. Update the message.
Also, print just the extracted word (which contains the offending sequences) in
the message, instead of the whole line.
Fixes#4697.
This adds mkdtemp_malloc() that is a combination of mkdtemp() plus strdup(). It
initializes its return paremeter only if the temporary directory could be
created successfully, so that the parameter is exactly non-NULL when the
directory exists.
rmdir_and_free() and rmdir_and_freep() are also added, and the latter may be
used inside of _cleanup_ for such a directory string variable, to automatically
rmdir() the directory if it is non-NULL when the scope exits.
rmdir_and_free() is similar to the existing rm_rf_and_free() however, is only
removes a single directory and does not operate recursively.
Let's accept "µs" as alternative time unit for microseconds. We already accept
"us" and "usec" for them, lets extend on this and accept the proper scientific
unit specification too.
We will never output this as time unit, but it's fine to accept it, after all
we are pretty permissive with time units already.
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
chase_symlinks() currently expects a fully qualified, absolute path, relative
to the host's root as first argument. Which is useful in many ways, and similar
to the paths unlink(), rename(), open(), … expect. Sometimes it's however
useful to first prefix the specified path with the specified root directory.
Add a new call chase_symlinks_prefix() for this, that is a simple wrapper.
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
Let's take inspiration from bluez's ELL library, and let's move our
cryptographic primitives away from libgcrypt and towards the kernel's AF_ALG
cryptographic userspace API.
In the long run we should try to remove the dependency on libgcrypt, in favour
of using only the kernel's own primitives, however this is unlikely to happen
anytime soon, as the kernel does not provide Elliptic Curve APIs to userspace
at this time, and we need them for the DNSSEC cryptographic.
This commit only covers hashing for now, symmetric encryption/decryption or
even asymetric encryption/decryption is not available for now.
"khash" is little more than a lightweight wrapper around the kernel's AF_ALG
socket API.
strtoul() parses leading whitespace and an optional sign;
check that the first character is a digit to prevent odd
specifications like "00: 00: 00" and "-00:+00/-1".
"*-*~1" => The last day of every month
"*-02~3..5" => The third, fourth, and fifth last days in February
"Mon 05~07/1" => The last Monday in May
Resolves#3861
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.
This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.
Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.
As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.
Fixes: #4664
This adds in4_addr_is_localhost() and in4_addr_is_link_local() that only take
an IPv4 "struct in_addr", to match in_addr_is_localhost() and
in_addr_is_link_local() that that a "union in_addr_union".
This matches the existing in4_addr_is_null() call that already exists.
For IPv6 glibc already exports a set of macros, hence we don't add similar
functions in6_addr_is_localhost(). We also drop in6_addr_is_null() as
IN6_IS_ADDR_UNSPECIFIED() already provides that.