There is little point in #defining and #undefining CAP_LAST_CAP multiple times.
The check is only done in developer mode. After all, it's not an error to
compile on a newer kernel, and we shouldn't even warn in that case.
Previously:
1. last_error wouldn't be updated with errors from is_dir;
2. We'd always issue a stat(), even for binaries without execute;
3. We used stat() instead of access(), which is cheaper.
This change avoids all of those, by only checking inside X_OK-positive
case whether access() works on the path with an extra slash appended.
Thanks to Lennart for the suggestion.
It's pretty useful to be able to access the bytes generically, without
acknowledging a specific family, hence let's a third way to access an
in_addr_union.
Imagine $PATH /a:/b. There is an echo command at /b/echo. Under this
configuration, this works fine:
% systemd-run --user --scope echo .
Running scope as unit: run-rfe98e0574b424d63a641644af511ff30.scope
.
However, if I do `mkdir /a/echo`, this happens:
% systemd-run --user --scope echo .
Running scope as unit: run-rcbe9369537ed47f282ee12ce9f692046.scope
Failed to execute: Permission denied
We check whether the resulting file is executable for the performing
user, but of course, most directories are anyway, since that's needed to
list within it. As such, another is_dir() check is needed prior to
considering the search result final.
Another approach might be to check S_ISREG, but there may be more gnarly
edge cases there than just eliminating this obviously pathological
example, so let's just do this for now.
When removing a directory tree as unprivileged user we might encounter
files owned by us but not deletable since the containing directory might
have the "r" bit missing in its access mode. Let's try to deal with
this: optionally if we get EACCES try to set the bit and see if it works
then.
Instead of defining the numbers only as fallback, always define our fallback
number, and if we have the real __NR_foo define, assert that our number matches
the real one.
This will result in warnings when our fallback number is not defined, even if
the kernel headers are new enough to define __NR_foo. This will probably annoy
people compiling for seldom-used architectures, but hopefully it'll provide
motivation to add the missing fallback defines.
The upside is that we have a higher chance of catching the cases where we got
the number wrong. Calling the wrong syscall is quite problematic, and with some
back luck, it might take us a long time to notice that we got the number wrong
on some rarely used architecture.
Also, rework some of the fallback wrappers to not call the syscall with a
negative number (that'd fail, but we'd got to the kernel and back). It seems
nicer to let the compiler know that this can never succeed.
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
This has the major benefit that the entire payload of the container can
access these files there. Previously, we'd set them only as env vars,
but that meant only PID 1 could read them directly or other privileged
payload code with access to /run/1/environ.
The kernel finally has a proper API to determine the mnt_id of a file.
Let's use it.
This adds support for the STATX_MNT_ID field of statx(), added in
kernel 5.8.
This was needed before commit 16e4fd87c5 added a
mode that opens the log fds for every single log message. This mode is used in
execute.c since then making the explicit call to log_open unnecessary.
This basically reverts ea89a119cd.
LOOP_CONFIGURE allows us to configure a loopback device in one ioctl
instead of two, which is not just faster but also removes the race that
udev might start probing the device before we adjusted things properly.
Unfortunately LOOP_CONFIGURE is broken in regards to LO_FLAGS_PARTSCAN
as of kernel 5.8.0. This patch contains a work-around for that, to
fallback to old behaviour if partition scanning is requested but does
not work. Sucks a bit.
Proposed upstream fix for that issue:
https://lkml.org/lkml/2020/8/6/97
Given a string in the format 'one:two three four:five', returns a string
vector with each word. If the second element of the tuple is not
present, an empty string is returned in its place, so that the vector
can be processed in pairs.
[zjs: use EXTRACT_UNESCAPE_SEPARATORS instead of EXTRACT_CUNESCAPE_RELAX.
This way we do escaping exactly once and in normal strict mode.]
We would add and remove definitions based on which colors were used by other
code. Let's just define all of them to simplify tests and allow easy comparisons
which colors look good.