We use symbols provided by unistd.h without including it. E.g.
open(), close(), read(), write(), access(), symlink(), unlink(), rmdir(),
fsync(), syncfs(), lseek(), ftruncate(), fchown(), dup2(), pipe2(),
getuid(), getgid(), gettid(), getppid(), pipe2(), execv(), _exit(),
environ, STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, F_OK, and their
friends and variants, so on.
Currently, unistd.h is indirectly included mainly in the following two paths:
- through missing_syscall.h, which is planned to covert to .c file.
- through signal.h -> bits/sigstksz.h, which is new since glibc-2.34.
Note, signal.h is included by sd-eevent.h. So, many source files
indirectly include unistd.h if newer glibc is used.
Currently, our baseline on glibc is 2.31. We need to support glibc older
than 2.34, but unfortunately, we do not have any CI environments with
such old glibc. CIFuzz uses glibc-2.31, but it builds only fuzzers, and
many files are even not compiled.
Now that the necessary functions from log.h have been moved to macro.h,
we can stop including log.h in macro.h. This requires modifying source
files all over the tree to include log.h instead.
As per the documentation, EACCES is only returned when F_SETLK is
used, and only on some platforms, which doesn't seem to include
Linux:
https://github.com/torvalds/linux/blob/master/fs/locks.c
F_OFD_SETLK is documented to only return EAGAIN, and F_SETLKW/F_OFD_SETLKW
are blocking operations so this logic doesn't apply to them in the
first place.
Hence, only automatically convert EACCES into EAGAIN for F_SETLK
operations, and propagate the original error in the other cases.
This is important because in some cases we catch permission errors
and gracefully fallback, which is not possible if the original error
is lost.
This is an issue in practice because, due to a kernel bug present
before v6.2, AppArmor denies locking on file descriptors to LXC
containers. We support all currently maintained LTS kernels,
including v6.1, where despite a lot of effort and attempts over almost
a year, the bugfix still hasn't been backported, as it is complex and
requires large changes to AppArmor.
On affected kernels, all services running with PrivateNetwork=yes
fail and do not recover, instead of the normal behaviour of gracefully
downgrading to PrivateNetwork=no.
The integration tests in the Debian CI fail due to this issue:
https://ci.debian.net/packages/s/systemd/testing/arm64/46828037/
This is just like lock_generic(), but applies the lock with a timeout.
This requires jumping through some hoops by executing things in a child
process, so that we can abort if necessary via a timer. Linux after all
has no native way to take file locks with a timeout.