This used to be relevant since in old versions of glibc an internal
cache is maintained, while we might sidestep their invalidation
with raw_clone(). After glibc 2.25 getpid() is a trivial wrapper
for the syscall, and hence there's no need to have a separate
raw_getpid().
glibc exports getdents64 syscall as is, but musl exports it as
posix_getdents(). Let's introduce a simple wrapper of posix_getdents().
Note, our baseline for glibc is 2.31. Hence, we can assume getdents64()
always defined when building with glibc.
To resolve conflict with sys/mount.h and linux/mount.h or linux/fs.h.
The conflict between sys/mount.h and linux/mount.h is resolved in
glibc-2.37 (774058d72942249f71d74e7f2b639f77184160a6), but our baseline
is still glibc-2.31. Also, even with the version or newer, still
sys/mount.h conflicts with linux/fs.h, which is included by
linux/btrfs.h.
This introduces our own implementation of sys/mount.h, that can be
simultaneously included with linux/mount.h and linux/fs.h. This also
imports linux/fs.h, linux/mount.h, and several other dependent headers.
The introduced sys/mount.h header itself may not be enough simple, but
by using the header, we can drop most of workarounds in other source files.
Nowadays, we define syscall numbers for newer syscalls.
Hence the conditions are not necessary.
This also adds several comments about when syscalls are introduced.
struct statx in glibc header was introduced in glibc-2.28
(fd70af45528d59a00eb3190ef6706cb299488fcd), but at that time,
sys/stat.h conflicts with linux/stat.h. Since glibc-2.30
(5dad6ffbb2b76215cfcd38c3001778536ada8e8a), sys/stat.h includes
linux/stat.h if exists.
Since now our baseline of glibc is 2.31. Hence, we can drop workarounds
for struct statx by importing linux/stat.h from newer kernel (v6.14-rc4).
Plus, linux/random.h never defined getrandom(), hence remove
the custom machinery for sys/random.h vs linux/random.h
in favor of single HAVE_GETRANDOM.
The kernel's sched_setattr interface allows for more control over a processes
scheduling attributes as the previously used sched_setscheduler interface.
Using sched_setattr is also the prerequisite for support of utilization
clamping (UCLAMP [1], see #26705) and allows to set sched_runtime. The latter,
sched_runtime, will probably become a relevant scheduling parameter of the
EEVDF scheduler [2, 3], and therefore will not only apply to processes
scheduled via SCHED_DEADLINE, but also for processes scheduled via
SCHED_OTHER/SCHED_BATCH (i.e., most processes).
1: https://docs.kernel.org/next/scheduler/sched-util-clamp.html
2: https://lwn.net/Articles/969062/
3: https://lwn.net/ml/linux-kernel/20240405110010.934104715@infradead.org/
So glibc exposes a close_range() syscall wrapper now, but they decided
to use "unsigned" as type for the fds. Which is a bit weird, because fds
are universally understood to be "int". The kernel internally uses
"unsigned", both for close() and for close_range(), but weirdly,
userspace didn't fix that for close_range() unlike what they did for
close()... Weird.
But anyway, let's follow suit, and make our wrapper match glibc's.
Fixes#31270
Follow-up for 8b45281daa
and preparation for later commits.
Since libcs are more interested in the POSIX `fchmodat(3)`, they are
unlikely to provide a direct wrapper for this syscall. Thus, the headers
we examine to set `HAVE_*` are picked somewhat arbitrarily.
Also, hook up `try_fchmodat2()` in `test-seccomp.c`. (Also, correct that
function's prototype, despite the fact that mistake would not matter in
practice)
Co-authored-by: Mike Yuan <me@yhndnzj.com>
Instead of mounting over, do an atomic swap using mount beneath, if
available. This way assets can be mounted again and again (e.g.:
updates) without leaking mounts.
glibc does not provide clone() on ia64, only clone2. But only as a
symbol in the shared library, there's no prototype in the gblic
headers, so we have to define it, copied from the manpage.
This reenables epoll_pwait2() use, i.e. undoes the effect of
39f756d3ae.
Instead of just reverting that, this PR will change things so that we
strictly rely on glibc's new epoll_pwait2() wrapper (which was added
earlier this year), and drop our own manual fallback syscall wrapper.
That should nicely side-step any issues with correct syscall wrapping
definitions (which on some arch seem not to be easy, given the sigset_t
size final argument), by making this a glibc problem, not ours.
Given that the only benefit this delivers are time-outs more granular
than msec, it shouldn't really matter that we'll miss out on support
for this on systems with older glibcs.
Using fsopen()/fsconfig(), we can check if hidepid/subset are supported to
avoid the noisy logs from the kernel if they aren't supported. This works
on centos/redhat 8 as well since they've backported fsopen()/fsconfig().
glibc 2.30 (Aug 2019) added a wrapper for getdents64(). For older
versions let's define our own.
(This syscall exists since Linux 2.4, hence should be safe to use for
us)
Getting the numbers right for all architectures has proven to be a
constant chore. Let's autogenerate the header from the tables that
were imported in one of the previous commits.
Fixes#18074. (Hopefully. I cannot verify this on all architectures.)
To update the lists, or to update the header after template changes:
ninja -C build update-syscall-tables update-syscall-header
Note: the generated file is saved in git. Initially I wanted to only
store the tables in git, and generate the header during each build.
Generation is quick enough, but the header is used in many many
places (wherever missing_syscall.h is included, directly or indirectly),
which means that we would need to declare the dependency in meson, so
the header would be generated early enough. This turned out to be very
noisy. Storing the generated header in version control avoids the hassle.