access_nofollow() is a simple wrapper of faccessat(), and it is defined as
```
int faccessat(int dirfd, const char *pathname, int mode, int flags);
```
fs-util.h provides access_nofollow() but it did not include neither
fcntl.h nor unistd.h, which define F_OK and friends. Hence we cannot use
the function without including one of the headers. Let's include fcntl.h
in fs-util.h, then we can use the function by simply including fs-util.h.
The additional definitions provided by the header are
- EXT4_IOC_RESIZE_FS, used in resize-fs.c,
- FILEID_KERNFS, used in cgroup-util.c and pidfd-util.c.
Let's drop the inclusion at other places.
Currently, callers of safe_glob() set an empty glob_t or glob_t with
opendir func, and all other components are always zero.
So, let's introduce safe_glob_full() which optionally takes opendir
function, rather than glob_t, and returns result strv, rather than
storing results in glob_t.
Also, introduce safe_glob() which is a trivial wrapper of
safe_glob_full() without opendir func.
No functional change, just refactoring.
Some configuration files that need updates are directly under in /etc. To
update them atomically, we need write access to /etc. For Ubuntu Core this is
an issue as /etc is not writable. Only a selection of subdirectories can be
writable. The general solution is symlinks or bind mounts to writable places.
But for atomic writes in /etc, that does not work. So Ubuntu has had a patch
for that that did not age well.
Instead we would like to introduce some environment variables for alternate
paths.
* SYSTEMD_ETC_HOSTNAME: /etc/hostname
* SYSTEMD_ETC_MACHINE_INFO: /etc/machine-info
* SYSTEMD_ETC_LOCALTIME: /etc/localtime
* SYSTEMD_ETC_LOCALE_CONF: /etc/locale.conf
* SYSTEMD_ETC_VCONSOLE_CONF: /etc/vconsole.conf
* SYSTEMD_ETC_ADJTIME: /etc/adjtime
While it is for now expected that there is a symlink from the standard, we
still try to read them from that alternate path. This is important for
`/etc/localtime`, which is a symlink, so we cannot have an indirect symlink or
bind mount for it.
Since machine-id is typically written only once and not updated. This commit
does not cover it. An initrd can properly create it and bind mount it.
This partially reverts e86a492ff0.
The function getdents64() was introduced in glibc-2.30, and our baseline
on glibc is 2.31. Hence, we can assume the function always exists.
The posix_getdents() wrapper was introduced for compatibility with musl.
However, even the latest release of musl does not provide posix_getdents()
yet. Also, even with musl, by defining _LARGEFILE64_SOURCE, we can get
getdents64() and struct dirent64. Hence, the wrapper is anyway not
necessary.
The check existed for musl. Let's remove it, as we explicitly request glibc.
While removing the check, this also drops generic_mallinfo, introduces
a tiny converter from struct mallinfo to struct mallinfo2 if mallinfo2()
does not exist, and renames mallinfo-util.h to malloc.h.
With this change, we can drop many ifdefs and casts in .c files.
Let's make more use of label_ops_pre()/label_ops_post(), and replace
write_env_file_label() by a flag to write_env_file().
This simplifies and normalizes the code.
This also makes one relevant change: it sets the new
WRITE_ENV_FILE_LABEL flag in firstboot.c when we write locale.conf,
where we previously did not (but should have). This should address one
detail of #37857.
More porting work to label_ops_pre()/label_ops_post()
This also enables labelling of the /etc/localtime symlink in
systemd-firstboot, which should address one small facet of #37857
As explained in https://lore.kernel.org/all/20250419183545.1982187-1-shakeel.butt@linux.dev/,
writing to memory.max or memory.high triggers synchronous memory reclaim
if the limit is lowered. This can end up taking nonnegligible amounts
of time, completely blocking pid1 from doing any other work while the
reclaim is ongoing.
To address this problem, the kernel going to add O_NONBLOCK semantics
to memory.max and memory.high. If the file is opened with O_NONBLOCK,
the synchronous memory reclaim is skipped and only triggered later
without blocking the process writing the file. Let's make sure we make
use of this by opening cgroupv2 attribute files with O_NONBLOCK.
We opt to do this for all cgroupv2 attribute files, to make sure that
if the same problem happens elsewhere in the future and is fixed in the
same way, we immediately take advantage of that fix without having to
make changes in systemd as well. We probably never want to block when
writing cgroupv2 attributes and any cases where we do want to block should
indicate so explicitly instead of blocking by default.
- config.h is not necessary when generating lists, hence drop it.
- linux/audit.h and libaudit.h are included by missing_audit.h,
hence not necessary to include them explicitly.
The header uses __THROW, which is defined in features.h, to make the
header self-consistent.
Note, src/basic/include/sys/mount.h also uses __THROW, and includes
features.h.
It provides several important constants, especially _PATH_BSHELL, which
is used in PID1, executor, and run. The header has been included
indirectly through e.g. libmount.h, mntent.h, utmpx.h, and so on.
Let's explicitly include it in forward.h, as libmount.h and friends that
includes paths.h are irrelevant to _PATH_BSHELL, and we may easily fail
to build when code is touched.
The header is not heavy, hence should not hurt anything.
I'd like to introduce a libsystemd helper for acquiring pidfd
inode id, which however means the fd passed to pidfd_check_pidfs()
can no longer be trusted. Let's add back the logic of allocating
a genuine pidfd allocated internally, which was remove in
5dc9d5b4ea.
Let's rename the return parameters as "ret_xyz" systematically in
sd-login.
Also, let's make the return parameters systematically optional, like we
typically do these days. So far some where optional, other's weren't.
Let's clean this up.
If the first call to `loop_read()` returns 0 (no input), `total_in`
remains 0, causing `total_out/total_in` to potential divide by zero.
We add a check before logging the compression ratio to skip the
percentage calculation when total_in is zero.
Co-authored-by: jinyaoguo <guo846@purdue.edu>