Let's modernize and clean up search_and_fopen a bit: let's add support
for regular open() (instead of fopen()), as well as access() (if caller
just wants to check if a file exists without opening it.
This unifies much of the code involved, which previously was duplicated
in search_and_fopen() and search_and_fopen_nulstr()
mount_option_supported() will call fsopen() which will probe the
kernel filesystem module. This means that we'll suddenly start
probing filesystem modules when running generators as those determine
which mount options to use. To prevent generators from loading kernel
filesystem modules as much as possible, let's always first check the
hardcoded list of filesystem which we know support a feature before
falling back to asking the kernel.
systemd's own cgroup hierarchy is special to us, we use it to actually
manage processes. Because of that many calls tha apply to cgroups are
only ever called with the SYSTEMD_CGROUP_CONTROLLER as controller
argument. Let's hence remove the argument altogether.
This in particular touches the kill and xattr routines.
This changes no behaviour, we just drop an argument that is always set
to the same value anyway.
This is preparation to eventually getting rid of the cgroupvs1, because
on cgroupvs2 the cgroup paths do not change for different controllers,
there's only a single hierarchy there.
note that this slightly changes the semantic of assert when NDEBUG is
defined. if there's an extern function call (without attribute pure or
similar) then the compiler has to assume it has side effects and still
emit the function call.
whereas the old assert guaranteed that nothing will be evaluated on
NDEBUG.
Closes: https://github.com/systemd/systemd/issues/29408
When READ_FULL_FILE_UNBASE64 (or READ_FULL_FILE_UNHEX) is specified,
setting size argument by caller is difficult, as it is hard to estimate
the encoded length.
This makes when size is specified with decoding option, let's read file
more, and check decoded size later with the specified size.
This new helper can be used after reading process info from procfs, to
verify that the data that was just read actually matches the pidfd, and
does not belong to some new process that just reused the numeric PID of
the process we originally pinned.
Usually we want to embed PidRef in other structures, but sometimes it
makes sense to allocate it on the heap in case it should be used
standalone. Add helpers for that.
Primary usecase: use as key in Hashmap objects, that for example map
process to unit objects in PID 1.
This adds pidref_free()/pidref_freep() for freeing such an allocated
struct, as well as pidref_dup() (for duplicating an existing PidRef
on the heap 1:1), and pidref_new_pid() (for allocating a new PidRef from a
PID).
This helper truns a pid_t into a PidRef. It's different from
pidref_set_pid() in being "passive", i.e. it does not attempt to acquire
a pidfd for the pid.
This is useful when using the PidRef as a lookup key that shall also
work after a process is already dead, and hence no conversion to a pidfd
is possible anymore.
'systemctl status /../dev' now looks for 'dev.mount', not '-..-dev.service',
and 'systemctl status /../foo' looks for 'foo.mount', not '-..-foo.service'. I
think this much more useful. I think the escaping is not very useful, so I plan
to submit a later series which changes that behaviour. But I think this first
step here is already useful on its own.
Note that the patch is smaller than it seems: before, is_device_path() would
return true only for absolute paths, so moving of is_device_path() under the
path_is_absolute() conditional doesn't influence the logic.
So, unfortunately oomd uses "io.system." rather than "io.systemd." as
prefix for its sockets. This is a mistake, and doesn't match the
Varlink interface naming or anything else in oomd.
hence, let's fix that.
Given that this is an internal protocol between PID1 and oomd let's
simply change this without retaining compat.
path_simplify_full()/path_simplify() are changed to allow a NULL path, for
which a NULL is returned. Generally, callers have already asserted before that
the argument is nonnull. This way path_simplify_full()/path_simplify() and
path_simplify_alloc() behave consistently.
In sd-device.c, logging in device_set_syspath() is intentionally dropped: other
branches don't log.
In mount-tool.c, logging in parse_argv() is changed to log the user-specified
value, not the simplified string. In an error message, we should show the
actual argument we got, not some transformed version.
I.e., /.. becomes /, /../foo becomes /foo, /../../bar becomes /bar, etc. We can
do this unconditionally, without access to the file system, because the parent
of the root directory always resolves to. /.. in other places is handled as
before, because resolving it properly would require access to the file system
which we don't want to do in path_simplify().
nanosleep() is kinda broken since it sleeps in the CLOCK_REALTIME clock,
i.e. is subject to time changes.
Let's use clock_nanosleep() instead with CLOCK_MONOTONIC, which is
really the only thing that makes sense.
This compares two PidRef structures via the pid_t field. Ideally we'd do
a stricter comparison here, that is safe towards PID reuse, but so far
the pidfd API lacks suitable mechanisms for that, hence do the best we
can do.
"/dev" or "/dev/" is the mount point, not a device path. In particular,
'systemctl status /dev' clearly does not refer to a device, so let's tweak
the code a bit to say that those are not device paths.
(Treating "/../dev" same as "/dev" would be also be reasonable, but that
requires chase(), which requires disk access, which we don't want to do from
this lightweight function.)
Let's start with the conversion of PID 1 to pidfds. Let's add a simple
structure with just two fields that can be used to maintain a reference
to arbitrary processes via both pid_t and pidfd.
This is an embeddable struct, to keep it in line with where we
previously used a pid_t directly to track a process.
Of course, since this might contain an fd on systems where we have pidfd
this structure has a proper lifecycle.
(Note that this is quite different from sd_event_add_child() event
source objects as that one is only for child processes and collects
process results, while this infra is much simpler and more generic and
can be used to reference any process, anywhere in the tree.)
We basically parsed the RFC3339 format already, except with a space:
NOTE: ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.
so now we handle both
2012-11-23 11:12:13.456
2012-11-23T11:12:13.456
as equivalent.
Parse directly-suffixed Z and +05:30 timezones as well:
2012-11-23T11:12:13.456Z
2012-11-23T11:12:13.456+02:00
as they're both defined by RFC3339.
We do /not/ allow z or t; the RFC says
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
syntax may alternatively be lower case "t" or "z" respectively.
This date/time format may be used in some environments or contexts
that distinguish between the upper- and lower-case letters 'A'-'Z'
and 'a'-'z' (e.g. XML). Specifications that use this format in
such environments MAY further limit the date/time syntax so that
the letters 'T' and 'Z' used in the date/time syntax must always
be upper case. Applications that generate this format SHOULD use
upper case letters.
We /are/ in a case-sensitive environment, neither are in wide-spread
use, and "z" poses an issue of whether "todayz" should be the same
as "todayZ" ("today UTC") or an error (it should be an error).
Fractional seconds are limited to six digits (they're nominally
time-secfrac = "." 1*DIGIT
), since we only support 1µs-resolution timestamps, and limit to six
digits in our other sub-second formats.
Parsing
2012-11-23T11:12
is an extension two ways (no seconds, no timezone),
mirroring our "canonical" format.
Fixes#5194