Let's start with the conversion of PID 1 to pidfds. Let's add a simple
structure with just two fields that can be used to maintain a reference
to arbitrary processes via both pid_t and pidfd.
This is an embeddable struct, to keep it in line with where we
previously used a pid_t directly to track a process.
Of course, since this might contain an fd on systems where we have pidfd
this structure has a proper lifecycle.
(Note that this is quite different from sd_event_add_child() event
source objects as that one is only for child processes and collects
process results, while this infra is much simpler and more generic and
can be used to reference any process, anywhere in the tree.)
We basically parsed the RFC3339 format already, except with a space:
NOTE: ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.
so now we handle both
2012-11-23 11:12:13.456
2012-11-23T11:12:13.456
as equivalent.
Parse directly-suffixed Z and +05:30 timezones as well:
2012-11-23T11:12:13.456Z
2012-11-23T11:12:13.456+02:00
as they're both defined by RFC3339.
We do /not/ allow z or t; the RFC says
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
syntax may alternatively be lower case "t" or "z" respectively.
This date/time format may be used in some environments or contexts
that distinguish between the upper- and lower-case letters 'A'-'Z'
and 'a'-'z' (e.g. XML). Specifications that use this format in
such environments MAY further limit the date/time syntax so that
the letters 'T' and 'Z' used in the date/time syntax must always
be upper case. Applications that generate this format SHOULD use
upper case letters.
We /are/ in a case-sensitive environment, neither are in wide-spread
use, and "z" poses an issue of whether "todayz" should be the same
as "todayZ" ("today UTC") or an error (it should be an error).
Fractional seconds are limited to six digits (they're nominally
time-secfrac = "." 1*DIGIT
), since we only support 1µs-resolution timestamps, and limit to six
digits in our other sub-second formats.
Parsing
2012-11-23T11:12
is an extension two ways (no seconds, no timezone),
mirroring our "canonical" format.
Fixes#5194
The enum definition, the two string tables and the test all were using
different orders (and in case of the test even missed entries).
Let's unify this, and make sure we always use the same order. This
settles the confusion, and makes the order used for the unicode string
table the canonical one, adjusting the other lists to match it. And adds
the missing entries to the tets.
New directive `NFTSet=` provides a method for integrating network configuration
into firewall rules with NFT sets. The benefit of using this setting is that
static network configuration or dynamically obtained network addresses can be
used in firewall rules with the indirection of NFT set types. For example,
access could be granted for hosts in the local subnetwork only. Firewall rules
using IP address of an interface are also instantly updated when the network
configuration changes, for example via DHCP.
This option expects a whitespace separated list of NFT set definitions. Each
definition consists of a colon-separated tuple of source type (one of
"address", "prefix", or "ifindex"), NFT address family (one of "arp", "bridge",
"inet", "ip", "ip6", or "netdev"), table name and set name. The names of tables
and sets must conform to lexical restrictions of NFT table names. The type of
the element used in the NFT filter must match the type implied by the
directive ("address", "prefix" or "ifindex") and address type (IPv4 or IPv6)
as shown type implied by the directive ("address", "prefix" or "ifindex") and
address type (IPv4 or IPv6) must also match the set definition.
When an interface is configured with IP addresses, the addresses, subnetwork
masks or interface index will be appended to the NFT sets. The information will
be removed when the interface is deconfigured. systemd-networkd only inserts
elements to (or removes from) the sets, so the related NFT rules, tables and
sets must be prepared elsewhere in advance. Failures to manage the sets will be
ignored.
/etc/systemd/network/eth.network
```
[DHCPv4]
...
NFTSet=prefix:netdev:filter:eth_ipv4_prefix
```
Example NFT rules:
```
table netdev filter {
set eth_ipv4_prefix {
type ipv4_addr
flags interval
}
chain eth_ingress {
type filter hook ingress device "eth0" priority filter; policy drop;
ip saddr != @eth_ipv4_prefix drop
accept
}
}
```
```
$ sudo nft list set netdev filter eth_ipv4_prefix
table netdev filter {
set eth_ipv4_prefix {
type ipv4_addr
flags interval
elements = { 10.0.0.0/24 }
}
}
```
We might inherit a max rlim value that's larger than the kernel's
maximum (nr_open). This will cause setrlimit() to fail as the given
maximum is larger than the kernel's maximum. To get around this,
let's limit the max rlim we pass to rlimit() to the value of nr_open.
Should fix#28965
Let's make utf8_to_utf16() and utf16_to_utf8() a bit nicer to use by
adding shortcuts for common cases.
This is particularly relevant for utf16_to_utf8() since the
multiplication with 2 is easy to forget.
"static inline" makes sense in .h files. But in .c files it's useless
decoration, the compiler should just make its own decisions there, and
it can do that.
hence, replace all remaining uses of "static line" by a simple" static"
in all .c files (but keep them in .h files, where they make sense)
Let's make sure btrfs_subvol_make() can operate on O_PATH fds, just like
mkdirat().
Fixes a bunch of tmpfiles errors at boot if we try to create btrfs
subvols, introduced by e54c79ccc2
Fixes: e54c79ccc2
This is useful if the variable is ssize_t and we don't want to trigger a
warning or truncation.
With gcc (gcc-13.2.1-1.fc38.x86_64), the resulting systemd binary is identical,
so I assume that the compiler is able to completely optimize away the type.
We do 'IN_SET(r, -CONST1, -CONST2)', instead of 'IN_SET(-r, CONST1, CONST2)'
because -r is undefined if r is the minimum value (i.e. INT_MIN). But we know
that the constants are small, so their negative values are fine.
Currently, we mount via file descriptors using /proc/self/fd. This
works, but it means that in /proc/mounts and various other files,
the source of the mount will be listed as /proc/self/fd/xxx. For other
software that parses these files, /proc/self/fd/xxx doesn't mean anything,
or worse, it means the completely wrong thing, as it will refer to one of
their own file descriptors instead.
Let's improve the situation by using /proc/pid/fd instead. This allows
processes parsing /proc/mounts to do the right thing more often than not.
One scenario where even this doesn't work if when containers are involved,
as with the pid namespace unshared, even /proc/pid/fd will mean the wrong
thing, but it's no worse than /proc/self/fd which will always means the wrong
thing.
This also doesn't work if we mount via file descriptor and then exit, as the pid will
be gone, but it does work as long as the process that did the mount is alive, which
makes it useful for systemd-dissect --with for example if the program we run in the
image wants to parse /proc/mounts.
This reverts commit 1483892a42.
As the commit says, it does not solve the race. Moreover, it introduces
an regression #28410.
Also, checking by `path_is_mount_point()` may trigger automount. From
statx(2),
> AT_NO_AUTOMOUNT
> Don't automount the terminal ("basename") component of pathname
> if it is a directory that is an automount point.
Similar statements can be found in fstatat(2), which is used in the
fallback call for statx() in glibc, and name_to_handle_at(2), which is
used as the fallback when statx() failed.
So, `path_is_mount_point()` may _do_ trigger automount for parent paths.
That should be avoided especially on shutdown.
The original issue #25527 that is 'fixed' by the commit is not serious,
and should be fixed by making umount command handle path gracefully:
https://github.com/util-linux/util-linux/issues/2132Fixes#28410.
This is supposed to be a help for compilers to apply optimizations on
functions where they can't determine whether they are const/pure on
their own. For static, local functions the compiler can do this on its
own easily however, hence the decoration with pure/const is just noise.
Let's drop it, and let the compiler to its thing better.
(Use it for exported functions, since compilers can't 'reach-over' into
other modules to determine if they are pure, except if LTO is used)