Added in 6140be90ec
However, when O_PATH fds are encountered we'd have to go by
/proc/self/fd/ still, since the kernel people are reluctant
to make the new syscalls work with them
(https://lore.kernel.org/linux-fsdevel/20250206-steril-raumplanung-733224062432@brauner/)
Hence getxattrat() and listxattrat() are not employed.
While at it, remove the discrepancy between path being NULL
and empty - I don't grok the "security issue" claimed earlier,
but nowadays even the kernel treats the two as identical:
e896474fe4
This allows reusing the logic in systemd-firstboot.c.
To avoid having to link libxkbcommon into libsystemd-shared, we add
a level of indirection to vconsole_convert_to_x11() so that the verify
function is passed in by the caller.
Commit 1be9b30a3b ("dhcp6: use dns_name_from_wire_format") introduced a
stricter validation of domains received via DHCPv6, by using function
dns_name_from_wire_format() which rejects the domain when it is missing the
terminating zero label. According to RFC 4704 § 4.2, DHCPv6 servers should
always add the zero label:
To send a fully qualified domain name, the Domain Name field is set
to the DNS-encoded domain name including the terminating zero-length
label. To send a partial name, the Domain Name field is set to the
DNS-encoded domain name without the terminating zero-length label.
[...]
Servers SHOULD send the complete fully qualified domain name in
Client FQDN options.
In practice, there is at least on common DHCPv6 server implementation (dnsmasq)
that sends the FQDN option without the ending zero-length label; after
upgrading to the new systemd, the client cannot parse the option and therefore
the machine doesn't get the hostname provided by dnsmasq.
This commit restores the old behavior that considers a domain valid even when
it's missing the terminating zero label.
Here's a quick reproducer:
--8<--
ip link add veth0 type veth peer name veth1
ip netns add ns1
ip link set veth1 netns ns1
ip link set veth0 address 00:11:22:33:44:55
ip link set veth0 up
ip -n ns1 link set veth1 up
ip -n ns1 address add dev veth1 fd01::1/64
ip netns exec ns1 dnsmasq \
--pid-file=/tmp/dnsmasq.pid --no-hosts \
--bind-interfaces --interface veth1 --except-interface lo \
--dhcp-range=fd01::100,fd01::200 --enable-ra \
--dhcp-host 00:11:22:33:44:55,foobar &
cat <<EOF > /etc/systemd/network/veth0.network
[Match]
Name=veth0
[Network]
DHCP=ipv6
EOF
networkctl reload
networkctl up veth0
sleep 5
hostname
--8<--
Without this change, systemd-networkd prints the following message and doesn't
set the hostname from DHCP:
veth0: DHCPv6 client: Failed to parse FQDN option, ignoring: Bad message
PCR 7 covers the SecureBoot policy, in particular "dbx", i.e. the
denylist of bad actors. That list is pretty much as frequently updated
as firmware these days (as fwupd took over automatic updating). This
means literal PCR 7 policies are problematic: they likely break soon,
and are as brittle as any other literal PCR policies.
hence, pick safer defaults, i.e. exclude PCR 7 from the default mask.
This means the mask is now empty.
Generally, people should really switch to signed PCR policies covering
PCR 11, in combination with systemd-pcrlock for the other PCRs.
PCR 7 covers the SecureBoot policy, in particular "dbx", i.e. the
denylist of bad actors. That list is pretty much as frequently updated
as firmware these days (as fwupd took over automatic updating). This
means literal PCR 7 policies are problematic: they likely break soon,
and are as brittle as any other literal PCR policies.
hence, pick safer defaults, i.e. exclude PCR 7 from the default mask.
This means the mask is now empty.
Generally, people should really switch to signed PCR policies covering
PCR 11, in combination with systemd-pcrlock for the other PCRs.
In v257 userdbctl gained support for filtering user records with fuzzy
matching and some other parameters. It was done on the client side only.
This PR adds server-side matching, by exendting the generic userdb
varlink api.
The api is generic any may have many other implementors, hence care is
taken to fallback to exclusively client side filtering in case the
service does not support the new parameters.
In fact I even opted to not actually implement server-side filtering in
any services but systemd-userdbd.service, because it's probably not too
much an optimization in relevant services (we might want to revisit this
later). By implementing it in userdbd the primary entrypoint for userdb
is however covered: the multiplexer interface which provides a single
interface for the multitude of backends. Or in other words: the
multiplexer itself supports server-side filtering even if its own
backends don't, and will hide this neatly away.
One nice side effect from not implementing server side filtering for all
our backends is that the fallback codepaths are comprehensively tested.
Note that this adds some unit tests but not new integration test for all
this, as the filtering tests for userdbctl already existed before, we
just move their implementation from the client to the server side.
Add a new flag, `--dns`, to systemd-networkd-wait-online to allow
waiting for DNS to be configured. The `--dns` flag respects the `--ipv4`
and `--ipv6` flags, as well as `--interface=` and `--any`.
Add a new method to io.systemd.Resolve.Monitor that allows subscribing
to changes in the systemd-resolved DNS configuration. The new method
emits the full DNS configuration (one entry for global configuration,
and one entry for each interface), any time the configuration is
updated.
This moves around the UserDBMatch handling, moves it out of userdbctl
and into generic userdb code, so that it can be passed to the server
side, to allow server side filtering.
This is preparation for one day allowing complex software to do such
filtering server side, and thus reducing the necessary traffic.
Right now no server side actually knows this, hence care is taken to
downgrade to the userdb varlink API as it was in v257 in case the new
options are not understood. This retains compatibility with any
implementation hence.
This is preparation for adding server side filtering to the userdb
logic: it adds some fields for this to the userdb varlink API. This only
adds the IDL for it, no client will use it for now, no server implement
it. That's added in later commits.
The varlink method io.systemd.Machine.Register() is in v256, hence type
of "leader" cannot be changed.
Let's revert the change by 755cb018c9, and
introduce another field "leaderProcessId", which takes detailed information
of the process.
Fixes a regression caused by 755cb018c9.
Fixes#36155.
This introduces libmount_parse_mountinfo() and libmount_parse_with_utab().
The former one parses only mountinfo, but the latter one also parse
utab. Hopefully this avoids pitfalls like issue #35949.
There's finally quota on tmpfs, hence let's use it to make it harder for
users to DoS the system by consuming all disk space in /tmp/ and
/dev/shm/.
This enforces a default limit of 80% quota of the backing fs for these
two dirs for users, but this can be overriden in the user record, if
desired.
This also adds two other interesting features:
1. mount units gain GracefulOptions= which takes optional mount options
that are added only if supported by the kernel. (this is used to enable
usrquota on /tmp/, if available.)
2. The PAM logic in service management now supports reading passwords
from service credentials and via the askpw logic. This used for make
testing easy (so that we can run0 into a homed user which strictly
requires a password).
So far nspawn supported unpriv containers only if backed by a DDI. This
adds dir-based unpriv containers too.
To make this work this introduces a new UID concept to systemd: the
"foreign UID range". This is a high UID range of size 64K. The idea is
that disk images that are "foreign" to the local system can use that,
and when a container or similar is invoked from it, a transiently
allocated dynamic UID range is mapped from that foreign UID range via id
mapped mounts.
This means the fully dynamic, transient UID ranges never hit the disk,
which should vastly simplify management, and does not require that uid
"subranges" are persistently delegated to any users.
The mountfsd daemon gained a new method call for acquiring an idmapped
mount fd for an mount tree owned by the foreign UID range. Access is
permitted to unpriv clients – as long as the referenced inode is located
within a dir owned by client's own uid range.
This adds a new "foreign" value to --private-users-ownership= which is a
lot like "map", but maps from the host's foreign UID range rather than from the
host's 0.
(This has nothing much to do with making unprivileged directory-based
containers work, it's just very handy that we can run privileged
contains with such a mapping too, with an easy switch)
systemd-mountfsd so far provided a MountImage() API call for mounting a
disk image and returning a set of mount fds. This complements the API
with a new MountDirectory() API call, that operates on a directory
instead of an image file. Now, what makes this interesting is that it
applies an idmapping from the foreign UID range to the provided target
userns – and in which case unpriveleged operation is allowed (well,
under some conditions: in particular the client must own a parent dir of
the provided path).
This allows container managers to run fully unprivileged from
directories – as long as those directories are owned by the foreign UID
range. Basic operation is like this:
1. acquire a transient userns from systemd-nsresourced with 64K users
2. ask systemd-mountfsd for an idmapped mount of the container dir
matching that userns
3. join the userns and bind the mount fd as root.
Note that we have to drop various sandboxing knobs from the mountfsd
service file for this to work, since the kernel's security checks that
try to ensure than an obstructed /proc/ cannot be circumvented via
mounting a new procfs will otherwise prohibit mountfsd to duplicate the
mounts properly.
We currently set this at two distinct places right before calling
userdb_connect(). let's do this inside of userdb_connect() instead, and
derive it directly from the socket path.
This doesn't change behaviour but simplifies things a bit.
- add one missing assertion,
- always logs on error,
- simplify the logic to make it easy to understand,
- add several more comments.
Preparation for later commits. No functional change.
errno handling for NSS is always a bit weird since NSS modules generally
are not particularly careful with it. Hence let's initialize errno
explicitly before we invoke getpwent() so that we know it's in a
reasonable state afterwards on failure, or zero if not.
We do this in most places we use NSS, including in userdb when it comes
to getgrent(), just for getpwent() we don't so far. Address that.
The documentation and code agree on the same name, since always, but
when I put together the IDL I made a mistake and insert a "Not" that
wasn't supposed to be there.
Let's correct that.
This ensures that user names can be specified either in the regular
short syntax or with a realm appended, and both are accepted. (The
latter of course only if the record actually defines a realm)