In all our daemons the primary entrypoint object is called "Manager".
But so far there was one exception: in journald it was called "Server".
Let's normalize that, and stick to the same nomenclature everywhere, to
make journald less special.
No real code change, just some search&replace.
Previously, both udevd and logind modifies ACLs of a device node. Hence,
there exists a race something like the following:
1. udevd reads an old state file,
2. logind updates the state file, and apply new ACLs,
3. udevd applies ACLs based on the old state file.
This makes logind not update ACLs but trigger uevents for relevant
devices to make ACLs updated by udevd.
When udevd broadcasts an event for e.g. a graphics device with master-of-seat
tag, then previously manager_process_seat_device() was called twice for
the event.
With this commit, the function is called only once even for an event for
such device.
This returns to the original approach proposed in
https://github.com/systemd/systemd/pull/17270. After review, the approach was
changed to use sd_pid_get_owner_uid() instead. Back then, when running in a
typical graphical session, sd_pid_get_owner_uid() would usually return the user
UID, and when running under sudo, geteuid() would return 0, so we'd trigger the
secure path.
sudo may allocate a new session if is invoked outside of a session (depending
on the PAM config). Since nowadays desktop environments usually start the user
shell through user units, the typical shell in a terminal emulator is not part
of a session, and when sudo is invoked, a new session is allocated, and
sd_pid_get_owner_uid() returns 0 too. Technically, the code still works as
documented in the man page, but in the common case, it doesn't do the expected
thing.
$ build/test-sd-login |& rg 'get_(owner_uid|cgroup|session)'
sd_pid_get_session(0) → No data available
sd_pid_get_owner_uid(0) → 1000
sd_pid_get_cgroup(0) → /user.slice/user-1000.slice/user@1000.service/app.slice/app-ghostty-transient-5088.scope/surfaces/556FAF50BA40.scope
$ sudo build/test-sd-login |& rg 'get_(owner_uid|cgroup|session)'
sd_pid_get_session(0) → c289
sd_pid_get_owner_uid(0) → 0
sd_pid_get_cgroup(0) → /user.slice/user-0.slice/session-c289.scope
I think it's worth checking for sudo because it is a common case used by users.
There obviously are other mechanims, so the man page is extended to say that
only some common mechanisms are supported, and to (again) recommend setting
SYSTEMD_LESSSECURE explicitly. The other option would be to set "secure mode"
by default. But this would create an inconvenience for users doing the right
thing, running systemctl and other tools directly, because then they can't run
privileged commands from the pager, e.g. to save the output to a file. (Or the
user would need to explicitly set SYSTEMD_LESSSECURE. One option would be to
set it always in the environment and to rely on sudo and other tools stripping
it from the environment before running privileged code. But that is also fairly
fragile and it obviously relies on the user doing a complicated setup to
support a fairly common use case. I think this decreases usability of the
system quite a bit. I don't think we should build solutions that work in
priniciple, but are painfully inconvenient in common cases.)
Fixes https://yeswehack.com/vulnerability-center/reports/346802.
Also see https://github.com/polkit-org/polkit/pull/562, which adds support for
$SUDO_UID/$SUDO_GID to pkexec.
If zstd frames are corrupted the initial size returned for the current
frame might be wrong. Don#t assert() on that, but handle it gracefully,
as EBADMSG
Let's be strict here: this data is conceptually not NUL terminated,
hence use memory_startswith() rather than startswith() (which implies
NUL termination). All other similar cases in logs-show.c got this right.
Fix the remaining three, too.
With the more efficient sync semantics it's more likely that
journal-upload-journal will try to read a partially written message.
Previously we'd fail then. Let's instead treat this gracefully,
expecting that this is either the end or will be fixed shortly (and
we'll get notified via inotify about it and recheck).
The MHD context owns the fd we watch via our event source, hence when we
destroy the context before the event source the event source might still
reference the fd that is now invalid. Hence swap the order.
The Synchronize() function is just too useful for clients, so that we
can make "systemd-run -v --user" actually useful. Hence let's make the
socket accessible without privs. Deny most method calls however, except
for the Synchronize() call.
Previously, if the Synchronize() varlink call is issued we'd wait for
journald to become idle before returning success. That is problematic
however: on a busy system journald might never become idle. Hence, let's
beef up the logic to ensure that we do not wait longer than necessary:
i.e. we make sure we process any data enqueued before the sync request
was submitted, but not more.
Implementing this isn't trivial unfortunately. To deal with this
reasonably, we need to determine somehow for incoming log messages
whether they are from before or after the point in time where the sync
requested was received.
For AF_UNIX/SOCK_DGRAM we can use SO_TIMESTAMP to directly compare
timestamps of incoming messages with the timestamp of the sync request
(unfortunately only CLOCK_REALTIME).
For AF_UNIX/SOCK_STREAM we can call SIOCINQ at the moment we initiate
the sync, and then continue processing incoming traffic, counting down
the bytes until the SIOCINQ returned bytes have been processed. All
further data must have been enqueued later hence.
With those two mechanisms in place we can relatively reliably
synchronize the journal.
This also adds a boolean argument "offline" to the Synchronize() call,
which controls whether to offline the journal after processing the
pending messages. it defaults to true, for compat with the status quo
ante. But for most cases the offlining is probably not necessary, and is
cheaper to do without, hence allow not to do it.
So far we schduled kmsg events at higher priority than native/syslog
ones. But that's quite problematic, since it means that kmsg events can
drown out native/syslog log events. And this actually shows up in some
CI tests.
Address that, and schedule all three sources at the same priority, so
that the earlier event always is processed first, regarding which
protocol is used.