It is strictly mandatory that this is done during initial
transaction, and not later when the system is already running.
Hence let's refuse manual start for all of the involved units.
Additionally, refuse manual stop for systemd-factory-reset-complete.service,
as it flags the factory reset completion through
/run/systemd/factory-reset-complete, which never gets removed
for the whole boot.
If all transfer definitions are features and disabled, a wrong error
is reported that there are no transfer definitions.
This breaks the features and vaccum verb, as they work on disabled
features, too.
Otherwise, if the system is busy, TEST-02-UNITTESTS will fail as
systemd will time out trying to kill the transient unit that we're
running test-async in.
When services start up they might query for passwords, or issue polkit
requests. Hence it makese sense to run the password query agent and
polkit agent from systemd-run. We already ran the polkit agent, this
also ensures we run the password query agent.
There's one tweak to the story though: running the agents and the pty
forwarder concurrently is messy, since they both try to read from stdin
(one potentially, the other definitely). Hence, let's time the agents
properly: invoke them when we initialize, but stop them once the start
job for the unit we are supposed to run is complete, and only then run
the pty forwarder.
With this in place, the following series of commands starts to work
really nicely (which previously deadlocked):
# homectl create foobar
# run0 -u foobar
What happens in the background in run0 is this: a new session is invoked
for "foobar", which pulls in the user@.service instance for the user.
That user@.service instance will need to unlock the homedir first. Since
8af1b296cb this will happen via the askpw
logic. With this commit here this prompt will now be shown by run0. Once
the password is entered the directory is unlocked and the real session
begins. Nice!
This new behaviour is conditioned behind --pty-late (distinct from the
existing --pty switches). For systemd-run we will never enable this mode
by default, for compat with command lines that use ExecStartPre=
(because we won't process the pty anymore during that command) For
run0 however this changes the default to --pty-late (unless
--no-ask-password is specified). This reflects the fact that run0 is
more of an interctive tool and unlikely to be used in more complex
service start-up situations with ExecStartPre= and suchlike.
This also merges JobDoneContext into RunContext, since it doesn't really
make sense to have two contexts around to communicate between outer
stack frame and event handlers. Let's just have one, and pass it around
to all handlers the same way. In particular as we should delay exit only
until both the unit's job is complete *and* in case of --wait the unit
is exited, one of the two should not suffice.
In relevant factory reset situation the root disk itself is subject to
removal. This somewhat conflicts with automatic root disk discovery,
since the system first comes up with one candidate for the root disk,
which is then replaced by another.
Let's address this by determining at the moment of probing for the
gpt-root logic what the factory reset state currently is. This is then
used to maintain two distinct symlinks to the gpt auto root device: one
which is always available and one that is only available if factory
reset is off or complete.
The new symlinks is not used by anything yet. This will be added in a
later commit.
While we typically just use /dev/tpmrm0 for accessing the TPM chip (i.e
via the kernel's own resource manager), some sysfs properties that
matter are on /dev/tpm0 only (i.e. the version without the kernel TPM
resource manager). Hence, wait for both to show up in tpm2.target, so
that we can be sure the full API is available.
This matters because we want to access /sys/class/tpm/tpm0/ppi/request
in the next commit.
This introduces a bunch of facilities:
1. The factory-reset.target unit that requests a factory reset is now
complemented by factory-reset-now.target that executes it at next
boot.
2. This latter is added to the initial transaction via the new trivial
systemd-factory-reset-generator.
3. A tool systemd-factory-reset has been added to query, request,
cancel, complete factory reset operations (via EFI variables). Two of
these are wrapped into units that are plugged into
factory-reset.target and factory-reset-now.target respectively. The
tool also provides a simple Varlink API.
This should make things a lot cleaner, and both be useful as explicit
implementation on UEFI, and as template + hookpoints for alternative
implementations on non-UEFI.
Let's provide a generic implementation of the systemd.factory_reset
kernel cmdline checking repart implements. Moreover add support for
leaving the factory reset state again.
This only establishes the basic APIs, it does not hook them up with
anything.
For some reason, argparse treats undefined options as positional args in
certain scenarios:
$ src/ukify/ukify.py --badopt='11'
ukify.py: error: unrecognized arguments: --badopt=11
$ src/ukify/ukify.py --badopt '11'
ukify.py: error: unrecognized arguments: --badopt
$ src/ukify/ukify.py --badopt '11 12'
Assuming obsolete command line syntax with no verb. Please use 'build'.
Traceback (most recent call last):
File "/home/zbyszek/src/systemd/src/ukify/ukify.py", line 2497, in <module>
main()
~~~~^^
File "/home/zbyszek/src/systemd/src/ukify/ukify.py", line 2485, in main
check_inputs(opts)
~~~~~~~~~~~~^^^^^^
File "/home/zbyszek/src/systemd/src/ukify/ukify.py", line 671, in check_inputs
value.open().close()
~~~~~~~~~~^^
File "/usr/lib64/python3.13/pathlib/_local.py", line 537, in open
return io.open(self, mode, buffering, encoding, errors, newline)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '--badopt=11 12'
I suspect that this is some crap compat for Windows, where option parsing is
an even bigger mess than here.
Being told about positional args, when no positional args were specified is
confusing, so add a check for this.
Otherwise, if the system is busy, TEST-02-UNITTESTS will fail as
systemd will time out trying to kill the transient unit that we're
running test-async in.
This used to be relevant since in old versions of glibc an internal
cache is maintained, while we might sidestep their invalidation
with raw_clone(). After glibc 2.25 getpid() is a trivial wrapper
for the syscall, and hence there's no need to have a separate
raw_getpid().
It's sometimes very useful to be able to terminate a container quickly
but cleanly while talking to it. Introduce a hotkey for that: ^]^]p for
powering it off. In similar style add ^]^]r for rebooting it.
Let's add the ability that ptyfwd tools can register additional hotkeys
that they then can handle.
So far the only hotkey we support is ^]^]^] to exit the ptyfwd session
abruptly. Staying close to this let's add ^]^]<char> for additional
commands.
We'll add another type of handler callback in the next commit, hence
rename the existing handler to be more precise what it is about:
handling hangups (either inline via tty, or explicit via user request)
inotify FD may take several milliseconds to close. We measured
daemon-reload
default: (0.427 ± 0.05) s
async: (0.323 ± 0.02) s
with 5 path units out of 422 units. I.e. ~1% of units cause ~25% of
delay, hence this fix seems like low-hanging fruit on the daemon-reload
critical path.
Particular inotify slowness pointed out by @fbuihuu.
We currently don't support invoking a per-area service manager instance,
hence don't try to pull in one if we log into an area.
Once we add support for per-area service managers we can relax this
again.
We always validate that the target value is below _LOG_TARGET_SINGLE_MAX
before acessing it, but we don't actually size the array like that.
let's fix that.
This doesn#t effectively change anything, but it makes things more
explicit what the limit here is.
If we use TCP fastopen to connect to a DNS server via TCP, and it
responds really quickly between our connection attempt and our immediate
check back, then we have not identified the peer yet, and will not be
able to use the peer metadata to fill in our packet info.
Let's fix that, and simply not read from the socket until identification
is complete.
Fixes: #34956
In Python, exception messages are often embedded in surrounding text, so in
general they should not contain punctuation.
Also, no need to instantiate the exception object if no arguments are used.