When conditions fail on a service unit, a path unit can cause
PID 1 to busy loop as it keeps trying to activate the service unit.
To avoid this from happening, add a trigger limit to the path unit,
identical to the trigger limit we have for socket units.
Initially, let's start with a high limit and not make it configurable.
If needed, we can add properties to configure the rate limit similar
to the ones we have for socket units.
Previously the mkdir_label() family of calls was implemented in
src/shared/mkdir-label.c but its functions partly declared ins
src/shared/label.h and partly in src/basic/mkdir.h (!!). That's weird
(and wrong).
Let's clean this up, and add a proper mkdir-label.h matching the .c
file.
Up until now the main reason why we didn't proceed with starting the
unit was exceed start limit burst. However, for unit types like mounts
the other reason could be effective ratelimit on /proc/self/mountinfo
event source. That means our mount unit state may not reflect current
kernel state. Hence, we need to attempt to re-run the start job again
after ratelimit on event source expires.
As we will be introducing another reason than start limit let's rename
the virtual function that implements the check.
Alternative to https://github.com/systemd/systemd/pull/20531.
Whenever a service triggered by another unit fails condition checks,
stop the triggering unit to prevent systemd busy looping trying to
start the triggered unit.
Fixes#17433. Currently, if any of the validations we do before we
check start rate limiting fail, we can still enter a busy loop as
no rate limiting gets applied. A common occurence of this scenario
is path units triggering a service that fails a condition check.
To fix the issue, we simply move up start rate limiting checks to
be the first thing we do when starting a unit. To achieve this,
we add a new method to the unit vtable and implement it for the
relevant unit types so that we can do the start rate limit checks
earlier on.
When watching paths that contain symlinks in some element we so far
always only watched the inode they are pointing to, not the symlink
inode itself. Let's fix that and always watch both. We do this by simply
installing the inotify watch once with and once without IN_DONT_FOLLOW.
For non-symlink inodes this just overrides the same watch twice (where
the second one replaces the first), which is has no effect effectively.
For symlinks it means we'll watch both source and destination.
Fixes: #17727
This mirrors the change done for systemd-resolved in
9793530228. Quoting that patch:
> We generally operate on the assumption that a source is "gone" as soon as we
> unref it. This is generally true because we have the only reference. But if
> something else holds the reference, our unref doesn't really stop the source
> and it could fire again.
In particular, we take temporary references from sd-event code, and when called
from an sd-event callback, we could temporarily see this elevated reference
count. This patch doesn't seem to change anything, but I think it's nicer to do
the same change as in other places and not rely on _unref() immediately
disabling the source.
We already do this for socket and automount units, do it for path units
too: if the triggered service keeps hitting the start limit, then fail
the triggering unit too, so that we don#t busy loop forever.
(Note that this leaves only timer units out in the cold for this kind of
protection, but it shouldn't matter there, as they are naturally
protected against busy loops: they are scheduled by time anyway).
Fixes: #16669
In 4c2ef32767 we enabled propagating
triggered unit state to the triggering unit for service units in more
load states, so that we don't accidentally stop tracking state
correctly.
Do the same for our other triggering unit states: automounts, paths, and
timers.
Also, make this an assertion rather than a simple test. After all it
should never happen that we get called for half-loaded units or units of
the wrong type. The load routines should already have made this
impossible.
In all the other cases, I think the code was clearer with the static table.
Here, not so much. And because of the existing dump code, the vtables cannot
be made static and need to remain exported. I still think it's worth to do the
change to have the cmdline introspection, but I'm disappointed with how this
came out.
As documented in systemd.path(5):
When a service unit triggered by a path unit terminates (regardless
whether it exited successfully or failed), monitored paths are
checked immediately again, and the service accordingly restarted
instantly.
This commit implements this behaviour for PathExists=, PathExistsGlob=,
and DirectoryNotEmpty=. These predicates are essentially
"level-triggered": the service should be activated whenever the
predicate is true. PathChanged= and PathModified=, on the other hand,
are "edge-triggered": the service should only be activated when the
predicate *becomes* true.
The behaviour has been broken since at least as far back as commit
8fca6944c2 ("path: stop watching path specs once we triggered the target
unit"). This commit had systemd stop monitoring inotify whenever the
triggered unit was activated. Unfortunately this meant it never updated
the ->inotify_triggered flag, so it never rechecked the path specs when
the triggered unit deactivated.
With this commit, systemd rechecks all paths specs whenever the
triggered unit deactivates. If any PathExists=, PathExistsGlob= or
DirectoryNotEmpty= predicate passes, the triggered unit is reactivated.
If the target unit is activated by something outside of the path unit,
the path unit immediately transitions to a running state. This ensures
the path unit stops monitoring inotify in this situation.
With this change in place, commit d7cf8c24d4 ("core/path: fix spurious
triggering of PathExists= on restart/reload") is no longer necessary.
The path unit (and its triggered unit) is now always active whenever
the PathExists= predicate passes, so there is no spurious restart when
systemd is reloaded or restarted.
Similar, refuse triggering deps on units that cannot trigger.
And rework how we ignore After= dependencies on device units, to work
the same way.
See: #14142
Our handling of the condition was inconsistent. Normally, we'd only fire when
the file was created (or removed and subsequently created again). But on restarts,
we'd do a "recheck" from path_coldplug(), and if the file existed, we'd
always trigger. Daemon restarts and reloads should not be observeable, in
the sense that they should not trigger units which were already triggered and
would not be started again under normal circumstances.
Note that the mechanism for checks is racy: we get a notification from inotify,
and by the time we check, the file could have been created and removed again,
or removed and created again. It would be better if we inotify would give as
an unambiguous signal that the file was created, but it doesn't: IN_DELETE_SELF
triggers on inode removal, not directory entry, so we need to include IN_ATTRIB,
which obviously triggers on other conditions.
Fixes#12801.
No functional change, just adjusting code to follow the same pattern
everywhere. In particular, never call _verify() on an already loaded unit,
but return early from the caller instead. This makes the code a bit easier
to follow.
unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional()
are really the same, with one minor difference in behaviour. Let's drop
the second function.
"_optional" in the name suggests that it's the "dropin" part that is optional.
(Which it is, but in this case, we mean the fragment to be optional.)
I think the new version with a flag is easier to understand.
The default message for ENOSPC is very misleading: it says that the disk is
filled, but in fact the inotify watch limit is the problem.
So let's introduce and use a wrapper that simply calls inotify_add_watch(2) and
which fixes the error message up in case ENOSPC is returned.
Just some renaming, no change in behaviour.
Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
This allows clients to follow our internal state changes safely.
Previously, quick state changes (for example, when we restart a unit due
to Restart= after it quickly transitioned through DEAD/FAILED states)
would be coalesced into one bus signal event, with this change there's
the guarantee that all state changes after the unit was announced ones
are reflected on th bus.
Note we only do this kind of guaranteed flushing only for unit state
changes, not for other unit property changes, where clients still have
to expect coalescing. This is because the unit state is a very
important, high-level concept.
Fixes: #10185
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.
In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.
While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.