tree-wide: open block device locks in writable mode

udev's block device locking protocol has one pitfall not even the
example in the documentation got right so far (even though this is
explained in all detail above): udev's rescanning is only triggered when
an fd that is opened for writing is closed. This means that if a
separate locking fd is opened on a block device – one that is maintained
independently of the fd actually used for writing – it must be opened for
writing too, so that closing the lock definitely triggers a rescan. This
matters in cases where the lock fd is kept for longer than the fd used
for writing to disk. (Because otherwise udev might get the
IN_CLOSE_WRITE event, but when it tries to rescan will find the device
locked, and never retry because no IN_CLOSE_WRITE is triggred anymore.)

Let's fix that across the codebase, at 4 places:

1. in makefs (a lock fd is kept, and mkfs then invoked as child, which
   uses a different fd, and the lock fd is closed only once the child
   died)

2. in udevadm lock (embarassing!): which is intended to be used to wrap tools
   that modify disk contents, very similar to the makefs case. The lock
   is also kept until after the tool exited.

3. In storagetm: the kernel nvme-tcp layer writes to the device
   directly, we just keep a lock fd.

4. the example in BLOCK_DEVICE_LOCKING.md
This commit is contained in:
Lennart Poettering
2025-10-22 22:47:53 +02:00
parent 46da450f13
commit e582484789
6 changed files with 17 additions and 11 deletions

View File

@@ -223,11 +223,12 @@ int main(int argc, char **argv) {
return EXIT_FAILURE;
}
// try to take an exclusive and nonblocking BSD lock
// try to take an exclusive and nonblocking BSD lock (use O_WRONLY mode to ensure udev
// rescans the device once the lock is closed)
__attribute__((cleanup(closep))) int fd =
lock_whole_disk_from_devname(
argv[1],
O_RDONLY|O_CLOEXEC|O_NONBLOCK|O_NOCTTY,
O_WRONLY|O_CLOEXEC|O_NONBLOCK|O_NOCTTY,
LOCK_EX|LOCK_NB);
if (fd < 0)