[File] brief impossible bug: magic_file(MAGIC_COMPRESS) for test.lz -> vfork() + waitpid(pid) + waitpid(0) = sleep forever
raf
file at raf.org
Sun Apr 5 13:59:14 UTC 2026
Hi,
debian-12, libmagic-5.44
Yesterday, I encountered a wierd bug using magic_file()
with MAGIC_COMPRESS on a compressed file. It would go
to sleep until I interrupted it. It only happened when
my code had its own child process, but it happened
almost every time that that was the case. For weeks, it
was fine, but for 1-2 days, it would sleep forever.
At first I thought I must be corrupting memory somehow
and damaging whatever is behind the magic_t pointer,
but ubsan, asan, msan, and valgrind found no problems.
I also checked that the magic_t pointer itself and the
first 30 bytes of whatever it pointed to weren't being
overwritten, and they weren't. It looked like something
inside libmagic was just going to sleep.
Today it's back to normal and working fine. I looked at
the code for 5.46 and it looks completely impossible
for yesterday's behaviour to happen (assuming no relevant
differences between 5.44 and 5.46).
For compressed files, uncompressbuf() does vfork() and
does waitpid() for it.
clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7fe771fa1000, stack_size=0x9000}, 88) = 480299
wait4(480299, NULL, 0, NULL) = 480299
If fd == -1, it calls writechild() which does fork()
and sets writepid. Later, if writepid > 0, then a second
waitpid() happens. Strangely, the second waitpid() does
happen, even though writepid should still be -1 rather
than > 0, because the fork() in writechild() never
happens.
When broken, writepid was 0 which makes no sense. It
can't be 0. It's either -1 or > 0, and the waitpid()
only happens when writepid > 0.
wait4(0, 0x7fff19477658, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
Anyway, when it was 0, and my code had its own child
process, the waitpid() in uncompressbuf() would go to
sleep, presumably/mistakenly waiting for my code's
child process to terminate, which it wasn't going to
do.
Today, now that it's not broken, the value of writepid
is > 0 even though there was no writechild() process
created. It just returns -1 because there's no such
child process. So my program doesn't go to sleep and it
works fine...
wait4(32764, 0x7ffc45c09118, 0, NULL) = -1 ECHILD (No child processes)
BUT the value of writepid is suspiciously low (ignoring
the fact that it should actually be -1). The pid just
before this was 519823, and pids tend to increment by 1
each time there's a fork(), so it's odd to drop to such
a low value as 32764. If the corresponding fork() had
happened, writepid would probably = 519824.
It's still wrong though. childpid should be -1 because
there's no second fork(), and the second waitpid()
shouldn't be happening at all.
Maybe it's cosmic rays, or a hardware error, but maybe
there is something wierd happening in uncompressbuf()
that I can't make sense of.
It looks like writepid is being overwritten with junk
(even in between "if (writepid > 0)" and
"w = waitpid(writepid, &status, 0)" on the next line.
but none of the analyzers noticed anything wrong.
I was thinking I'd have to call magic_file() from a
separate child process so it can't make the mistake of
waiting for my code's child process. Or maybe just stop
using MAGIC_COMPRESS. Maybe I'll have to, just to be
on the safe side.
Can anyone make sense of the the following strace
output files? There's one extract for non-sleepforever
but still broken writepid value (good) and another
extract for sleepforever behaviour because writepid != 0
and then writepid == 0 immediately after (bad):
good:
newfstatat(AT_FDCWD, "/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/test/support/fixtures/test.lz", {st_mode=S_IFREG|0644, st_size=36, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/test/support/fixtures/test.lz", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 4
newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=36, ...}, AT_EMPTY_PATH) = 0
read(4, "LZIP\1\f\0\203\377\373\377\377\300\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$\0\0\0"..., 7340032) = 36
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f896af95050}, {sa_handler=0x56128af72180, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f896af95050}, 8) = 0
brk(0x56129dab4000) = 0x56129dab4000
write(1, "{\"path\":\"/home/linuxbrew/.linuxb"..., 3986) = 3986
pipe2([7, 9], O_CLOEXEC) = 0
pipe2([10, 11], O_CLOEXEC) = 0
lseek(4, 0, SEEK_SET) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(0, F_SETFD, 0) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(1, F_SETFD, 0) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(2, F_SETFD, 0) = 0
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f896b21c000
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7f896b21c000, stack_size=0x9000}, 88) = 519823
wait4(519823, NULL, 0, NULL) = 519823
munmap(0x7f896b21c000, 36864) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=519823, si_uid=1000, si_status=127, si_utime=0, si_stime=0} ---
close(9) = 0
close(11) = 0
read(7, "", 7340032) = 0
read(10, "", 7340032) = 0
close(7) = 0
close(10) = 0
wait4(32764, 0x7ffc45c09118, 0, NULL) = -1 ECHILD (No child processes)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 7
newfstatat(7, "", {st_mode=S_IFREG|0644, st_size=2996, ...}, AT_EMPTY_PATH) = 0
read(7, "# Locale name alias data base.\n#"..., 4096) = 2996
read(7, "", 4096) = 0
close(7) = 0
openat(AT_FDCWD, "/usr/share/locale/en_AU/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigaction(SIGPIPE, {sa_handler=0x56128af72180, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f896af95050}, NULL, 8) = 0
brk(0x56129ccb3000) = 0x56129ccb3000
close(4) = 0
bad:
newfstatat(AT_FDCWD, "/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/test/support/fixtures/test.lz", {st_mode=S_IFREG|0644, st_size=36, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/test/support/fixtures/test.lz", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 4
newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=36, ...}, AT_EMPTY_PATH) = 0
read(4, "LZIP\1\f\0\203\377\373\377\377\300\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$\0\0\0"..., 7340032) = 36
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe771d1a050}, {sa_handler=0x55dd31ba7180, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe771d1a050}, 8) = 0
pipe2([7, 9], O_CLOEXEC) = 0
pipe2([10, 11], O_CLOEXEC) = 0
lseek(4, 0, SEEK_SET) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(0, F_SETFD, 0) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(1, F_SETFD, 0) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(2, F_SETFD, 0) = 0
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fe771fa1000
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7fe771fa1000, stack_size=0x9000}, 88) = 480299
wait4(480299, NULL, 0, NULL) = 480299
munmap(0x7fe771fa1000, 36864) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=480299, si_uid=1000, si_status=127, si_utime=0, si_stime=0} ---
close(9) = 0
close(11) = 0
read(7, "", 7340032) = 0
read(10, "", 7340032) = 0
close(7) = 0
close(10) = 0
wait4(0, 0x7fff19477658, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
rt_sigprocmask(SIG_BLOCK, NULL, [INT], 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT PIPE], NULL, 8) = 0
write(5, "-stay_open\n0\n", 13) = 13
rt_sigprocmask(SIG_SETMASK, [INT], NULL, 8) = 0
close(5) = 0
close(6) = 0
close(8) = 0
wait4(480297, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 480297
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=480297, si_uid=1000, si_status=1, si_utime=54 /* 0.54 s */, si_stime=4 /* 0.04 s */} ---
exit_group(2) = ?
+++ exited with 2 +++
cheers,
raf
More information about the File
mailing list