[Tcsh] Deadlock with SIGHUP

Brett Frankenberger rbf at rbfnet.com
Mon Aug 19 16:10:26 UTC 2019


tcsh can deadlock with itself if savehist is confgured with "merge" and
"lock", and two SIGHUPs are received in rapid succession.  The
mechanism of the deadlock is the first SIGHUP triggers a rechist() and
while that rechist() is executing (and after it has created the lock
file), another SIGHUP triggers a another rechist() which then waits
forever for the lock the the first rechist() created to be released
(which will never happen).

A backtrace from when it's deadlocked:

#1  0x00007fe3a48f7877 in usleep (useconds=useconds at entry=100000)
    at ../sysdeps/posix/usleep.c:32
#2  0x000055c7b9368974 in dot_lock (
    fname=fname at entry=0x55c7ba174540 "/home/rbf/.history", 
    pollinterval=pollinterval at entry=100) at dotlock.c:166
#3  0x000055c7b935950f in rechist (fname=0x55c7ba1e5960L"/home/rbf/.history", 
    ref=<optimized out>) at sh.hist.c:1293
#4  0x000055c7b9344cc0 in record () at sh.c:2512
#5  0x000055c7b9346b29 in phup () at sh.c:1842
#6  0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
#7  0x000055c7b935ec53 in xwrite (fildes=3, 
    buf=buf at entry=0x55c7b95b6e00 <linbuf>, nbyte=12) at sh.misc.c:690
#8  0x000055c7b9360104 in flush () at sh.print.c:260
#9  0x000055c7b9387219 in doprnt (addchar=0x55c7b9360390 <xputchar>, 
    sfmt=sfmt at entry=0x55c7b938d0ad "%S", ap=ap at entry=0x7ffc4fb9cd60)
    at tc.printf.c:294
#10 0x000055c7b9387823 in xprintf (fmt=fmt at entry=0x55c7b938d0ad "%S")
    at tc.printf.c:392
#11 0x000055c7b935b392 in prlex (sp0=sp0 at entry=0x55c7ba17efc0) at
sh.lex.c:228
#12 0x000055c7b9358510 in phist (hp=0x55c7ba17efc0, hflg=<optimized out>)
    at sh.hist.c:1071
#13 0x000055c7b93596d3 in dophist (hflg=65, n=200) at sh.hist.c:1114
#14 dohist (vp=<optimized out>, c=<optimized out>) at sh.hist.c:1177
#15 0x000055c7b93593f7 in rechist (fname=0x55c7ba1dade0
#L"/home/rbf/.history", 
    ref=<optimized out>) at sh.hist.c:1322
#16 0x000055c7b9344cc0 in record () at sh.c:2512
#17 0x000055c7b9346b29 in phup () at sh.c:1842
#18 0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
#19 0x000055c7b935ebb3 in xread (fildes=16, buf=buf at entry=0x7ffc4fb9e030, 
    nbyte=nbyte at entry=1) at sh.misc.c:662

This patch (which is to 6.20.00 ... but there's doesn't appear to be
anything in 6.21.00 which would address this, so I'm reasonably
confident the problem exists there as well) fixes the problem for me. 
It disables processing if pending SIGHUP at the start of rechist (and
then restores on completion).  

(It does this even if savehist isn't configured with lock; so it avoids
starting a second write while the first one is in progress even in
cases where it won't deadlock.)

(I did consider just having handle_pending_signals not redispatch
phup() if one was already running, but it looks like the same deadlock
could occur if a single SIGHUP arrived while the shell was saving the
history for other reasons, although I haven't produced (or tried to
produce) that behavior.)

--- tcsh-6.20.00.orig/sh.hist.c
+++ tcsh-6.20.00/sh.hist.c
@@ -1223,7 +1223,7 @@ void
 rechist(Char *fname, int ref)
 {
     Char    *snum, *rs;
-    int     fp, ftmp, oldidfds;
+    int     fp, ftmp, oldidfds, phup_disabled_tmp;
     struct varent *shist;
     char path[MAXPATHLEN];
     struct stat st;
@@ -1231,6 +1231,10 @@ rechist(Char *fname, int ref)
 
     if (fname == NULL && !ref) 
        return;
+
+       phup_disabled_tmp = phup_disabled;
+       phup_disabled = 1;
+
     /*
      * If $savehist is just set, we use the value of $history
      * else we use the value in $savehist
@@ -1305,6 +1309,7 @@ rechist(Char *fname, int ref)
     if (fp == -1) {
        didfds = oldidfds;
        cleanup_until(fname);
+       phup_disabled = phup_disabled_tmp;
        return;
     }
     /* Try to preserve ownership and permissions of the original history file */
@@ -1325,6 +1330,7 @@ rechist(Char *fname, int ref)
     didfds = oldidfds;
     (void)rename(path, short2str(fname));
     cleanup_until(fname);
+    phup_disabled = phup_disabled_tmp;
 }

(As background, below is where/how I found this)

For me, this is occurring on Linux; and on systemd systems it's easy to
recreate -- when systemd attempts to terminate a session, the shell
often ends up getting two SIGHUPs in rapid succession (my assumption is
that one is directly from systemd and another is a result of the parent
sshd terminating).  I get get it to happen about 50% of the time with 
"systemctl stop session-XX.scope" when that session is an ssh
connection that has a tcsh shell configiured with "savehist = ( XXX
merge lock )".  (I have a history of about 200 lines to write out.)

Obviously, it's racy ... sometimes the second SIGHUP is early enough,
or late enough, to avoid the problem.

     -- Brett


More information about the Tcsh mailing list