[Tcsh] tcsh Deadlock with SIGHUP

Mon Jan 20 16:40:49 UTC 2020

Thanks, it should be fixed now.

christos

> On Jan 20, 2020, at 9:08 AM, Brett Frankenberger <rbf at rbfnet.com> wrote:
> 
> (Resending what I posted last August ... let me know if there's
> anything I can do to get this (or a differnet fix for the same issue)
> into the tree.)
> 
> tcsh can deadlock with itself if savehist is confgured with "merge" and
> "lock", and two SIGHUPs are received in rapid succession.  The
> mechanism of the deadlock is the first SIGHUP triggers a rechist() and
> while that rechist() is executing (and after it has created the lock
> file), the second SIGHUP triggers a another rechist() which then waits
> forever for the lock the the first rechist() created to be released
> (which will never happen).
> 
> A backtrace from when it's deadlocked:
> 
> #1  0x00007fe3a48f7877 in usleep (useconds=useconds at entry=100000)
>    at ../sysdeps/posix/usleep.c:32
> #2  0x000055c7b9368974 in dot_lock (
>    fname=fname at entry=0x55c7ba174540 "/home/rbf/.history",
>    pollinterval=pollinterval at entry=100) at dotlock.c:166
> #3  0x000055c7b935950f in rechist (fname=0x55c7ba1e5960L"/home/rbf/.history",
>    ref=<optimized out>) at sh.hist.c:1293
> #4  0x000055c7b9344cc0 in record () at sh.c:2512
> #5  0x000055c7b9346b29 in phup () at sh.c:1842
> #6  0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
> #7  0x000055c7b935ec53 in xwrite (fildes=3,
>    buf=buf at entry=0x55c7b95b6e00 <linbuf>, nbyte=12) at sh.misc.c:690
> #8  0x000055c7b9360104 in flush () at sh.print.c:260
> #9  0x000055c7b9387219 in doprnt (addchar=0x55c7b9360390 <xputchar>,
>    sfmt=sfmt at entry=0x55c7b938d0ad "%S", ap=ap at entry=0x7ffc4fb9cd60)
>    at tc.printf.c:294
> #10 0x000055c7b9387823 in xprintf (fmt=fmt at entry=0x55c7b938d0ad "%S")
>    at tc.printf.c:392
> #11 0x000055c7b935b392 in prlex (sp0=sp0 at entry=0x55c7ba17efc0) at
> sh.lex.c:228
> #12 0x000055c7b9358510 in phist (hp=0x55c7ba17efc0, hflg=<optimized out>)
>    at sh.hist.c:1071
> #13 0x000055c7b93596d3 in dophist (hflg=65, n=200) at sh.hist.c:1114
> #14 dohist (vp=<optimized out>, c=<optimized out>) at sh.hist.c:1177
> #15 0x000055c7b93593f7 in rechist (fname=0x55c7ba1dade0
> #L"/home/rbf/.history",
>    ref=<optimized out>) at sh.hist.c:1322
> #16 0x000055c7b9344cc0 in record () at sh.c:2512
> #17 0x000055c7b9346b29 in phup () at sh.c:1842
> #18 0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
> #19 0x000055c7b935ebb3 in xread (fildes=16, buf=buf at entry=0x7ffc4fb9e030,
>    nbyte=nbyte at entry=1) at sh.misc.c:662
> 
> This patch (which is to 6.20.00 ... but there's doesn't appear to be
> anything in 6.21.00 or 6.22.00 which would address this, so I'm
> reasonably confident the problem exists there as well) fixes the
> problem for me.  It disables processing of pending SIGHUPs at the start
> of rechist (and then restores on completion).
> 
> (It does this even if savehist isn't configured with lock; so it avoids
> starting a second write while the first one is in progress even in
> cases where it won't deadlock.)
> 
> (I did consider just having handle_pending_signals not redispatch
> phup() if one was already running, but it looks like the same deadlock
> could occur if a single SIGHUP arrived while the shell was saving the
> history for other reasons, although I haven't produced (or tried to
> produce) that behavior.)
> 
> --- tcsh-6.20.00.orig/sh.hist.c
> +++ tcsh-6.20.00/sh.hist.c
> @@ -1223,7 +1223,7 @@ void
> rechist(Char *fname, int ref)
> {
>     Char    *snum, *rs;
> -    int     fp, ftmp, oldidfds;
> +    int     fp, ftmp, oldidfds, phup_disabled_tmp;
>     struct varent *shist;
>     char path[MAXPATHLEN];
>     struct stat st;
> @@ -1231,6 +1231,10 @@ rechist(Char *fname, int ref)
> 
>     if (fname == NULL && !ref)
>        return;
> +
> +       phup_disabled_tmp = phup_disabled;
> +       phup_disabled = 1;
> +
>     /*
>      * If $savehist is just set, we use the value of $history
>      * else we use the value in $savehist
> @@ -1305,6 +1309,7 @@ rechist(Char *fname, int ref)
>     if (fp == -1) {
>        didfds = oldidfds;
>        cleanup_until(fname);
> +       phup_disabled = phup_disabled_tmp;
>        return;
>     }
>     /* Try to preserve ownership and permissions of the original history file */
> @@ -1325,6 +1330,7 @@ rechist(Char *fname, int ref)
>     didfds = oldidfds;
>     (void)rename(path, short2str(fname));
>     cleanup_until(fname);
> +    phup_disabled = phup_disabled_tmp;
> }
> 
> (As background, below is where/how I found this)
> 
> For me, this is occurring on Linux; and on systemd systems it's easy to
> recreate -- when systemd attempts to terminate a session, the shell
> often ends up getting two SIGHUPs in rapid succession (my assumption is
> that one is directly from systemd and another is a result of the parent
> sshd terminating).  I get get it to happen about 50% of the time with
> "systemctl stop session-XX.scope" when that session is an ssh
> connection that has a tcsh shell configured with "savehist = ( XXX
> merge lock )".  (I have a history of about 200 lines to write out.)
> 
> Obviously, it's racy ... sometimes the second SIGHUP is early enough,
> or late enough, to avoid the problem.
> 
> With the above fix, I can not reproduce the deadlock.
> 
>     -- Brett
> --
> Tcsh mailing list
> Tcsh at astron.com
> https://mailman.astron.com/mailman/listinfo/tcsh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/tcsh/attachments/20200120/f1c482ff/attachment.asc>