[Tcsh] tcsh Deadlock with SIGHUP
Christos Zoulas
christos at zoulas.com
Mon Jan 20 16:40:49 UTC 2020
Thanks, it should be fixed now.
christos
> On Jan 20, 2020, at 9:08 AM, Brett Frankenberger <rbf at rbfnet.com> wrote:
>
> (Resending what I posted last August ... let me know if there's
> anything I can do to get this (or a differnet fix for the same issue)
> into the tree.)
>
> tcsh can deadlock with itself if savehist is confgured with "merge" and
> "lock", and two SIGHUPs are received in rapid succession. The
> mechanism of the deadlock is the first SIGHUP triggers a rechist() and
> while that rechist() is executing (and after it has created the lock
> file), the second SIGHUP triggers a another rechist() which then waits
> forever for the lock the the first rechist() created to be released
> (which will never happen).
>
> A backtrace from when it's deadlocked:
>
> #1 0x00007fe3a48f7877 in usleep (useconds=useconds at entry=100000)
> at ../sysdeps/posix/usleep.c:32
> #2 0x000055c7b9368974 in dot_lock (
> fname=fname at entry=0x55c7ba174540 "/home/rbf/.history",
> pollinterval=pollinterval at entry=100) at dotlock.c:166
> #3 0x000055c7b935950f in rechist (fname=0x55c7ba1e5960L"/home/rbf/.history",
> ref=<optimized out>) at sh.hist.c:1293
> #4 0x000055c7b9344cc0 in record () at sh.c:2512
> #5 0x000055c7b9346b29 in phup () at sh.c:1842
> #6 0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
> #7 0x000055c7b935ec53 in xwrite (fildes=3,
> buf=buf at entry=0x55c7b95b6e00 <linbuf>, nbyte=12) at sh.misc.c:690
> #8 0x000055c7b9360104 in flush () at sh.print.c:260
> #9 0x000055c7b9387219 in doprnt (addchar=0x55c7b9360390 <xputchar>,
> sfmt=sfmt at entry=0x55c7b938d0ad "%S", ap=ap at entry=0x7ffc4fb9cd60)
> at tc.printf.c:294
> #10 0x000055c7b9387823 in xprintf (fmt=fmt at entry=0x55c7b938d0ad "%S")
> at tc.printf.c:392
> #11 0x000055c7b935b392 in prlex (sp0=sp0 at entry=0x55c7ba17efc0) at
> sh.lex.c:228
> #12 0x000055c7b9358510 in phist (hp=0x55c7ba17efc0, hflg=<optimized out>)
> at sh.hist.c:1071
> #13 0x000055c7b93596d3 in dophist (hflg=65, n=200) at sh.hist.c:1114
> #14 dohist (vp=<optimized out>, c=<optimized out>) at sh.hist.c:1177
> #15 0x000055c7b93593f7 in rechist (fname=0x55c7ba1dade0
> #L"/home/rbf/.history",
> ref=<optimized out>) at sh.hist.c:1322
> #16 0x000055c7b9344cc0 in record () at sh.c:2512
> #17 0x000055c7b9346b29 in phup () at sh.c:1842
> #18 0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
> #19 0x000055c7b935ebb3 in xread (fildes=16, buf=buf at entry=0x7ffc4fb9e030,
> nbyte=nbyte at entry=1) at sh.misc.c:662
>
> This patch (which is to 6.20.00 ... but there's doesn't appear to be
> anything in 6.21.00 or 6.22.00 which would address this, so I'm
> reasonably confident the problem exists there as well) fixes the
> problem for me. It disables processing of pending SIGHUPs at the start
> of rechist (and then restores on completion).
>
> (It does this even if savehist isn't configured with lock; so it avoids
> starting a second write while the first one is in progress even in
> cases where it won't deadlock.)
>
> (I did consider just having handle_pending_signals not redispatch
> phup() if one was already running, but it looks like the same deadlock
> could occur if a single SIGHUP arrived while the shell was saving the
> history for other reasons, although I haven't produced (or tried to
> produce) that behavior.)
>
> --- tcsh-6.20.00.orig/sh.hist.c
> +++ tcsh-6.20.00/sh.hist.c
> @@ -1223,7 +1223,7 @@ void
> rechist(Char *fname, int ref)
> {
> Char *snum, *rs;
> - int fp, ftmp, oldidfds;
> + int fp, ftmp, oldidfds, phup_disabled_tmp;
> struct varent *shist;
> char path[MAXPATHLEN];
> struct stat st;
> @@ -1231,6 +1231,10 @@ rechist(Char *fname, int ref)
>
> if (fname == NULL && !ref)
> return;
> +
> + phup_disabled_tmp = phup_disabled;
> + phup_disabled = 1;
> +
> /*
> * If $savehist is just set, we use the value of $history
> * else we use the value in $savehist
> @@ -1305,6 +1309,7 @@ rechist(Char *fname, int ref)
> if (fp == -1) {
> didfds = oldidfds;
> cleanup_until(fname);
> + phup_disabled = phup_disabled_tmp;
> return;
> }
> /* Try to preserve ownership and permissions of the original history file */
> @@ -1325,6 +1330,7 @@ rechist(Char *fname, int ref)
> didfds = oldidfds;
> (void)rename(path, short2str(fname));
> cleanup_until(fname);
> + phup_disabled = phup_disabled_tmp;
> }
>
> (As background, below is where/how I found this)
>
> For me, this is occurring on Linux; and on systemd systems it's easy to
> recreate -- when systemd attempts to terminate a session, the shell
> often ends up getting two SIGHUPs in rapid succession (my assumption is
> that one is directly from systemd and another is a result of the parent
> sshd terminating). I get get it to happen about 50% of the time with
> "systemctl stop session-XX.scope" when that session is an ssh
> connection that has a tcsh shell configured with "savehist = ( XXX
> merge lock )". (I have a history of about 200 lines to write out.)
>
> Obviously, it's racy ... sometimes the second SIGHUP is early enough,
> or late enough, to avoid the problem.
>
> With the above fix, I can not reproduce the deadlock.
>
> -- Brett
> --
> Tcsh mailing list
> Tcsh at astron.com
> https://mailman.astron.com/mailman/listinfo/tcsh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/tcsh/attachments/20200120/f1c482ff/attachment.asc>
More information about the Tcsh
mailing list