[Tcsh] tcsh Deadlock with SIGHUP
Brett Frankenberger
rbf at rbfnet.com
Wed Apr 8 01:47:25 UTC 2020
Your fix, which was different from mine, does successfully prevent
deadlocks. However, I have discovered a similar issue: If the STDIN
for the shell is closed (in my particular reproduction, the read() in
which the shell is waiting for input is returning EIO), and then a
SIGHUP is received shortly thereafter, there is a race condition that,
while it cannot deadlock, and result in .history_lock being left in
$HOME, causing hangs when future shells exit. (I am seeing this with
shutdowns of a Ubuntu 20.04 (beta release) server; the timing there
appears to be different than on my other servers.)
What appears to be happening is that the EIO is triggering the shell to
exit, and record() (after the done: label in main()) is being called.
>From within record(), recdirs() is called, and then .history_lock is
created and the history is written. While the history is being
written, the SIGHUP arrives and phup() ends up getting invoked.
phup()'s call to record() will not deadlock because the (again) fix
will cause record to immediately return; but ultimately the shell will
exit without finishing the original recdirs().
So the history will not have been completely written (the .history.XXX
temporary file will remain) and .history_lock will still exist. So (a)
the history is lost, and (b) the remaining .history_lock will hang
future shells that exit.
Separately, although I haven't experienced it, I also think the original
deadlock could occur if a "history -S" was in progress when a SIGHUP
was received. ("history -S" doesn't call record(), so rechist() would
be entered once as a result of "history -S" and again as a result of
phup(). The second entry will deadlock with the first one.)
And, finally, I think all of the issues described above can occur with
savedirs (although the symptoms are different since recdirs() doesn't
appear to use a lockfile or an intermediate temporary file), although I
don't personally use that feature and so have not experienced any
problems there.
Below is the patch I have applied to my tcsh. It includes three
things:
(1) Changes to sh.c to include temporarily disabling phup processing
while in record(). This is intended to fix the issue with a SIGHUP
being received while in record() while the shell is exiting for some
other reason, resulting in the original rechist() (and/or recdirs())
not completing and leaving behind a lock file (in the case of rechist)
or a partially written dirs file (in the case of recdirs)).
(2) Reapplies my original fix (from below) to sh.hist.c to prevent a
deadlock when a SIGHUP is received while processing "history -S".
(3) A similar fix (for a similar reason -- although it won't deadlock
since there's no lock file option) to sh.dirs.c.
That's probably not the only way to fix things. And (1) really doesn't
accomplish much if (2) and (3) are implemented. (And instead of my
patch to sh.c, I wonder if phup processing should just be disabled once
done: is reached in main() ... it looks like the code attempts to
cleanup any pending SIGHUPs there, but doesn't really account for the
race condition if a SIGHUP is received right after
handle_pending_signals() returns).
Anyway, lots of options, but would be nice to get at least (1) or (2)
implemented, or something else that fixes the same issue, to prevent
the problem I am actually seeing on Ubuntu 20.04.
Thanks ...
-- Brett
========= PATCHES ============
Index: tcsh-6.22.02/sh.c
===================================================================
--- tcsh-6.22.02.orig/sh.c
+++ tcsh-6.22.02/sh.c
@@ -2521,14 +2521,21 @@ static void
record(void)
{
static int again = 0;
+ int phup_disabled_tmp;
+
if (again++)
return;
+ phup_disabled_tmp = phup_disabled;
+ phup_disabled = 1;
+
if (!fast) {
recdirs(NULL, adrof(STRsavedirs) != NULL);
rechist(NULL, adrof(STRsavehist) != NULL);
}
displayHistStats("Exiting"); /* no-op unless DEBUG_HIST */
+
+ phup_disabled = phup_disabled_tmp;
}
/*
Index: tcsh-6.22.02/sh.hist.c
===================================================================
--- tcsh-6.22.02.orig/sh.hist.c
+++ tcsh-6.22.02/sh.hist.c
@@ -1219,7 +1219,7 @@ void
rechist(Char *fname, int ref)
{
Char *snum, *rs;
- int fp, ftmp, oldidfds;
+ int fp, ftmp, oldidfds, phup_disabled_tmp;
struct varent *shist;
char path[MAXPATHLEN];
struct stat st;
@@ -1227,6 +1227,10 @@ rechist(Char *fname, int ref)
if (fname == NULL && !ref)
return;
+
+ phup_disabled_tmp = phup_disabled;
+ phup_disabled = 1;
+
/*
* If $savehist is just set, we use the value of $history
* else we use the value in $savehist
@@ -1305,6 +1309,7 @@ rechist(Char *fname, int ref)
if (fp == -1) {
didfds = oldidfds;
cleanup_until(fname);
+ phup_disabled = phup_disabled_tmp;
return;
}
/* Try to preserve ownership and permissions of the original history file */
@@ -1329,6 +1334,7 @@ rechist(Char *fname, int ref)
(void)ReplaceFile( short2str(fname),path,NULL,0,NULL,NULL);
#endif
cleanup_until(fname);
+ phup_disabled = phup_disabled_tmp;
}
Index: tcsh-6.22.02/sh.dir.c
===================================================================
--- tcsh-6.22.02.orig/sh.dir.c
+++ tcsh-6.22.02/sh.dir.c
@@ -1356,7 +1356,7 @@ loaddirs(Char *fname)
void
recdirs(Char *fname, int def)
{
- int fp, ftmp, oldidfds;
+ int fp, ftmp, oldidfds, phup_disabled_tmp;
int cdflag = 0;
struct directory *dp;
unsigned int num;
@@ -1366,6 +1366,9 @@ recdirs(Char *fname, int def)
if (fname == NULL && !def)
return;
+ phup_disabled_tmp = phup_disabled;
+ phup_disabled = 1;
+
if (fname == NULL) {
if ((fname = varval(STRdirsfile)) == STRNULL)
fname = Strspl(varval(STRhome), &STRtildotdirs[1]);
@@ -1378,6 +1381,7 @@ recdirs(Char *fname, int def)
if ((fp = xcreat(short2str(fname), 0600)) == -1) {
cleanup_until(fname);
+ phup_disabled = phup_disabled_tmp;
return;
}
@@ -1413,4 +1417,5 @@ recdirs(Char *fname, int def)
SHOUT = ftmp;
didfds = oldidfds;
cleanup_until(fname);
+ phup_disabled = phup_disabled_tmp;
}
======== PREVIOUS MESSAGES ==========
On Mon, Jan 20, 2020 at 11:40:49AM -0500, Christos Zoulas wrote:
> Thanks, it should be fixed now.
>
> christos
>
> > On Jan 20, 2020, at 9:08 AM, Brett Frankenberger <rbf at rbfnet.com> wrote:
> >
> > (Resending what I posted last August ... let me know if there's
> > anything I can do to get this (or a differnet fix for the same issue)
> > into the tree.)
> >
> > tcsh can deadlock with itself if savehist is confgured with "merge" and
> > "lock", and two SIGHUPs are received in rapid succession. The
> > mechanism of the deadlock is the first SIGHUP triggers a rechist() and
> > while that rechist() is executing (and after it has created the lock
> > file), the second SIGHUP triggers a another rechist() which then waits
> > forever for the lock the the first rechist() created to be released
> > (which will never happen).
> >
> > A backtrace from when it's deadlocked:
> >
> > #1 0x00007fe3a48f7877 in usleep (useconds=useconds at entry=100000)
> > at ../sysdeps/posix/usleep.c:32
> > #2 0x000055c7b9368974 in dot_lock (
> > fname=fname at entry=0x55c7ba174540 "/home/rbf/.history",
> > pollinterval=pollinterval at entry=100) at dotlock.c:166
> > #3 0x000055c7b935950f in rechist (fname=0x55c7ba1e5960L"/home/rbf/.history",
> > ref=<optimized out>) at sh.hist.c:1293
> > #4 0x000055c7b9344cc0 in record () at sh.c:2512
> > #5 0x000055c7b9346b29 in phup () at sh.c:1842
> > #6 0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
> > #7 0x000055c7b935ec53 in xwrite (fildes=3,
> > buf=buf at entry=0x55c7b95b6e00 <linbuf>, nbyte=12) at sh.misc.c:690
> > #8 0x000055c7b9360104 in flush () at sh.print.c:260
> > #9 0x000055c7b9387219 in doprnt (addchar=0x55c7b9360390 <xputchar>,
> > sfmt=sfmt at entry=0x55c7b938d0ad "%S", ap=ap at entry=0x7ffc4fb9cd60)
> > at tc.printf.c:294
> > #10 0x000055c7b9387823 in xprintf (fmt=fmt at entry=0x55c7b938d0ad "%S")
> > at tc.printf.c:392
> > #11 0x000055c7b935b392 in prlex (sp0=sp0 at entry=0x55c7ba17efc0) at
> > sh.lex.c:228
> > #12 0x000055c7b9358510 in phist (hp=0x55c7ba17efc0, hflg=<optimized out>)
> > at sh.hist.c:1071
> > #13 0x000055c7b93596d3 in dophist (hflg=65, n=200) at sh.hist.c:1114
> > #14 dohist (vp=<optimized out>, c=<optimized out>) at sh.hist.c:1177
> > #15 0x000055c7b93593f7 in rechist (fname=0x55c7ba1dade0
> > #L"/home/rbf/.history",
> > ref=<optimized out>) at sh.hist.c:1322
> > #16 0x000055c7b9344cc0 in record () at sh.c:2512
> > #17 0x000055c7b9346b29 in phup () at sh.c:1842
> > #18 0x000055c7b93895a6 in handle_pending_signals () at tc.sig.c:72
> > #19 0x000055c7b935ebb3 in xread (fildes=16, buf=buf at entry=0x7ffc4fb9e030,
> > nbyte=nbyte at entry=1) at sh.misc.c:662
> >
> > This patch (which is to 6.20.00 ... but there's doesn't appear to be
> > anything in 6.21.00 or 6.22.00 which would address this, so I'm
> > reasonably confident the problem exists there as well) fixes the
> > problem for me. It disables processing of pending SIGHUPs at the start
> > of rechist (and then restores on completion).
> >
> > (It does this even if savehist isn't configured with lock; so it avoids
> > starting a second write while the first one is in progress even in
> > cases where it won't deadlock.)
> >
> > (I did consider just having handle_pending_signals not redispatch
> > phup() if one was already running, but it looks like the same deadlock
> > could occur if a single SIGHUP arrived while the shell was saving the
> > history for other reasons, although I haven't produced (or tried to
> > produce) that behavior.)
> >
> > --- tcsh-6.20.00.orig/sh.hist.c
> > +++ tcsh-6.20.00/sh.hist.c
> > @@ -1223,7 +1223,7 @@ void
> > rechist(Char *fname, int ref)
> > {
> > Char *snum, *rs;
> > - int fp, ftmp, oldidfds;
> > + int fp, ftmp, oldidfds, phup_disabled_tmp;
> > struct varent *shist;
> > char path[MAXPATHLEN];
> > struct stat st;
> > @@ -1231,6 +1231,10 @@ rechist(Char *fname, int ref)
> >
> > if (fname == NULL && !ref)
> > return;
> > +
> > + phup_disabled_tmp = phup_disabled;
> > + phup_disabled = 1;
> > +
> > /*
> > * If $savehist is just set, we use the value of $history
> > * else we use the value in $savehist
> > @@ -1305,6 +1309,7 @@ rechist(Char *fname, int ref)
> > if (fp == -1) {
> > didfds = oldidfds;
> > cleanup_until(fname);
> > + phup_disabled = phup_disabled_tmp;
> > return;
> > }
> > /* Try to preserve ownership and permissions of the original history file */
> > @@ -1325,6 +1330,7 @@ rechist(Char *fname, int ref)
> > didfds = oldidfds;
> > (void)rename(path, short2str(fname));
> > cleanup_until(fname);
> > + phup_disabled = phup_disabled_tmp;
> > }
> >
> > (As background, below is where/how I found this)
> >
> > For me, this is occurring on Linux; and on systemd systems it's easy to
> > recreate -- when systemd attempts to terminate a session, the shell
> > often ends up getting two SIGHUPs in rapid succession (my assumption is
> > that one is directly from systemd and another is a result of the parent
> > sshd terminating). I get get it to happen about 50% of the time with
> > "systemctl stop session-XX.scope" when that session is an ssh
> > connection that has a tcsh shell configured with "savehist = ( XXX
> > merge lock )". (I have a history of about 200 lines to write out.)
> >
> > Obviously, it's racy ... sometimes the second SIGHUP is early enough,
> > or late enough, to avoid the problem.
> >
> > With the above fix, I can not reproduce the deadlock.
> >
> > -- Brett
> > --
> > Tcsh mailing list
> > Tcsh at astron.com
> > https://mailman.astron.com/mailman/listinfo/tcsh
>
> --
> Tcsh mailing list
> Tcsh at astron.com
> https://mailman.astron.com/mailman/listinfo/tcsh
More information about the Tcsh
mailing list