[Tcsh] "Readable" Unicode in setenv

Thu Dec 9 13:28:45 UTC 2021

On Thu, 9 Dec 2021 at 14:21, Corinna Vinschen <vinschen at redhat.com> wrote:
> > We do:
> > - \uNNNNNN for unicode code points (six hex digits)
> > - \xNN and \x{NNNN} for "ASCII" char in hex (how is NNNN ASCII?)
>
> \x{NNNN} looks like an extension to suport DBCS.

Ok, so maybe ignoring all but the last two hex digits was limiting
here. I wonder, though, if we should allow an arbitrary number in the
case without curly braces. Currently the hex digit parsing is limited
to 8 digits (based on a cursory look at the code).

> > FreeBSD sh(1) documents these:
> > - \uNNNN and \uNNNNNNNN for unicode code points (four and eight hex digits)
>
> The second one uses \U, not \u.  Ideally tcsh uses the same 4 and 8 hex
> digit expressions.  \uNNNN is sufficient for the base plane, \U is only
> required for the higher values.  These would have to be converted into a
> surrogate pair on systems with sizeof(wchar_t) == 2, e. g., native
> Windows and Cygwin.

Right, the second form was \U (uppercase) -- sorry for the misquote.

Merijn -- would you perhaps have an interest and some time for the
necessary modifications to parseescape?

Also, I apparently got something wrong in the memory management in the
code added to Dgetdol(), as after having used $'' the shell would
segfault after some more commands were run. I've made another commit
that appears to have fixed it for me. (Ideally I would understand if
addla() needs to have its argument Strsave()'d or not, but I really
only added that based on reading existing code.)

Fix for SIGSEGV:
https://github.com/tcsh-org/tcsh/commit/f40fead48a6b7c59b2f1e5bf7f5a5b9417277b97

Christos, could you please have a look and let me know if something is
still not right?

Thanks,
+ Kimmo