[Tcsh] "Readable" Unicode in setenv
Kimmo Suominen
kim at netbsd.org
Thu Dec 9 13:28:45 UTC 2021
On Thu, 9 Dec 2021 at 14:21, Corinna Vinschen <vinschen at redhat.com> wrote:
> > We do:
> > - \uNNNNNN for unicode code points (six hex digits)
> > - \xNN and \x{NNNN} for "ASCII" char in hex (how is NNNN ASCII?)
>
> \x{NNNN} looks like an extension to suport DBCS.
Ok, so maybe ignoring all but the last two hex digits was limiting
here. I wonder, though, if we should allow an arbitrary number in the
case without curly braces. Currently the hex digit parsing is limited
to 8 digits (based on a cursory look at the code).
> > FreeBSD sh(1) documents these:
> > - \uNNNN and \uNNNNNNNN for unicode code points (four and eight hex digits)
>
> The second one uses \U, not \u. Ideally tcsh uses the same 4 and 8 hex
> digit expressions. \uNNNN is sufficient for the base plane, \U is only
> required for the higher values. These would have to be converted into a
> surrogate pair on systems with sizeof(wchar_t) == 2, e. g., native
> Windows and Cygwin.
Right, the second form was \U (uppercase) -- sorry for the misquote.
Merijn -- would you perhaps have an interest and some time for the
necessary modifications to parseescape?
Also, I apparently got something wrong in the memory management in the
code added to Dgetdol(), as after having used $'' the shell would
segfault after some more commands were run. I've made another commit
that appears to have fixed it for me. (Ideally I would understand if
addla() needs to have its argument Strsave()'d or not, but I really
only added that based on reading existing code.)
Fix for SIGSEGV:
https://github.com/tcsh-org/tcsh/commit/f40fead48a6b7c59b2f1e5bf7f5a5b9417277b97
Christos, could you please have a look and let me know if something is
still not right?
Thanks,
+ Kimmo
More information about the Tcsh
mailing list