[Tcsh] "Readable" Unicode in setenv

Corinna Vinschen vinschen at redhat.com
Fri Dec 10 11:22:14 UTC 2021


On Dec  9 18:54, Kimmo Suominen wrote:
> On Thu, 9 Dec 2021 at 18:34, H.Merijn Brand <tcsh at tux.freedom.nl> wrote:
> > On Thu, 9 Dec 2021 13:47:52 +0200, Kimmo Suominen <kim at netbsd.org> wrote:
> > > What should we do about these differences:
> > >
> > > We do:
> > > - \uNNNNNN for unicode code points (six hex digits)
> >
> > exactly 6 or 2..6?
> 
> I think it is actually 1..6. Or was. I changed it to:
> 
> - \uNNNN (where NNNN is 1-4 hex digits)
> - \UNNNNNN (where NNNNNN is 1-6 hex digits)
> 
> This was to match what other shells do with \u -- they allow at most 4 digits.
> 
> > > - \xNN and \x{NNNN} for "ASCII" char in hex (how is NNNN ASCII?)
> >
> > in perl \xNN is 8-bit clean, \x{NNNN} can be \x{N} .. \x{NNNNNNNN}
> 
> I did not change \x{NNNN} but I documented it incorrectly: it indeed
> allows up to 8 digits, not just 4. I will fix the manual. Based on the
> earlier comments from Corinna, I have a feeling that this is broken
> for Cygwin and Windows, though...

If the final expression is a UTF-8 byte sequence, then that's ok.

If it just creates a 4 byte UTF-32, then that's broken on UTF-16 systems,
if the value is > 65535.  Bigger UTF-32 values have to be converted into
2 byte UTF-16 surrogate pairs, according to the following formula:

  if (sizeof (wchar_t) > 2 || utf_32_value < 0x10000UL)
    wchar_outputstring[pos++] = (wchar_t) utf_32_value;
  else
    {
      utf_32_value -= 0x10000UL;
      /* First the high surrogate */
      wchar_outputstring[pos++] = 0xd800 | (temp_val >> 10);
      /* next the low surrogate */
      wchar_outputstring[pos++] = 0xdc00 | (temp_val & 0x3ff);
    }


Corinna



More information about the Tcsh mailing list