[Tcsh] "Readable" Unicode in setenv

Jamie Landeg-Jones jamie at catflap.org
Tue Nov 16 10:52:58 UTC 2021


"H.Merijn Brand" <tcsh at tux.freedom.nl> wrote:

> I expect \u20AC (not interpreted), but yes, this is becoming a gray
> area where expectations might/will differ and DWIM is not the same
> for all.
>
> > I mean, quotes are optional for strings in shell, and that makes life complicated :-)
> > 
> > christos

Rather than altering anything that already exists, have you thought of adding
a new type, e.g. like the bourne shell format $'xxxx' ?

This would remove ambiguity, and also be easily memorable by those used to sh.
It already covers escaping, and unicode code-points:

>From the FreeBSD sh(1) man page:

 |    Quoting
 |      Quoting is used to remove the special meaning of certain characters or
 |      words to the shell, such as operators, whitespace, keywords, or alias
 |      names.
 | 
 |      There are four types of quoting: matched single quotes, dollar-single
 |      quotes, matched double quotes, and backslash.
 | 
 |      Single Quotes
 |              Enclosing characters in single quotes preserves the literal
 |              meaning of all the characters (except single quotes, making it
 |              impossible to put single-quotes in a single-quoted string).
 | 
 |      Dollar-Single Quotes
 |              Enclosing characters between $' and ' preserves the literal
 |              meaning of all characters except backslashes and single quotes.
 |              A backslash introduces a C-style escape sequence:
 | 
 |              \a          Alert (ring the terminal bell)
 | 
 |              \b          Backspace
 | 
 |              \cc         The control character denoted by ^c in stty(1).  If c
 |                          is a backslash, it must be doubled.
 | 
 |              \e          The ESC character (ASCII 0x1b)
 | 
 |              \f          Formfeed
 | 
 |              \n          Newline
 | 
 |              \r          Carriage return
 | 
 |              \t          Horizontal tab
 | 
 |              \v          Vertical tab
 | 
 |              \\          Literal backslash
 | 
 |              \'          Literal single-quote
 | 
 |              \"          Literal double-quote
 | 
 |              \nnn        The byte whose octal value is nnn (one to three
 |                          digits)
 | 
 |              \xnn        The byte whose hexadecimal value is nn (one or more
 |                          digits only the last two of which are used)
 | 
 |              \unnnn      The Unicode code point nnnn (four hexadecimal digits)
 | 
 |              \Unnnnnnnn  The Unicode code point nnnnnnnn (eight hexadecimal
 |                          digits)
 | 
 |              The sequences for Unicode code points are currently only useful
 |              with UTF-8 locales.  They reject code point 0 and UTF-16
 |              surrogates.
 | 
 |              If an escape sequence would produce a byte with value 0, that
 |              byte and the rest of the string until the matching single-quote
 |              are ignored.
 | 
 |              Any other string starting with a backslash is an error.
 | 


More information about the Tcsh mailing list