[Tcsh] Multi-byte characters in promptchars

Kimmo Suominen kim at netbsd.org
Fri Apr 12 09:15:18 UTC 2024


On Fri, Apr 05, 2024 at 12:08:59PM +0200, H.Merijn Brand wrote:
> That was not the tone I intended. I'm a volunteer myself and I did not
> see it as a complaint. Sorry if that was not clear!

Thank you for clarifying — I appreciate it.

> On Fri, 5 Apr 2024 12:47:05 +0300, Kimmo Suominen <kim at netbsd.org> wrote:
> > I think this part of the commit is your proposed fix:
> > 
> > diff --git a/ed.refresh.c b/ed.refresh.c
> > index f1913801..bc902f5e 100644
> > --- a/ed.refresh.c
> > +++ b/ed.refresh.c
> > @@ -1155,6 +1160,8 @@ CalcPosition(int w, int th, int *h, int *v)
> >  	    *h += 4;
> >  	    break;
> >  	case NLSCLASS_ILLEGAL2:
> > +	    *h += NLSCLASS_ILLEGAL_SIZE(w);
> > +	    break;
> >  	case NLSCLASS_ILLEGAL3:
> >  	case NLSCLASS_ILLEGAL4:
> >  	case NLSCLASS_ILLEGAL5:
> > 
> > Why does it only apply to NLSCLASS_ILLEGAL2?
> 
> Because that was the smallest change required to make "it work", and I
> do not understand the underlying internals, so keeping the scope as
> small as possible was a way to do it the safest way possible.

I gave this a try.  I noticed that the penguin is rendering across
two columns.  If you use a character that renders in a single screen
position, the cursor is placed off by one.  While this could be
considered better than the original, which will be off by several
column positions, it is clearly not correct.

You can reproduce with the hwair character:

    set promptchars=$'\U10348#'

versus the penguin character:

    set promptchars=$'\U1F427#'

Do we have something already that correctly provides the rendering width
of the character?  Here is a Stack Overflow answer that points to using
wcwidth(3) and wcswidth(3):

    https://stackoverflow.com/a/9145712/1511370

    https://man.netbsd.org/wcwidth.3
    https://man.netbsd.org/wcswidth.3

I'm still not at all clear about the meanings of NSLCLASS_ILLEGAL*, but
I'm guessing it is about the number of bytes taken to represent each
character.  Which does not appear to equate with the rendering width of
the characters.

And then should we also handle combining characters?  What if I wanted
the Finnish flag in my prompt?

    echo $'\U1F1EB\U1F1EE'

Cheers,
+ Kimmo



More information about the Tcsh mailing list