[Tcsh] Multi-byte characters in promptchars

Amol Deshpande amol.vinayak.deshpande at gmail.com
Sat Sep 21 16:28:31 UTC 2024


Fair enough. I'll make a branch from my latest windows branch. It will
serve as a proof-of-concept for the fix and if it works for you, I'll spend
some time to clean up and merge everything properly back to upstream.



On Sat, Sep 21, 2024 at 3:42 AM H.Merijn Brand <tcsh at tux.freedom.nl> wrote:

> On Fri, 20 Sep 2024 09:30:41 -0700, Amol Deshpande <
> amol.vinayak.deshpande at gmail.com> wrote:
>
> > So, this is how the Windows version works and if you can tell me that
> this
> > applies to Linux as well, I can hopefully whip up a patch over the
> weekend.
> >
> > If the prompt string being output is UTF-8, tcsh is still putting each
> byte
> > into a 32-bit int when WIDE_CHAR is set.
> >
> > Therefore, it may actually be 2 Chars (or 3, or 4) in the prompt that are
> > consumed for displaying an emoji or whatever.
> > However, the code in ed.refresh.c (RefreshPromptpart) is written to
> assume
> > each Char is an independent byte.
> >
> > So, in addition to wcwidth to find the length of the output, NLSWidthMB
> > should also report how many Chars of the input string it consumed to
> > represent  the emoji.  (line 318)
> >
> > We should then increment cp by "+= consumed" in the above function while
> > looping through the buffer, instead of by just 1. (line 320)
> >
> > Does that sound right or am I missing something ?
>
> Knowing Unicode as much as I do (way insufficient), than I/we will most
> likely miss something.
>
> That said, you could start a branch with what you think is correct, and
> I will be more than happy to test that with my insane prompts
>
> > -amol
> >
> > On Fri, Sep 20, 2024 at 7:38 AM Kimmo Suominen <kim at netbsd.org> wrote:
> >
> > > On Fri, Sep 20, 2024 at 10:46:49AM +0200, H.Merijn Brand wrote:
> > > > With multibyte promptchars and a complicated prompt like
> > > >
> > > >  promptchars ����
> > > >  prompt      xyz%{\e[47;34m0392\e[0m%} %U%m:%u%{\e[1m%}%/ %h
> %{\e[0;38;2;255;24;0m%}%#%{\e[0m%}
> > > >
> > > > Positioning inside the line when editing still frequently messes up.
> A
> > > > control-R fixes that, but I guess it should be smooth
> > >
> > > I think the issue is highlighted by this question I made:
> > >
> > > On Fri, 5 Apr 2024 12:47:05 +0300, Kimmo Suominen <kim at netbsd.org>
> > > wrote:
> > > > Why does it only apply to NLSCLASS_ILLEGAL2?
> > >
> > > I think the whole logic there across the NLSCLASS_ILLEGALn cases is
> > > incorrect.
> > >
> > > > Do we have something already that correctly provides the rendering
> > > > width of the character?  Here is a Stack Overflow answer that points
> > > > to using wcwidth(3) and wcswidth(3):
> > > >
> > > >     https://stackoverflow.com/a/9145712/1511370
> > > >
> > > >     https://man.netbsd.org/wcwidth.3
> > > >     https://man.netbsd.org/wcswidth.3
> > >
> > > I think the correct implementation needs to include the detection of
> the
> > > rendering width of the characters.
> > >
> > > + Kimmo
>
> --
> H.Merijn Brand  https://tux.nl   Perl Monger   http://amsterdam.pm.org/
> using perl5.00307 .. 5.37        porting perl5 on HP-UX, AIX, and Linux
> https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.astron.com/pipermail/tcsh/attachments/20240921/7dd90030/attachment.htm>


More information about the Tcsh mailing list