[Tcsh] Multi-byte characters in promptchars
H.Merijn Brand
tcsh at tux.freedom.nl
Sat Sep 21 10:42:28 UTC 2024
On Fri, 20 Sep 2024 09:30:41 -0700, Amol Deshpande <amol.vinayak.deshpande at gmail.com> wrote:
> So, this is how the Windows version works and if you can tell me that this
> applies to Linux as well, I can hopefully whip up a patch over the weekend.
>
> If the prompt string being output is UTF-8, tcsh is still putting each byte
> into a 32-bit int when WIDE_CHAR is set.
>
> Therefore, it may actually be 2 Chars (or 3, or 4) in the prompt that are
> consumed for displaying an emoji or whatever.
> However, the code in ed.refresh.c (RefreshPromptpart) is written to assume
> each Char is an independent byte.
>
> So, in addition to wcwidth to find the length of the output, NLSWidthMB
> should also report how many Chars of the input string it consumed to
> represent the emoji. (line 318)
>
> We should then increment cp by "+= consumed" in the above function while
> looping through the buffer, instead of by just 1. (line 320)
>
> Does that sound right or am I missing something ?
Knowing Unicode as much as I do (way insufficient), than I/we will most
likely miss something.
That said, you could start a branch with what you think is correct, and
I will be more than happy to test that with my insane prompts
> -amol
>
> On Fri, Sep 20, 2024 at 7:38 AM Kimmo Suominen <kim at netbsd.org> wrote:
>
> > On Fri, Sep 20, 2024 at 10:46:49AM +0200, H.Merijn Brand wrote:
> > > With multibyte promptchars and a complicated prompt like
> > >
> > > promptchars
> > > prompt xyz%{\e[47;34m0392\e[0m%} %U%m:%u%{\e[1m%}%/ %h %{\e[0;38;2;255;24;0m%}%#%{\e[0m%}
> > >
> > > Positioning inside the line when editing still frequently messes up. A
> > > control-R fixes that, but I guess it should be smooth
> >
> > I think the issue is highlighted by this question I made:
> >
> > On Fri, 5 Apr 2024 12:47:05 +0300, Kimmo Suominen <kim at netbsd.org>
> > wrote:
> > > Why does it only apply to NLSCLASS_ILLEGAL2?
> >
> > I think the whole logic there across the NLSCLASS_ILLEGALn cases is
> > incorrect.
> >
> > > Do we have something already that correctly provides the rendering
> > > width of the character? Here is a Stack Overflow answer that points
> > > to using wcwidth(3) and wcswidth(3):
> > >
> > > https://stackoverflow.com/a/9145712/1511370
> > >
> > > https://man.netbsd.org/wcwidth.3
> > > https://man.netbsd.org/wcswidth.3
> >
> > I think the correct implementation needs to include the detection of the
> > rendering width of the characters.
> >
> > + Kimmo
--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.astron.com/pipermail/tcsh/attachments/20240921/e2011027/attachment.asc>
More information about the Tcsh
mailing list