[Tcsh] Multi-byte characters in promptchars

Sat Sep 21 10:42:28 UTC 2024

On Fri, 20 Sep 2024 09:30:41 -0700, Amol Deshpande <amol.vinayak.deshpande at gmail.com> wrote:

> So, this is how the Windows version works and if you can tell me that this
> applies to Linux as well, I can hopefully whip up a patch over the weekend.
> 
> If the prompt string being output is UTF-8, tcsh is still putting each byte
> into a 32-bit int when WIDE_CHAR is set.
> 
> Therefore, it may actually be 2 Chars (or 3, or 4) in the prompt that are
> consumed for displaying an emoji or whatever.
> However, the code in ed.refresh.c (RefreshPromptpart) is written to assume
> each Char is an independent byte.
> 
> So, in addition to wcwidth to find the length of the output, NLSWidthMB
> should also report how many Chars of the input string it consumed to
> represent  the emoji.  (line 318)
> 
> We should then increment cp by "+= consumed" in the above function while
> looping through the buffer, instead of by just 1. (line 320)
> 
> Does that sound right or am I missing something ?

Knowing Unicode as much as I do (way insufficient), than I/we will most
likely miss something.

That said, you could start a branch with what you think is correct, and
I will be more than happy to test that with my insane prompts

> -amol
> 
> On Fri, Sep 20, 2024 at 7:38 AM Kimmo Suominen <kim at netbsd.org> wrote:
> 
> > On Fri, Sep 20, 2024 at 10:46:49AM +0200, H.Merijn Brand wrote:  
> > > With multibyte promptchars and a complicated prompt like
> > >
> > >  promptchars ����
> > >  prompt      xyz%{\e[47;34m0392\e[0m%} %U%m:%u%{\e[1m%}%/ %h %{\e[0;38;2;255;24;0m%}%#%{\e[0m%}  
> > >
> > > Positioning inside the line when editing still frequently messes up. A
> > > control-R fixes that, but I guess it should be smooth  
> >
> > I think the issue is highlighted by this question I made:
> >
> > On Fri, 5 Apr 2024 12:47:05 +0300, Kimmo Suominen <kim at netbsd.org>
> > wrote:  
> > > Why does it only apply to NLSCLASS_ILLEGAL2?  
> >
> > I think the whole logic there across the NLSCLASS_ILLEGALn cases is
> > incorrect.
> >  
> > > Do we have something already that correctly provides the rendering
> > > width of the character?  Here is a Stack Overflow answer that points
> > > to using wcwidth(3) and wcswidth(3):
> > >
> > >     https://stackoverflow.com/a/9145712/1511370
> > >
> > >     https://man.netbsd.org/wcwidth.3
> > >     https://man.netbsd.org/wcswidth.3  
> >
> > I think the correct implementation needs to include the detection of the
> > rendering width of the characters.
> >
> > + Kimmo

-- 
H.Merijn Brand  https://tux.nl   Perl Monger   http://amsterdam.pm.org/
using perl5.00307 .. 5.37        porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.astron.com/pipermail/tcsh/attachments/20240921/e2011027/attachment.asc>