[Tcsh] Multi-byte characters in promptchars

Sat Sep 21 23:39:10 UTC 2024

unfortunately, my experiment was not successful, so I have no clue how to
fix this issue  :(

On Sat, Sep 21, 2024 at 9:28 AM Amol Deshpande <
amol.vinayak.deshpande at gmail.com> wrote:

> Fair enough. I'll make a branch from my latest windows branch. It will
> serve as a proof-of-concept for the fix and if it works for you, I'll spend
> some time to clean up and merge everything properly back to upstream.
>
>
>
> On Sat, Sep 21, 2024 at 3:42 AM H.Merijn Brand <tcsh at tux.freedom.nl>
> wrote:
>
>> On Fri, 20 Sep 2024 09:30:41 -0700, Amol Deshpande <
>> amol.vinayak.deshpande at gmail.com> wrote:
>>
>> > So, this is how the Windows version works and if you can tell me that
>> this
>> > applies to Linux as well, I can hopefully whip up a patch over the
>> weekend.
>> >
>> > If the prompt string being output is UTF-8, tcsh is still putting each
>> byte
>> > into a 32-bit int when WIDE_CHAR is set.
>> >
>> > Therefore, it may actually be 2 Chars (or 3, or 4) in the prompt that
>> are
>> > consumed for displaying an emoji or whatever.
>> > However, the code in ed.refresh.c (RefreshPromptpart) is written to
>> assume
>> > each Char is an independent byte.
>> >
>> > So, in addition to wcwidth to find the length of the output, NLSWidthMB
>> > should also report how many Chars of the input string it consumed to
>> > represent  the emoji.  (line 318)
>> >
>> > We should then increment cp by "+= consumed" in the above function while
>> > looping through the buffer, instead of by just 1. (line 320)
>> >
>> > Does that sound right or am I missing something ?
>>
>> Knowing Unicode as much as I do (way insufficient), than I/we will most
>> likely miss something.
>>
>> That said, you could start a branch with what you think is correct, and
>> I will be more than happy to test that with my insane prompts
>>
>> > -amol
>> >
>> > On Fri, Sep 20, 2024 at 7:38 AM Kimmo Suominen <kim at netbsd.org> wrote:
>> >
>> > > On Fri, Sep 20, 2024 at 10:46:49AM +0200, H.Merijn Brand wrote:
>> > > > With multibyte promptchars and a complicated prompt like
>> > > >
>> > > >  promptchars ����
>> > > >  prompt      xyz%{\e[47;34m0392\e[0m%} %U%m:%u%{\e[1m%}%/ %h
>> %{\e[0;38;2;255;24;0m%}%#%{\e[0m%}
>> > > >
>> > > > Positioning inside the line when editing still frequently messes
>> up. A
>> > > > control-R fixes that, but I guess it should be smooth
>> > >
>> > > I think the issue is highlighted by this question I made:
>> > >
>> > > On Fri, 5 Apr 2024 12:47:05 +0300, Kimmo Suominen <kim at netbsd.org>
>> > > wrote:
>> > > > Why does it only apply to NLSCLASS_ILLEGAL2?
>> > >
>> > > I think the whole logic there across the NLSCLASS_ILLEGALn cases is
>> > > incorrect.
>> > >
>> > > > Do we have something already that correctly provides the rendering
>> > > > width of the character?  Here is a Stack Overflow answer that points
>> > > > to using wcwidth(3) and wcswidth(3):
>> > > >
>> > > >     https://stackoverflow.com/a/9145712/1511370
>> > > >
>> > > >     https://man.netbsd.org/wcwidth.3
>> > > >     https://man.netbsd.org/wcswidth.3
>> > >
>> > > I think the correct implementation needs to include the detection of
>> the
>> > > rendering width of the characters.
>> > >
>> > > + Kimmo
>>
>> --
>> H.Merijn Brand  https://tux.nl   Perl Monger   http://amsterdam.pm.org/
>> using perl5.00307 .. 5.37        porting perl5 on HP-UX, AIX, and Linux
>> https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.astron.com/pipermail/tcsh/attachments/20240921/bbffc186/attachment.htm>