[File] [PATCH] of Magdir/wordprocessors for Corel WordPerfect User Word List *.UWL *.SAV
Christos Zoulas
christos at zoulas.com
Tue Aug 16 12:06:02 UTC 2022
Added, thanks!
christos
> On Aug 13, 2022, at 7:47 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some days ago i send patch for Word Perfect CBT samples. These are
> found in sub directory WritingTools inside Word Perfect program
> directory "c:\Program Files (x86)\Corel\WordPerfect Office 2021".
> In the sub directory there exist more similar files but with other
> file name extensions like adv, hyd, icr, lex, mor and sav.
>
> The dozens of SAV examples have file names are like: Wt13de.sav
> Wt13dk.sav Wt13fr.sav Wt13it.sav Wt13nl.sav wt13ru.sav WT21us.sav
> WT21DE.UWL WT21fr.sav WT21it.sav Wtcz.sav
>
> These start with 2 letter phrase WT. That apparently is the
> abbreviation for Writing Tools. The starting letter are often
> followed by digits which corresponds to Word Perfect version. For
> version 2021 this digits are 21 and for an older version i found
> digits 13. And on web page sample 15 is used. The last capitals
> correspond to used language. For Germany DE is used. For French FR is
> used. For Italy IT is used and so on. For USA English US is used and
> for English EN is used. Here the file name extension is HWL is used
> whereas for all other languages it is SAV.
>
> The SAV extension is apparently used for the "basic" or "template"
> variant of a word list, that is provided by software producer. So
> when the user "activate" a word list for a distinct language, then
> apparently the SAV example is copied to WritingTools sub directory
> inside the WordPerfect folder in his user directory and gets here
> file name extension UWL. Obviously UWL is the abbreviation for user
> word list. That this true can be seen by looking at the md5 sum. This
> are the same at the beginning like 8837a6e4bfabfceba68901e30339c48b
> for WT21us.sav and WT21US.UWL samples. The user can add, disable,
> remove, or change his personal word lists ( that is the UWL variant).
> Then of course the check sums get different.
>
> When running file command (version 5.42) on such examples i get an
> output like:
>
> wt13de.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT13DE.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> wt13en.hwl: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> Wt13it.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> wt13pl.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> wt13ru.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> Wt13us.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21en.hwl: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21it.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21us.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21US.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WTUS.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
>
> With --extension option only ??? is displayed. Furthermore with -i
> option for my samples only generic application/octet-stream is shown.
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
> such examples with low rate as "WordPerfect (generic)" by
> wp-generic.trid.xml and the examples are described with high rate
> as "Corel Writing Tools User Word List" by uwl-wp.trid.xml (See
> appended trid-v-wordperfect-adv.txt.gz).
>
> Unfortunately i found no information especially about file format
> specification about such WordPerfect files. I find in support area on
> Corel web site a page about Writing Tools does not work. So i choose
> that page as reference. That is expressed inside
> Magdir/wordprocessors by comment lines like:
>
> # URL: https://support.corel.com/hc/en-us/articles/
> # 215876258-Writing-Tools-Spell-Check-Dictionary
> # -does-not-work-in-WordPerfect-X5
> # http://wordperfect.helpmax.net/en/
> # editing-and-formatting-documents/
> # using-the-writing-tools/working-with-user-word-lists/
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/u/uwl-wp.trid.xml
>
> The description happens inside Magdir/wordprocessors by starting like:
> 0 string \xffWPC
> So we see that the first 4 bytes are the generic magic for all
> WordPerfect samples. By bytes at offset 8 and 9 sub classification is
> done. If sub class is not known as last step the sub class for every
> thing else is shown by lines like:
>> 8 default x
>>> 8 byte x Unknown Corel/Wordperfect product %d,
>>>> 9 byte x file type %d
>
> So for my word list examples i must insert before lines like:
>> 8 byte 32
>>> 9 byte 10 Corel Writing Tools User Word List
> !:mime application/x-wordperfect-wordlist
> !:ext uwl/hwl/sav
> Instead of generic mime type application/octet-stream i show an user
> defined one.
>
> According to unofficial WordPerfect File Format description found as
> WPFF_DocumentStructure.htm at offset 20 the file size (not including
> pad characters at EOF) is stored as 4 byte little endian integer. So
> show that additional information by line like:
>>>> 0x14 uleshort x \b, %u bytes
>
> At offset 4 pointer to document area is stored as 4 byte little
> endian integer. In my examples this value was always 200 hexadecimal.
> That is right after after extended header. So show this information
> for unusual cases by line like:
>>>> 4 ulelong !0x200 \b, at %#x document area
>
> So at offset 512 the document area begin. Obviously it contains often
> UTF-16 little endian encoded strings and maybe some additional
> information like string length and more. In the page about Working
> with user word lists the following is written: If you often type
> incorrectly, you can define the correctly spelled word as a
> replacement. For example, you can specify "the" as a replacement for
> "hte". You can replace abbreviations or acronyms with words or
> phrases. For example, you can define Chief Executive Officer as the
> replacement for the acronym CEO. So when we look in output of file
> command with my patches we see something like acsesory/accessory in
> USA version, abre/aber in German version and uan/una in Italian
> version. So show beginning/excerpt part of wordlist by lines like:
>>>> (4.l) ubyte x
>>>>> &0 search/91/sb a\0
>>>>>> &0 lestring16 x (...%-.33s...)
> So first jump to document area, then look for beginning of word list
> starting mostly with letter a encoded as UTF-16 little endian and
> then showing this as UTF string.
>
> After applying the above mentioned modifications by patch
> file-5.42-wordprocessors-uwl.diff then i get a more precise output
> like:
>
> Wt13de.sav: Corel Writing Tools User Word List
> (...abre\010aber(\001\024Abschlu\027\...),
> 8610 bytes, v2.0
> WT13DE.UWL: Corel Writing Tools User Word List
> (...abre\010aber(\001\024Abschlu\027\...),
> 8610 bytes, v2.0
> wt13en.hwl: Corel Writing Tools User Word List
> (...alf&\002\02010 key\024number pad....),
> 24249 bytes, v2.0
> Wt13it.sav: Corel Writing Tools User Word List
> (...an\006una*\001\026stapmante\022st...),
> 14492 bytes, v2.0
> wt13pl.sav: Corel Writing Tools User Word List
> (...al\006al.\024\001...),
> 23754 bytes, v2.0
> wt13ru.sav: Corel Writing Tools User Word List,
> 13877 bytes, v2.0
> Wt13us.sav: Corel Writing Tools User Word List
> (...acsesory\022accessory.\001\026aco...),
> 14199 bytes, v2.0
> WT21en.hwl: Corel Writing Tools User Word List
> (...alf&\002\02010 key\024number pad....),
> 24249 bytes, v2.0
> WT21it.sav: Corel Writing Tools User Word List
> (...an\006una*\001\026stapmante\022st...),
> 14098 bytes, v2.0
> WT21us.sav: Corel Writing Tools User Word List
> (...acsesory\022accessory.\001\026aco...),
> 13844 bytes, v2.0
> WT21US.UWL: Corel Writing Tools User Word List
> (...acsesory\022accessory.\001\026aco...),
> 13844 bytes, v2.0
> WTUS.UWL: Corel Writing Tools User Word List,
> 546 bytes, v2.0
>
> I hope my diff file can be applied in future version of
> file utility.
>
> The wordlist excerpt is done by lestring16 type. So i get misspelled
> and corrected phrase but with ugly looking control like character
> between. When digging deeper it seem that before UTF-16 string the
> size (number of bytes of string) is stored as 2 byte little endian
> integer. So i would be nice if a pascal variant of lestring16 (maybe
> called plestring16) could be implemented. Then the wordlist excerpt
> could be shown more beautifully only with phrase strings.
>
> Maybe somebody could implement such a new type.
>
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
>
>
>
> <file-5_42-wordprocessors-uwl_diff.DEFANGED-3248><file-5_42-wordprocessors-uwl_diff_sig.DEFANGED-3249><trid-v-wordperfect-uwl.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220816/062f38c9/attachment.asc>
More information about the File
mailing list