[File] [PATCH] of Magdir/wordprocessors for Corel WordPerfect User Word List *.UWL *.SAV

Christos Zoulas christos at zoulas.com
Tue Aug 16 12:06:02 UTC 2022


Added, thanks!

christos

> On Aug 13, 2022, at 7:47 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some days ago i send patch for Word Perfect CBT samples. These are
> found in sub directory WritingTools inside Word Perfect program
> directory "c:\Program Files (x86)\Corel\WordPerfect Office 2021".
> In the sub directory there exist more similar files but with other
> file name extensions like adv, hyd, icr, lex, mor and sav.
> 
> The dozens of SAV examples have file names are like: Wt13de.sav
> Wt13dk.sav Wt13fr.sav Wt13it.sav Wt13nl.sav wt13ru.sav WT21us.sav
> WT21DE.UWL WT21fr.sav WT21it.sav Wtcz.sav
> 
> These start with 2 letter phrase WT. That apparently is the
> abbreviation for Writing Tools. The starting letter are often
> followed by digits which corresponds to Word Perfect version. For
> version 2021 this digits are 21 and for an older version i found
> digits 13. And on web page sample 15 is used. The last capitals
> correspond to used language. For Germany DE is used. For French FR is
> used. For Italy IT is used and so on. For USA English US is used and
> for English EN is used. Here the file name extension is HWL is used
> whereas for all other languages it is SAV.
> 
> The SAV extension is apparently used for the "basic" or "template"
> variant of a word list, that is provided by software producer. So
> when the user "activate" a word list for a distinct language, then
> apparently the SAV example is copied to WritingTools sub directory
> inside the WordPerfect folder in his user directory and gets here
> file name extension UWL. Obviously UWL is the abbreviation for user
> word list. That this true can be seen by looking at the md5 sum. This
> are the same at the beginning like 8837a6e4bfabfceba68901e30339c48b
> for WT21us.sav and WT21US.UWL samples. The user can add, disable,
> remove, or change his personal word lists ( that is the UWL variant).
> Then of course the check sums get different.
> 
> When running file command (version 5.42) on such examples i get an
> output like:
> 
> wt13de.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT13DE.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> wt13en.hwl: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> Wt13it.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> wt13pl.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> wt13ru.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> Wt13us.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21en.hwl: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21it.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21us.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WT21US.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
> WTUS.UWL:   Unknown Corel/Wordperfect product 32, file type 10, v2.0
> 
> With --extension option only ??? is displayed. Furthermore with -i
> option for my samples only generic application/octet-stream is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
> such examples with low rate as "WordPerfect (generic)" by
> wp-generic.trid.xml and the examples are described with high rate
> as "Corel Writing Tools User Word List" by uwl-wp.trid.xml (See
> appended trid-v-wordperfect-adv.txt.gz).
> 
> Unfortunately i found no information especially about file format
> specification about such WordPerfect files. I find in support area on
> Corel web site a page about Writing Tools does not work. So i choose
> that page as reference. That is expressed inside
> Magdir/wordprocessors by comment lines like:
> 
> # URL:		https://support.corel.com/hc/en-us/articles/
> #		215876258-Writing-Tools-Spell-Check-Dictionary
> #		-does-not-work-in-WordPerfect-X5
> #		http://wordperfect.helpmax.net/en/
> #		editing-and-formatting-documents/
> #		using-the-writing-tools/working-with-user-word-lists/
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/u/uwl-wp.trid.xml
> 
> The description happens inside Magdir/wordprocessors by starting like:
> 0	string	\xffWPC
> So we see that the first 4 bytes are the generic magic for all
> WordPerfect samples. By bytes at offset 8 and 9 sub classification is
> done. If sub class is not known as last step the sub class for every
> thing else is shown by lines like:
>> 8	default x
>>> 8	byte	x	Unknown Corel/Wordperfect product %d,
>>>> 9	byte	x	file type %d
> 
> So for my word list examples i must insert before lines like:
>> 8	byte	32
>>> 9	byte	10	Corel Writing Tools User Word List
> !:mime	application/x-wordperfect-wordlist
> !:ext	uwl/hwl/sav
> Instead of generic mime type application/octet-stream i show an user
> defined one.
> 
> According to unofficial WordPerfect File Format description found as
> WPFF_DocumentStructure.htm at offset 20 the file size (not including
> pad characters at EOF) is stored as 4 byte little endian integer. So
> show that additional information by line like:
>>>> 0x14	uleshort x	\b, %u bytes
> 
> At offset 4 pointer to document area is stored as 4 byte little
> endian integer. In my examples this value was always 200 hexadecimal.
> That is right after after extended header. So show this information
> for unusual cases by line like:
>>>> 4	ulelong	!0x200	\b, at %#x document area
> 
> So at offset 512 the document area begin. Obviously it contains often
> UTF-16 little endian encoded strings and maybe some additional
> information like string length and more. In the page about Working
> with user word lists the following is written: If you often type
> incorrectly, you can define the correctly spelled word as a
> replacement. For example, you can specify "the" as a replacement for
> "hte". You can replace abbreviations or acronyms with words or
> phrases. For example, you can define Chief Executive Officer as the
> replacement for the acronym CEO. So when we look in output of file
> command with my patches we see something like acsesory/accessory in
> USA version, abre/aber in German version and uan/una in Italian
> version. So show beginning/excerpt part of wordlist by lines like:
>>>> (4.l)	ubyte	x
>>>>> &0	search/91/sb	a\0
>>>>>> &0		lestring16	x	(...%-.33s...)
> So first jump to document area, then look for beginning of word list
> starting mostly with letter a encoded as UTF-16 little endian and
> then showing this as UTF string.
> 
> After applying the above mentioned modifications by patch
> file-5.42-wordprocessors-uwl.diff then i get a more precise output
> like:
> 
> Wt13de.sav: Corel Writing Tools User Word List
> 	    (...abre\010aber(\001\024Abschlu\027\...),
> 	    8610 bytes, v2.0
> WT13DE.UWL: Corel Writing Tools User Word List
> 	    (...abre\010aber(\001\024Abschlu\027\...),
> 	    8610 bytes, v2.0
> wt13en.hwl: Corel Writing Tools User Word List
> 	    (...alf&\002\02010 key\024number pad....),
> 	    24249 bytes, v2.0
> Wt13it.sav: Corel Writing Tools User Word List
> 	    (...an\006una*\001\026stapmante\022st...),
> 	    14492 bytes, v2.0
> wt13pl.sav: Corel Writing Tools User Word List
> 	    (...al\006al.\024\001...),
> 	    23754 bytes, v2.0
> wt13ru.sav: Corel Writing Tools User Word List,
> 	    13877 bytes, v2.0
> Wt13us.sav: Corel Writing Tools User Word List
> 	    (...acsesory\022accessory.\001\026aco...),
> 	    14199 bytes, v2.0
> WT21en.hwl: Corel Writing Tools User Word List
> 	    (...alf&\002\02010 key\024number pad....),
> 	    24249 bytes, v2.0
> WT21it.sav: Corel Writing Tools User Word List
> 	    (...an\006una*\001\026stapmante\022st...),
> 	    14098 bytes, v2.0
> WT21us.sav: Corel Writing Tools User Word List
> 	    (...acsesory\022accessory.\001\026aco...),
> 	    13844 bytes, v2.0
> WT21US.UWL: Corel Writing Tools User Word List
> 	    (...acsesory\022accessory.\001\026aco...),
> 	    13844 bytes, v2.0
> WTUS.UWL:   Corel Writing Tools User Word List,
> 	    546 bytes, v2.0
> 
> I hope my diff file can be applied in future version of
> file utility.
> 
> The wordlist excerpt is done by lestring16 type. So i get misspelled
> and corrected phrase but with ugly looking control like character
> between. When digging deeper it seem that before UTF-16 string the
> size (number of bytes of string) is stored as 2 byte little endian
> integer. So i would be nice if a pascal variant of lestring16 (maybe
> called plestring16) could be implemented. Then the wordlist excerpt
> could be shown more beautifully only with phrase strings.
> 
> Maybe somebody could implement such a new type.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> 
> 
> 
> <file-5_42-wordprocessors-uwl_diff.DEFANGED-3248><file-5_42-wordprocessors-uwl_diff_sig.DEFANGED-3249><trid-v-wordperfect-uwl.txt.gz>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20220816/062f38c9/attachment.asc>


More information about the File mailing list