[File] [PATCH] of Magdir/wordprocessors for Corel WordPerfect User Word List *.UWL *.SAV

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sat Aug 13 16:47:57 UTC 2022


Hello,

some days ago i send patch for Word Perfect CBT samples. These are
found in sub directory WritingTools inside Word Perfect program
directory "c:\Program Files (x86)\Corel\WordPerfect Office 2021".
In the sub directory there exist more similar files but with other
file name extensions like adv, hyd, icr, lex, mor and sav.

The dozens of SAV examples have file names are like: Wt13de.sav
Wt13dk.sav Wt13fr.sav Wt13it.sav Wt13nl.sav wt13ru.sav WT21us.sav
WT21DE.UWL WT21fr.sav WT21it.sav Wtcz.sav

These start with 2 letter phrase WT. That apparently is the
abbreviation for Writing Tools. The starting letter are often
followed by digits which corresponds to Word Perfect version. For
version 2021 this digits are 21 and for an older version i found
digits 13. And on web page sample 15 is used. The last capitals
correspond to used language. For Germany DE is used. For French FR is
used. For Italy IT is used and so on. For USA English US is used and
for English EN is used. Here the file name extension is HWL is used
whereas for all other languages it is SAV.

The SAV extension is apparently used for the "basic" or "template"
variant of a word list, that is provided by software producer. So
when the user "activate" a word list for a distinct language, then
apparently the SAV example is copied to WritingTools sub directory
inside the WordPerfect folder in his user directory and gets here
file name extension UWL. Obviously UWL is the abbreviation for user
word list. That this true can be seen by looking at the md5 sum. This
are the same at the beginning like 8837a6e4bfabfceba68901e30339c48b
for WT21us.sav and WT21US.UWL samples. The user can add, disable,
remove, or change his personal word lists ( that is the UWL variant).
Then of course the check sums get different.

When running file command (version 5.42) on such examples i get an
output like:

wt13de.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
WT13DE.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
wt13en.hwl: Unknown Corel/Wordperfect product 32, file type 10, v2.0
Wt13it.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
wt13pl.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
wt13ru.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
Wt13us.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
WT21en.hwl: Unknown Corel/Wordperfect product 32, file type 10, v2.0
WT21it.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
WT21us.sav: Unknown Corel/Wordperfect product 32, file type 10, v2.0
WT21US.UWL: Unknown Corel/Wordperfect product 32, file type 10, v2.0
WTUS.UWL:   Unknown Corel/Wordperfect product 32, file type 10, v2.0

With --extension option only ??? is displayed. Furthermore with -i
option for my samples only generic application/octet-stream is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
such examples with low rate as "WordPerfect (generic)" by
wp-generic.trid.xml and the examples are described with high rate
as "Corel Writing Tools User Word List" by uwl-wp.trid.xml (See
appended trid-v-wordperfect-adv.txt.gz).

Unfortunately i found no information especially about file format
specification about such WordPerfect files. I find in support area on
Corel web site a page about Writing Tools does not work. So i choose
that page as reference. That is expressed inside
Magdir/wordprocessors by comment lines like:

# URL:		https://support.corel.com/hc/en-us/articles/
#		215876258-Writing-Tools-Spell-Check-Dictionary
#		-does-not-work-in-WordPerfect-X5
#		http://wordperfect.helpmax.net/en/
#		editing-and-formatting-documents/
#		using-the-writing-tools/working-with-user-word-lists/
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/u/uwl-wp.trid.xml

The description happens inside Magdir/wordprocessors by starting like:
 0	string	\xffWPC
So we see that the first 4 bytes are the generic magic for all
WordPerfect samples. By bytes at offset 8 and 9 sub classification is
done. If sub class is not known as last step the sub class for every
thing else is shown by lines like:
 >8	default x
 >>8	byte	x	Unknown Corel/Wordperfect product %d,
 >>>9	byte	x	file type %d

So for my word list examples i must insert before lines like:
 >8	byte	32
 >>9	byte	10	Corel Writing Tools User Word List
 !:mime	application/x-wordperfect-wordlist
 !:ext	uwl/hwl/sav
Instead of generic mime type application/octet-stream i show an user
defined one.

According to unofficial WordPerfect File Format description found as
WPFF_DocumentStructure.htm at offset 20 the file size (not including
pad characters at EOF) is stored as 4 byte little endian integer. So
show that additional information by line like:
 >>>0x14	uleshort x	\b, %u bytes

At offset 4 pointer to document area is stored as 4 byte little
endian integer. In my examples this value was always 200 hexadecimal.
That is right after after extended header. So show this information
for unusual cases by line like:
 >>>4	ulelong	!0x200	\b, at %#x document area

So at offset 512 the document area begin. Obviously it contains often
UTF-16 little endian encoded strings and maybe some additional
information like string length and more. In the page about Working
with user word lists the following is written: If you often type
incorrectly, you can define the correctly spelled word as a
replacement. For example, you can specify "the" as a replacement for
"hte". You can replace abbreviations or acronyms with words or
phrases. For example, you can define Chief Executive Officer as the
replacement for the acronym CEO. So when we look in output of file
command with my patches we see something like acsesory/accessory in
USA version, abre/aber in German version and uan/una in Italian
version. So show beginning/excerpt part of wordlist by lines like:
 >>>(4.l)	ubyte	x
 >>>>&0	search/91/sb	a\0
 >>>>>&0		lestring16	x	(...%-.33s...)
So first jump to document area, then look for beginning of word list
starting mostly with letter a encoded as UTF-16 little endian and
then showing this as UTF string.

After applying the above mentioned modifications by patch
file-5.42-wordprocessors-uwl.diff then i get a more precise output
like:

Wt13de.sav: Corel Writing Tools User Word List
	    (...abre\010aber(\001\024Abschlu\027\...),
	    8610 bytes, v2.0
WT13DE.UWL: Corel Writing Tools User Word List
	    (...abre\010aber(\001\024Abschlu\027\...),
	    8610 bytes, v2.0
wt13en.hwl: Corel Writing Tools User Word List
	    (...alf&\002\02010 key\024number pad....),
	    24249 bytes, v2.0
Wt13it.sav: Corel Writing Tools User Word List
	    (...an\006una*\001\026stapmante\022st...),
	    14492 bytes, v2.0
wt13pl.sav: Corel Writing Tools User Word List
	    (...al\006al.\024\001...),
	    23754 bytes, v2.0
wt13ru.sav: Corel Writing Tools User Word List,
	    13877 bytes, v2.0
Wt13us.sav: Corel Writing Tools User Word List
	    (...acsesory\022accessory.\001\026aco...),
	    14199 bytes, v2.0
WT21en.hwl: Corel Writing Tools User Word List
	    (...alf&\002\02010 key\024number pad....),
	    24249 bytes, v2.0
WT21it.sav: Corel Writing Tools User Word List
	    (...an\006una*\001\026stapmante\022st...),
	    14098 bytes, v2.0
WT21us.sav: Corel Writing Tools User Word List
	    (...acsesory\022accessory.\001\026aco...),
	    13844 bytes, v2.0
WT21US.UWL: Corel Writing Tools User Word List
	    (...acsesory\022accessory.\001\026aco...),
	    13844 bytes, v2.0
WTUS.UWL:   Corel Writing Tools User Word List,
	    546 bytes, v2.0

I hope my diff file can be applied in future version of
file utility.

The wordlist excerpt is done by lestring16 type. So i get misspelled
and corrected phrase but with ugly looking control like character
between. When digging deeper it seem that before UTF-16 string the
size (number of bytes of string) is stored as 2 byte little endian
integer. So i would be nice if a pascal variant of lestring16 (maybe
called plestring16) could be implemented. Then the wordlist excerpt
could be shown more beautifully only with phrase strings.

Maybe somebody could implement such a new type.

With best wishes
Jörg Jenderek
--
Jörg Jenderek



-------------- next part --------------
--- file-5.42/magic/Magdir/wordprocessors.old	2021-12-06 16:25:22.000000000 +0100
+++ file-5.42/magic/Magdir/wordprocessors	2022-08-13 17:57:36.429878100 +0200
@@ -196,6 +196,29 @@
 >>9	byte	24	GroupWise admin ADS deferment data file
 >>9	default	x
 >>>9	byte	x	GroupWise: Unknown filetype %d
+# Corel Writing Tools WT*.*
+# From:		Joerg Jenderek
+# URL:		https://support.corel.com/hc/en-us/articles/215876258-Writing-Tools-Spell-Check-Dictionary-does-not-work-in-WordPerfect-X5
+#		http://wordperfect.helpmax.net/en/editing-and-formatting-documents/using-the-writing-tools/working-with-user-word-lists/
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/u/uwl-wp.trid.xml
+>8	byte	32
+>>9	byte	10	Corel Writing Tools User Word List
+#!:mime	application/octet-stream
+!:mime	application/x-wordperfect-wordlist
+# personal user word list UWL under user directory like: WTDE.UWL WTUS.UWL WT21DE.UWL WT21US.UWL WT13DE.UWL ...
+# and "template" SAV/HWL variant under program directory like: wt13en.hwl Wt13de.sav Wt13it.sav wt13ru.sav WT21us.sav Wtcz.sav ...
+!:ext	uwl/hwl/sav
+# jump to document area with some marker and word list
+>>>(4.l)	ubyte	x
+# look for beginning of word list starting mostly with letter a as UTF-16 like: Wt13es.sav
+# but not found in russian wt13ru.sav
+>>>>&0	search/91/sb	a\0
+# word list starting like: "acsesory\022accessory.\001\026acomodate\026accommodate4\001"
+>>>>>&0		lestring16	x	(...%-.33s...)
+# pointer to document area like: 200h
+>>>4	ulelong	!0x200	\b, at %#x document area
+# file size, not including pad characters at EOF
+>>>0x14	uleshort x	\b, %u bytes
 # IntelliTAG
 >8	byte	33
 >>9	byte	10	IntelliTAG (SGML) compiled DTD
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.42-wordprocessors-uwl.diff.sig
Type: application/octet-stream
Size: 1137 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220813/ac960fc6/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-wordperfect-uwl.txt.gz
Type: application/x-gzip
Size: 782 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220813/ac960fc6/attachment.bin>


More information about the File mailing list