[File] [PATCH] of Magdir/msdos for old Microsoft Word documents for Windows, Mac (*.doc *.dot *.mcw *.bak)

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jun 7 02:01:49 UTC 2019


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,
some days ago i inspect some old Microsoft documents with file
extension doc, dot, mcw, bak and no extension. I run version 5.37 on
such documents. Many of inspected documents are not recognized or
identified wrong. The output with -k options looks like:


BACKUPM4.BAK:           Microsoft Word for Macintosh 4.0
	Base III DBT, version number 0,
	next free block index 469776382,
	1st item "test backup of word"
BACKUPM5.BAK:           Microsoft Word for Macintosh 5.0
	dBase III DBT, version number 0,
	next free block index 587216894,
	1st item "test backup of word"
BACKUPW2.BAK:           Microsoft Word 2.0 Document\012-
	Microsoft WinWord 2.0 Document\012-
	Microsoft WinWord 2.0 Document
	FoxPro FPT, blocks size 2308,
	next free block index 3685035264
BACKW1.BAK:             FoxPro FPT, blocks size 512,
	next free block index 2611290368
Business Letter:        Microsoft Word for Macintosh 5.0
BUTERFLY.DOC:           Microsoft Word 2.0 Document\012-
	Microsoft WinWord 2.0 Document\012-
	Microsoft WinWord 2.0 Document\012-
Date and Time Glossary: Microsoft Word for Macintosh 5.0
MacWrite Settings:      data
NORMAL20a.DOT:          Microsoft WinWord 2.0 Document\012-
	Microsoft WinWord 2.0 Document
Resume Glossary:        data
ScriptGlossary:         data
VBS Labels:             Microsoft Word for Macintosh 5.0
WINWORD1.DOC:           FoxPro FPT, blocks size 512,
	next free block index 2611290368
WINWORD2.DOC:           Microsoft Word 2.0 Document\012-
	Microsoft WinWord 2.0 Document\012-
	Microsoft WinWord 2.0 Document
	FoxPro FPT, blocks size 2308,
	next free block index 3685035264
Word 4 ReadMe:          Microsoft Word for Macintosh 4.0
Word 4.0 Settings (5):  data
Word Command Help:      Microsoft Word for Macintosh 3.0
WORDMAC4.MCW:           Microsoft Word for Macintosh 4.0
	dBase III DBT, version number 0,
	next free block index 469776382,
	1st item "Macintosh Test!"
WORDMAC5.MCW:           Microsoft Word for Macintosh 5.0
	dBase III DBT, version number 0,
	next free block index 587216894,
	1st item "Macintosh Test!"

Furthermore with --extension option ??? is displayed for
unrecognized variants. And with --apple option UNKNUNKN is shown.
The file identifying tool TrID ( http://mark0.net/soft-trid-e.html )
describes some inspected examples as "Microsoft Word for Windows
(v1.x)".
Droid, the UK government national archives program describes these
examples as "Microsoft Word for Windows 1.0". See
https://sourceforge.net/projects/droid

So i add/change  lines to Magdir/msdos. Some Information is found on
fileformats.archiveteam.org website. So i add comment line like
 # URL:       http://fileformats.archiveteam.org/wiki/DOC
There a website with Source Code of Microsoft Word for Windows 1.1a
is mentioned. So use header file Opus/wordtech/file.h as reference.

Furthermore we get triple identifiers by Magdir/msdos
Once expressed by octal expression
 0 string/b	\333\245-\0\0\0		Microsoft Word 2.0 Document
And two times by hexadecimal expression
 0 string/b	\xDB\xA5\x2D\x00	Microsoft WinWord 2.0 Document

After removing duplicate this becomes
 0  string/b	\xDB\xA5\x2D\x00
 >0 use				word-fib

Because these Word document variants start with similar File
Information Block (FIB) i introduce a subroutine to display this
embedded information like
 0       name    			word-fib
 >0	ulelong		x		Microsoft
 !:mime	application/msword
Then for known and found wIdent + nFib combinations display specific
version name like
 >0	ulelong		0x0021A59B	WinWord 1.0
 >0	ulelong		0x002DA5DB	WinWord 2.x
 >0	ulelong		0x992DA5DB	WinWord 2.0
 >0	ulelong		0x000032fe	Word for Macintosh 1.0
 >0	ulelong		0x000034fe	Word for Macintosh 3.0
 >0	ulelong		0x050037fe	Word for Macintosh 3.x
 >0	ulelong		0x1B0037fe	Word for Macintosh 4.x
 >0	ulelong		0x1c0037fe	Word for Macintosh 4.0
 >0	ulelong		0xA40037fe	Word for Macintosh 4.y
 >0	ulelong		0x230037fe	Word for Macintosh 5.0

According to documentation FDot bit implies file is a template
with other filename extension and apple type. This is expressed by
 >10	ubyte&0x01	1		template
 !:apple	MSWDWTBN
 !:ext   dot
If this bit is not set then file is a document with specific apple
type. This is expressed by line like
 >10	ubyte&0x01	0		Docu
 !:apple	MSWDWDBN
Then by looking for wIdent value, which is something like 3?FE for
Mac variant and A5?? for Windows variant, show file name extensions.
For Mac variant this is mcw and also no filename extension. For
Windows file name extension is doc. If Windows Program should make
backup files then these files get bak extension. These facts are now
expressed by lines
 >>0	uleshort&0xF0ff	0x30fe		\bment
 !:ext   mcw/bak/
 >>0	uleshort&0xFF00	0xA500		\bment
 !:ext   doc/bak

The structure of the FIB varies in different word versions. So i am
not sure about the correctness of variable values like nFibBack,
fcMin etc. The initial document is represented by fcMin through
fcMac if  not fComplex.
So show text for Macintosh variant by lines
 >0	uleshort&0xF0ff	0x30fe
 >>20	ubelong		x				\b, 0x%x fcMin
 >>(20.L)	string	x				%s
For the Windows variant this looks like
 >0	uleshort&0xFF00	0xA500
 >>24	ulelong		x				\b, 0x%x fcMin
 >>(24.l)	string	x				%s

At the end i also change all similar Word magic lines in same
manner. So following old lines
 >0  belong      0xfe370023      Microsoft Word for Macintosh 5.0
 !:mime	application/msword
 !:ext   mcw
are replaced by changed lines
 > 0  belong      0xfe370023
 >> 0	use	word-fib

For unrecognized variants like "WinWord 1.0" add additional lines like
 0 ubelong	0x9BA52100
 >0 use		word-fib

After applying the above mentioned modifications by patch
file-5.37-msdos-doc_dot.diff then more such old Microsoft documents
are identified and described more precisely like:

BACKUPM4.BAK:           Microsoft Word for Macintosh 4.0 Document,
	nFibBack 0x1900,
	0x200 fcMin test backup of word
BACKUPM5.BAK:           Microsoft Word for Macintosh 5.0 Document,
	nFibBack 0x1900,
	0x200 fcMin test backup of word
BACKUPW2.BAK:           Microsoft WinWord 2.x Document,
	locale 0x409,
	0x180 fcMin test backup of word
BACKW1.BAK:             Microsoft WinWord 1.0 Document, l
	locale 0x2, 0x180 fcMin test backup of word
Business Letter:        Microsoft Word for Macintosh 5.0 Document,
	with pictures, nFibBack 0x1900,
	0x100 fcMin \001
BUTERFLY.DOC:           Microsoft WinWord 2.x Document,
	locale 0x409, with pictures,
	0x180 fcMin test backup word\001
Date and Time Glossary: Microsoft Word for Macintosh 5.0 Document,
	fast saved, nFibBack 0x1900,
	0x100 fcMin
MacWrite Settings:      Microsoft Word for Macintosh 4.y Document,
	nFibBack 0x7, 0x9da0000 fcMin
NORMAL20a.DOT:          Microsoft WinWord 2.x template,
	nProduct 4021, locale 0x409, nFibBack 0x2d,
	0x180 fcMin
Resume Glossary:        Microsoft Word for Macintosh 4.x Document,
	nFibBack 0x1900,
	0x100 fcMin activities\007\245\011
	Activity\007\007\007\245\011Activity\007\007
ScriptGlossary:         Microsoft Word for Macintosh 3.x Document,
	0x100 fcMin
VBS Labels:             Microsoft Word for Macintosh 5.0 Document,
	nFibBack 0x1900,
	0x100 fcMin Name,company,street,street2,City,State,Zip
WINWORD1.DOC:           Microsoft WinWord 1.0 Document,
	locale 0x2,
	0x180 fcMin test
WINWORD2.DOC:           Microsoft WinWord 2.x Document,
	locale 0x409,
	0x180 fcMin test
Word 4 ReadMe:          Microsoft Word for Macintosh 4.0 Document,
	with pictures, nFibBack 0x1900,
	0x100 fcMin Update for\013Microsoft Word Version 4 Users
Word 4.0 Settings (5):  Microsoft Word for Macintosh 4.y Document,
	nFibBack 0xc, 0x6c70000 fcMin
Word Command Help:      Microsoft Word for Macintosh 3.0 Document,
	0x0 fcMin \3764
WORDMAC4.MCW:           Microsoft Word for Macintosh 4.0 Document,
	nFibBack 0x1900,
	0x200 fcMin Macintosh Test!
WORDMAC5.MCW:           Microsoft Word for Macintosh 5.0 Document,
	nFibBack 0x1900,
	0x200 fcMin Macintosh Test!

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek









-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXPnFfwAKCRCv8rHJQhrU
1hMLAJoD6mpQpQ+7b2TblfTlV+w4AflRzwCfRQXXnuf5YEvhVWUTlTnxBlXrFG8=
=l1y4
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.37/magic/Magdir/msdos.old	2019-04-19 00:42:27 +0000
+++ file-5.37/magic/Magdir/msdos	2019-06-07 01:20:16 +0000
@@ -667,15 +667,16 @@
 !:ext   mcw
->0  belong      0xfe340000      Microsoft Word for Macintosh 3.0
-!:mime	application/msword
-!:ext   mcw
->0  belong      0xfe37001c      Microsoft Word for Macintosh 4.0
-!:mime	application/msword
-!:ext   mcw
->0  belong      0xfe370023      Microsoft Word for Macintosh 5.0
-!:mime	application/msword
-!:ext   mcw
+# Microsoft Word for Macintosh 3.0 and higher
+>0  belong      0xfe340000
+>>0	use	word-fib
+>0  belong      0xfe370005
+>>0	use	word-fib
+>0  belong      0xfe3700a4
+>>0	use	word-fib
+>0  belong      0xfe37001b
+>>0	use	word-fib
+>0  belong      0xfe37001c
+>>0	use	word-fib
+>0  belong      0xfe370023
+>>0	use	word-fib
 
-0	string/b	\333\245-\0\0\0			Microsoft Word 2.0 Document
-!:mime	application/msword
-!:ext   doc
 # Note: seems already recognized as "OLE 2 Compound Document" in ./ole2compounddocs
@@ -683,6 +684,88 @@
 #!:mime	application/msword
-
 #
-0	string/b	\xDB\xA5\x2D\x00		Microsoft WinWord 2.0 Document
-!:mime application/msword
+# Update:    Joerg Jenderek
+# URL:       http://fileformats.archiveteam.org/wiki/DOC
+# Reference: d1yx3ys82bpsa0.cloudfront.net/source/Word-1.1a-CHM-Distribution.zip
+#            /Word 1.1a CHM Distribution/Opus/wordtech/file.h
+#	display info for old Microsoft Word documents
+0       name    			word-fib
+# File Information Block Base (FIBBase): wIdent + nFib
+>0	ulelong		x		Microsoft
+!:mime	application/msword
+#>0	uleshort	x		\b, wIdent 0x%4.4x
+>0	ulelong		0x0021A59B	WinWord 1.0
+#>0	ulelong		0x    A59C	wMagicPmWord
+>0	ulelong		0x002DA5DB	WinWord 2.x
+>0	ulelong		0x992DA5DB	WinWord 2.0
+#>0	ulelong		0x00C1A5EC	newer
+#>0	ulelong		0x    CFD0	WinWord 6.0
+>0	ulelong		0x000032fe	Word for Macintosh 1.0
+# maybe 1987 or earlier
+>0	ulelong		0x000034fe	Word for Macintosh 3.0
+#>0	ulelong		0x    35fe	W1
+# maybe 1988 or earlier
+>0	ulelong		0x050037fe	Word for Macintosh 3.x
+>0	ulelong		0x1B0037fe	Word for Macintosh 4.x
+>0	ulelong		0x1c0037fe	Word for Macintosh 4.0
+# maybe march 1989 or earlier
+>0	ulelong		0xA40037fe	Word for Macintosh 4.y
+>0	ulelong		0x230037fe	Word for Macintosh 5.0
+# fDot:1 bit implies file is a DOT
+>10	ubyte&0x01	1		template
+# https://www.macdisk.com/macsigen.php
+!:apple	MSWDWTBN
+!:ext   dot
+# Magdir\msdos, 815: Warning: Current entry does not yet have a description for adding a APPLE type
+# file: could not find any valid magic files! (No error)
+>10	ubyte&0x01	0		Docu
+!:apple	MSWDWDBN
+# Mac variant
+>>0	uleshort&0xF0ff	0x30fe		\bment
+# mcw or no extension
+!:ext   mcw/bak/
+# Windows variant
+>>0	uleshort&0xFF00	0xA500		\bment
+# bak for backup
+!:ext   doc/bak
+# product version written by
+>4	uleshort	!0				\b, nProduct %x
+# language stamp; English_United_States~409 English_United_Kingdom~809
+>6	uleshort	!0				\b, locale 0x%x
+# pnNext; if has file appended, where it starts
+>8	uleshort	!0				\b, PN 0x%x
+# flags
+# fGlsy:1; file is a glossary co-doc 
+>10	ubyte&0x02	2				glossary
+# fComplex:1; file piece table/etc stored (FastSave)
+>10	ubyte&0x04	4				\b, fast saved
+# fHasPic:1; graphics in file
+>10	ubyte&0x08	0x08				\b, with pictures
+# cQuickSaves (4 bits); count of times file quicksaved
+#>10	ubyte&0xF0	>0				\b, 0x%x cQuickSaves
+# nFibBack; how backwards compatiable is this format?
+>12	uleshort	!0				\b, nFibBack 0x%x
+# FIB is defined to extend from pnFib to fcMin 
+# the initial document is represented by fcMin through fcMac if !fComplex
+# Macintosh variant
+>0	uleshort&0xF0ff	0x30fe
+# rgwSpare0 [3]
+>>20	ubelong		x				\b, 0x%x fcMin
+#>(20.L)	ubequad	x				%16.16llx
+>>(20.L)	string	x				%s
+# Windows variant
+>0	uleshort&0xFF00	0xA500
+# rgwSpare0 [5]
+>>24	ulelong		x				\b, 0x%x fcMin
+#>>(24.l)	ubelong	x				%16.16llx
+>>(24.l)	string	x				%s
+# An unsigned integer that specifies the count of 16-bit values corresponding to fibRgW that follow
+#>32	uleshort	!0				\b, csw %x
+#>34	ubequad		!0				\b, FibRgW97 0x%16.16llx
+#
+
+# Winword 1.0 and higher
+0	ubelong		0x9BA52100
+>0	use		word-fib
+0	string/b	\xDB\xA5\x2D\x00
+>0	use		word-fib
 #
@@ -691,4 +774,2 @@
 #
-0	string/b	\xDB\xA5\x2D\x00		Microsoft WinWord 2.0 Document
-!:mime application/msword
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.37-msdos-doc_dot.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20190607/ed77d8b2/attachment.obj>


More information about the File mailing list