[File] [PATCH] of Magdir/msdos for old Microsoft Word documents for Windows, Mac (*.doc *.dot *.mcw *.bak)
Jörg Jenderek
joerg.jen.der.ek at gmx.net
Fri Jun 7 02:01:49 UTC 2019
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
some days ago i inspect some old Microsoft documents with file
extension doc, dot, mcw, bak and no extension. I run version 5.37 on
such documents. Many of inspected documents are not recognized or
identified wrong. The output with -k options looks like:
BACKUPM4.BAK: Microsoft Word for Macintosh 4.0
Base III DBT, version number 0,
next free block index 469776382,
1st item "test backup of word"
BACKUPM5.BAK: Microsoft Word for Macintosh 5.0
dBase III DBT, version number 0,
next free block index 587216894,
1st item "test backup of word"
BACKUPW2.BAK: Microsoft Word 2.0 Document\012-
Microsoft WinWord 2.0 Document\012-
Microsoft WinWord 2.0 Document
FoxPro FPT, blocks size 2308,
next free block index 3685035264
BACKW1.BAK: FoxPro FPT, blocks size 512,
next free block index 2611290368
Business Letter: Microsoft Word for Macintosh 5.0
BUTERFLY.DOC: Microsoft Word 2.0 Document\012-
Microsoft WinWord 2.0 Document\012-
Microsoft WinWord 2.0 Document\012-
Date and Time Glossary: Microsoft Word for Macintosh 5.0
MacWrite Settings: data
NORMAL20a.DOT: Microsoft WinWord 2.0 Document\012-
Microsoft WinWord 2.0 Document
Resume Glossary: data
ScriptGlossary: data
VBS Labels: Microsoft Word for Macintosh 5.0
WINWORD1.DOC: FoxPro FPT, blocks size 512,
next free block index 2611290368
WINWORD2.DOC: Microsoft Word 2.0 Document\012-
Microsoft WinWord 2.0 Document\012-
Microsoft WinWord 2.0 Document
FoxPro FPT, blocks size 2308,
next free block index 3685035264
Word 4 ReadMe: Microsoft Word for Macintosh 4.0
Word 4.0 Settings (5): data
Word Command Help: Microsoft Word for Macintosh 3.0
WORDMAC4.MCW: Microsoft Word for Macintosh 4.0
dBase III DBT, version number 0,
next free block index 469776382,
1st item "Macintosh Test!"
WORDMAC5.MCW: Microsoft Word for Macintosh 5.0
dBase III DBT, version number 0,
next free block index 587216894,
1st item "Macintosh Test!"
Furthermore with --extension option ??? is displayed for
unrecognized variants. And with --apple option UNKNUNKN is shown.
The file identifying tool TrID ( http://mark0.net/soft-trid-e.html )
describes some inspected examples as "Microsoft Word for Windows
(v1.x)".
Droid, the UK government national archives program describes these
examples as "Microsoft Word for Windows 1.0". See
https://sourceforge.net/projects/droid
So i add/change lines to Magdir/msdos. Some Information is found on
fileformats.archiveteam.org website. So i add comment line like
# URL: http://fileformats.archiveteam.org/wiki/DOC
There a website with Source Code of Microsoft Word for Windows 1.1a
is mentioned. So use header file Opus/wordtech/file.h as reference.
Furthermore we get triple identifiers by Magdir/msdos
Once expressed by octal expression
0 string/b \333\245-\0\0\0 Microsoft Word 2.0 Document
And two times by hexadecimal expression
0 string/b \xDB\xA5\x2D\x00 Microsoft WinWord 2.0 Document
After removing duplicate this becomes
0 string/b \xDB\xA5\x2D\x00
>0 use word-fib
Because these Word document variants start with similar File
Information Block (FIB) i introduce a subroutine to display this
embedded information like
0 name word-fib
>0 ulelong x Microsoft
!:mime application/msword
Then for known and found wIdent + nFib combinations display specific
version name like
>0 ulelong 0x0021A59B WinWord 1.0
>0 ulelong 0x002DA5DB WinWord 2.x
>0 ulelong 0x992DA5DB WinWord 2.0
>0 ulelong 0x000032fe Word for Macintosh 1.0
>0 ulelong 0x000034fe Word for Macintosh 3.0
>0 ulelong 0x050037fe Word for Macintosh 3.x
>0 ulelong 0x1B0037fe Word for Macintosh 4.x
>0 ulelong 0x1c0037fe Word for Macintosh 4.0
>0 ulelong 0xA40037fe Word for Macintosh 4.y
>0 ulelong 0x230037fe Word for Macintosh 5.0
According to documentation FDot bit implies file is a template
with other filename extension and apple type. This is expressed by
>10 ubyte&0x01 1 template
!:apple MSWDWTBN
!:ext dot
If this bit is not set then file is a document with specific apple
type. This is expressed by line like
>10 ubyte&0x01 0 Docu
!:apple MSWDWDBN
Then by looking for wIdent value, which is something like 3?FE for
Mac variant and A5?? for Windows variant, show file name extensions.
For Mac variant this is mcw and also no filename extension. For
Windows file name extension is doc. If Windows Program should make
backup files then these files get bak extension. These facts are now
expressed by lines
>>0 uleshort&0xF0ff 0x30fe \bment
!:ext mcw/bak/
>>0 uleshort&0xFF00 0xA500 \bment
!:ext doc/bak
The structure of the FIB varies in different word versions. So i am
not sure about the correctness of variable values like nFibBack,
fcMin etc. The initial document is represented by fcMin through
fcMac if not fComplex.
So show text for Macintosh variant by lines
>0 uleshort&0xF0ff 0x30fe
>>20 ubelong x \b, 0x%x fcMin
>>(20.L) string x %s
For the Windows variant this looks like
>0 uleshort&0xFF00 0xA500
>>24 ulelong x \b, 0x%x fcMin
>>(24.l) string x %s
At the end i also change all similar Word magic lines in same
manner. So following old lines
>0 belong 0xfe370023 Microsoft Word for Macintosh 5.0
!:mime application/msword
!:ext mcw
are replaced by changed lines
> 0 belong 0xfe370023
>> 0 use word-fib
For unrecognized variants like "WinWord 1.0" add additional lines like
0 ubelong 0x9BA52100
>0 use word-fib
After applying the above mentioned modifications by patch
file-5.37-msdos-doc_dot.diff then more such old Microsoft documents
are identified and described more precisely like:
BACKUPM4.BAK: Microsoft Word for Macintosh 4.0 Document,
nFibBack 0x1900,
0x200 fcMin test backup of word
BACKUPM5.BAK: Microsoft Word for Macintosh 5.0 Document,
nFibBack 0x1900,
0x200 fcMin test backup of word
BACKUPW2.BAK: Microsoft WinWord 2.x Document,
locale 0x409,
0x180 fcMin test backup of word
BACKW1.BAK: Microsoft WinWord 1.0 Document, l
locale 0x2, 0x180 fcMin test backup of word
Business Letter: Microsoft Word for Macintosh 5.0 Document,
with pictures, nFibBack 0x1900,
0x100 fcMin \001
BUTERFLY.DOC: Microsoft WinWord 2.x Document,
locale 0x409, with pictures,
0x180 fcMin test backup word\001
Date and Time Glossary: Microsoft Word for Macintosh 5.0 Document,
fast saved, nFibBack 0x1900,
0x100 fcMin
MacWrite Settings: Microsoft Word for Macintosh 4.y Document,
nFibBack 0x7, 0x9da0000 fcMin
NORMAL20a.DOT: Microsoft WinWord 2.x template,
nProduct 4021, locale 0x409, nFibBack 0x2d,
0x180 fcMin
Resume Glossary: Microsoft Word for Macintosh 4.x Document,
nFibBack 0x1900,
0x100 fcMin activities\007\245\011
Activity\007\007\007\245\011Activity\007\007
ScriptGlossary: Microsoft Word for Macintosh 3.x Document,
0x100 fcMin
VBS Labels: Microsoft Word for Macintosh 5.0 Document,
nFibBack 0x1900,
0x100 fcMin Name,company,street,street2,City,State,Zip
WINWORD1.DOC: Microsoft WinWord 1.0 Document,
locale 0x2,
0x180 fcMin test
WINWORD2.DOC: Microsoft WinWord 2.x Document,
locale 0x409,
0x180 fcMin test
Word 4 ReadMe: Microsoft Word for Macintosh 4.0 Document,
with pictures, nFibBack 0x1900,
0x100 fcMin Update for\013Microsoft Word Version 4 Users
Word 4.0 Settings (5): Microsoft Word for Macintosh 4.y Document,
nFibBack 0xc, 0x6c70000 fcMin
Word Command Help: Microsoft Word for Macintosh 3.0 Document,
0x0 fcMin \3764
WORDMAC4.MCW: Microsoft Word for Macintosh 4.0 Document,
nFibBack 0x1900,
0x200 fcMin Macintosh Test!
WORDMAC5.MCW: Microsoft Word for Macintosh 5.0 Document,
nFibBack 0x1900,
0x200 fcMin Macintosh Test!
I hope my diff file can be applied in future version of
file utility.
With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXPnFfwAKCRCv8rHJQhrU
1hMLAJoD6mpQpQ+7b2TblfTlV+w4AflRzwCfRQXXnuf5YEvhVWUTlTnxBlXrFG8=
=l1y4
-----END PGP SIGNATURE-----
-------------- next part --------------
--- file-5.37/magic/Magdir/msdos.old 2019-04-19 00:42:27 +0000
+++ file-5.37/magic/Magdir/msdos 2019-06-07 01:20:16 +0000
@@ -667,15 +667,16 @@
!:ext mcw
->0 belong 0xfe340000 Microsoft Word for Macintosh 3.0
-!:mime application/msword
-!:ext mcw
->0 belong 0xfe37001c Microsoft Word for Macintosh 4.0
-!:mime application/msword
-!:ext mcw
->0 belong 0xfe370023 Microsoft Word for Macintosh 5.0
-!:mime application/msword
-!:ext mcw
+# Microsoft Word for Macintosh 3.0 and higher
+>0 belong 0xfe340000
+>>0 use word-fib
+>0 belong 0xfe370005
+>>0 use word-fib
+>0 belong 0xfe3700a4
+>>0 use word-fib
+>0 belong 0xfe37001b
+>>0 use word-fib
+>0 belong 0xfe37001c
+>>0 use word-fib
+>0 belong 0xfe370023
+>>0 use word-fib
-0 string/b \333\245-\0\0\0 Microsoft Word 2.0 Document
-!:mime application/msword
-!:ext doc
# Note: seems already recognized as "OLE 2 Compound Document" in ./ole2compounddocs
@@ -683,6 +684,88 @@
#!:mime application/msword
-
#
-0 string/b \xDB\xA5\x2D\x00 Microsoft WinWord 2.0 Document
-!:mime application/msword
+# Update: Joerg Jenderek
+# URL: http://fileformats.archiveteam.org/wiki/DOC
+# Reference: d1yx3ys82bpsa0.cloudfront.net/source/Word-1.1a-CHM-Distribution.zip
+# /Word 1.1a CHM Distribution/Opus/wordtech/file.h
+# display info for old Microsoft Word documents
+0 name word-fib
+# File Information Block Base (FIBBase): wIdent + nFib
+>0 ulelong x Microsoft
+!:mime application/msword
+#>0 uleshort x \b, wIdent 0x%4.4x
+>0 ulelong 0x0021A59B WinWord 1.0
+#>0 ulelong 0x A59C wMagicPmWord
+>0 ulelong 0x002DA5DB WinWord 2.x
+>0 ulelong 0x992DA5DB WinWord 2.0
+#>0 ulelong 0x00C1A5EC newer
+#>0 ulelong 0x CFD0 WinWord 6.0
+>0 ulelong 0x000032fe Word for Macintosh 1.0
+# maybe 1987 or earlier
+>0 ulelong 0x000034fe Word for Macintosh 3.0
+#>0 ulelong 0x 35fe W1
+# maybe 1988 or earlier
+>0 ulelong 0x050037fe Word for Macintosh 3.x
+>0 ulelong 0x1B0037fe Word for Macintosh 4.x
+>0 ulelong 0x1c0037fe Word for Macintosh 4.0
+# maybe march 1989 or earlier
+>0 ulelong 0xA40037fe Word for Macintosh 4.y
+>0 ulelong 0x230037fe Word for Macintosh 5.0
+# fDot:1 bit implies file is a DOT
+>10 ubyte&0x01 1 template
+# https://www.macdisk.com/macsigen.php
+!:apple MSWDWTBN
+!:ext dot
+# Magdir\msdos, 815: Warning: Current entry does not yet have a description for adding a APPLE type
+# file: could not find any valid magic files! (No error)
+>10 ubyte&0x01 0 Docu
+!:apple MSWDWDBN
+# Mac variant
+>>0 uleshort&0xF0ff 0x30fe \bment
+# mcw or no extension
+!:ext mcw/bak/
+# Windows variant
+>>0 uleshort&0xFF00 0xA500 \bment
+# bak for backup
+!:ext doc/bak
+# product version written by
+>4 uleshort !0 \b, nProduct %x
+# language stamp; English_United_States~409 English_United_Kingdom~809
+>6 uleshort !0 \b, locale 0x%x
+# pnNext; if has file appended, where it starts
+>8 uleshort !0 \b, PN 0x%x
+# flags
+# fGlsy:1; file is a glossary co-doc
+>10 ubyte&0x02 2 glossary
+# fComplex:1; file piece table/etc stored (FastSave)
+>10 ubyte&0x04 4 \b, fast saved
+# fHasPic:1; graphics in file
+>10 ubyte&0x08 0x08 \b, with pictures
+# cQuickSaves (4 bits); count of times file quicksaved
+#>10 ubyte&0xF0 >0 \b, 0x%x cQuickSaves
+# nFibBack; how backwards compatiable is this format?
+>12 uleshort !0 \b, nFibBack 0x%x
+# FIB is defined to extend from pnFib to fcMin
+# the initial document is represented by fcMin through fcMac if !fComplex
+# Macintosh variant
+>0 uleshort&0xF0ff 0x30fe
+# rgwSpare0 [3]
+>>20 ubelong x \b, 0x%x fcMin
+#>(20.L) ubequad x %16.16llx
+>>(20.L) string x %s
+# Windows variant
+>0 uleshort&0xFF00 0xA500
+# rgwSpare0 [5]
+>>24 ulelong x \b, 0x%x fcMin
+#>>(24.l) ubelong x %16.16llx
+>>(24.l) string x %s
+# An unsigned integer that specifies the count of 16-bit values corresponding to fibRgW that follow
+#>32 uleshort !0 \b, csw %x
+#>34 ubequad !0 \b, FibRgW97 0x%16.16llx
+#
+
+# Winword 1.0 and higher
+0 ubelong 0x9BA52100
+>0 use word-fib
+0 string/b \xDB\xA5\x2D\x00
+>0 use word-fib
#
@@ -691,4 +774,2 @@
#
-0 string/b \xDB\xA5\x2D\x00 Microsoft WinWord 2.0 Document
-!:mime application/msword
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.37-msdos-doc_dot.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20190607/ed77d8b2/attachment.obj>
More information about the File
mailing list