[File] [PATCH] Magdir/frame FrameMaker document ; missing some versions+extensions

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Sun Mar 3 23:24:13 UTC 2024


Hello,

some weeks ago i send patches for some Adobe FrameMaker file types. In
this session i will handle FrameMaker document.

So i look for such files. When running file command version 5.45 on such
samples and related files i get an output like:

DIALOG10.HLP:                  FrameMaker document (3.0 F)
ECOLOGY.IX:                    FrameMaker document (4.0 K)
ECOLOGY.TOC:                   FrameMaker document (4.0 K)
LETTER:                        FrameMaker document (4.0 K)
MAINMENU.HLP:                  FrameMaker document (4.0 K)
SampleBookTOC.fm:              FrameMaker document 0)
XREF.HLP:                      FrameMaker document (4.0 K)
allchaps.ix:                   FrameMaker document (4.0 K)
fm-a5doc.doc:                  FrameMaker document (4.0 K)
fmt-190-signature-id-840.fm:   FrameMaker document (5.0 Y)
fmt-533-signature-id-837.fm:   FrameMaker document (2.0 J)
fmt-534-signature-id-838.fm:   FrameMaker document (3.0 F)
fmt-535-signature-id-839.fm:   FrameMaker document (4.0 K)
fmt-536-signature-id-841.fm:   FrameMaker document (5.5 Q)
fmt-537-signature-id-842.fm:   FrameMaker document J)
fmt-538-signature-id-843.fm:   FrameMaker document H)
fmt-539-signature-id-844.fm:   FrameMaker document H)
title.fm:                      FrameMaker document (3.0 F)
title.fm4:                     FrameMaker document (4.0 K)
x-fmt-302-signature-id-395.fm: FrameMaker document H)

When running with -e soft option i get an output like:

DIALOG10.HLP:                  data
ECOLOGY.IX:                    data
ECOLOGY.TOC:                   data
LETTER:                        data
MAINMENU.HLP:                  data
SampleBookTOC.fm:              data
XREF.HLP:                      data
allchaps.ix:                   data
fm-a5doc.doc:                  data
fmt-190-signature-id-840.fm:   ASCII text, with no line terminators
fmt-533-signature-id-837.fm:   ASCII text, with no line terminators
fmt-534-signature-id-838.fm:   ASCII text, with no line terminators
fmt-535-signature-id-839.fm:   ASCII text, with no line terminators
fmt-536-signature-id-841.fm:   ASCII text, with no line terminators
fmt-537-signature-id-842.fm:   ASCII text, with no line terminators
fmt-538-signature-id-843.fm:   ASCII text, with no line terminators
fmt-539-signature-id-844.fm:   ASCII text, with no line terminators
title.fm:                      data
title.fm4:                     data
x-fmt-302-signature-id-395.fm: ASCII text, with no line terminators

With option --extension only 3 byte sequence ??? is shown and with -i
option application/x-mif is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). All my inspected samples
are described with low priority as "FrameMaker document" by fm.trid.xml.
Here no mime type is shown. For file name suffix only one (.FM) shown.
The samples with HLP suffix are described with higher priority as
"FrameMaker Help" by hlp-fm.trid.xml. Here mime type
application/vnd.framemaker and correct file name suffix (.HLP) is shown
(See appended trid-v-fm.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies most examples but not "newest" versions like
SampleBookTOC.fm. The samples are here described as "Adobe FrameMaker
Document" with mime type application/vnd.framemaker. Here also version
information is shown. Nine variants are detected starting from 2.0 until
9.0 and one intermediate version 5.5. These is done via PUID from
fmt/533 until fmt/538 and x-fmt/302 (See appended droid-fm-hlp.csv.gz).

On Linux according to shared MIME-info database such samples are called
"Adobe FrameMaker document". Here application/vnd.framemaker is used
as mime type and also file name suffix fm is shown. The samples are just
recognized by looking for 10 byte sequence <MakerFile at the beginning.
That information can be seen in source freedesktop.org.xml.in found for
example on gitlab.freedesktop.org.

With the help of other tools i found a page about FrameMaker on file
formats archive team web site. That informations are expressed by
comment lines inside Magdir/frame like:
# URL:		http://fileformats.archiveteam.org/wiki/FrameMaker
# Reference:	http://mark0.net/download/triddefs_xml.7z/
#		defs/f/fm.trid.xml
#		defs/h/hlp-fm.trid.xml

The description happens by lines inside Magdir/frame like:
  0	string		\<MakerFile	FrameMaker document
  !:mime	application/x-mif
  >11	string		5.5		 (5.5
  >11	string		5.0		 (5.0
  >11	string		4.0		 (4.0
  >11	string		3.0		 (3.0
  >11	string		2.0		 (2.0
  >11	string		1.0		 (1.0
  >14	byte		x		  %c)
It checks for some older know version strings and print version
information, but for newer and higher versions (from 6.0) no version
information is shown and only 1 character is shown which is a upcase
letter for 3 byte version strings or next digit for 4 byte version
string (like 10.0). I could add in same manner parts for newer versions
but then some problems occur. There exist no real official file format
specification. So on archive team web site highest mentioned version is
9.0 but according to Wikipedia highest mentioned version is 10 which i
also found in example SampleBookTOC.fm. Furthermore i do not know if
there exist intermediate versions like 5.5. So in my opinion it is
better to show first 3 characters of "version string". If fourth
character looks like an upcase letter (like F H J K Q Y) when display
one space character and the character as before. If forth character is
not like upcase letter append that character so i get 4 character
version string like 10.0. According to iana MIF (Maker Interchange
Format with current application/x-mif mime type) is text-based whereas
the inspected documents are binary (implied by "data" classification
with mime type application/vnd.framemaker). The DROID samples like
fmt-533-signature-id-837.fm are not real FrameMaker documents. These
samples contain only some leading bytes of such documents and are used
by DROID tool as signature to recognize such documents. Because the
samples start with ASCII string <MakerFile these are considered as
"text" by file command. So such misidentified samples can be excluded
when string tests are done in binary mode. So the first lines now
becomes like:
  0	string/b	\<MakerFile	FrameMaker document
  !:mime	application/vnd.framemaker
  >11	string		x		(%0.3s
  >>14	ubyte		>0x40		%c
  >>14	ubyte		<0x41		\b%c

The development of Framemaker was mainly done for Apple Macintosh. On
that system did not exist the strange concept from Windows that file
type is done by file name suffix. So for examples often fm is used as
suffix, but there exist also samples where file names are without suffix
(like in CHAPTER HARVARD LETTER MEMO1 NEWSLTR REPORT3). Often also doc (
apparently abbreviation for document), toc (apparently abbreviation for
table of content) and ix (apparently abbreviation for index) are found.
On documentation page also bk or book is listed as suffix, but i do not
found such samples. So i do not show these suffix. I also found samples
with total different suffix (like title.fm4 wp.filt textre1.htr
pmscript.ind change.nbh books.prd executiv.sum Hyper.Template). I am not
sure that these rare names are accidents. So i also do not display these
suffix.
Apparently the FrameMaker document format is used by the program for
it's own internal help system. There hlp suffix is used instead of
something like fm. With the help of TrID tool i implement such a sub
classification. Apparently characteristic for such help documents are
embedded references to same or other help documents which looks like
"gotolink xref.hlp:Overview" "openlink syntax1.hlp:firstpage". So by
brute force look for "unique" 5 byte phrase .hlp: . Then i check for
gotolink or openlink keyword before help file name and display link
construct text. If is is not such a help document i assume by default
directive it is a "normal" document with above mentioned possible
suffix. The default clause only works if i repeat again a test for 4
version string character. Furthermore Xref-hlp.fm is described as "help"
file because is is same as XREF.HLP. So the sub classification with file
name extensions is done by additional lines that look like:
  >14	ubyte		x
  >>18	search/9688/s	.hlp:		 \b) help
  !:ext	hlp
  #>>>&5	string		x		 LINK_NAME "%s"
  >>>&5	string		x
  >>>&-18	search/18/s	link\040
  >>>>&-4	regex/s		=^\[A-Za-z0-9.:\040]{1,} with "%s"
  >>18	default			x	\b)
  !:ext	/fm/doc/toc/ix

After applying the above mentioned modifications by patch
file-frame-fm.diff then all my inspected FrameMaker documents are still
described but now always version information is shown. Furthermore only
"real" documents are described and misidentified DROID signatures are
now skipped and sub classification for Help documents with correct file
name suffix is done. This now looks like:

DIALOG10.HLP:                  FrameMaker document (3.0 F) help with
			       "gotolink dialog09.hlp:lastpage"
ECOLOGY.IX:                    FrameMaker document (4.0 K)
ECOLOGY.TOC:                   FrameMaker document (4.0 K)
LETTER:                        FrameMaker document (4.0 K)
MAINMENU.HLP:                  FrameMaker document (4.0 K) help with
			       "openlink syntax1.hlp:firstpage"
SampleBookTOC.fm:              FrameMaker document (10.0)
XREF.HLP:                      FrameMaker document (4.0 K) help with
			       "gotolink xref.hlp:Overview"
allchaps.ix:                   FrameMaker document (4.0 K)
fm-a5doc.doc:                  FrameMaker document (4.0 K)
fmt-190-signature-id-840.fm:   ASCII text, with no line terminators
fmt-533-signature-id-837.fm:   ASCII text, with no line terminators
fmt-534-signature-id-838.fm:   ASCII text, with no line terminators
fmt-535-signature-id-839.fm:   ASCII text, with no line terminators
fmt-536-signature-id-841.fm:   ASCII text, with no line terminators
fmt-537-signature-id-842.fm:   ASCII text, with no line terminators
fmt-538-signature-id-843.fm:   ASCII text, with no line terminators
fmt-539-signature-id-844.fm:   ASCII text, with no line terminators
title.fm:                      FrameMaker document (3.0 F)
title.fm4:                     FrameMaker document (4.0 K)
x-fmt-302-signature-id-395.fm: ASCII text, with no line terminators

I hope my diff file can be applied in future version of file
utility.

With best wishes,
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
-- 
File mailing list
File at astron.com
https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-fm.txt.gz
Type: application/x-gzip
Size: 814 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240304/b7693d7d/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-fm-hlp.csv.gz
Type: application/x-gzip
Size: 838 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240304/b7693d7d/attachment-0001.bin>
-------------- next part --------------
--- file-master/magic/Magdir/frame.old	2024-02-29 21:43:55.786498000 +0100
+++ file-master/magic/Magdir/frame	2024-03-04 00:05:56.452618300 +0100
@@ -8,15 +8,46 @@
 #
 # URL:		https://en.wikipedia.org/wiki/Adobe_FrameMaker
 #
-0	string		\<MakerFile	FrameMaker document
-!:mime	application/x-mif
->11	string		5.5		 (5.5
->11	string		5.0		 (5.0
->11	string		4.0		 (4.0
->11	string		3.0		 (3.0
->11	string		2.0		 (2.0
->11	string		1.0		 (1.0
->14	byte		x		  %c)
+# Update:	Joerg Jenderek 2024 Mar
+# URL:		http://fileformats.archiveteam.org/wiki/FrameMaker
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/f/fm.trid.xml
+# Note:		called "FrameMaker document" by TrID and "Adobe FrameMaker document" by shared MIME-info database
+# skip "text" DROID samples like: fmt-190-signature-id-840.fm fmt-533-signature-id-837.fm fmt-534-signature-id-838.fm fmt-535-signature-id-839.fm fmt-536-signature-id-841.fm
+# fmt-537-signature-id-842.fm fmt-538-signature-id-843.fm fmt-539-signature-id-844.fm x-fmt-302-signature-id-395.fm
+0	string/b	\<MakerFile	FrameMaker document
+#!:mime	application/octet-stream
+# https://www.iana.org/assignments/media-types/application/vnd.framemaker
+!:mime	application/vnd.framemaker
+# version string like 1.0 2.0 3.0 4.0 5.0 5.5 6.0 7.0 8.0 9.0 10.0
+>11	string		x		(%0.3s
+# before closing directive ">" is appended version letter like: F H J K Q Y
+>>14	ubyte		>0x40		%c
+# or last digit of 4 character version string
+>>14	ubyte		<0x41		\b%c
+# test again so that next default clause works
+>14	ubyte		x
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/h/hlp-fm.trid.xml
+# Note:		called "FrameMaker Help" by TrID
+# look for reference to FrameMaker help name suffix like in: index1.hlp
+>>18	search/9688/s	.hlp:		 \b) help
+# the internal FrameMaker help are just FrameMaker document with hlp suffix; XREF.HLP is same as Xref-hlp.fm
+!:ext	hlp
+# For control reason show link name like:
+# "Overview" "lastpage "firstpage "Add File" "Conditional Text" "Table Format" "Creating a reference frame" "firstpageCov" "Spot Colors" "Selecting text" "proceduresbl" "lastpageu" "Introducing HelpHe" "Menu of Syntax Descriptions" "Main FrameMaker window"
+#>>>&5	string		x		 LINK_NAME "%s"
+>>>&5	string		x
+# look for gotolink or openlink keyword before help file name
+>>>&-18	search/18/s	link\040
+# link construct with help name like: "gotolink xref.hlp:Overview" "openlink syntax1.hlp:firstpage"
+>>>>&-4	regex/s		=^\[A-Za-z0-9.:\040]{1,}	with "%s"
+# if not FrameMaker Help assume it is "normal" FrameMaker document
+# shown with closing parenthesis to get look like in frame,v 1.18
+>>18	default			x	\b)
+# sometimes without suffix like: CHAPTER HARVARD LETTER MEMO1 NEWSLTR REPORT3
+# no samples found with .bk or .book extension
+# allchaps.ix (Framemaker Index) and others like:
+# title.fm4 wp.filt textre1.htr pmscript.ind change.nbh books.prd executiv.sum Hyper.Template
+!:ext	/fm/doc/toc/ix
 # URL:		http://fileformats.archiveteam.org/wiki/Maker_Interchange_Format
 # Reference:	https://help.adobe.com/en_US/framemaker/mifreference/mifref.pdf
 # Update:	Joerg Jenderek 2019 Nov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-frame-fm.diff.sig
Type: application/octet-stream
Size: 1680 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20240304/b7693d7d/attachment.obj>


More information about the File mailing list