[File] [PATCH] Magdir/frame FrameMaker document ; missing some versions+extensions

Christos Zoulas christos at zoulas.com
Mon Mar 4 00:35:02 UTC 2024


Committed, thanks!

christos

> On Mar 3, 2024, at 6:24 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some weeks ago i send patches for some Adobe FrameMaker file types. In
> this session i will handle FrameMaker document.
> 
> So i look for such files. When running file command version 5.45 on such
> samples and related files i get an output like:
> 
> DIALOG10.HLP:                  FrameMaker document (3.0 F)
> ECOLOGY.IX:                    FrameMaker document (4.0 K)
> ECOLOGY.TOC:                   FrameMaker document (4.0 K)
> LETTER:                        FrameMaker document (4.0 K)
> MAINMENU.HLP:                  FrameMaker document (4.0 K)
> SampleBookTOC.fm:              FrameMaker document 0)
> XREF.HLP:                      FrameMaker document (4.0 K)
> allchaps.ix:                   FrameMaker document (4.0 K)
> fm-a5doc.doc:                  FrameMaker document (4.0 K)
> fmt-190-signature-id-840.fm:   FrameMaker document (5.0 Y)
> fmt-533-signature-id-837.fm:   FrameMaker document (2.0 J)
> fmt-534-signature-id-838.fm:   FrameMaker document (3.0 F)
> fmt-535-signature-id-839.fm:   FrameMaker document (4.0 K)
> fmt-536-signature-id-841.fm:   FrameMaker document (5.5 Q)
> fmt-537-signature-id-842.fm:   FrameMaker document J)
> fmt-538-signature-id-843.fm:   FrameMaker document H)
> fmt-539-signature-id-844.fm:   FrameMaker document H)
> title.fm:                      FrameMaker document (3.0 F)
> title.fm4:                     FrameMaker document (4.0 K)
> x-fmt-302-signature-id-395.fm: FrameMaker document H)
> 
> When running with -e soft option i get an output like:
> 
> DIALOG10.HLP:                  data
> ECOLOGY.IX:                    data
> ECOLOGY.TOC:                   data
> LETTER:                        data
> MAINMENU.HLP:                  data
> SampleBookTOC.fm:              data
> XREF.HLP:                      data
> allchaps.ix:                   data
> fm-a5doc.doc:                  data
> fmt-190-signature-id-840.fm:   ASCII text, with no line terminators
> fmt-533-signature-id-837.fm:   ASCII text, with no line terminators
> fmt-534-signature-id-838.fm:   ASCII text, with no line terminators
> fmt-535-signature-id-839.fm:   ASCII text, with no line terminators
> fmt-536-signature-id-841.fm:   ASCII text, with no line terminators
> fmt-537-signature-id-842.fm:   ASCII text, with no line terminators
> fmt-538-signature-id-843.fm:   ASCII text, with no line terminators
> fmt-539-signature-id-844.fm:   ASCII text, with no line terminators
> title.fm:                      data
> title.fm4:                     data
> x-fmt-302-signature-id-395.fm: ASCII text, with no line terminators
> 
> With option --extension only 3 byte sequence ??? is shown and with -i
> option application/x-mif is shown.
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). All my inspected samples
> are described with low priority as "FrameMaker document" by fm.trid.xml.
> Here no mime type is shown. For file name suffix only one (.FM) shown.
> The samples with HLP suffix are described with higher priority as
> "FrameMaker Help" by hlp-fm.trid.xml. Here mime type
> application/vnd.framemaker and correct file name suffix (.HLP) is shown
> (See appended trid-v-fm.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). This
> identifies most examples but not "newest" versions like
> SampleBookTOC.fm. The samples are here described as "Adobe FrameMaker
> Document" with mime type application/vnd.framemaker. Here also version
> information is shown. Nine variants are detected starting from 2.0 until
> 9.0 and one intermediate version 5.5. These is done via PUID from
> fmt/533 until fmt/538 and x-fmt/302 (See appended droid-fm-hlp.csv.gz).
> 
> On Linux according to shared MIME-info database such samples are called
> "Adobe FrameMaker document". Here application/vnd.framemaker is used
> as mime type and also file name suffix fm is shown. The samples are just
> recognized by looking for 10 byte sequence <MakerFile at the beginning.
> That information can be seen in source freedesktop.org.xml.in found for
> example on gitlab.freedesktop.org.
> 
> With the help of other tools i found a page about FrameMaker on file
> formats archive team web site. That informations are expressed by
> comment lines inside Magdir/frame like:
> # URL:		http://fileformats.archiveteam.org/wiki/FrameMaker
> # Reference:	http://mark0.net/download/triddefs_xml.7z/
> #		defs/f/fm.trid.xml
> #		defs/h/hlp-fm.trid.xml
> 
> The description happens by lines inside Magdir/frame like:
> 0	string		\<MakerFile	FrameMaker document
> !:mime	application/x-mif
> >11	string		5.5		 (5.5
> >11	string		5.0		 (5.0
> >11	string		4.0		 (4.0
> >11	string		3.0		 (3.0
> >11	string		2.0		 (2.0
> >11	string		1.0		 (1.0
> >14	byte		x		  %c)
> It checks for some older know version strings and print version
> information, but for newer and higher versions (from 6.0) no version
> information is shown and only 1 character is shown which is a upcase
> letter for 3 byte version strings or next digit for 4 byte version
> string (like 10.0). I could add in same manner parts for newer versions
> but then some problems occur. There exist no real official file format
> specification. So on archive team web site highest mentioned version is
> 9.0 but according to Wikipedia highest mentioned version is 10 which i
> also found in example SampleBookTOC.fm. Furthermore i do not know if
> there exist intermediate versions like 5.5. So in my opinion it is
> better to show first 3 characters of "version string". If fourth
> character looks like an upcase letter (like F H J K Q Y) when display
> one space character and the character as before. If forth character is
> not like upcase letter append that character so i get 4 character
> version string like 10.0. According to iana MIF (Maker Interchange
> Format with current application/x-mif mime type) is text-based whereas
> the inspected documents are binary (implied by "data" classification
> with mime type application/vnd.framemaker). The DROID samples like
> fmt-533-signature-id-837.fm are not real FrameMaker documents. These
> samples contain only some leading bytes of such documents and are used
> by DROID tool as signature to recognize such documents. Because the
> samples start with ASCII string <MakerFile these are considered as
> "text" by file command. So such misidentified samples can be excluded
> when string tests are done in binary mode. So the first lines now
> becomes like:
> 0	string/b	\<MakerFile	FrameMaker document
> !:mime	application/vnd.framemaker
> >11	string		x		(%0.3s
> >>14	ubyte		>0x40		%c
> >>14	ubyte		<0x41		\b%c
> 
> The development of Framemaker was mainly done for Apple Macintosh. On
> that system did not exist the strange concept from Windows that file
> type is done by file name suffix. So for examples often fm is used as
> suffix, but there exist also samples where file names are without suffix
> (like in CHAPTER HARVARD LETTER MEMO1 NEWSLTR REPORT3). Often also doc (
> apparently abbreviation for document), toc (apparently abbreviation for
> table of content) and ix (apparently abbreviation for index) are found.
> On documentation page also bk or book is listed as suffix, but i do not
> found such samples. So i do not show these suffix. I also found samples
> with total different suffix (like title.fm4 wp.filt textre1.htr
> pmscript.ind change.nbh books.prd executiv.sum Hyper.Template). I am not
> sure that these rare names are accidents. So i also do not display these
> suffix.
> Apparently the FrameMaker document format is used by the program for
> it's own internal help system. There hlp suffix is used instead of
> something like fm. With the help of TrID tool i implement such a sub
> classification. Apparently characteristic for such help documents are
> embedded references to same or other help documents which looks like
> "gotolink xref.hlp:Overview" "openlink syntax1.hlp:firstpage". So by
> brute force look for "unique" 5 byte phrase .hlp: . Then i check for
> gotolink or openlink keyword before help file name and display link
> construct text. If is is not such a help document i assume by default
> directive it is a "normal" document with above mentioned possible
> suffix. The default clause only works if i repeat again a test for 4
> version string character. Furthermore Xref-hlp.fm is described as "help"
> file because is is same as XREF.HLP. So the sub classification with file
> name extensions is done by additional lines that look like:
> >14	ubyte		x
> >>18	search/9688/s	.hlp:		 \b) help
> !:ext	hlp
> #>>>&5	string		x		 LINK_NAME "%s"
> >>>&5	string		x
> >>>&-18	search/18/s	link\040
> >>>>&-4	regex/s		=^\[A-Za-z0-9.:\040]{1,} with "%s"
> >>18	default			x	\b)
> !:ext	/fm/doc/toc/ix
> 
> After applying the above mentioned modifications by patch
> file-frame-fm.diff then all my inspected FrameMaker documents are still
> described but now always version information is shown. Furthermore only
> "real" documents are described and misidentified DROID signatures are
> now skipped and sub classification for Help documents with correct file
> name suffix is done. This now looks like:
> 
> DIALOG10.HLP:                  FrameMaker document (3.0 F) help with
> 			       "gotolink dialog09.hlp:lastpage"
> ECOLOGY.IX:                    FrameMaker document (4.0 K)
> ECOLOGY.TOC:                   FrameMaker document (4.0 K)
> LETTER:                        FrameMaker document (4.0 K)
> MAINMENU.HLP:                  FrameMaker document (4.0 K) help with
> 			       "openlink syntax1.hlp:firstpage"
> SampleBookTOC.fm:              FrameMaker document (10.0)
> XREF.HLP:                      FrameMaker document (4.0 K) help with
> 			       "gotolink xref.hlp:Overview"
> allchaps.ix:                   FrameMaker document (4.0 K)
> fm-a5doc.doc:                  FrameMaker document (4.0 K)
> fmt-190-signature-id-840.fm:   ASCII text, with no line terminators
> fmt-533-signature-id-837.fm:   ASCII text, with no line terminators
> fmt-534-signature-id-838.fm:   ASCII text, with no line terminators
> fmt-535-signature-id-839.fm:   ASCII text, with no line terminators
> fmt-536-signature-id-841.fm:   ASCII text, with no line terminators
> fmt-537-signature-id-842.fm:   ASCII text, with no line terminators
> fmt-538-signature-id-843.fm:   ASCII text, with no line terminators
> fmt-539-signature-id-844.fm:   ASCII text, with no line terminators
> title.fm:                      FrameMaker document (3.0 F)
> title.fm4:                     FrameMaker document (4.0 K)
> x-fmt-302-signature-id-395.fm: ASCII text, with no line terminators
> 
> I hope my diff file can be applied in future version of file
> utility.
> 
> With best wishes,
> Jörg Jenderek
> --
> Jörg Jenderek
> <Nachrichtenteil als Anhang.DEFANGED-5449><trid-v-fm.txt.gz><droid-fm-hlp.csv.gz><file-frame-fm_diff.DEFANGED-5450><file-frame-fm_diff_sig.DEFANGED-5451>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list