From joerg.jen.der.ek at gmx.net Tue Jan 11 20:24:19 2022 From: joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) Date: Tue, 11 Jan 2022 21:24:19 +0100 Subject: [File] [PATCH] Magdir/ole2compounddocs for "newer" Adobe PageMaker Message-ID: <701dba45-e02a-e297-7f39-916c2df41bf7@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, some days ago i send patch for "older" Aldus/Adobe PageMaker documents, which is accepted and now included inside Magdir/wordprocessors. Now i check "newer" Adobe PageMaker documents. The documents and templates are files with file name extensions like PM6 P65 PMD PT6 T65 PMT. When running file command version 5.41 with -e cdf option on such documents i get an output like: 02TEMPLT.T65: OLE 2 Compound Document, v3.62, SecID 0, 2 FAT sectors, 0 Mini FAT sector : UNKNOWN with names PageMaker Charset.pmt: OLE 2 Compound Document, v3.62, SecID 0x66, 0 Mini FAT sector : UNKNOWN with names PageMaker MyPage6.PM6: OLE 2 Compound Document, v3.62, SecID 0x1, 0 Mini FAT sector : UNKNOWN with names PageMaker brochus.pt6: OLE 2 Compound Document, v3.62, SecID 0x1, 0 Mini FAT sector : UNKNOWN with names PageMaker pm-70.pmd: OLE 2 Compound Document, v3.62, SecID 0, 0 Mini FAT sector : UNKNOWN with names PageMaker strategies.p65: OLE 2 Compound Document, v3.62, SecID 0, 24 FAT sectors, Mini FAT start sector 0x2a, 25 Mini FAT sectors : UNKNOWN with names PageMaker ObjectPool 1 Furthermore with -i option only generic application/CDFV2 is shown. With -i and -e cdf option mime type application/x-ole-storage is shown. With option --extension only 3 byte sequence ??? is shown. No oficial mime type come from Microsoft. Blame on them. But at least according to FreeDesktop.org shared MIME database "application/x-ole-storage" seems to be the most common used. This information can also be found on reposcope.com website. So i think the file command should also use this term or at least use the same term when using soft or cdf magic. So i changed in current src/readcdf.c this mime type. That looked like: } else if (ms->flags & MAGIC_MIME_TYPE) { if (file_printf(ms, "application/CDFV2") == -1) return -1; } When running file command with -e soft or no extra option for all examples i get a generic line like: Composite Document File V2 Document, Cannot read section info For comparison reason i run the file format identification utility TrID ( See https://mark0.net/soft-trid-e.html). This identifies also all examples with low priority as "Generic OLE2 / Multistream Compound" by docfile.trid.xml. Most examples are described as "Adobe PageMaker document (generic)" with mime type application/x-pagemaker by pagemaker-generic.trid.xml. The examples are described often also as "Adobe PageMaker document (v6)" by pagemaker-pm6.trid.xml, "Adobe PageMaker document (v6.5)" by pagemaker-pm65.trid.xml and "Page Maker 7 Document" by pmd-pm7.trid.xml without correct version differentiation. So also mentioned 3 filename extensions PM6, P65 and PMD are not in right order. Furthermore here the file name extensions for templates (PT6 T65 PMT) with character T are also missing (See appended trid-v-pagemaker-new.txt.gz). For comparison reason i also run the file format identification utility DROID ( See https://sourceforge.net/projects/droid/). This identifies all new pagemaker examples as "Pagemaker Document (Generic)" with mime type application/vnd.pagemaker by PUID fmt/876. But it only shows 2 extensions PMD and PMT (See appended DROID-pagemaker-new.csv.gz) Luckily i also found a page about PageMaker on file formats archive team web site. That informations are about the "old" variants and also the "new" variants. That informations are expressed by comment lines inside Magdir/ole2compounddocs like: # URL: http://fileformats.archiveteam.org/wiki/PageMaker # Reference: http://mark0.net/download/triddefs_xml.7z/defs/p # pagemaker-generic.trid.xml # pagemaker-pm6.trid.xml # pagemaker-pm65.trid.xml # pmd-pm7.trid.xml The Pagemaker documents are recognized as "OLE 2 Compound Document" by starting bytes (\320\317\021\340\241\261\032\341) at the beginning inside Magdir/ole2compounddocs. Obviously there exist no code fragment to do sub class identification. So the examples are described as "UNKNOWN". Furthermore the examples have no registered Root storage object CLSID or this value is nil. In that case file command would display afterwords this information by a phrase like ", clsid 0xc0c7266eb98cd311a1c800c04f612452". That means that in branch handling CLSID GUID 0 code must be added. The last entry was for SoftMaker Presentations or template (*.prd *.prv) with pictures. So i add afterwards lines for my inspected examples. Luckily file command print some directory entry names. In all examples this is word "PageMaker" encoded as UTF-16. This characteristic is also found in global string section inside TrID definition by line like: P'A'G'E'M'A'K'E'R When i extract this stream for example by Michal Mutl Structured Storage Viewer i get real pagemaker content in "old" format. This is also described in the documentation and these parts are recognised by Magdir/wordprocessors. So by first additional line i look for second directory entry with UTF-16 encoded name PageMaker. That looks like: >>>> 128 lestring16 PageMaker : In second step i must jump to stream part. Maybe there exist more efficient or better ways, but i do brute force looking for start magic of "old" PageMaker by line like: >>>>> 0 search/0xa900/s \0\0\0\0\0\0\xff\x99 In third step i handle this stream part by lines like: #>>>>>>&0 use PageMaker >>>>>> &0 indirect x I first tried to call directly sub routine PageMaker from Magdir/wordprocessors, but then i get wrong version. Maybe this is bug in file command. When i use instead the indirect directive i get correct identifications. But i also get an ugly side effect. Afterwards an additional unexpected phrase UNKNOWN0000000000000000 is displayed. This was triggered by part for remaining non nil clsid. That was done by lines like: >>88 default x : UNKNOWN >>>80 ubequad !0 \b, clsid %#16.16llx >>>88 ubequad x \b%16.16llx This should not happen! I do not know what is wrong here. So i check again for non nil GUID. So this now becomes like: >>88 default x >>>88 ubequad !0 : UNKNOWN >>>>80 ubequad !0 \b, clsid %#16.16llx >>>>88 ubequad x \b%16.16llx After applying the above mentioned modifications by patch file-5.41-ole2compounddocs-pagemaker.diff, file-5.41-readcdf-mime.diff and using newest Magdir/wordprocessors then all my inspected "newer" PageMaker documents are now described with more details. This now looks with -e cdf option like: 02TEMPLT.T65: OLE 2 Compound Document, v3.62, SecID 0, 2 FAT sectors, 0 Mini FAT sector : Adobe PageMaker document, little-endian, version 6.50 Charset.pmt: OLE 2 Compound Document, v3.62, SecID 0x66, 0 Mini FAT sector : Adobe PageMaker document, little-endian, version 6.50 MyPage6.PM6: OLE 2 Compound Document, v3.62, SecID 0x1, 0 Mini FAT sector : Adobe PageMaker document, little-endian, version 6 brochus.pt6: OLE 2 Compound Document, v3.62, SecID 0x1, 0 Mini FAT sector : Adobe PageMaker document, little-endian, version 6 pm-70.pmd: OLE 2 Compound Document, v3.62, SecID 0, 0 Mini FAT sector : Adobe PageMaker document, little-endian, version 6.50 strategies.p65: OLE 2 Compound Document, v3.62, SecID 0, 24 FAT sectors, Mini FAT start sector 0x2a, 25 Mini FAT sectors : Adobe PageMaker document, little-endian, version 6.50 With -e cdf and --extension option this now looks like: 02TEMPLT.T65: p65/t65/pmd/pmt Charset.pmt: p65/t65/pmd/pmt MyPage6.PM6: pm6/pt6 brochus.pt6: pm6/pt6 pm-70.pmd: p65/t65/pmd/pmt strategies.p65: p65/t65/pmd/pmt I hope my diff files can be applied in future version of file utility. So unfortunately no ways are described and found by myself to distinguish templates with other file name extensions from pure PageMaker publications. Also i found no way to distinguish version 6.5 (*.P65 *.T65) from version 7 (*.PMD *.PMT). Check the facts as far as you can. Listen to what scientists and the experts of the departments recommend. Accordingly, the vaccine is the most suited measure against Corona. Anyone who believes in Fake news, also storms as MOB the Capitol, mocks science and terrorizes the to silent majority of the population. Stay healthy. J?rg Jenderek - -- J?rg Jenderek -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYd3nXQAKCRCv8rHJQhrU 1nJVAKDWay4r61LNcGvLo/8tNO2b8R/SvgCeIelamPiKS+QVYX0dR78c8xiXUBg= =JjJT -----END PGP SIGNATURE----- -------------- next part -------------- --- file-5.41/src/readcdf.c.old 2019-09-30 17:42:50.000000000 +0200 +++ file-5.41/src/readcdf.c 2022-01-09 22:17:06.936913800 +0100 @@ -675,5 +675,6 @@ return -1; } else if (ms->flags & MAGIC_MIME_TYPE) { - if (file_printf(ms, "application/CDFV2") == -1) + /* https://reposcope.com/mimetype/application/x-ole-storage */ + if (file_printf(ms, "application/x-ole-storage") == -1) return -1; } -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-readcdf-mime.diff.sig Type: application/octet-stream Size: 412 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DROID-pagemaker-new.csv.gz Type: application/x-gzip Size: 475 bytes Desc: not available URL: -------------- next part -------------- --- file-5.41/magic/Magdir/ole2compounddocs.old 2021-09-07 09:39:31 +0000 +++ file-5.41/magic/Magdir/ole2compounddocs 2022-01-11 19:53:37 +0000 @@ -260,2 +260,20 @@ >>>>>>128 lestring16 Pictures with pictures +# +# URL: http://fileformats.archiveteam.org/wiki/PageMaker +# Reference: http://mark0.net/download/triddefs_xml.7z/defs/p +# pagemaker-generic.trid.xml +# pagemaker-pm6.trid.xml +# pagemaker-pm65.trid.xml +# pmd-pm7.trid.xml +# From: Joerg Jenderek +# Note: since version 6 embedd as stream with PageMaker name the "old" format handled by ./wordprocessors +# verified by Michal Mutl Structured Storage Viewer `SSView.exe brochus.pt6` +# Second directory entry name PageMaker +>>>>128 lestring16 PageMaker : +# look for magic of "old" PageMaker like in 02TEMPLT.T65 +>>>>>0 search/0xa900/s \0\0\0\0\0\0\xff\x99 +# GRR: jump to PageMaker stream and inspect it by sub routine PageMaker of ./wordprocessors failed with wrong version! +#>>>>>>&0 use PageMaker +# THIS WORKS PARTLY! +>>>>>>&0 indirect x # remaining null clsid @@ -269,2 +287,5 @@ !:mime application/x-ole-storage +# according to file version 5.41 with -e soft option +#!:mime application/CDFV2 +#!:ext ??? # look for known clsid GUID @@ -563,6 +584,12 @@ # remaining non null clsid ->>88 default x : UNKNOWN +>>88 default x +# GRR: check again for non null clsid because wrong when called by indirect directive +>>>88 ubequad !0 : UNKNOWN +# https://reposcope.com/mimetype/application/x-ole-storage !:mime application/x-ole-storage ->>>80 ubequad !0 \b, clsid %#16.16llx ->>>88 ubequad x \b%16.16llx +# according to file version 5.41 with -e soft option +#!:mime application/CDFV2 +#!:ext ??? +>>>>80 ubequad !0 \b, clsid %#16.16llx +>>>>88 ubequad x \b%16.16llx -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-ole2compounddocs-pagemaker.diff.sig Type: application/octet-stream Size: 1044 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trid-v-pagemaker-new.txt.gz Type: application/x-gzip Size: 843 bytes Desc: not available URL: From mihaidavid at posteo.net Sun Jan 16 21:26:13 2022 From: mihaidavid at posteo.net (Horia Mihai David) Date: Sun, 16 Jan 2022 21:26:13 +0000 Subject: [File] [PATCH] of Magdir/images for Quite OK Image Format *.qoi In-Reply-To: <60fbe421-8442-29d8-61ed-498ec75d6dad@posteo.net> References: <60fbe421-8442-29d8-61ed-498ec75d6dad@posteo.net> Message-ID: <7ee30e8a-7276-519d-00b8-9a589cf7d88b@posteo.net> Hi, QOI (Quite OK Image Format) created by Dominic Szablewski is a losless image compression format similar to PNG, with a much simpler implementation, very fast decoding and comparable compression rates. It is finding its popularity among game developers. Please find attached (qoi.patch) a patch to add detection support for qoi images in libmagic/file. NOTE: The patch targets the tip of master on the github mirror (commit f8558a245ab82b58d64c1fbe8f6f310fd8b36259). I CC the author of the QOI format. I added a comment with my name and contact method as required, if the original author steps in he can replace me. According to the official spec (): A QOI file consists of a 14-byte header, followed by any number of data ?chunks? and an 8-byte end marker. qoi_header { char magic[4]; // magic bytes "qoif" uint32_t width; // image width in pixels (BE) uint32_t height; // image height in pixels (BE) uint8_t channels; // 3 = RGB, 4 = RGBA uint8_t colorspace; // 0 = sRGB with linear alpha // 1 = all channels linear }; The provided patch detects qoi files and extracts height/width as well as channel and colorspace information. If unsupported channel or colorspace information is in the header, it is outputted as a bad header. I added MIME and file extension descriptions too. qoi_test/0a.qoi:???????????????????? QOI image data, 800 x 600, sRGBA (linear alpha) qoi_test/0b.qoi:???????????????????? QOI image data, 800 x 600, RGBA (all channels linear) qoi_test/0c.qoi:???????????????????? QOI image data, 800 x 600, RGB (all channels linear) qoi_test/0d.qoi:???????????????????? QOI image data, 800 x 600, sRGB (linear alpha) qoi_test/bad_channels.qoi:?????????? QOI image data, 800 x 600, bad header qoi_test/bad_chansandcolorspace.qoi: QOI image data, 800 x 600, bad header qoi_test/bad_colorspace.qoi:???????? QOI image data, 800 x 600, bad header user at pc:~/file/file/src$ MAGIC=../magic/Magdir/ ./file --mime qoi_test/*.qoi qoi_test/0a.qoi:???????????????????? image/x-qoi; charset=binary qoi_test/0b.qoi:???????????????????? image/x-qoi; charset=binary qoi_test/0c.qoi:???????????????????? image/x-qoi; charset=binary qoi_test/0d.qoi:???????????????????? image/x-qoi; charset=binary qoi_test/bad_channels.qoi:?????????? image/x-qoi; charset=binary qoi_test/bad_chansandcolorspace.qoi: image/x-qoi; charset=binary qoi_test/bad_colorspace.qoi:???????? image/x-qoi; charset=binary user at pc:~/file/file/src$ MAGIC=../magic/Magdir/ ./file --extension qoi_test/*.qoi qoi_test/0a.qoi:???????????????????? qoi qoi_test/0b.qoi:???????????????????? qoi qoi_test/0c.qoi:???????????????????? qoi qoi_test/0d.qoi:???????????????????? qoi qoi_test/bad_channels.qoi:?????????? qoi qoi_test/bad_chansandcolorspace.qoi: qoi qoi_test/bad_colorspace.qoi:???????? qoi I attach the aforementioned test files, each as its own attachment to this email. The files contain *only* the 14 byte headers, with the rest of the data stripped out to reduce file size. Best Regards, - Horia Mihai David -------------- next part -------------- A non-text attachment was scrubbed... Name: qoi.patch Type: text/x-patch Size: 1449 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0a.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0b.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0c.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0d.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bad_channels.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bad_chansandcolorspace.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bad_colorspace.qoi Type: application/octet-stream Size: 14 bytes Desc: not available URL: From joerg.jen.der.ek at gmx.net Sun Jan 16 01:34:58 2022 From: joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) Date: Sun, 16 Jan 2022 02:34:58 +0100 Subject: [File] [PATCH] of Magdir/msvc, database, sysex for Microsoft Visual C or OMF library *.LIB Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, some days ago i handled some libraries with file name extension LIB. When running file command version 5.41 on some examples and related files with --keep-going option i get an output like: DBFNTX.LIB: SysEx File - Inventronics FLIB7M.LIB: data FP87.LIB: Microsoft Visual C library SysEx File - ADA JMPPM32.LIB: Microsoft Visual C library SysEx File - ADA MOUSE.LIB: data PRESET_6.SYX: SysEx File - QB4UTIL.LIB: Microsoft Visual C library SysEx File - ADA T2.DBT: dBase III DBT, next free block index 11, 1st item "First memo\032\032" WATTCPWL.LIB: dBase III DBT, version number 0, next free block index 130544, 1st item "\200 " ZLIB.LIB: dBase III DBT, version number 0, next free block index 130544, 1st item "\200M" example-1manband.syx: SysEx File - example-apple.syx: SysEx File - Apple example-somascape2.syx: SysEx File - example-webmidijs.syx: SysEx File - example-yamaha.syx: SysEx File - Yamaha mlibce.lib: Microsoft Visual C library SysEx File - ADA mwlibc.lib: Microsoft Visual C library SysEx File - ADA With --extension option wrong 3 byte extension "dbt" or ??? are displayed and with -i option wrong mime type application/x-dbt or only generic application/octet-stream is shown. For comparison reason i run the file format identification utility TrID ( See https://mark0.net/soft-trid-e.html). This describes one SysEx example PRESET_6.SYX as "MIDI Emulator Project SysEx preset command" by syx--midiemu.trid.xml whereas the file command display no vendor name for this type (66h). The libraries recognised by file command are also described as "Microsoft Visual C Library" by lib-msvc.trid.xml, but TrID also describe some examples like DBFNTX.LIB correctly. (See appended lib_syx-trid-v.txt.gz) Luckily TrID tool displays correct file name extension LIB for inspected libraries and SYX for few SysEx files. This list with -v option the related URL pointing to used file information. The information of TrID points to page about Microsoft Visual C++ on Wikipedia. This was not so helpful, but on file formats archive team website i found a page about Microsoft Library (*.lib). From there i get the right hint that the used file format is the Relocatable Object Module Format (OMF). So that information is now expressed inside Magdir/msvc by comment lines like: # URL: https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B # http://fileformats.archiveteam.org/wiki/Microsoft_Library # http://fileformats.archiveteam.org/wiki/OMF # Ref.: http://mark0.net/download/triddefs_xml.7z # defs/l/lib-msvc.trid.xml # https://pierrelib.pagesperso-orange.fr/exec_formats/OMF_v1.1.pdf I remember the same problems with bad recognition exist also for OMF object modules (*.o,*.obj). So i look again at Magdir/xenix how this was improved for "8086 relocatable (Microsoft)". Because magic is not so strong i put displaying part now inside subroutine omf-lib which starts like: 0 name omf-lib > 0 byte 0xF0 Microsoft Visual C/OMF library!:mime > application/x-omf-lib !:ext lib #>1 uleshort x \b, 1st record data length %#x > 1 uleshort+3 x \b, page size %u 3 ulelong x \b, at %#x > dictionary 7 uleshort x with %u blocks (1.s+3) ubyte x \b, 2nd > record (1.s+3) ubyte !0x80 (type %#x) In the first byte the OMF record type is stored. For OMF libraries this has value 0Fh (LibraryHeaderRecord). Afterwards the first record data length is stored a 2 byte integer in little endian. By adding three you get the length of the whole first record. Apparently for libraries you call it page size. According to documentation page size must be multiple of two (page size=2**n). The lowest possible value is sixteen (16=2**4) and highest possible value 32768 (=2*15). When printing this in hexadecimal this record length looks like ???Dh. So the first nibble has always value D. Instead of generic mime type application/octet-stream i show a user defined one "application/x-omf-lib" and shown file name extension is 3 byte string "lib". At offset 7 the dictionary size is stored as 2 byte integer as number of blocks (a 512 byte). It is written this should be a prime number due to the hashing algorithm. For many examples this is often true but not always in my inspected examples. It is not explicitly written, but when the dictionary size is a multiple of 512 ( that is hexadecimal 200), then it obviously make sense that the dictionary itself start on such a boundary. So for this dictionary offset value stored as 4 byte integer at offset 3 the lowest byte is then always 0 . Now comes the trick to improve recognition. With the help of the stored record length it is possible to jump to second record and further and inspect the next record. According to specification the type of second record is Library Module Header Record (LHEADR=82h), but in my inspected examples i found Translator Header Record (THEADR=80h), according to documentation this does not hurt because the THEADR and LHEADR records are handled identically. So now it is time to interpret and update magic lines inside Magdir/msvc. The identification happens inside Magdir/msvc by three lines like: 0 string \360\015\000\000 Microsoft Visual C library 0 string \360\075\000\000 Microsoft Visual C library 0 string \360\175\000\000 Microsoft Visual C library The TrID tool behave similar. It also checks the first 4 bytes but it ignores the value in second byte completely. Instead it checks for the existence of 4 byte string DATA. The first line is for examples with page size 16, The second line is for examples with page size 64 and the third line is for examples with page size 128. For examples with page size 512 record length is 01FDh. So here third byte has value 01 instead of 00. So such examples are not recognized. One possible solution would be to add lines for maximal 12 variants which look like: 0 string \360\015\000\000 > 0 use omf-lib Instead the starting lines now becomes like: 0 ubelong&0xFF0f80ff =0xF00d0000 >(1.s+3) ubyte&0xFD =0x80 >>0 use omf-lib The first test line is now expressed in a more general way. It test that RecordType is as Library Header Record (F0 hexadecimal), record length is a hexadecimal number like ???D and dictionary offset is multiple of 512 (?????200 hexadecimal and so on). So strength of first test is still 70, but now only about 2 and a half bytes are used for recognition. By second line the OMF record number 2 is checked for valid type (Translator Header Record=THEADR=80h or LHEADR=82h). So now about three and a half bytes are used for recognition. I hope that this is sufficient. If not then more tests for OMF characteristics must be used. That are displayed by sub routine. With the help of stored dictionary offset i am able to inspect the dictionary itself by lines like: >(3.l) ubequad x (%#16.16llx...) According to documentation the first 37 bytes correspond to the 37 buckets. Afterwards FFLAG byte is stored. If this has the value 255, there is no space left. So i show that value by lines like: >(3.l+37) ubyte <0xFF (FFLAG=%#x) >(3.l+37) ubyte =0xFF (FFLAG=full) Afterwards come dictionary entries in the following form; first comes length byte of following symbol, then the following text bytes of symbol and then two bytes specifying the page number. So first dictionary entry is shown by line like: >(3.l+38) pstring x 1st entry %s >>&0 uleshort x in page %u So for library ZLIB.LIB i get here "zlibCompileFlags_ in page 1". So this may help to identify unknown libraries. After the dictionary size a library flag byte is stored. According to documentation value one means case sensitive and all bits are reserved for future use and should be 0, but for old MOUSE.LIB i found here unexpected value 0x4d. So this value maybe can not be used as characteristic and is shown by lines like: >9 ubyte =1 case sensitive >9 ubyte >1 \b, flags %#x The second record is Translator Header Record (THEADR=80h) or Library Module Header Record (LHEADR=82h). Here record content consist just of one pascal string followed by a checksum byte. This information is shown by lines like: >(1.s+6) pstring x "%s" #>>&0 ubyte x checksum %#x Often this string is the library module source name like "dos\crt0.asm" in mlibce.lib, "QB4UTIL.ASM" in QB4UTIL.LIB or "C:\Documents and Settings\Allan Campbell\My Documents\FDOSBoot\ zlib\zutil.c" in ZLIB.LIB. But the string name can also directly specified by the programmer via TITLE pseudo-operand or assembler NAME directive. So sometimes i find title like "87INIT" in FP87.LIB or "ACOSASIN" in MATHC.LIB or "Copyright" in calc-bcc.lib. Afterwards comes third record. This inspection starts with lines like : >>&1 ubyte x \b, 3rd record #>>&1 ubyte x (type %#x) For my inspected examples third record type was a List of Names Record (LNAMES=96h) or Comment Record (88h~COMENT=88h). The LNAMES branch is handled by lines like: >>&1 ubyte =0x96 LNAMES >>>&0 uleshort >2 >>>>&0 ubyte =0 >>>>>&0 pstring x %s >>>>>>&0 pstring x %s >>>>>>>&0 ubyte <44 >>>>>>>>&-1 pstring x %s To display only meaningful content some checks must be done. If record length is too low (lower three) then there is no content. The pascal strings itself can also be empty. This is often the case for first LNAME string. The names themselves are used as segment, class, group, overlay, and selector names. So here i get typical 4 byte strings like CODE (mwlibc.lib) DATA (mwlibc.lib) and longer strings like _TEXT32 (JMPPM32.LIB) _OVLCODE (WOVL.LIB) DGROUP (MOUSE.LIB). So here we find the word DATA, what is mentioned in TrID definition. Now comes problem with used naming. My oldest example was MOUSE.LIB dated from September 1984. According to Wikipedia the first Visual C compiler suite occur at February 1993. So to become general true, the word Visual must be removed from phrase "Microsoft Visual C library". According to reference site about Microsoft Library such libraries would be compiled from source code (BASIC, C, Pascal, etc.). That can be verified by example like QB4UTIL.LIB. Here second record name is "QB4UTIL.ASM". That means source was Assembler code. So the upcase letter C in phrase must be expanded to something like "C, Assembler, Pascal, BASIC" or must be removed. This library format was not only used by Microsoft but also by completely other companies like Borland. This can be seen in example CATDB.LIB where third record is a Translator comment like "TC86 Borland Turbo C++ 3.00". So describing phrase now has shrunken down to just one word library. So the correct describing text should look like "relocatable Object Module Format (OMF) library". For many examples the old description is correct. So to get no total different look i choose a describing text like "Microsoft Visual C/OMF library". Some libraries are misidentified by Magdir/database as "dBase III DBT". Unfortunately xbase memo files have no strong magic, but luckily the displaying part is done by subroutine dbase3-memo-print. At the end of that sub routine the first memo item is shown by line like: >512 string >\0 \b, 1st item "%s" For real examples i get ASCII text like "WHAT IS XBASE" in example test.dbt, "Borges, Malte" in biblio.dbt or "First memo\032\032" in T2.DBT. Before calling this subroutine the first and second character of possible first memo item was tested for not "too low" by 2 lines like : >>>>>>>>>>>512 ubyte >037 >>>>>>>>>>>>513 ubyte >037 >>>>>>>>>>>>>0 use dbase3-memo-print So bad examples AI070GEP.EPS and gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin were skipped. To skip also some Microsoft Visual C, OMF library ( like: BZ2.LIB WATTCPWL.LIB ZLIB.LIB) i insert a test line for first character "not too high" by one additional line. So for libraries with page size 512 the second record start at offset 512 with Record Type byte (80=THEADR) which can be misinterpreted as dbase memo first item starting with \200. >>>>>>>>>>>512 ubyte >037 >>>>>>>>>>>>512 ubyte <0200 >>>>>>>>>>>>>513 ubyte >037 >>>>>>>>>>>>>>0 use dbase3-memo-print Unfortunately most relocatable Object Module Format (OMF) libraries are also misidentified as "SysEx File" by Magdir/sysex because Library Header Record Type byte (F0h) at the beginning can be interpreted as StartSysEx byte (F0h). Afterwards the library first record length is interpreted as MIDI vendor ID. So all libraries with record length 1Dh ( +3= 32 page size) are described by MIDI vendor name "Inventronics" and all libraries with record length 0Dh ( +3= 16 page size) are described by MIDI vendor name "ADA". For libraries with page size 512 (That is record length 1FD) second byte is FD. So here the upper bit is set. That is not true for MIDI SysEx. So such libraries are not misidentified by magic lines inside Magdir/sysex. The description happens by lines that look like: 0 ubeshort&0xFF80 0xF000 SysEx File - >1 byte 0x01 Sequential >1 byte 0x02 IDP >1 byte 0x03 OctavePlateau .. >1 byte 0x0d ADA .. >1 byte 0x1d Inventronics >1 byte 0x57 Acoustic tech. lab. So this now becomes like: 0 ubeshort&0xFF80 0xF000 #!:strength +0 >2 search/11 \xF7 >>0 use midi-sysex 0 name midi-sysex #>1 ubyte x SysEx File - >1 ubyte x MIDI audio System Exclusive (SysEx) message - !:mime audio/x-syx !:ext syx/sysex >1 byte 0x01 Sequential >1 byte 0x02 IDP ... >1 byte 0x66 MIDI Emulator First i put displaying part inside sub routine midi-sysex. After test for StartSysEx byte and upper unused bit of vendor ID it is possible to add more test lines (like i do later to distinguish SysEx from OMF libraries) or change the unspecific test. With --list option i look at the reported strength of patterns for LIB and SYX examples. The MIDI System Exclusive (SysEx) messages with strength=50 comes after Microsoft Visual C library with strength=70. This is OK i think, but if not then the total strength value can be raised or lowered by adding or subtracting integers in order to shift description texts. Furthermore i changed description text "SysEx File" because it was for myself and probably the users irritating. I read this as "sy sex" which sounds like a sexual practice or behaviour. But it is the technical term used and known by audio experts but not by normal users. This is the abbreviation for "System Exclusive". So the inspected files are System Exclusive messages to control MIDI audio devices. So i look how others call such files. The page about SYX file extension on web site fileinfo.com use description "MIDI System Exclusive Message". The page all about SYX files on filext.com use text "SysEx MIDI File". So i finally choose "MIDI audio System Exclusive (SysEx) message". If you are in circles about audio and computers you probably know the phrase MIDI, but my sister working in buero at PC did not know that term. So instead of generic mime type application/octet-stream i also add a user defined one belonging to main class audio. According to page about "System Exclusive Files" on onsongapp.com also the second file extension sysex can be used instead of 3 byte SYX. According to MIDI file specification after StartSysEx byte comes n bytes data. Afterwards comes a required "End of Exclusive" (EOX=0F7) byte. When searching the net for information i read text, that some not well behaved software does not terminate data bytes, but in my dozen inspected examples terminating byte was OK and EOX byte occur only a few bytes later. So the misidentified OMF libraries are skipped by second additional test line that looks like: >2 search/12 \xF7 But i do not know if this is always true, but i think it is better to skip many misidentified and maybe mismatch a few SYX examples. So for control reasons at the end of the sub routine i display information about EOX byte (F7h) by lines like: >4 ubyte 0xF7 \b, at 4 EOX >5 ubyte 0xF7 \b, at 5 EOX ... >13 ubyte 0xF7 \b, at 13 EOX >14 ubyte 0xF7 \b, at 14 EOX For many examples after the hyphen character the vendor id name like Apple or Yamaha is shown by inspecting the next 1 or 3 bytes. Unfortunately for a few samples like example-1manband.syx example-syxcom.syx and example-webmidijs.syx nothing is shown here. When looking in Magdir/sysex as part of file version 5.41 i see that this text file has a version number of 1.10 and is dated with April 2019. On web site midi.org i found a page with assigned manufacturer MIDI SysEx ID numbers dated with March 2021 and on GitHub i a found a CSV table with MIDI Sysex manufacture IDs. By such information sources i added lines for unrecognized IDs, but i only check the 1 byte manufacture numbers. These additional part looks like: >1 byte 0x00 ID EXTENSIONS >1 byte 0x13 Digidesign Inc. >1 byte 0x1e Key Concepts >1 byte 0x20 Passac .. >1 byte 0x66 MIDI Emulator >1 byte 0x7D PROTOTYPING >1 byte 0x7E UNIVERSAL >1 byte 0x7F universal real time Then there are a few entries where manufacture in newer documentation are different. I do not know the reasons, but probably the company is by bought another and so name changed. Such things are expressed by lines like: #>1 byte 0x03 OctavePlateau >1 byte 0x03 Voyetra Turtle Beach #>1 byte 0x09 Gulbransen >1 byte 0x09 MIDI9 #>1 byte 0x26 Solton >1 byte 0x26 Ketron s.r.l. #>1 byte 0x31 Jomox >1 byte 0x31 Viscount International Spa In some cases in current lines the id name is an abbreviation or such a general term, that it gives a wrong direction of meaning. I will illustrate this. For me Garfield is a comic cat. So files with ID 0e are described as "SysEx File - Garfield". So for me that sound like the second phrase is the main part. So i interpret this as the sysex of the Garfield cat. So when using the full name mentioned in newer documents "Garfield Electronics" it becomes clear that this impression is wrong. Examples with id 2b are described as "SysEx File - - SSL". For me this sounds like a module or method of the SSL decryption library. So with full "Solid State Logic Organ Systems" this misinterpretation vanish. Examples with id 0d are described as "SysEx File - ADA". When i hear ADA i get association with the ADA programming language. So for me this sounds like module or format example belonging to ADA programming language. With the full name "ADA Signal Processors Inc." this misinterpretation becomes destroyed. Often by appended phrase Ltd, GmbH or Inc it becomes clear that we are talking about company names. So such changed things are now expressed by lines like: #>1 byte 0x01 Sequential >1 byte 0x01 Sequential Circuits #>1 byte 0x05 Passport >1 byte 0x05 Passport Designs #>1 byte 0x06 Lexicon >1 byte 0x06 Lexicon Inc. #>1 byte 0x0a AKG >1 byte 0x0a AKG Acoustics #>1 byte 0x0d ADA >1 byte 0x0d ADA Signal Processors Inc. #>1 byte 0x0e Garfield >1 byte 0x0e Garfield Electronics #>1 byte 0x19 Harmony >1 byte 0x19 Harmony Systems #>1 byte 0x21 SIEL >1 byte 0x21 Proel Labs (SIEL) #>1 byte 0x2b SSL >1 byte 0x2b Solid State Logic Organ Systems After applying the above mentioned modifications by patches file-5.41-msvc-lib.diff, file-5.41-database-lib.diff and file-5.41-sysex-lib.diff then i get a more correct output with more details like: CATDB.LIB: Microsoft Visual C/OMF library, page size 16, at 0x1200 dictionary with 1 block (FFLAG=0x67) 1st entry CATGETS! in page 57, 2nd record "DB", 3rd record COMMENT class=0 Translator "TC86 Borland Turbo C++ 3.00" DBFNTX.LIB: Microsoft Visual C/OMF library, page size 32, at 0x18200 dictionary with 1 block (FFLAG=0x37) 1st entry dbfntx1! in page 1 case sensitive, 2nd record "C:\XHARBOUR\SRC\SOURCE\RDD\DBFNTX\ dbfntx1.c", 3rd record COMMENT Preserved class=0xaa FLIB7M.LIB: Microsoft Visual C/OMF library, page size 512, at 0x44a00 dictionary with 31 blocks (FFLAG=0xf7) 1st entry IF at XQABS in page 41, 2nd record "fsystem", 3rd record COMMENT Preserved class=0xa1 FP87.LIB: Microsoft Visual C/OMF library, page size 16, at 0xa00 dictionary with 1 block (FFLAG=0x60) 1st entry 87INIT! in page 1, 2nd record "87INIT", 3rd record LNAMES JMPPM32.LIB: Microsoft Visual C/OMF library, page size 16, at 0x600 dictionary with 2 blocks (FFLAG=0x23) 1st entry __movehigh at 0 in page 47, 2nd record "getcmdl.asm", 3rd record LNAMES _TEXT32 CODE LIBFL.LIB: Microsoft Visual C/OMF library, page size 16, at 0x600 dictionary with 2 blocks (FFLAG=0x34) 1st entry yy_flex_free in page 1 case sensitive, 2nd record "liballoc.obj", 3rd record COMMENT Preserved class=0xa1 New OMF extensions n=3 IBM style MOUSE.LIB: Microsoft Visual C/OMF library, page size 512, at 0x800 dictionary with 1 block (FFLAG=0x26) 1st entry MOUSE in page 1, flags 0x4d, 2nd record "MOUSE", 3rd record LNAMES CODE DATA DGROUP PRESET_6.SYX: MIDI audio System Exclusive (SysEx) message - MIDI Emulator, at 4 EOX QB4UTIL.LIB: Microsoft Visual C/OMF library, page size 16, at 0x400 dictionary with 1 block (FFLAG=0x38) 1st entry ASCIIDISPLAY in page 1, 2nd record "QB4UTIL.ASM", 3rd record COMMENT class=0xa3 LIBMOD qb4util REXX.LIB: Microsoft Visual C/OMF library, page size 16, at 0x1800 dictionary with 5 blocks (FFLAG=0x96) 1st entry RXEXITREGISTER in page 31, 2nd record "REXXSAA", 3rd record COMMENT class=0xa0 OMF extensions IMPDEF ordinal REXXSAA exported by REXX T2.DBT: dBase III DBT, next free block index 11, 1st item "First memo\032\032" WATTCPWL.LIB: Microsoft Visual C/OMF library, page size 512, at 0x46000 dictionary with 23 blocks (FFLAG=full) 1st entry _w32_millisec_clock_ in page 33 case sensitive, 2nd record "C:\watcom\watt32\src\version.c", 3rd record COMMENT Preserved class=0xa1 ZLIB.LIB: Microsoft Visual C/OMF library, page size 512, at 0x2ce00 dictionary with 3 blocks (FFLAG=0xc6) 1st entry zlibCompileFlags_ in page 1 case sensitive, 2nd record "C:\Documents and Settings\Allan Campbell \My Documents\FDOSBoot\zlib\zutil.c", 3rd record COMMENT Preserved class=0xa1 example-1manband.syx: MIDI audio System Exclusive (SysEx) message - UNIVERSAL, at 5 EOX example-apple.syx: MIDI audio System Exclusive (SysEx) message - Apple, at 4 EOX example-chromakinetics.syx: MIDI audio System Exclusive (SysEx) message - Roland, at 14 EOX example-roland.syx: MIDI audio System Exclusive (SysEx) message - Roland, at 13 EOX example-somascape2.syx: MIDI audio System Exclusive (SysEx) message - UNIVERSAL, at 5 EOX example-syxcom.syx: MIDI audio System Exclusive (SysEx) message - ID EXTENSIONS, at 10 EOX example-webmidijs.syx: MIDI audio System Exclusive (SysEx) message - PROTOTYPING, at 8 EOX example-yamaha.syx: MIDI audio System Exclusive (SysEx) message - Yamaha, at 8 EOX mlibce.lib: Microsoft Visual C/OMF library, page size 16, at 0x2b400 dictionary with 31 blocks (FFLAG=0xde) 1st entry $i4_m4 in page 8783, 2nd record "dos\crt0.asm", 3rd record COMMENT class=0xa3 LIBMOD crt0 mwlibc.lib: Microsoft Visual C/OMF library, page size 16, at 0x15a00 dictionary with 17 blocks (FFLAG=0x93) 1st entry atoi! in page 458, 2nd record "_exit", 3rd record LNAMES CODE DATA DGROUP unrar-bcc.lib: Microsoft Visual C/OMF library, page size 16, at 0x400 dictionary with 1 block (FFLAG=0x84) 1st entry RARCloseArchive in page 6, 2nd record "DllGetVersion", 3rd record COMMENT class=0xa0 OMF extensions IMPDEF DllGetVersion exported by unrar3.dll I hope my three diff files can be applied in future version of file utility. With best wishes J?rg Jenderek - -- J?rg Jenderek -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYeN2MAAKCRCv8rHJQhrU 1grVAKC/VBJ1l0qBPrarc1tHKRBYexfoPACcC0mB6HAFzVnujoDxQPLMp29fj08= =OPSA -----END PGP SIGNATURE----- -------------- next part -------------- --- file-5.41/magic/Magdir/database.old 2021-10-18 14:20:03 +0000 +++ file-5.41/magic/Magdir/database 2022-01-01 17:47:22 +0000 @@ -416,7 +416,9 @@ # skip AI070GEP.EPS by printable 1st character of 1st memo item >>>>>>>>>>>512 ubyte >037 +# skip some Microsoft Visual C, OMF library like: BZ2.LIB WATTCPWL.LIB ZLIB.LIB +>>>>>>>>>>>>512 ubyte <0200 # skip gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin by printable 2nd character ->>>>>>>>>>>>513 ubyte >037 ->>>>>>>>>>>>>0 use dbase3-memo-print +>>>>>>>>>>>>>513 ubyte >037 +>>>>>>>>>>>>>>0 use dbase3-memo-print # dBASE IV DBT with positive block size >>>>>>>20 uleshort >0 @@ -441,5 +443,8 @@ >20 uleshort !0 \b, block length %u # dBase III memo field terminated by \032\032 +# like: "WHAT IS XBASE" test.dbt "Borges, Malte" biblio.dbt "First memo\032\032" T2.DBT >512 string >\0 \b, 1st item "%s" +# For DEBUGGING +#>512 ubelong x \b, 1ST item %#8.8x # https://www.clicketyclick.dk/databases/xbase/format/dbt.html # Print the information of dBase IV DBT memo file -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-database-lib.diff.sig Type: application/octet-stream Size: 779 bytes Desc: not available URL: -------------- next part -------------- --- file-5.41/magic/Magdir/sysex.old 2020-05-31 10:34:41 +0000 +++ file-5.41/magic/Magdir/sysex 2022-01-16 00:05:47 +0000 @@ -8,11 +8,33 @@ # https://en.wikipedia.org/wiki/MIDI -0 ubeshort&0xFF80 0xF000 SysEx File - - +# test for StartSysEx byte and upper unsed bit of vendor ID +0 ubeshort&0xFF80 0xF000 +# MIDI System Exclusive (SysEx) messages (strength=50) after Microsoft Visual C library (strength=70) +#!:strength +0 +# skip Microsoft Visual C library with page size 16 misidentifed as ADA and +# page size 32 misidentifed as Inventronics by looking for terminating End Of eXclusive byte (EOX) +>2 search/12 \xF7 +>>0 use midi-sysex +# display information about MIDI System Exclusive (SysEx) messages +0 name midi-sysex +# https://fileinfo.com/extension/syx +>1 ubyte x MIDI audio System Exclusive (SysEx) message - +# Note: file (version 5.41) labeled the above entry as "SysEx File" +#!:mime application/octet-stream +!:mime audio/x-syx +# https://onsongapp.com/docs/features/formats/sysex +!:ext syx/sysex +# https://www.midi.org/specifications-old/item/manufacturer-id-numbers +# https://raw.githubusercontent.com/insolace/MIDI-Sysex-MFG-IDs/master/Sysex%20ID%20Tables/MIDI%20Sysex%20MFG%20IDs.csv +# SysEx manufacturer ID; originally one byte, but now 0 is used as an escapement to reach the next two # North American Group ->1 byte 0x01 Sequential +#>1 byte 0x01 Sequential +>1 byte 0x01 Sequential Circuits >1 byte 0x02 IDP ->1 byte 0x03 OctavePlateau +#>1 byte 0x03 OctavePlateau +>1 byte 0x03 Voyetra Turtle Beach >1 byte 0x04 Moog ->1 byte 0x05 Passport ->1 byte 0x06 Lexicon +#>1 byte 0x05 Passport +>1 byte 0x05 Passport Designs +#>1 byte 0x06 Lexicon +>1 byte 0x06 Lexicon Inc. >1 byte 0x07 Kurzweil/Future Retro @@ -40,8 +62,13 @@ >1 byte 0x08 Fender ->1 byte 0x09 Gulbransen ->1 byte 0x0a AKG +#>1 byte 0x09 Gulbransen +>1 byte 0x09 MIDI9 +#>1 byte 0x0a AKG +>1 byte 0x0a AKG Acoustics >1 byte 0x0b Voyce >1 byte 0x0c Waveframe ->1 byte 0x0d ADA ->1 byte 0x0e Garfield +# not ADA programming language +#>1 byte 0x0d ADA +>1 byte 0x0d ADA Signal Processors Inc. +#>1 byte 0x0e Garfield +>1 byte 0x0e Garfield Electronics >1 byte 0x0f Ensoniq @@ -61,3 +88,4 @@ >1 byte 0x18 E-mu ->1 byte 0x19 Harmony +#>1 byte 0x19 Harmony +>1 byte 0x19 Harmony Systems >1 byte 0x1a ART @@ -69,3 +97,4 @@ # European Group ->1 byte 0x21 SIEL +#>1 byte 0x21 SIEL +>1 byte 0x21 Proel Labs (SIEL) >1 byte 0x22 Synthaxe @@ -73,3 +102,4 @@ >1 byte 0x25 Twister ->1 byte 0x26 Solton +#>1 byte 0x26 Solton +>1 byte 0x26 Ketron s.r.l. >1 byte 0x27 Jellinghaus @@ -78,4 +108,6 @@ >1 byte 0x2a JEN ->1 byte 0x2b SSL ->1 byte 0x2c AudioVertrieb +#>1 byte 0x2b SSL +>1 byte 0x2b Solid State Logic Organ Systems +#>1 byte 0x2c AudioVertrieb +>1 byte 0x2c Audio Veritrieb-P. Struven @@ -85,3 +117,4 @@ >1 byte 0x30 Dynacord ->1 byte 0x31 Jomox +#>1 byte 0x31 Jomox +>1 byte 0x31 Viscount International Spa >1 byte 0x33 Clavia @@ -204,3 +237,4 @@ >1 byte 0x47 Akai ->1 byte 0x48 Victor +#>1 byte 0x48 Victor +>1 byte 0x48 Victor Company of Japan. Ltd. >1 byte 0x49 Mesosha @@ -211,3 +245,4 @@ >1 byte 0x51 Fostex ->1 byte 0x52 Zoom +#>1 byte 0x52 Zoom +>1 byte 0x52 Zoom Corporation >1 byte 0x54 Matsushita @@ -319,2 +354,76 @@ +# Update: Joerg Jenderek; January 2022 +>1 byte 0x00 ID EXTENSIONS +>1 byte 0x13 Digidesign Inc. +>1 byte 0x1e Key Concepts +>1 byte 0x20 Passac +>1 byte 0x23 Stepp +>1 byte 0x2d Neve +>1 byte 0x2e Soundtracs Ltd. +>1 byte 0x32 Drawmer +>1 byte 0x34 Audio Architecture +>1 byte 0x35 Generalmusic Corp SpA +>1 byte 0x36 Cheetah Marketing +>1 byte 0x37 C.T.M. +>1 byte 0x38 Simmons UK +>1 byte 0x3a Steinberg +>1 byte 0x3b Wersi GmbH +>1 byte 0x3c AVAB Niethammer AB +>1 byte 0x3d Digigram +>1 byte 0x3f Quasimidi +# +>1 byte 0x40 Kawai Musical Instruments MFG. CO. Ltd +#>1 byte 0x45 foo +#>1 byte 0x4a foo +#>1 byte 0x4d foo +#>1 byte 0x4f foo +#>1 byte 0x53 foo +>1 byte 0x55 Suzuki Musical Instruments MFG. Co. Ltd. +>1 byte 0x56 Fuji Sound Corporation Ltd. +#>1 byte 0x58 foo +>1 byte 0x59 Faith. Inc. +>1 byte 0x5a Internet Corporation +#>1 byte 0x5b foo +>1 byte 0x5c Seekers Co. Ltd. +#>1 byte 0x5d foo +#>1 byte 0x5e foo +>1 byte 0x5f SD Card Association +# Reserved for other uses for 60H to 7FH +# URL: https://www.philscomputerlab.com/roland-midi-emulator-project-20.html +# Reference: http://mark0.net/download/triddefs_xml.7z/defs/s/syx--midiemu.trid.xml +# Note: called by TrID "MIDI Emulator Project SysEx preset command" +>1 byte 0x66 MIDI Emulator +# https://electronicmusic.fandom.com/wiki/List_of_MIDI_Manufacturer_IDs +# Educational, prototyping, test, private use and experimentation +>1 byte 0x7D PROTOTYPING +# universal non-real-time (sample dump, tuning table, etc.) +>1 byte 0x7E UNIVERSAL +# universal real time (MIDI time code, MIDI Machine control, etc.) +>1 byte 0x7F universal real time +# display information about End Of eXclusive byte (EOX=F7) +#>2 ubyte 0xF7 \b, at 2 EOX +#>3 ubyte 0xF7 \b, at 3 EOX +# https://tttapa.github.io/Control-Surface-doc/new-input/Doxygen/d2/d93/SysEx-Send-Receive_8ino-example.html +>4 ubyte 0xF7 \b, at 4 EOX +# http://www.1manband.nl/tutorials2/sysex.htm +>5 ubyte 0xF7 \b, at 5 EOX +# http://www.somascape.org/midi/tech/mfile.html#sysex +>6 ubyte 0xF7 \b, at 6 EOX +# +>7 ubyte 0xF7 \b, at 7 EOX +# https://webmidijs.org/forum/discussion/34/how-to-send-or-receive-system-exclusive-messages +>8 ubyte 0xF7 \b, at 8 EOX +# +>9 ubyte 0xF7 \b, at 9 EOX +# https://www.chd-el.cz/wp-content/uploads/845010_syxcom.pdf +>10 ubyte 0xF7 \b, at 10 EOX +# https://stackoverflow.com/questions/52906076/handling-midi-the-input-of-multiple-system-exclusive-messages-in-vb +>11 ubyte 0xF7 \b, at 11 EOX +# https://www.2writers.com/eddie/TutSysEx.htm +>12 ubyte 0xF7 \b, at 12 EOX +>13 ubyte 0xF7 \b, at 13 EOX +# http://www.chromakinetics.com/handsonic/rolSysEx.htm +>14 ubyte 0xF7 \b, at 14 EOX +#>15 ubyte 0xF7 \b, at 15 EOX + 0 string T707 Roland TR-707 Data -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-sysex-lib.diff.sig Type: application/octet-stream Size: 2814 bytes Desc: not available URL: -------------- next part -------------- --- file-5.41/magic/Magdir/msvc.old 2020-05-31 10:34:40 +0000 +++ file-5.41/magic/Magdir/msvc 2022-01-16 00:50:19 +0000 @@ -22,5 +22,158 @@ #.lib -0 string \360\015\000\000 Microsoft Visual C library -0 string \360\075\000\000 Microsoft Visual C library -0 string \360\175\000\000 Microsoft Visual C library +# URL: https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B +# http://fileformats.archiveteam.org/wiki/Microsoft_Library +# http://fileformats.archiveteam.org/wiki/OMF +# Reference: http://mark0.net/download/triddefs_xml.7z/defs/l/lib-msvc.trid.xml +# https://pierrelib.pagesperso-orange.fr/exec_formats/OMF_v1.1.pdf +# Update: Joerg Jenderek +#0 string \360\015\000\000 Microsoft Visual C library +#0 string \360\075\000\000 Microsoft Visual C library +#0 string \360\175\000\000 Microsoft Visual C library +# test for RecordType~LibraryHeaderRecord=0xF0 + RecordLength=???Dh + dictionary offset is multiple of 0x200 +0 ubelong&0xFF0f80ff =0xF00d0000 +# Microsoft Visual C library (strength=70) before MIDI SysEx messages (strength=50) handled by ./sysex +#!:strength +0 +# test for valid 2nd RecordType~Translator Header Record=THEADR=80h or LHEADR=82h +>(1.s+3) ubyte&0xFD =0x80 +>>0 use omf-lib +# display information about Microsoft Visual C/OMF library +0 name omf-lib +# RecordType~LibraryHeaderRecord=0xF0 +#>0 byte 0xF0 Microsoft Visual C library +# the above description was used in file version 5.41 +>0 byte 0xF0 Microsoft Visual C/OMF library +#>0 byte 0xF0 relocatable Object Module Format (OMF) libray +#!:mime application/octet-stream +!:mime application/x-omf-lib +!:ext lib +# 1st record data length like 13=0Dh 29=1Dh 61=3Dh 125=7Dh 509=01FDh ... 32765=7FFDh +#>1 uleshort x \b, 1st record data length %u +#>1 uleshort x \b, 1st record data length %#x +# 2**4=16 <= RecordLength+3 = PageSize = 2**n {16 32 512 no examples 64 128 256 1024 2048 ...32768} <= 2**15=32768 +>1 uleshort+3 x \b, page size %u +# dictionary offset like: 400h 600h a00h c00h 1200h 1800h 2400h 5600h 12800h 19200h 28a00h +>3 ulelong x \b, at %#x dictionary +# dictionary block a 512 bytes; the first 37 bytes correspond to the 37 buckets +#>(3.l) ubequad x (%#16.16llx...) +# dictionary size; length in 512-byte blocks; a prime number? like: +# 1 2 3 4 5 6 7 9 11 13 15 16 18 21 22 23 24 25 31 50 53 89 101 117 277 +>7 uleshort x with %u block +# plurals s +>7 uleshort >1 \bs +# If dictionary byte 38 (FFLAG) has the value 255, there is no space left +>(3.l+37) ubyte <0xFF (FFLAG=%#x) +>(3.l+37) ubyte =0xFF (FFLAG=full) +# dictionary entry; length byte of following symbol, the following text bytes of symbol, two bytes specifies the page number +# like: dbfntx1! DBFNTX.LIB zlibCompileFlags_ ZLIB.LIB atoi! mwlibc.lib +>(3.l+38) pstring x 1st entry %s +# like: 1 33 41 47 458 8783 +>>&0 uleshort x in page %u +# library flags; 0 or 1, but WHAT IS 0x4d in MOUSE.LIB ? +>9 ubyte >1 \b, flags %#x +>9 ubyte =1 case sensitive +# In the library after header comes first object module with a Library Module Header Record (LHEADR=82h) +# but in examples Translator Header Record (THEADR=80h) which is handled identically +>(1.s+3) ubyte x \b, 2nd record +>(1.s+3) ubyte !0x80 (type %#x) +#>(1.s+4) uleshort x \b, 2nd record data length %u +# Module name often source name like "dos\crt0.asm" in mlibce.lib or "QB4UTIL.ASM" in QB4UTIL.LIB +# or "C:\Documents and Settings\Allan Campbell\My Documents\FDOSBoot\zlib\zutil.c" in ZLIB.LIB +# or title like "87INIT" in FP87.LIB or "ACOSASIN" in MATHC.LIB or "Copyright" in calc-bcc.lib +>(1.s+6) pstring x "%s" +# 2nd record checksum +#>>&0 ubyte x checksum %#x +# 3rd RecordType: 96h~LNAMES 88h~COMENT +>>&1 ubyte x \b, 3rd record +>>&1 ubyte !0x88 +>>>&-1 ubyte !0x96 +# 3rd unusual record type +>>>>&-1 ubyte x (type %#x) +# 3rd record is a List of Names Record (LNAMES=96h) +>>&1 ubyte =0x96 LNAMES +# LNAMES record length like: 2 15 19 +#>>>&0 uleshort x \b, LNAMES record length %u +>>>&0 uleshort >2 +# 1st LNAME string length; null is valid; maximal 255 +#>>>>&0 ubyte x 1st LNAME length %u +>>>>&0 ubyte =0 +# 2nd LNAME length like: 4 7 8 17 31 +#>>>>>&0 ubyte x 2nd LNAME length %u +# name used for segment, class, group, overlay, etc like: +# CODE (mwlibc.lib) _TEXT32 (JMPPM32.LIB) _OVLCODE (WOVL.LIB) +>>>>>&0 pstring x %s +# 3rd LNAME length like: 4 5 +#>>>>>>&0 ubyte x 3rd LNAME length %u +# like: DATA (mwlibc.lib) CODE (JMPPM32.LIB) _TEXT (EMU87.LIB) +>>>>>>&0 pstring x %s +# maybe 4th LNAME length like: 4 6 +>>>>>>>&0 ubyte <44 +# like: DATA (DEBUG.LIB) DGROUP (mwlibc.lib MOUSE.LIB) +>>>>>>>>&-1 pstring x %s +# 3rd record is a COMMENT (Including all comment class extensions) +>>&1 ubyte =0x88 COMMENT +# comment record length like: 3 FLIB7M.LIB 1Bh 1Eh 23h 27h 2Bh 30h freetype-bcc.lib +#>>>&0 uleshort x \b, record length %#x +# real comment length = record length - 1 (comment type) - 1 (comment Class) - 1 (checksum) -1 (char count) +# like: 2 LIBFL.LIB 4 "UUID" 5 "dscap" 6 "int386" 7 "qb4util" 8 "AMSGEXIT" 16 REXX.LIB 20 27 35 44 freetype-bcc.lib +#>>>>&-2 uleshort-4 >0 \b, comment length %u +# check that record contain at least comment type (1 byte), comment class (1 byte), checksum (1 byte) +# probably always true +>>>&0 uleshort >2 +# comment type: 80h~NP~no purge bit 40h~NL~no list bit +#>>>>&0 ubyte !0 Type %#x +>>>>&0 ubyte &0x80 Preserved +# no example +>>>>&0 ubyte &0x40 NoList +# comment class like: 0~Translator A0~OMF extensions A3~LIBMOD A1~New OMF extensions AA~UNKNOWN +>>>>&1 ubyte x class=%#x +# check that comment record contains at least real content +>>>>&-2 uleshort >3 +# Translator comment record (0); it may name the source language or translator +>>>>>&1 ubyte =0 Translator +#>>>>>>&0 ubyte x Translator length %u +# like: "TC86 Borland Turbo C 2.01 " (GEMS.LIB) "TC86 Borland Turbo C++ 3.00" (CATDB.LIB) +>>>>>>&0 pstring x "%s" +# OMF extensions comment record (A0); first byte of commentary string identifies subtype +>>>>>&1 ubyte =0xA0 OMF extensions +# A0 subtype like: 1~IMPDEF +>>>>>>&0 ubyte !1 subtype %#x +# Import Definition Record (Comment Class A0, Subtype 01~IMPDEF) +>>>>>>&0 ubyte 1 IMPDEF +# ordinal flag; determines form of Entry Ident field. If nonzero (seems to be 1) Entry is ordinal +>>>>>>>&0 ubyte !0 ordinal +# like: IMPORT.LIB DOSCALLS.LIB mlibw.lib mwinlibc.lib REXX.LIB +>>>>>>>>&-1 ubyte >1 %u +# Internal Name in count, char string format; module name for the imported symbol +# like: 7 "REXXSAA" 9 11 13 14 15 16 20 21 26 "_Z10_clip_linePdS_S_S_dddd" +#>>>>>>>&1 ubyte x internal name length %u +# internal module name like: _DllGetVersion DllGetVersion BezierTerminationTest Copyright +>>>>>>>&1 pstring x %s +# module name in count, char string format; DLL name that supplies a matching export symbol +# like: jpeg62.dll (jpeg-bcc.lib) unrar3.dll (unrar-bcc.lib) REXX (REXX.LIB) +>>>>>>>>&0 pstring x exported by %s +# Entry Ident; 16-bit if ordinal flag != 0 or imported name in count, char string format if ordinal flag = 0 +# like: \0 (calc-bcc.lib) DllGetVersion (libtiff-bcc.lib) UTF8ToHtml (libxml2-bcc.lib) xslAddCall (libxslt-bcc.lib) +#>>>>>>>>>&0 pstring >\0 entry ident %s +# "New OMF" extensions comment (A1); indicate version of symbolic debug information +# like: LIBFL.LIB +>>>>>&1 ubyte =0xA1 New OMF extensions +# symbolic debug information version n +>>>>>>&0 ubyte x n=%u +# symbolic debug information style like: HL~IBM PM Debugger style (LIBFL.LIB) DX~AIX style CV~Microsoft symbol and type style +>>>>>>>&0 string HL IBM style +>>>>>>>&0 string DX AIX style +>>>>>>>&0 string CV Microsoft style +# LIBMOD comment record (A3) used only by the librarian +# Microsoft extension added for LIB version 3.07 in macro assembler (MASM 5.0) +>>>>>&1 ubyte =0xA3 LIBMOD +# The A3 LIBMOD record contains only the ASCII string of the module name in count char format +#>>>>>>&0 ubyte x LIBMOD length %u +# LIBMOD comment record module name without path and extension like: +# qb4util (QB4UTIL.LIB) affaldiv (libh.lib) crt0 (slibc.lib) clipper (DDDRAWS.LIB) dinpdev (DINPUTS.LIB) UUID (UUID.LIB) +>>>>>>&0 pstring x %s +# GRR: WHAT iS THAT? AA foo comment record +#>>>>>&1 ubyte =0xAA AA-comment +# like: OS220 +#>>>>>>&0 string x what=%-5.5s +# -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-msvc-lib.diff.sig Type: application/octet-stream Size: 3642 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lib_syx-trid-v.txt.gz Type: application/x-gzip Size: 681 bytes Desc: not available URL: From christos at zoulas.com Mon Jan 17 17:02:47 2022 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 17 Jan 2022 12:02:47 -0500 Subject: [File] [PATCH] Magdir/ole2compounddocs for "newer" Adobe PageMaker In-Reply-To: <701dba45-e02a-e297-7f39-916c2df41bf7@gmx.net> References: <701dba45-e02a-e297-7f39-916c2df41bf7@gmx.net> Message-ID: <34469F78-09E9-4AB2-BCB6-35A2D5504B4E@zoulas.com> Committed, thanks! christos > On Jan 11, 2022, at 3:24 PM, J?rg Jenderek wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > some days ago i send patch for "older" Aldus/Adobe PageMaker > documents, which is accepted and now included inside > Magdir/wordprocessors. Now i check "newer" Adobe PageMaker documents. > The documents and templates are files with file name extensions > like PM6 P65 PMD PT6 T65 PMT. > > When running file command version 5.41 with -e cdf option on such > documents i get an output like: > > 02TEMPLT.T65: OLE 2 Compound Document, v3.62, SecID 0, > 2 FAT sectors, > 0 Mini FAT sector : > UNKNOWN with names PageMaker > Charset.pmt: OLE 2 Compound Document, v3.62, SecID 0x66, > 0 Mini FAT sector : > UNKNOWN with names PageMaker > MyPage6.PM6: OLE 2 Compound Document, v3.62, SecID 0x1, > 0 Mini FAT sector : > UNKNOWN with names PageMaker > brochus.pt6: OLE 2 Compound Document, v3.62, SecID 0x1, > 0 Mini FAT sector : > UNKNOWN with names PageMaker > pm-70.pmd: OLE 2 Compound Document, v3.62, SecID 0, > 0 Mini FAT sector : > UNKNOWN with names PageMaker > strategies.p65: OLE 2 Compound Document, v3.62, SecID 0, > 24 FAT sectors, > Mini FAT start sector 0x2a, > 25 Mini FAT sectors : > UNKNOWN with names PageMaker ObjectPool 1 > > Furthermore with -i option only generic application/CDFV2 is shown. > With -i and -e cdf option mime type application/x-ole-storage is > shown. With option --extension only 3 byte sequence ??? is shown. > > No oficial mime type come from Microsoft. Blame on them. But at > least according to FreeDesktop.org shared MIME database > "application/x-ole-storage" seems to be the most common used. > This information can also be found on reposcope.com website. > So i think the file command should also use this term or at least use > the same term when using soft or cdf magic. So i changed in current > src/readcdf.c this mime type. That looked like: > } else if (ms->flags & MAGIC_MIME_TYPE) { > if (file_printf(ms, "application/CDFV2") == -1) > return -1; > } > When running file command with -e soft or no extra option for all > examples i get a generic line like: > Composite Document File V2 Document, Cannot read section info > > For comparison reason i run the file format identification utility > TrID ( See https://mark0.net/soft-trid-e.html). This identifies also > all examples with low priority as "Generic OLE2 / Multistream > Compound" by docfile.trid.xml. Most examples are described as "Adobe > PageMaker document (generic)" with mime type application/x-pagemaker > by pagemaker-generic.trid.xml. The examples are described often also > as "Adobe PageMaker document (v6)" by pagemaker-pm6.trid.xml, "Adobe > PageMaker document (v6.5)" by pagemaker-pm65.trid.xml and "Page Maker > 7 Document" by pmd-pm7.trid.xml without correct version > differentiation. So also mentioned 3 filename extensions PM6, P65 and > PMD are not in right order. Furthermore here the file name extensions > for templates (PT6 T65 PMT) with character T are also missing (See > appended trid-v-pagemaker-new.txt.gz). > > For comparison reason i also run the file format identification > utility DROID ( See https://sourceforge.net/projects/droid/). This > identifies all new pagemaker examples as "Pagemaker Document > (Generic)" with mime type application/vnd.pagemaker by PUID fmt/876. > But it only shows 2 extensions PMD and PMT (See appended > DROID-pagemaker-new.csv.gz) > > Luckily i also found a page about PageMaker on file formats archive > team web site. That informations are about the "old" variants and > also the "new" variants. That informations are expressed by comment > lines inside Magdir/ole2compounddocs like: > # URL: http://fileformats.archiveteam.org/wiki/PageMaker > # Reference: http://mark0.net/download/triddefs_xml.7z/defs/p > # pagemaker-generic.trid.xml > # pagemaker-pm6.trid.xml > # pagemaker-pm65.trid.xml > # pmd-pm7.trid.xml > > The Pagemaker documents are recognized as "OLE 2 Compound Document" > by starting bytes (\320\317\021\340\241\261\032\341) at the beginning > inside Magdir/ole2compounddocs. Obviously there exist no code > fragment to do sub class identification. So the examples are > described as "UNKNOWN". Furthermore the examples have no registered > Root storage object CLSID or this value is nil. In that case file > command would display afterwords this information by a phrase like > ", clsid 0xc0c7266eb98cd311a1c800c04f612452". That means that in > branch handling CLSID GUID 0 code must be added. The last entry was > for SoftMaker Presentations or template (*.prd *.prv) with pictures. > > So i add afterwards lines for my inspected examples. Luckily file > command print some directory entry names. In all examples this is > word "PageMaker" encoded as UTF-16. This characteristic is also > found in global string section inside TrID definition by line like: > P'A'G'E'M'A'K'E'R > When i extract this stream for example by Michal Mutl Structured > Storage Viewer i get real pagemaker content in "old" format. This > is also described in the documentation and these parts are > recognised by Magdir/wordprocessors. So by first additional line i > look for second directory entry with UTF-16 encoded name PageMaker. > That looks like: >>>>> 128 lestring16 PageMaker : > > In second step i must jump to stream part. Maybe there exist more > efficient or better ways, but i do brute force looking for start > magic of "old" PageMaker by line like: >>>>>> 0 search/0xa900/s \0\0\0\0\0\0\xff\x99 > In third step i handle this stream part by lines like: > #>>>>>>&0 use PageMaker >>>>>>> &0 indirect x > I first tried to call directly sub routine PageMaker from > Magdir/wordprocessors, but then i get wrong version. Maybe this is > bug in file command. When i use instead the indirect directive i > get correct identifications. But i also get an ugly side effect. > Afterwards an additional unexpected phrase UNKNOWN0000000000000000 > is displayed. > > This was triggered by part for remaining non nil clsid. That was > done by lines like: >>> 88 default x : UNKNOWN >>>> 80 ubequad !0 \b, clsid %#16.16llx >>>> 88 ubequad x \b%16.16llx > This should not happen! I do not know what is wrong here. So i > check again for non nil GUID. So this now becomes like: >>> 88 default x >>>> 88 ubequad !0 : UNKNOWN >>>>> 80 ubequad !0 \b, clsid %#16.16llx >>>>> 88 ubequad x \b%16.16llx > > After applying the above mentioned modifications by patch > file-5.41-ole2compounddocs-pagemaker.diff, > file-5.41-readcdf-mime.diff and using newest Magdir/wordprocessors > then all my inspected "newer" PageMaker documents are now described > with more details. This now looks with -e cdf option like: > 02TEMPLT.T65: OLE 2 Compound Document, v3.62, SecID 0, > 2 FAT sectors, > 0 Mini FAT sector : > Adobe PageMaker document, little-endian, version 6.50 > Charset.pmt: OLE 2 Compound Document, v3.62, SecID 0x66, > 0 Mini FAT sector : > Adobe PageMaker document, little-endian, version 6.50 > MyPage6.PM6: OLE 2 Compound Document, v3.62, SecID 0x1, > 0 Mini FAT sector : > Adobe PageMaker document, little-endian, version 6 > brochus.pt6: OLE 2 Compound Document, v3.62, SecID 0x1, > 0 Mini FAT sector : > Adobe PageMaker document, little-endian, version 6 > pm-70.pmd: OLE 2 Compound Document, v3.62, SecID 0, > 0 Mini FAT sector : > Adobe PageMaker document, little-endian, version 6.50 > strategies.p65: OLE 2 Compound Document, v3.62, SecID 0, > 24 FAT sectors, > Mini FAT start sector 0x2a, > 25 Mini FAT sectors : > Adobe PageMaker document, little-endian, version 6.50 > > With -e cdf and --extension option this now looks like: > 02TEMPLT.T65: p65/t65/pmd/pmt > Charset.pmt: p65/t65/pmd/pmt > MyPage6.PM6: pm6/pt6 > brochus.pt6: pm6/pt6 > pm-70.pmd: p65/t65/pmd/pmt > strategies.p65: p65/t65/pmd/pmt > > I hope my diff files can be applied in future version of file > utility. So unfortunately no ways are described and found by myself > to distinguish templates with other file name extensions from pure > PageMaker publications. Also i found no way to distinguish version > 6.5 (*.P65 *.T65) from version 7 (*.PMD *.PMT). > > Check the facts as far as you can. Listen to what scientists and > the experts of the departments recommend. Accordingly, the vaccine > is the most suited measure against Corona. Anyone who believes in > Fake news, also storms as MOB the Capitol, mocks science and > terrorizes the to silent majority of the population. Stay healthy. > > J?rg Jenderek > - -- > J?rg Jenderek > -----BEGIN PGP SIGNATURE----- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYd3nXQAKCRCv8rHJQhrU > 1nJVAKDWay4r61LNcGvLo/8tNO2b8R/SvgCeIelamPiKS+QVYX0dR78c8xiXUBg= > =JjJT > -----END PGP SIGNATURE----- > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From christos at zoulas.com Mon Jan 17 17:19:24 2022 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 17 Jan 2022 12:19:24 -0500 Subject: [File] [PATCH] of Magdir/msvc, database, sysex for Microsoft Visual C or OMF library *.LIB In-Reply-To: References: Message-ID: <9BFC5AFF-8448-4BF2-8EAD-C1DBA2AC3129@zoulas.com> Wow, applied thanks! christos > On Jan 15, 2022, at 8:34 PM, J?rg Jenderek wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > some days ago i handled some libraries with file name extension LIB. > > When running file command version 5.41 on some examples > and related files with --keep-going option i get an output like: > > DBFNTX.LIB: SysEx File - Inventronics > FLIB7M.LIB: data > FP87.LIB: Microsoft Visual C library > SysEx File - ADA > JMPPM32.LIB: Microsoft Visual C library > SysEx File - ADA > MOUSE.LIB: data > PRESET_6.SYX: SysEx File - > QB4UTIL.LIB: Microsoft Visual C library > SysEx File - ADA > T2.DBT: dBase III DBT, > next free block index 11, > 1st item "First memo\032\032" > WATTCPWL.LIB: dBase III DBT, version number 0, > next free block index 130544, > 1st item "\200 " > ZLIB.LIB: dBase III DBT, version number 0, > next free block index 130544, > 1st item "\200M" > example-1manband.syx: SysEx File - > example-apple.syx: SysEx File - Apple > example-somascape2.syx: SysEx File - > example-webmidijs.syx: SysEx File - > example-yamaha.syx: SysEx File - Yamaha > mlibce.lib: Microsoft Visual C library > SysEx File - ADA > mwlibc.lib: Microsoft Visual C library > SysEx File - ADA > > With --extension option wrong 3 byte extension "dbt" or ??? are > displayed and with -i option wrong mime type application/x-dbt or > only generic application/octet-stream is shown. > > For comparison reason i run the file format identification utility > TrID ( See https://mark0.net/soft-trid-e.html). This describes one > SysEx example PRESET_6.SYX as "MIDI Emulator Project SysEx preset > command" by syx--midiemu.trid.xml whereas the file command display no > vendor name for this type (66h). > > The libraries recognised by file command are also described as > "Microsoft Visual C Library" by lib-msvc.trid.xml, but TrID also > describe some examples like DBFNTX.LIB correctly. > (See appended lib_syx-trid-v.txt.gz) > > Luckily TrID tool displays correct file name extension LIB for > inspected libraries and SYX for few SysEx files. > This list with -v option the related URL pointing to used file > information. > > The information of TrID points to page about Microsoft Visual C++ > on Wikipedia. This was not so helpful, but on file formats archive > team website i found a page about Microsoft Library (*.lib). From > there i get the right hint that the used file format is the > Relocatable Object Module Format (OMF). So that information is now > expressed inside Magdir/msvc by comment lines like: > # URL: https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B > # http://fileformats.archiveteam.org/wiki/Microsoft_Library > # http://fileformats.archiveteam.org/wiki/OMF > # Ref.: http://mark0.net/download/triddefs_xml.7z > # defs/l/lib-msvc.trid.xml > # https://pierrelib.pagesperso-orange.fr/exec_formats/OMF_v1.1.pdf > > I remember the same problems with bad recognition exist also for OMF > object modules (*.o,*.obj). So i look again at Magdir/xenix how this > was improved for "8086 relocatable (Microsoft)". > > Because magic is not so strong i put displaying part now inside > subroutine omf-lib which starts like: > > 0 name omf-lib >> 0 byte 0xF0 Microsoft Visual C/OMF library!:mime >> application/x-omf-lib > !:ext lib > #>1 uleshort x \b, 1st record data length %#x >> 1 uleshort+3 x \b, page size %u 3 ulelong x \b, at %#x >> dictionary 7 uleshort x with %u blocks (1.s+3) ubyte x \b, 2nd >> record (1.s+3) ubyte !0x80 (type %#x) > > In the first byte the OMF record type is stored. For OMF libraries > this has value 0Fh (LibraryHeaderRecord). Afterwards the first record > data length is stored a 2 byte integer in little endian. By adding > three you get the length of the whole first record. Apparently for > libraries you call it page size. According to documentation page size > must be multiple of two (page size=2**n). The lowest possible value > is sixteen (16=2**4) and highest possible value 32768 (=2*15). When > printing this in hexadecimal this record length looks like ???Dh. So > the first nibble has always value D. Instead of generic mime type > application/octet-stream i show a user defined one > "application/x-omf-lib" and shown file name extension is 3 byte > string "lib". > At offset 7 the dictionary size is stored as 2 byte integer as number > of blocks (a 512 byte). It is written this should be a prime number > due to the hashing algorithm. For many examples this is often true > but not always in my inspected examples. It is not explicitly > written, but when the dictionary size is a multiple of 512 ( that is > hexadecimal 200), then it obviously make sense that the dictionary > itself start on such a boundary. So for this dictionary offset value > stored as 4 byte integer at offset 3 the lowest byte is then always 0 > . > > Now comes the trick to improve recognition. With the help of the > stored record length it is possible to jump to second record and > further and inspect the next record. According to specification the > type of second record is Library Module Header Record (LHEADR=82h), > but in my inspected examples i found Translator Header Record > (THEADR=80h), according to documentation this does not hurt because > the THEADR and LHEADR records are handled identically. > > So now it is time to interpret and update magic lines inside > Magdir/msvc. The identification happens inside Magdir/msvc by three > lines like: > 0 string \360\015\000\000 Microsoft Visual C library > 0 string \360\075\000\000 Microsoft Visual C library > 0 string \360\175\000\000 Microsoft Visual C library > > The TrID tool behave similar. It also checks the first 4 bytes but it > ignores the value in second byte completely. Instead it checks for > the existence of 4 byte string DATA. > > The first line is for examples with page size 16, The second line > is for examples with page size 64 and the third line is for examples > with page size 128. For examples with page size 512 record length is > 01FDh. So here third byte has value 01 instead of 00. So such > examples are not recognized. One possible solution would be to add > lines for maximal 12 variants which look like: > 0 string \360\015\000\000 >> 0 use omf-lib > > Instead the starting lines now becomes like: > 0 ubelong&0xFF0f80ff =0xF00d0000 >> (1.s+3) ubyte&0xFD =0x80 >>> 0 use omf-lib > > The first test line is now expressed in a more general way. It test > that RecordType is as Library Header Record (F0 hexadecimal), record > length is a hexadecimal number like ???D and dictionary offset is > multiple of 512 (?????200 hexadecimal and so on). So strength of > first test is still 70, but now only about 2 and a half bytes are > used for recognition. By second line the OMF record number 2 is > checked for valid type (Translator Header Record=THEADR=80h or > LHEADR=82h). So now about three and a half bytes are used for > recognition. I hope that this is sufficient. If not then more tests > for OMF characteristics must be used. That are displayed by sub > routine. > > With the help of stored dictionary offset i am able to inspect the > dictionary itself by lines like: >> (3.l) ubequad x (%#16.16llx...) > According to documentation the first 37 bytes correspond to the 37 > buckets. Afterwards FFLAG byte is stored. If this has the value 255, > there is no space left. So i show that value by lines like: >> (3.l+37) ubyte <0xFF (FFLAG=%#x) >> (3.l+37) ubyte =0xFF (FFLAG=full) > Afterwards come dictionary entries in the following form; first > comes length byte of following symbol, then the following text bytes > of symbol and then two bytes specifying the page number. So first > dictionary entry is shown by line like: >> (3.l+38) pstring x 1st entry %s >>> &0 uleshort x in page %u > So for library ZLIB.LIB i get here "zlibCompileFlags_ in page 1". So > this may help to identify unknown libraries. > > After the dictionary size a library flag byte is stored. According to > documentation value one means case sensitive and all bits are > reserved for future use and should be 0, but for old MOUSE.LIB i > found here unexpected value 0x4d. So this value maybe can not be used > as characteristic and is shown by lines like: >> 9 ubyte =1 case sensitive >> 9 ubyte >1 \b, flags %#x > > The second record is Translator Header Record (THEADR=80h) or Library > Module Header Record (LHEADR=82h). Here record content consist just > of one pascal string followed by a checksum byte. This information is > shown by lines like: >> (1.s+6) pstring x "%s" > #>>&0 ubyte x checksum %#x > > Often this string is the library module source name like > "dos\crt0.asm" in mlibce.lib, "QB4UTIL.ASM" in QB4UTIL.LIB or > "C:\Documents and Settings\Allan Campbell\My Documents\FDOSBoot\ > zlib\zutil.c" in ZLIB.LIB. But the string name can also directly > specified by the programmer via TITLE pseudo-operand > or assembler NAME directive. So sometimes i find title like "87INIT" > in FP87.LIB or "ACOSASIN" in MATHC.LIB or "Copyright" in > calc-bcc.lib. > > Afterwards comes third record. This inspection starts with lines like > : >>> &1 ubyte x \b, 3rd record > #>>&1 ubyte x (type %#x) > > For my inspected examples third record type was a List of Names > Record (LNAMES=96h) or Comment Record (88h~COMENT=88h). > The LNAMES branch is handled by lines like: >>> &1 ubyte =0x96 LNAMES >>>> &0 uleshort >2 >>>>> &0 ubyte =0 >>>>>> &0 pstring x %s >>>>>>> &0 pstring x %s >>>>>>>> &0 ubyte <44 >>>>>>>>> &-1 pstring x %s > To display only meaningful content some checks must be done. If > record length is too low (lower three) then there is no content. The > pascal strings itself can also be empty. This is often the case for > first LNAME string. The names themselves are used as segment, class, > group, overlay, and selector names. So here i get typical 4 byte > strings like CODE (mwlibc.lib) DATA (mwlibc.lib) and longer strings > like _TEXT32 (JMPPM32.LIB) _OVLCODE (WOVL.LIB) DGROUP (MOUSE.LIB). > So here we find the word DATA, what is mentioned in TrID definition. > > Now comes problem with used naming. My oldest example was MOUSE.LIB > dated from September 1984. According to Wikipedia the first > Visual C compiler suite occur at February 1993. So to become > general true, the word Visual must be removed from phrase > "Microsoft Visual C library". According to reference site about > Microsoft Library > such libraries would be compiled from source code (BASIC, C, > Pascal, etc.). That can be verified by example like QB4UTIL.LIB. > Here second record name is "QB4UTIL.ASM". That means source was > Assembler code. So the upcase letter C in phrase must be expanded > to something like "C, Assembler, Pascal, BASIC" or must be removed. > This library format was not only used by Microsoft but also by > completely other companies like Borland. This can be seen in > example CATDB.LIB where third record is a Translator comment like > "TC86 Borland Turbo C++ 3.00". > So describing phrase now has shrunken down to just one word library. > So the correct describing text should look like "relocatable Object > Module Format (OMF) library". For many examples the old description > is correct. So to get no total different look i choose a describing > text like "Microsoft Visual C/OMF library". > > Some libraries are misidentified by Magdir/database as "dBase III > DBT". Unfortunately xbase memo files have no strong magic, but > luckily the displaying part is done by subroutine > dbase3-memo-print. At the end of that sub routine the first memo > item is shown by line like: >> 512 string >\0 \b, 1st item "%s" > For real examples i get ASCII text like "WHAT IS XBASE" in example > test.dbt, "Borges, Malte" in biblio.dbt or "First memo\032\032" in > T2.DBT. > > Before calling this subroutine the first and second character of > possible first memo item was tested for not "too low" by 2 lines like > : >>>>>>>>>>>> 512 ubyte >037 >>>>>>>>>>>>> 513 ubyte >037 >>>>>>>>>>>>>> 0 use dbase3-memo-print > So bad examples AI070GEP.EPS and > gluon-ffhat-1.0-tp-link-tl-wr1043n-nd-v2-sysupgrade.bin were skipped. > > To skip also some Microsoft Visual C, OMF library ( like: BZ2.LIB > WATTCPWL.LIB ZLIB.LIB) i insert a test line for first character "not > too high" by one additional line. So for libraries with page size > 512 the second record start at offset 512 with Record Type byte > (80=THEADR) which can be misinterpreted as dbase memo first item > starting with \200. >>>>>>>>>>>> 512 ubyte >037 >>>>>>>>>>>>> 512 ubyte <0200 >>>>>>>>>>>>>> 513 ubyte >037 >>>>>>>>>>>>>>> 0 use dbase3-memo-print > > Unfortunately most relocatable Object Module Format (OMF) libraries > are also misidentified as "SysEx File" by Magdir/sysex because > Library Header Record Type byte (F0h) at the beginning can be > interpreted as StartSysEx byte (F0h). Afterwards the library first > record length is interpreted as MIDI vendor ID. So all libraries with > record length 1Dh ( +3= 32 page size) are described by MIDI vendor > name "Inventronics" and all libraries with record length 0Dh ( +3= 16 > page size) are described by MIDI vendor name "ADA". For libraries > with page size 512 (That is record length 1FD) second byte is FD. So > here the upper bit is set. That is not true for MIDI SysEx. So such > libraries are not misidentified by magic lines inside Magdir/sysex. > The description happens by lines that look like: > > 0 ubeshort&0xFF80 0xF000 SysEx File - >> 1 byte 0x01 Sequential >> 1 byte 0x02 IDP >> 1 byte 0x03 OctavePlateau > .. >> 1 byte 0x0d ADA > .. >> 1 byte 0x1d Inventronics >> 1 byte 0x57 Acoustic tech. lab. > > So this now becomes like: > 0 ubeshort&0xFF80 0xF000 > #!:strength +0 >> 2 search/11 \xF7 >>> 0 use midi-sysex > 0 name midi-sysex > #>1 ubyte x SysEx File - >> 1 ubyte x MIDI audio System Exclusive (SysEx) message - > !:mime audio/x-syx > !:ext syx/sysex >> 1 byte 0x01 Sequential >> 1 byte 0x02 IDP > ... >> 1 byte 0x66 MIDI Emulator > > First i put displaying part inside sub routine midi-sysex. > > After test for StartSysEx byte and upper unused bit of vendor ID it > is possible to add more test lines (like i do later to distinguish > SysEx from OMF libraries) or change the unspecific test. > > With --list option i look at the reported strength of patterns for > LIB and SYX examples. The MIDI System Exclusive (SysEx) messages > with strength=50 comes after Microsoft Visual C library with > strength=70. This is OK i think, but if not then the total strength > value can be raised or lowered by adding or subtracting integers in > order to shift description texts. > > Furthermore i changed description text "SysEx File" because it was > for myself and probably the users irritating. I read this as "sy sex" > which sounds like a sexual practice or behaviour. But it is the > technical term used and known by audio experts but not by normal > users. This is the abbreviation for "System Exclusive". So the > inspected files are System Exclusive messages to control MIDI audio > devices. So i look how others call such files. The page about SYX > file extension on web site fileinfo.com use description "MIDI System > Exclusive Message". The page all about SYX files on filext.com use > text "SysEx MIDI File". So i finally choose "MIDI audio System > Exclusive (SysEx) message". If you are in circles about audio and > computers you probably know the phrase MIDI, but my sister working in > buero at PC did not know that term. So instead of generic mime type > application/octet-stream i also add a user defined one belonging to > main class audio. > > According to page about "System Exclusive Files" on onsongapp.com > also the second file extension sysex can be used instead of 3 byte > SYX. > > According to MIDI file specification after StartSysEx byte comes n > bytes data. Afterwards comes a required "End of Exclusive" (EOX=0F7) > byte. When searching the net for information i read text, that some > not well behaved software does not terminate data bytes, but in my > dozen inspected examples terminating byte was OK and EOX byte occur > only a few bytes later. So the misidentified OMF libraries are > skipped by second additional test line that looks like: >> 2 search/12 \xF7 > But i do not know if this is always true, but i think it is better to > skip many misidentified and maybe mismatch a few SYX examples. So for > control reasons at the end of the sub routine i display information > about EOX byte (F7h) by lines like: >> 4 ubyte 0xF7 \b, at 4 EOX >> 5 ubyte 0xF7 \b, at 5 EOX > ... >> 13 ubyte 0xF7 \b, at 13 EOX >> 14 ubyte 0xF7 \b, at 14 EOX > > For many examples after the hyphen character the vendor id name like > Apple or Yamaha is shown by inspecting the next 1 or 3 bytes. > Unfortunately for a few samples like example-1manband.syx > example-syxcom.syx and example-webmidijs.syx nothing is shown here. > When looking in Magdir/sysex as part of file version 5.41 i see that > this text file has a version number of 1.10 and is dated with April > 2019. On web site midi.org i found a page with assigned manufacturer > MIDI SysEx ID numbers dated with March 2021 and on GitHub i a found a > CSV table with MIDI Sysex manufacture IDs. By such information > sources i added lines for unrecognized IDs, but i only check the 1 > byte manufacture numbers. These additional part looks like: > >> 1 byte 0x00 ID EXTENSIONS >> 1 byte 0x13 Digidesign Inc. >> 1 byte 0x1e Key Concepts >> 1 byte 0x20 Passac > .. >> 1 byte 0x66 MIDI Emulator >> 1 byte 0x7D PROTOTYPING >> 1 byte 0x7E UNIVERSAL >> 1 byte 0x7F universal real time > > Then there are a few entries where manufacture in newer documentation > are different. I do not know the reasons, but probably the company is > by bought another and so name changed. Such things are expressed by > lines like: > > #>1 byte 0x03 OctavePlateau >> 1 byte 0x03 Voyetra Turtle Beach > #>1 byte 0x09 Gulbransen >> 1 byte 0x09 MIDI9 > #>1 byte 0x26 Solton >> 1 byte 0x26 Ketron s.r.l. > #>1 byte 0x31 Jomox >> 1 byte 0x31 Viscount International Spa > > In some cases in current lines the id name is an abbreviation or > such a general term, that it gives a wrong direction of meaning. > I will illustrate this. For me Garfield is a comic cat. So files with > ID 0e are described as "SysEx File - Garfield". So for me that sound > like the second phrase is the main part. So i interpret this as the > sysex of the Garfield cat. So when using the full name mentioned in > newer documents "Garfield Electronics" it becomes clear that this > impression is wrong. Examples with id 2b are described as "SysEx File > - - SSL". For me this sounds like a module or method of the SSL > decryption library. So with full "Solid State Logic Organ Systems" > this misinterpretation vanish. Examples with id 0d are described as > "SysEx File - ADA". When i hear ADA i get association with the ADA > programming language. So for me this sounds like module or format > example belonging to ADA programming language. With the full name > "ADA Signal Processors Inc." this misinterpretation becomes > destroyed. Often by appended phrase Ltd, GmbH or Inc it becomes clear > that we are talking about company names. So such changed things are > now expressed by lines like: > #>1 byte 0x01 Sequential >> 1 byte 0x01 Sequential Circuits > #>1 byte 0x05 Passport >> 1 byte 0x05 Passport Designs > #>1 byte 0x06 Lexicon >> 1 byte 0x06 Lexicon Inc. > #>1 byte 0x0a AKG >> 1 byte 0x0a AKG Acoustics > #>1 byte 0x0d ADA >> 1 byte 0x0d ADA Signal Processors Inc. > #>1 byte 0x0e Garfield >> 1 byte 0x0e Garfield Electronics > #>1 byte 0x19 Harmony >> 1 byte 0x19 Harmony Systems > #>1 byte 0x21 SIEL >> 1 byte 0x21 Proel Labs (SIEL) > #>1 byte 0x2b SSL >> 1 byte 0x2b Solid State Logic Organ Systems > > After applying the above mentioned modifications by patches > file-5.41-msvc-lib.diff, file-5.41-database-lib.diff and > file-5.41-sysex-lib.diff then i get a more correct output with more > details like: > > CATDB.LIB: Microsoft Visual C/OMF library, > page size 16, at 0x1200 dictionary with > 1 block (FFLAG=0x67) 1st entry > CATGETS! in page 57, > 2nd record > "DB", > 3rd record COMMENT class=0 Translator > "TC86 Borland Turbo C++ 3.00" > DBFNTX.LIB: Microsoft Visual C/OMF library, > page size 32, at 0x18200 dictionary with > 1 block (FFLAG=0x37) 1st entry > dbfntx1! in page 1 > case sensitive, > 2nd record > "C:\XHARBOUR\SRC\SOURCE\RDD\DBFNTX\ > dbfntx1.c", > 3rd record COMMENT Preserved class=0xaa > FLIB7M.LIB: Microsoft Visual C/OMF library, > page size 512, at 0x44a00 dictionary with > 31 blocks (FFLAG=0xf7) 1st entry > IF at XQABS in page 41, > 2nd record > "fsystem", > 3rd record COMMENT Preserved class=0xa1 > FP87.LIB: Microsoft Visual C/OMF library, > page size 16, at 0xa00 dictionary with > 1 block (FFLAG=0x60) 1st entry > 87INIT! in page 1, > 2nd record > "87INIT", > 3rd record LNAMES > JMPPM32.LIB: Microsoft Visual C/OMF library, > page size 16, at 0x600 dictionary with > 2 blocks (FFLAG=0x23) 1st entry > __movehigh at 0 in page 47, > 2nd record > "getcmdl.asm", > 3rd record LNAMES _TEXT32 CODE > LIBFL.LIB: Microsoft Visual C/OMF library, > page size 16, at 0x600 dictionary with > 2 blocks (FFLAG=0x34) 1st entry > yy_flex_free in page 1 > case sensitive, > 2nd record > "liballoc.obj", > 3rd record COMMENT Preserved class=0xa1 > New OMF extensions n=3 IBM style > MOUSE.LIB: Microsoft Visual C/OMF library, > page size 512, at 0x800 dictionary with > 1 block (FFLAG=0x26) 1st entry > MOUSE in page 1, flags 0x4d, > 2nd record > "MOUSE", > 3rd record LNAMES CODE DATA DGROUP > PRESET_6.SYX: MIDI audio System Exclusive (SysEx) > message - MIDI Emulator, at 4 EOX > QB4UTIL.LIB: Microsoft Visual C/OMF library, > page size 16, at 0x400 dictionary with > 1 block (FFLAG=0x38) 1st entry > ASCIIDISPLAY in page 1, > 2nd record > "QB4UTIL.ASM", > 3rd record COMMENT class=0xa3 > LIBMOD qb4util > REXX.LIB: Microsoft Visual C/OMF library, > page size 16, at 0x1800 dictionary with > 5 blocks (FFLAG=0x96) 1st entry > RXEXITREGISTER in page 31, > 2nd record > "REXXSAA", > 3rd record COMMENT class=0xa0 > OMF extensions IMPDEF ordinal > REXXSAA exported by REXX > T2.DBT: dBase III DBT, next free block index 11, > 1st item "First memo\032\032" > WATTCPWL.LIB: Microsoft Visual C/OMF library, > page size 512, at 0x46000 dictionary with > 23 blocks (FFLAG=full) 1st entry > _w32_millisec_clock_ in page 33 > case sensitive, > 2nd record > "C:\watcom\watt32\src\version.c", > 3rd record COMMENT Preserved class=0xa1 > ZLIB.LIB: Microsoft Visual C/OMF library, > page size 512, at 0x2ce00 dictionary with > 3 blocks (FFLAG=0xc6) 1st entry > zlibCompileFlags_ in page 1 > case sensitive, > 2nd record > "C:\Documents and Settings\Allan Campbell > \My Documents\FDOSBoot\zlib\zutil.c", > 3rd record COMMENT Preserved class=0xa1 > example-1manband.syx: MIDI audio System Exclusive (SysEx) > message - UNIVERSAL, at 5 EOX > example-apple.syx: MIDI audio System Exclusive (SysEx) > message - Apple, at 4 EOX > example-chromakinetics.syx: MIDI audio System Exclusive (SysEx) > message - Roland, at 14 EOX > example-roland.syx: MIDI audio System Exclusive (SysEx) > message - Roland, at 13 EOX > example-somascape2.syx: MIDI audio System Exclusive (SysEx) > message - UNIVERSAL, at 5 EOX > example-syxcom.syx: MIDI audio System Exclusive (SysEx) > message - ID EXTENSIONS, at 10 EOX > example-webmidijs.syx: MIDI audio System Exclusive (SysEx) > message - PROTOTYPING, at 8 EOX > example-yamaha.syx: MIDI audio System Exclusive (SysEx) > message - Yamaha, at 8 EOX > mlibce.lib: Microsoft Visual C/OMF library, > page size 16, at 0x2b400 dictionary with > 31 blocks (FFLAG=0xde) 1st entry > $i4_m4 in page 8783, > 2nd record > "dos\crt0.asm", > 3rd record COMMENT class=0xa3 LIBMOD crt0 > mwlibc.lib: Microsoft Visual C/OMF library, > page size 16, at 0x15a00 dictionary with > 17 blocks (FFLAG=0x93) 1st entry > atoi! in page 458, > 2nd record > "_exit", > 3rd record LNAMES CODE DATA DGROUP > unrar-bcc.lib: Microsoft Visual C/OMF library, > page size 16, at 0x400 dictionary with > 1 block (FFLAG=0x84) 1st entry > RARCloseArchive in page 6, > 2nd record > "DllGetVersion", > 3rd record COMMENT class=0xa0 > OMF extensions IMPDEF > DllGetVersion exported by unrar3.dll > > I hope my three diff files can be applied in future version of > file utility. > > With best wishes > J?rg Jenderek > - -- > J?rg Jenderek > > > > > > > > > > > > > > > > > > > > > > > > > -----BEGIN PGP SIGNATURE----- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYeN2MAAKCRCv8rHJQhrU > 1grVAKC/VBJ1l0qBPrarc1tHKRBYexfoPACcC0mB6HAFzVnujoDxQPLMp29fj08= > =OPSA > -----END PGP SIGNATURE----- > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From sgrubb at redhat.com Mon Jan 17 18:23:44 2022 From: sgrubb at redhat.com (Steve Grubb) Date: Mon, 17 Jan 2022 13:23:44 -0500 Subject: [File] qemu disk file format Message-ID: <8885163.CDJkKcVGEf@x2> Hello, I run across something that I'd like to report. On linux systems, libvirt is the main way of running virtualized operating systems. It uses the qemu copy on write file format (qcow2). If I point file at one of these images, I get this: # file /var/lib/libvirt/images/rhel8.2-s390x.qcow2 /var/lib/libvirt/images/rhel8.2-s390x.qcow2: QEMU QCOW2 Image (v3), 21474836480 bytes Good. It understands what it's looking at. But checking the mime-type: # file --mime-type /var/lib/libvirt/images/rhel8.2-s390x.qcow2 /var/lib/libvirt/images/rhel8.2-s390x.qcow2: application/octet-stream It doesn't know what it's looking at. The text in magic/Magdir/virtual is kind of complicated or I would take a stab at patching this. The shared-mime- info package seems to think it should be application/x-qemu-disk. Best Regards, -Steve From astron.com.bwoj at manchmal.in-ulm.de Tue Jan 18 06:53:29 2022 From: astron.com.bwoj at manchmal.in-ulm.de (Christoph Biedl) Date: Tue, 18 Jan 2022 07:53:29 +0100 Subject: [File] qemu disk file format In-Reply-To: <8885163.CDJkKcVGEf@x2> References: <8885163.CDJkKcVGEf@x2> Message-ID: <1642488712@msgid.manchmal.in-ulm.de> Steve Grubb wrote... > It doesn't know what it's looking at. The text in magic/Magdir/virtual is > kind of complicated or I would take a stab at patching this. The shared-mime- > info package seems to think it should be application/x-qemu-disk. The related change ("Add mime type for qemu files (Steve Grubb)") however triggers a build failure: | ../src/file -C -m magic | magic/virtual, 223: Warning: Current entry does not yet have a description for adding a MIME type Christoph From christos at zoulas.com Tue Jan 18 14:08:27 2022 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 18 Jan 2022 09:08:27 -0500 Subject: [File] qemu disk file format In-Reply-To: <1642488712@msgid.manchmal.in-ulm.de> References: <8885163.CDJkKcVGEf@x2> <1642488712@msgid.manchmal.in-ulm.de> Message-ID: <93308844-49C7-43F1-8C1D-E78F354A36D7@zoulas.com> Fixed, thanks! christos > On Jan 18, 2022, at 1:53 AM, Christoph Biedl wrote: > > Steve Grubb wrote... > >> It doesn't know what it's looking at. The text in magic/Magdir/virtual is >> kind of complicated or I would take a stab at patching this. The shared-mime- >> info package seems to think it should be application/x-qemu-disk. > > The related change ("Add mime type for qemu files (Steve Grubb)") > however triggers a build failure: > > | ../src/file -C -m magic > | magic/virtual, 223: Warning: Current entry does not yet have a description for adding a MIME type > > Christoph > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From mgorny at gentoo.org Tue Jan 18 19:40:26 2022 From: mgorny at gentoo.org (=?UTF-8?q?Micha=C5=82=20G=C3=B3rny?=) Date: Tue, 18 Jan 2022 20:40:26 +0100 Subject: [File] [PATCH] Add rules for FreeBSD kernel minidumps Message-ID: <20220118194026.220526-1-mgorny@gentoo.org> Recognize FreeBSD kernel minidump magic and print its architecture. For amd64, arm, arm64 and i386 also print the version as the offset and endianness are clear there. I do not have VMs for other architectures to test right now. --- magic/Magdir/freebsd | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/magic/Magdir/freebsd b/magic/Magdir/freebsd index a01ac4a2..b04ac50b 100644 --- a/magic/Magdir/freebsd +++ b/magic/Magdir/freebsd @@ -142,3 +142,16 @@ >9 byte 2 %d bytes in header, >>10 byte x %d chars wide by >>11 byte x %d chars high + +# +# FreeBSD kernel minidumps +# +0 string minidump\040FreeBSD/ FreeBSD kernel minidump +>17 string amd64 for %s, +>>24 lelong x version %d +>17 string arm for %s, +>>24 lelong x version %d +>17 string i386 for %s, +>>24 lelong x version %d +>17 default x +>>17 string >\0 for %s -- 2.34.1 From christos at zoulas.com Tue Jan 18 23:46:00 2022 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 18 Jan 2022 18:46:00 -0500 Subject: [File] [PATCH] Add rules for FreeBSD kernel minidumps In-Reply-To: <20220118194026.220526-1-mgorny@gentoo.org> References: <20220118194026.220526-1-mgorny@gentoo.org> Message-ID: <987153CA-0CE1-4E93-B960-A8B7145AE152@zoulas.com> Committed, thanks! christos > On Jan 18, 2022, at 2:40 PM, Micha? G?rny wrote: > > Recognize FreeBSD kernel minidump magic and print its architecture. > For amd64, arm, arm64 and i386 also print the version as the offset > and endianness are clear there. I do not have VMs for other > architectures to test right now. > --- > magic/Magdir/freebsd | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/magic/Magdir/freebsd b/magic/Magdir/freebsd > index a01ac4a2..b04ac50b 100644 > --- a/magic/Magdir/freebsd > +++ b/magic/Magdir/freebsd > @@ -142,3 +142,16 @@ >> 9 byte 2 %d bytes in header, >>> 10 byte x %d chars wide by >>> 11 byte x %d chars high > + > +# > +# FreeBSD kernel minidumps > +# > +0 string minidump\040FreeBSD/ FreeBSD kernel minidump > +>17 string amd64 for %s, > +>>24 lelong x version %d > +>17 string arm for %s, > +>>24 lelong x version %d > +>17 string i386 for %s, > +>>24 lelong x version %d > +>17 default x > +>>17 string >\0 for %s > -- > 2.34.1 > > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From mgorny at gentoo.org Wed Jan 19 12:08:19 2022 From: mgorny at gentoo.org (=?UTF-8?q?Micha=C5=82=20G=C3=B3rny?=) Date: Wed, 19 Jan 2022 13:08:19 +0100 Subject: [File] [PATCH] Improve FreeBSD kernel minidump recognition Message-ID: <20220119120819.718045-1-mgorny@gentoo.org> (sorry for not thinking about it in yesterday's patch) - add explicit support for powerpc which is the only "special case" right now - use the version field to guess endianness Example output: minidumps/amd64: FreeBSD kernel minidump for amd64, little endian, version 3 minidumps/arm64: FreeBSD kernel minidump for arm64, little endian, version 2 minidumps/i386: FreeBSD kernel minidump for i386, little endian, version 2 minidumps/ppc64be: FreeBSD kernel minidump for powerpc64, mmu_radix, big endian, version 2 minidumps/ppc64le: FreeBSD kernel minidump for powerpc64, mmu_radix, little endian, version 2 --- magic/Magdir/freebsd | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/magic/Magdir/freebsd b/magic/Magdir/freebsd index 69a07970..5215f420 100644 --- a/magic/Magdir/freebsd +++ b/magic/Magdir/freebsd @@ -147,11 +147,18 @@ # FreeBSD kernel minidumps # 0 string minidump\040FreeBSD/ FreeBSD kernel minidump ->17 string amd64 for %s, ->>24 lelong x version %d ->17 string arm for %s, ->>24 lelong x version %d ->17 string i386 for %s, ->>24 lelong x version %d +# powerpc uses 32-byte magic, followed by 32-byte mmu kind, then version +>17 string powerpc +>>17 string >\0 for %s, +>>>32 string >\0 %s, +>>>>64 byte 0 big endian, +>>>>>64 belong x version %d +>>>>64 default x little endian, +>>>>>64 lelong x version %d +# all other architectures use 24-byte magic, followed by version >17 default x ->>17 string >\0 for %s +>>17 string >\0 for %s, +>>>24 byte 0 big endian, +>>>>24 belong x version %d +>>>24 default x little endian, +>>>>24 lelong x version %d -- 2.34.1 From mgorny at gentoo.org Wed Jan 19 12:20:49 2022 From: mgorny at gentoo.org (=?UTF-8?q?Micha=C5=82=20G=C3=B3rny?=) Date: Wed, 19 Jan 2022 13:20:49 +0100 Subject: [File] [PATCH v2] Improve FreeBSD kernel minidump recognition In-Reply-To: <20220119120819.718045-1-mgorny@gentoo.org> References: <20220119120819.718045-1-mgorny@gentoo.org> Message-ID: <20220119122049.719629-1-mgorny@gentoo.org> - add explicit support for powerpc which is the only "special case" right now - use the version field to guess endianness Example output: minidumps/amd64: FreeBSD kernel minidump for amd64, little endian, version 3 minidumps/arm64: FreeBSD kernel minidump for arm64, little endian, version 2 minidumps/i386: FreeBSD kernel minidump for i386, little endian, version 2 minidumps/ppc64be: FreeBSD kernel minidump for powerpc64, mmu_radix, big endian, version 2 minidumps/ppc64le: FreeBSD kernel minidump for powerpc64, mmu_radix, little endian, version 2 --- magic/Magdir/freebsd | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) Changed in v2: - removed unnecessary nesting in ppc rule diff --git a/magic/Magdir/freebsd b/magic/Magdir/freebsd index 69a07970..1582d99a 100644 --- a/magic/Magdir/freebsd +++ b/magic/Magdir/freebsd @@ -147,11 +147,18 @@ # FreeBSD kernel minidumps # 0 string minidump\040FreeBSD/ FreeBSD kernel minidump ->17 string amd64 for %s, ->>24 lelong x version %d ->17 string arm for %s, ->>24 lelong x version %d ->17 string i386 for %s, ->>24 lelong x version %d +# powerpc uses 32-byte magic, followed by 32-byte mmu kind, then version +>17 string powerpc +>>17 string >\0 for %s, +>>32 string >\0 %s, +>>>64 byte 0 big endian, +>>>>64 belong x version %d +>>>64 default x little endian, +>>>>64 lelong x version %d +# all other architectures use 24-byte magic, followed by version >17 default x ->>17 string >\0 for %s +>>17 string >\0 for %s, +>>>24 byte 0 big endian, +>>>>24 belong x version %d +>>>24 default x little endian, +>>>>24 lelong x version %d -- 2.34.1 From christos at zoulas.com Wed Jan 19 12:44:28 2022 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 19 Jan 2022 07:44:28 -0500 Subject: [File] [PATCH v2] Improve FreeBSD kernel minidump recognition In-Reply-To: <20220119122049.719629-1-mgorny@gentoo.org> References: <20220119120819.718045-1-mgorny@gentoo.org> <20220119122049.719629-1-mgorny@gentoo.org> Message-ID: No problem, committed! christos > On Jan 19, 2022, at 7:20 AM, Micha? G?rny wrote: > > - add explicit support for powerpc which is the only "special case" > right now > > - use the version field to guess endianness > > Example output: > > minidumps/amd64: FreeBSD kernel minidump for amd64, little endian, version 3 > minidumps/arm64: FreeBSD kernel minidump for arm64, little endian, version 2 > minidumps/i386: FreeBSD kernel minidump for i386, little endian, version 2 > minidumps/ppc64be: FreeBSD kernel minidump for powerpc64, mmu_radix, big endian, version 2 > minidumps/ppc64le: FreeBSD kernel minidump for powerpc64, mmu_radix, little endian, version 2 > --- > magic/Magdir/freebsd | 21 ++++++++++++++------- > 1 file changed, 14 insertions(+), 7 deletions(-) > > Changed in v2: > - removed unnecessary nesting in ppc rule > > diff --git a/magic/Magdir/freebsd b/magic/Magdir/freebsd > index 69a07970..1582d99a 100644 > --- a/magic/Magdir/freebsd > +++ b/magic/Magdir/freebsd > @@ -147,11 +147,18 @@ > # FreeBSD kernel minidumps > # > 0 string minidump\040FreeBSD/ FreeBSD kernel minidump > ->17 string amd64 for %s, > ->>24 lelong x version %d > ->17 string arm for %s, > ->>24 lelong x version %d > ->17 string i386 for %s, > ->>24 lelong x version %d > +# powerpc uses 32-byte magic, followed by 32-byte mmu kind, then version > +>17 string powerpc > +>>17 string >\0 for %s, > +>>32 string >\0 %s, > +>>>64 byte 0 big endian, > +>>>>64 belong x version %d > +>>>64 default x little endian, > +>>>>64 lelong x version %d > +# all other architectures use 24-byte magic, followed by version >> 17 default x > ->>17 string >\0 for %s > +>>17 string >\0 for %s, > +>>>24 byte 0 big endian, > +>>>>24 belong x version %d > +>>>24 default x little endian, > +>>>>24 lelong x version %d > -- > 2.34.1 > > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From joerg.jen.der.ek at gmx.net Thu Jan 20 17:26:11 2022 From: joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) Date: Thu, 20 Jan 2022 18:26:11 +0100 Subject: [File] [PATCH] of Magdir/msvc for Borland C++ project *.IDE Message-ID: <18ebf9ab-1d3c-d3d7-3c97-5196a60f85c5@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, some days ago i handled some libraries with file name extension LIB done by Magdir/msvc. Now i look at other definition for IDE samples mentioned in that file. When running file command version 5.41 on such IDE examples i get an output like: PLAYMIDI.IDE: MSVC .ide libdsk.ide: MSVC .ide With --extension option only 3 byte extension ??? is displayed and with -i option only generic application/octet-stream is shown. For comparison reason i run the file format identification utility TrID ( See https://mark0.net/soft-trid-e.html). This describes IDE examples as "Borland C++ Project definition" by ide-borland-c.trid.xml (See appended ide-trid-v.txt.gz) Luckily TrID tool displays correct file name extension IDE for inspected examples. This list with -v option the related URL pointing to some information. The information of TrID points to company page borland.com, but the company does not exist any more. It is now part of Micro Focus. So i choose page about Borland on Wikipedia. So that information is now expressed inside Magdir/msvc by comment lines like: # URL: https://en.wikipedia.org/wiki/Borland # Reference: http://mark0.net/download/triddefs_xml.7z # defs/i/ide-borland-c.trid.xml The recognition happens inside Magdir/msvc by line like: 0 string \102\157\162\154\141\156\144\040\103\053\053\040\120\162\157 \152\145\143\164\040\106\151\154\145 \012\000\032\000\002\000\262\000\272\276\372\316 MSVC .ide The string part is encoded as octal values. When encoding this in more human readable ASCII form and comparing it this with trid definition it becomes clear that the starting magic is a line with 4 ASCII space separated words like "Borland C++ Project File" followed by Line feed character, a null terminating byte and a Control-Z character. So this now becomes like: 0 string Borland\040C++\040Project\040File\012\000\032 MSVC .ide !:mime application/x-borland-ide !:ext ide Instead of generic mime type application/octet-stream i show a user defined one "application/x-borland-ide" and shown file name extension is 3 byte string "ide". Now comes problem with used naming "MSVC .ide". No normal user does understand or know what this means. So i look how others call such files. When looking on website file-extension.net for IDE extension i get used description texts like: Borland C 4.x IDE project - BCW.EXE Borland C++ Project definition Borland C++ project C++ Project (Borland Software Corporation) Integrated Development Environment Configuration File Project file So IDE is apparently the abbreviation for Integrated Development Environment. That 3 byte word is used as file name extension. That information is now shown by --extension option. Apparently MSVC is the abbreviation for "Microsoft Visual C++". The IDE samples are apparently created by compiler suite from company Borland. That is completely different from company Microsoft. Maybe that this file format is also used for other programming languages like "C", but in start magic explicitly "C++" is mentioned. So i finally choose "Borland C++ project" as describing text. After the start magic comes some bytes which seems to be in most cases constant. Maybe this is a version . So print this value if these bytes have an unusual value by line like: >27 ubequad !0x000200B200BABEFA \b, maybe version %#16.16llx Furthermore in my expected examples i found DOS file names like "D:\BC45\INCLUDE\STDIO.H" or "E:\BC5\INCLUDE\STRING.H". So show this information also by lines like: >35 search/5490 :\\ \b, 1st header file with directory >>&-5 pstring/h x "%s" After applying the above mentioned modifications by patch file-5.41-msvc-ide.diff then i get a more descriptive output with more details like: PLAYMIDI.IDE: Borland C++ project, 1st header file with directory "D:\BC45\INCLUDE\STDIO.H" libdsk.ide: Borland C++ project, # 1st header file with directory "E:\BC5\INCLUDE\STRING.H" With best wishes J?rg Jenderek - -- J?rg Jenderek -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYembMwAKCRCv8rHJQhrU 1lGBAKDXPe4hgheC8jygWdgUHqLfN0r5MACfWyZllBa6l/6d9oOu5VvdLH2ErQM= =/sk5 -----END PGP SIGNATURE----- -------------- next part -------------- -- File mailing list File at astron.com https://mailman.astron.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: ide-trid-v.txt.gz Type: application/x-gzip Size: 464 bytes Desc: not available URL: -------------- next part -------------- --- file-5.41/magic/Magdir/msvc.old 2020-05-31 12:34:40.000000000 +0200 +++ file-5.41/magic/Magdir/msvc 2022-01-20 18:13:29.911576000 +0100 @@ -13,4 +13,19 @@ # .ide +# URL: https://en.wikipedia.org/wiki/Borland +# Reference: http://mark0.net/download/triddefs_xml.7z/defs/i/ide-borland-c.trid.xml +# Update: Joerg Jenderek +# Note: called by TrID "Borland C++ Project definition" #too long 0 string \102\157\162\154\141\156\144\040\103\053\053\040\120\162\157\152\145\143\164\040\106\151\154\145\012\000\032\000\002\000\262\000\272\276\372\316 MSVC .ide -0 string \102\157\162\154\141\156\144\040\103\053\053\040\120\162\157 MSVC .ide +# ./msvc (1.10) of file (version 5.41) labeled the entry as "MSVC .ide" +0 string Borland\040C++\040Project\040File\012\000\032 Borland C++ project +#!:mime application/octet-stream +!:mime application/x-borland-ide +# Integrated Development Environment +!:ext ide +# maybe version part like: 000200B200BABEFA +>27 ubequad !0x000200B200BABEFA \b, maybe version %#16.16llx +# look for DOS drive letter +>35 search/5490 :\\ \b, 1st header file with directory +# DOS names like: "D:\BC45\INCLUDE\STDIO.H" "E:\BC5\INCLUDE\STRING.H" +>>&-5 pstring/h x "%s" -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-msvc-ide.diff.sig Type: application/octet-stream Size: 866 bytes Desc: not available URL: From jgmiller at pt.LU Mon Jan 24 22:52:42 2022 From: jgmiller at pt.LU (J G Miller) Date: Mon, 24 Jan 2022 23:52:42 +0100 Subject: [File] file v5.41 -- double space in some descriptions Message-ID: <1nC8CZ-0003sk-Qn@ip-88-207-140-249.dyn.luxdsl.pt.lu> =============================================================================== Dear File Program Mailing List, Having just installed a /usr/local/bin version of file v5.41 because of bugs in the /usr/bin version of file v5.25 (yes I know I really, really, really must get around to do a full system distribution-upgrade) I have noticed that file v5.41 puts a double space in some output strings between the leading "a" and the following "path". /usr/local/bin/file -v file-5.41 magic file from /usr/local/share/misc/magic /usr/local/bin/file bikegear bikegear: a /usr/bin/wish script, ASCII text executable /usr/local/bin/file convert.lua convert.lua: a /usr/bin/lua script, ASCII text executable The expected output is "...a /usr/bin/..." with a single space. Presumably this double spacing is not intentional since it does not always occur and it makes checking the output more complex, viz the necessity of adding "[[:space:]][[:space:]]*" to regular expressions to parse the output. Thanking you for your continuing maintenance of this essential file utility, J G Miller