From joerg.jen.der.ek at gmx.net Thu Dec 2 00:11:29 2021 From: joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) Date: Thu, 2 Dec 2021 01:11:29 +0100 Subject: [File] [PATCH] Magdir/fonts, pdp Adobe Multiple Master font *.MMM misidentfied as PDP-11 executable Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, some days ago i installed an older Adobe software with some fonts. So i was checking some other font stuff. The inspected examples have file name extension MMM. When running running file command version 5.41 on such font examples i get an output like: _MI_____.MMM: PDP-11 executable not stripped - version 99 _MRG____.MMM: PDP-11 executable not stripped - version 99 fmt-521-signature-id-814.mmm: data sample.mmm: Adobe Multiple Master font zx______.mmm: Adobe Multiple Master font zy______.mmm: Adobe Multiple Master font Furthermore with --extension option only 3 character sequence ??? is shown. With -i option only generic mime type application/octet-stream is shown. For comparison reason i run the file format identification utility TrID ( See https://mark0.net/soft-trid-e.html). All real examples are described as "Adobe Type Manager Multiple Master Metrics" by mmm-atm.trid.xml (See appended mmm-trid-v.txt.gz). For comparison reason i also run the file format identification utility DROID ( See https://sourceforge.net/projects/droid/). This identifies many MMM examples as "Adobe Multiple Master Metrics font file" by PUID fmt/521 (See appended mmm-droid.csv.gz). With -v option trid displays used 3 byte file name extension MMM and a reference URL pointing to Adobe Type Manager on Wikipedia. There an internal link to Adobe Multiple master fonts is mentioned. So these informations are now expressed by additional comment lines inside Magdir/fonts like: # URL: https://en.wikipedia.org/wiki/Multiple_master_fonts # Reference: http://mark0.net/download/triddefs_xml.7z # defs/m/mmm-atm.trid.xml # http://www.nationalarchives.gov.uk/pronom/fmt/521 In current version two patterns describe Adobe MMM examples like: 0 string \007\001\001\000Copyright\ (c)\ 199 0 string \012\001\001\000Copyright\ (c)\ 199 None of my examples are described by first pattern and for not detected examples like _MI_____.MMM and _MRG____.MMM an equivalent pattern would look like: 0 string \007\001\002\000Copyright\ (c)\ 199 Unfortunately i found no file format specification. So now i put displaying part inside sub routine mmm-font. This routine start like: 0 name mmm-font >0x53 ubyte x Adobe Multiple Master font Metric !:mime application/x-font-mmm !:ext mmm Instead of generic mime type application/octet-stream a user defined one is shown and now also MMM file name extension is shown. Furthermore i add phrase "Metric" because the MMM examples contain only the metric (like described by DROID and TrID) whereas the real font are stored inside PFB examples. At the beginning 4 bytes are stored which purpose is unknown for me. For control reasons these can be shown by debugging line like: >0 ubelong x \b, at 0 %#8.8x All identifier tools assume that byte at offset 3 is nil. At offset 4 apparently a 0-terminated copyright message is stored which looks like: Copyright (c) 1992, 1993, 1994, 1999 Adobe Systems Incorporated. All R Copyright (c) 1992, 1994 Adobe Systems Incorporated. All Rights Reserv Copyright (c) 1993, 1994, 1999 Adobe Systems Incorporated. All Rights For debugging purpose this can be shown by line like: >4 string x "%s" The DROID tool checks only for start keyword Copyright followed by one space character and does not check for year message part ( like 199?). And the TrID tool only checks for space character (0x20) after font copy right character embraced by parentheses, which can be checked by debugging line like: >17 byte !0x20 \b, at 17 "%c" So the copyright string maybe have year part that is different from nineteen century or completely different. But i now check for message part and leading nil byte similar as done in previous file version and what i found in my inspected examples. So this now becomes like: 3 string \000Copyright\ (c)\ 199 >0 use mmm-font If this is not always true, then test line must be changed or more other test lines must be added before calling sub routine. After copy right message probably foo factor string (like: 001.001 001.002 001.003) is stored. That can be displayed by debugging line like: >0x4c string x \b, factor %s That string is also 0-terminated. That can be checked by line like: >0x53 byte !0 \b, at 0x53 %x Afterwards third string part occurs which apparently seems to be the font name with optional indicator MM (for Multiple Master font like MyriadMM-It MyriadMM AdobeSansMM AdobeSerifMM). So show that useful information by additional lines like >0x53 ubyte =0 >>0x54 string x "%s" Finally i look what the other tools are also checking. So these facts maybe can be used as additional tests. Some hundreds bytes later the DROID tool checks for value 76000000E803E803h which is true for examples zx______.mmm and zy______.mmm, but for examples _MI_____.MMM and _MRG____.MMM value is 69000000E803E803h. This can be checked by line like: >0xb8 ubequad !0x76000000e803e803h \b, at 0xB8 %#llx The TrID tool looks for keywords like Weight and Width. These checks transferred as magic lines look like: >0x55 search/0x10B5 Weight\0\0 \b, FOUND Weight >0x55 search/0x1131 Width\0\0 \b, FOUND Width The MMM examples like _MI_____.MMM _MRG____.MMM starting with \007\001\002\000Copyright are misidentified as PDP-11 a.out via Magdir/pdp by lines like: 0 leshort 0407 PDP-11 executable >8 leshort >0 not stripped >15 byte >0 - version %d because the 2 leading bytes are the same. I have no deeper knowledge about PDP executables file format, but where in executable numeric integer values are stored is occupied by copyright message in MMM examples. So c character (0x63=99) of font copy right message embraced by parentheses is misinterpreted as version number as version 99. So by additional test line for copy right message string the misidentified MMM examples are skipped. So this now looks now like: 0 leshort 0407 >4 string !Copyright\040 PDP-11 executable >>8 leshort >0 not stripped >>15 byte >0 - version %d After applying the above mentioned modifications by patches file-5.41-fonts-mmm.diff and file-5.41-pdp-mmm.diff then the misidentification vanish and identification gets more detail s (font name). This now looks like: _MI_____.MMM: Adobe Multiple Master font Metric "MyriadMM-It" _MRG____.MMM: Adobe Multiple Master font Metric "MyriadMM" fmt-521-signature-id-814.mmm: data sample.mmm: Adobe Multiple Master font Metric "AdobeSerifMM" zx______.mmm: Adobe Multiple Master font Metric "AdobeSansMM" zy______.mmm: Adobe Multiple Master font Metric "AdobeSerifMM" I hope my 2 diff files can be applied in future version of file utility. With best wishes J?rg Jenderek - -- J?rg Jenderek -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYagPMQAKCRCv8rHJQhrU 1l2yAJ9BMEJS+NMIXtiAXuC085IwhDeRJQCgl6sVgvi7gESXw2N70kREGBmNsRk= =Olxn -----END PGP SIGNATURE----- -------------- next part -------------- --- file-5.41/magic/Magdir/pdp.old 2020-05-31 10:34:40 +0000 +++ file-5.41/magic/Magdir/pdp 2021-11-28 20:47:00 +0000 @@ -9,7 +9,13 @@ # PDP-11 a.out # -0 leshort 0407 PDP-11 executable ->8 leshort >0 not stripped ->15 byte >0 - version %d +# updated by Joerg Jenderek at Nov 2021 +# GRR: line below too general as it catches some Adobe Multiple Master font handled by ./fonts +0 leshort 0407 +# c character (0x63=99) of font copy right message embraced by parentheses +#>15 string x \b, at 15 %.1s +# skip font _MI_____.MMM _MRG____.MMM with 0701h and copy right message near the beginning +>4 string !Copyright\040 PDP-11 executable +>>8 leshort >0 not stripped +>>15 byte >0 - version %d # updated by Joerg Jenderek at Mar 2013 -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-pdp-mmm.diff.sig Type: application/octet-stream Size: 591 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mmm-droid.csv.gz Type: application/x-gzip Size: 388 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mmm-trid-v.txt.gz Type: application/x-gzip Size: 430 bytes Desc: not available URL: -------------- next part -------------- --- file-5.41/magic/Magdir/fonts.old 2021-05-12 16:30:24 +0000 +++ file-5.41/magic/Magdir/fonts 2021-12-01 23:54:10 +0000 @@ -316,6 +316,44 @@ >>>>>>>&(&-14.S-17) lestring16 x \b, %-11.96s -0 string \007\001\001\000Copyright\ (c)\ 199 Adobe Multiple Master font -0 string \012\001\001\000Copyright\ (c)\ 199 Adobe Multiple Master font +# Update: Joerg Jenderek +# URL: https://en.wikipedia.org/wiki/Multiple_master_fonts +# Reference: http://mark0.net/download/triddefs_xml.7z +# defs/m/mmm-atm.trid.xml +# http://www.nationalarchives.gov.uk/pronom/fmt/521 +# Note: still used in Adobe Acrobat Reader +#0 string \007\001\001\000Copyright\ (c)\ 199 Adobe Multiple Master font +#0 string \012\001\001\000Copyright\ (c)\ 199 Adobe Multiple Master font +#0 string \007\001\002\000Copyright\ (c)\ 199 Adobe Multiple Master font +3 string \000Copyright\ (c)\ 199 +>0 use mmm-font +# display Adobe Multiple Master font Metric information +0 name mmm-font +>0x53 ubyte x Adobe Multiple Master font Metric +#!:mime application/octet-stream +!:mime application/x-font-mmm +# http://file.fyicenter.com/c/sample.mmm +!:ext mmm +# unknown like: 07010200 0A010100 07010100 (no example) +#>0 ubelong x \b, at 0 %#8.8x +# probably copyright message like: +# Copyright (c) 1992, 1993, 1994, 1999 Adobe Systems Incorporated. All R +# Copyright (c) 1992, 1994 Adobe Systems Incorporated. All Rights Reserv +# Copyright (c) 1993, 1994, 1999 Adobe Systems Incorporated. All Rights +#>4 string x "%s" +# According to TrID space character (0x20) after font copyright character embraced by parentheses +#>17 byte !0x20 \b, at 17 "%c" +# after copy right message probably foo factor like: 001.001 001.002 001.003 +#>0x4c string x \b, factor %s +# nul terminating character of foo factor +#>0x53 byte !0 \b, at 0x53 %x +>0x53 ubyte =0 +# 3rd string part probably font name with optional indicator MM like: +# AdobeSansMM AdobeSerifMM MyriadMM MyriadMM-It +>>0x54 string x "%s" +# According to DROID 76000000E803E803h but also 69000000E803E803h (_MI_____.MMM _MRG____.MMM) +#>0xb8 ubequad !0x76000000e803e803h \b, at 0xB8 %#llx +# According to TrID keywords like: Weight Width +#>0x55 search/0x10B5 Weight\0\0 \b, FOUND Weight +#>0x55 search/0x1131 Width\0\0 \b, FOUND Width # TrueType/OpenType font collections (.ttc) -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-fonts-mmm.diff.sig Type: application/octet-stream Size: 1205 bytes Desc: not available URL: From christos at zoulas.com Mon Dec 6 15:06:08 2021 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 6 Dec 2021 10:06:08 -0500 Subject: [File] [PATCH] Magdir/wordprocessors for Aldus/Adobe PageMaker In-Reply-To: References: Message-ID: <422D35F7-7C71-4BBC-AAC2-359E06BCC16F@zoulas.com> Committed, thanks! christos > On Nov 27, 2021, at 4:12 PM, J?rg Jenderek wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > some times ago i installed an older Aldus PageMaker software. > The documents and templates are files with file name extensions > like PM4 PM5 PM6 P65 PMD PT3 PT6 T65 PMT. > > When running file command version 5.41 on such documents all are > described as "data". > > For comparison reason i run the file format identification utility > TrID ( See https://mark0.net/soft-trid-e.html). This identifies the > "middle aged" examples like BCOMDOC2.PM4 as "Aldus PageMaker document > (v4)" by pm4-pagemaker.trid.xml and example Mytest5.PM5 as "Aldus > PageMaker document (v5)" by pm5-pagemaker.trid.xml (See appended > trid-v-pagemaker.txt.gz ). This mentions page on Wikipedia and used > file name extension. > Luckily i also found a page about PageMaker on file formats archive > team web site. That informations are expressed by comment lines > inside Magdir/wordprocessors like: > # URL: http://fileformats.archiveteam.org/wiki/PageMaker > # https://en.wikipedia.org/wiki/Adobe_PageMaker > # Reference: http://mark0.net/download/triddefs_xml.7z/defs/p > # pm4-pagemaker.trid.xml > # pm5-pagemaker.trid.xml > > Unfortunately the documentation is neither official nor complete. So > i put displaying part inside sub routine PageMaker. > > At the end according to documentation the numeric version (like: 4 5 > 6 6.50) is shown by lines like: > > #>110 uleshort x \b, VERSION=%#x >> 110 uleshort >0x03FF >>> 110 uleshort/256 x \b, version %u >>> 110 uleshort%256 >0 \b.%u >> 110 uleshort <0x0400 \b, maybe version 3 > > > Unfortunately for version 3 examples the mentioned numeric version > is zero and for version 7 the numeric value is 6.50 as for version 6. > 5. > > - From version part some sub classification are depending. It started > as Aldus PageMaker, but later (since version 6) it was acquired from > Adobe. So these different names are expressed by subroutine > starting like: > 0 name PageMaker >> 110 uleshort <0x0600 Aldus >> 110 uleshort >0x05FF Adobe >> 110 uleshort x PageMaker > !:mime application/vnd.pagemaker > > Depending from version are the used file name extensions and the > APPLE creator and type mentioned on page about signatures of > Macintosh Files on web site macdisk.com. So for version 3 this > looks like: >> 110 uleshort/256 =0 document > !:apple ALB3ALD3 > !:ext pm3/pt3 > The PT3 extension is used for templates. Nothing is mentioned in > documentation if it is possible to distinguish template from pure > document. > > For major version 6 there exist 2 variants 6 and 6.5. So this look > a little bit different like: >> 110 uleshort =0x0600 document > !:apple ALD6ALB6 > !:ext pm6/pt6 >> 110 uleshort =0x0632 document > !:apple AD65AB65 > !:ext p65/t65/pmd/pmt > > According to documentation PageMaker documents begin with the hex > values "FF 99" at offset 6 for little endian and according to TrID > for version 4 and 5 the prepending bytes are nil. That is what i > found in my examples, but in version 3 samples only 2 byte before > are nil. So this is used as test by starting lines like: > 4 ubelong =0x0000FF99 >> 0 use PageMaker > Most of my inspected samples are little endian, but i least i was > able to extract one big endian example Templates-3-BE.pt3. There > byte order is changed. So that example with inverted logic is > described by additional lines like: > 4 ubelong =0x000099FF >> 0 use \^PageMaker > > After applying the above mentioned modifications by patch > file-5.41-wordprocessors-pagemaker.diff then all my inspected > PageMaker documents are now described. This now looks like: > > 02TEMPLT-stream.T65: Adobe PageMaker document, > little-endian, version 6.50 > BCOMDOC2.PM4: Aldus PageMaker document, > little-endian, version 4 > MyPage6-stream.PM6: Adobe PageMaker document, > little-endian, version 6 > Mytest5.PM5: Aldus PageMaker document, > little-endian, version 5 > SPECSHT.PT3: Aldus PageMaker document, > little-endian, maybe version 3 > Templates-3-BE.pt3: Aldus PageMaker document, > big-endian, maybe version 3 > brochus-stream.pt6: Adobe PageMaker document, > little-endian, version 6 > pm-70-stream.pmd: Adobe PageMaker document, > little-endian, version 6.50 > pm-70-template-stream.pmt: Adobe PageMaker document, > little-endian, version 6.50 > strategies-stream.p65: Adobe PageMaker document, > little-endian, version 6.50 > > I hope my diff file can be applied in future version of file utility. > > Since version 6 such documents are embedded inside Compound > Documents. So such examples must be handled by modifications of > Magdir/ole2compounddocs. I will try to do this in a future session. > > With best wishes > J?rg Jenderek > - -- > J?rg Jenderek > -----BEGIN PGP SIGNATURE----- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYaKfRAAKCRCv8rHJQhrU > 1g/kAJ9lrDRP6vFm2zeaiaqiKqAtsHIjCQCgjP/DW7dEaCRGeacQLG7114+7KnI= > =g6es > -----END PGP SIGNATURE----- > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From vmihalko at redhat.com Wed Dec 8 11:38:14 2021 From: vmihalko at redhat.com (Vincent Mihalkovic) Date: Wed, 8 Dec 2021 12:38:14 +0100 Subject: [File] regression: javascript executables Message-ID: Hi, this regression (https://bugzilla.redhat.com/show_bug.cgi?id=2029975) was introduced with file-5.41 The problematic commit & line is https://github.com/file/file/commit/c07b2a18eb1c5d3854e3ecc72319a2336e361d9e#diff-85466710385fb2ac02303e18020a937c563abbea6d4050ba3aff96cf6c8e6866R10 which overwhelms the https://github.com/file/file/blob/master/magic/Magdir/javascript patterns. The powerful (with huge strength) "wild-card match for interpreters" pattern is the cause of the regression. After running file -- checking-printout --list: 100: > 0 string/wt,=#! ,"a"] <-- used detection pattern 101: >> 1 string,x,"%s script text executable"] ... 16: > 0 search/1,=#!/usr/bin/env nodejs,"Node.js script text executable"] <-- expected detection pattern I want to ask how to fix this - whether to increase the strength of the JavaScript detection patterns or to remove the "wild-card match for interpreters" pattern... regards, vincent mihalkovic -------------- next part -------------- An HTML attachment was scrubbed... URL: From christos at zoulas.com Wed Dec 8 13:42:50 2021 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 8 Dec 2021 08:42:50 -0500 Subject: [File] regression: javascript executables In-Reply-To: References: Message-ID: <64EFB51B-BD9E-43EE-B4AA-87E73FADA5B8@zoulas.com> Fixed; the issue was that the javascript magic used "search" instead of "string" and that ranked it lower than the #! magic in commands. Best, christos > On Dec 8, 2021, at 6:38 AM, Vincent Mihalkovic wrote: > > Hi, > > this regression (https://bugzilla.redhat.com/show_bug.cgi?id=2029975 ) was introduced with file-5.41 > > > > The problematic commit & line is https://github.com/file/file/commit/c07b2a18eb1c5d3854e3ecc72319a2336e361d9e#diff-85466710385fb2ac02303e18020a937c563abbea6d4050ba3aff96cf6c8e6866R10 which overwhelms the https://github.com/file/file/blob/master/magic/Magdir/javascript patterns. > > > > The powerful (with huge strength) "wild-card match for interpreters" pattern is the cause of the regression. After running file --checking-printout --list: > > 100: > 0 string/wt,=#! ,"a"] <-- used detection pattern > > 101: >> 1 string,x,"%s script text executable"] > > ... > > 16: > 0 search/1,=#!/usr/bin/env nodejs,"Node.js script text executable"] <-- expected detection pattern > > > > I want to ask how to fix this - whether to increase the strength of the JavaScript detection patterns or to remove the "wild-card match for interpreters" pattern... > > > > regards, > > vincent mihalkovic > > > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From vmihalko at redhat.com Wed Dec 8 14:30:59 2021 From: vmihalko at redhat.com (Vincent Mihalkovic) Date: Wed, 8 Dec 2021 15:30:59 +0100 Subject: [File] regression: javascript executables In-Reply-To: <64EFB51B-BD9E-43EE-B4AA-87E73FADA5B8@zoulas.com> References: <64EFB51B-BD9E-43EE-B4AA-87E73FADA5B8@zoulas.com> Message-ID: Great, thanks! On Wed, Dec 8, 2021 at 2:51 PM Christos Zoulas wrote: > Fixed; the issue was that the javascript magic used "search" instead of > "string" and that ranked it lower than the #! magic in commands. > > Best, > > christos > > On Dec 8, 2021, at 6:38 AM, Vincent Mihalkovic > wrote: > > Hi, > > this regression (https://bugzilla.redhat.com/show_bug.cgi?id=2029975) was > introduced with file-5.41 > > > The problematic commit & line is > https://github.com/file/file/commit/c07b2a18eb1c5d3854e3ecc72319a2336e361d9e#diff-85466710385fb2ac02303e18020a937c563abbea6d4050ba3aff96cf6c8e6866R10 > which > overwhelms the > https://github.com/file/file/blob/master/magic/Magdir/javascript patterns. > > > The powerful (with huge strength) "wild-card match for interpreters" > pattern is the cause of the regression. After running file -- > checking-printout --list: > > 100: > 0 string/wt,=#! ,"a"] <-- used detection pattern > > 101: >> 1 string,x,"%s script text executable"] > > ... > > 16: > 0 search/1,=#!/usr/bin/env nodejs,"Node.js script text executable"] > <-- expected detection pattern > > > I want to ask how to fix this - whether to increase the strength of the > JavaScript detection patterns or to remove the "wild-card match for > interpreters" pattern... > > > regards, > > vincent mihalkovic > > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joerg.jen.der.ek at gmx.net Thu Dec 9 01:07:32 2021 From: joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) Date: Thu, 9 Dec 2021 02:07:32 +0100 Subject: [File] [PATCH] Magdir/images, intel for Atari DEGAS bitmap *.pi1 *.pi2 *.pi3 *.pc1 *.pc2 *.pc3 Message-ID: <7da42095-3a20-050b-8edf-ff0f868a56fb@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, some times ago i send patches for lif file because the starting 2 byte magic 8000h was too weak. Unfortunately this starting sequence also occur for Atari DEGAS low-resolution bitmap (*.pc1). Now i inspect such Atari images with 6 different file name extension (pi1 pi2 pi3 pc1 pc2 pc3). When running running file command version 5.41 on such bitmaps and related files i get an output like: ARTIS3.PC2: data BEETHVEN.IMG: GEM Image data 224 x 131, 1 planes, 352 x 352 pixelsize CHURCH.IMG: GEM Image data 224 x 170, 1 planes, 352 x 352 pixelsize Faux-Spitzen.abr: data GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, 4 planes, 372 x 372 pixelsize GNUCHESS.PC1: data LEREDACT.PI3: data SMTHDRAW.PC3: data TBX_DEMO.PI3: data bigspid.pi1: X11 SNF font data, MSB first bilboule.pi1: data clinton.img: GEM HYPERPAINT Image data 77 x 87, 4 planes, 338 x 372 pixelsize g3test.g3: data gnucash-4.8.setup.exe.aria2: data hpcc88.lif: lif file "HPCC88", version 1, directory length 12, extensions 0x4d00000002..., 1st file MCITFYP4 load-v0001.aria2: data medres.pi2: data msvcrt.lib: Intel ia64 COFF object file, not stripped, 91 sections, symbol offset=0xa8d3, 644 symbols, created Tue Sep 19 05:18:08 2006, 1st section name ".drectve" plpbt.iso: ISO 9660 CD-ROM filesystem data 'Plop Boot Manager 5.0.14' (bootable) sigirl1.pi3: Intel ia64 COFF object file, not stripped, 256 sections, 1st section name "" For comparison reason i run the file format identification utility TrID ( See https://mark0.net/soft-trid-e.html). Most Atari Degas bitmap are described correctly by TrID as DEGAS bitmap by first two bytes, sometimes with low recognition rate. Only the PI1 examples are not recognised, probably because the patterns are too unspecific. The PI2 examples are described as "DEGAS med-res bitmap" by definition bitmap-pi2-degas.trid.xml. The PI3 examples are described as "DEGAS hi-res bitmap" by definition bitmap-pi3-degas.trid.xml. The PC1 examples are described as "DEGAS low-res compressed bitmap" by bitmap-pc1-degas.trid.xml. The PC2 examples are described as "DEGAS med-res compressed bitmap" by bitmap-pc2-degas.trid.xml. The PC3 examples are described as "DEGAS hi-res compressed bitmap" by bitmap-pc3-degas.trid.xml. (See appended DEGAS-trid-v.txt.gz ). Luckily TrID with -v option display correct file name extension and also URL pointing to file format specification. That is expressed inside Magdir/images by additional comment lines likes: # URL: http://fileformats.archiveteam.org/wiki/DEGAS_image # Reference: https://wiki.multimedia.cx/index.php?title=Degas # http://mark0.net/download/triddefs_xml.7z/defs/b # bitmap-pi2-degas.trid.xml bitmap-pi3-degas.trid.xml # bitmap-pc1-degas.trid.xml bitmap-pc2-degas.trid.xml # bitmap-pc3-degas.trid.xml On that site download links for examples and graphic tools are mentioned. I verified information by NetPBM tool pi3topbm, XnView command line tool nconvert and Deark software. Unfortunately there exist no strong significant magic pattern (only 2 bytes) for such bitmap images. So i put displaying part inside sub routine degas-bitmap which starts like: 0 name degas-bitmap >0 ubyte x Atari DEGAS !:mime image/x-atari-degas >0 ubyte =0x80 Elite compressed >>32042 ubequad x Elite >0 beshort 0x0000 bitmap !:ext pi1 >0 beshort 0x0001 bitmap !:ext pi2 >0 beshort 0x0002 bitmap !:ext pi3 >0 beshort 0x8000 bitmap !:ext pc1 >0 beshort 0x8001 bitmap !:ext pc2 >0 beshort 0x8002 bitmap !:ext pc3 >1 ubyte =0 320 x 200 x 16 >1 ubyte =1 640 x 200 x 4 >1 ubyte =2 640 x 400 x 2 The first byte determinate if the image bytes are compressed. The value 80h means compressed variant and 0 means uncompressed. The second byte determinate the image resolution and colour depth. The value 0 means low resolution (320?200, 16 colours). One means medium resolution (640?200, 4 colours) and value two means high resolution (640?400, 2 colours). Depending on compression and resolution the file name extension varies. The documentation also mention a second extension SUH for uncompressed high resolution, but i myself found no such examples. Instead of generic mime type application/octet-stream i show a user defined one. For the uncompressed images exist a "elite" variant. There after the pixel data animation information about colour to change, direction and time delay is stored. This can also be shown at the end by additional debugging lines like: >32034 ubequad !0 \b, color animations %16.16llx (left) >>32042 ubequad !0 %16.16llx (right) >32050 ubequad !0 \b, channel directions %16.16llx >32058 ubequad !0 \b, channel delays %16.16llx After the 2 starting bytes the colour palette is stored til offset 34. The Atari ST palette has place for 16 colour entries. Each entry occupies 2 bytes ( or 16 bits) in big endian order. Not all 16 bits are used. Some examples use only 9 bits ( ?????RRR?GGG?BBB with R for red, G for green, B for Blue and ? for unused. That means 512 different colours) and some use 12 bits ( ????RRRRGGGGBBBB . That means 4096 different colours). In documentation also an Atari Spectrum 512 Enhanced 15 bit palette is mentioned, but luckily this variant apparently does not be used for DEGAS images. According to documentation you can not rely that unused bits are zero like in mentioned "bad" example bilboule.pi1, but i myself found that in most examples unused bits are zero ( so replace ? by 0 in bit mask). For control reason show the first colour entries by lines like: >2 ubeshort x \b, color palette %4.4x >4 ubeshort x %4.4x >6 ubeshort x %4.4x >8 ubeshort x %4.4x >10 ubeshort x %4.4x So often we see value 0000. That means black colour. The value 0777 means white colour in 9 bit variant and 0fff means white in 12 varian t. So this information can be used as an additional test. So DEGAS low-resolution compressed bitmaps (like: BATTLSHP.PC1 GNUCHESS.PC1 MEDUSABL.PC1 MOONLORD.PC1 WILDROSE.PC1) are recognized by lines like: 0 beshort 0x8000 >2 ubeshort&0xF000 0 >>0 use degas-bitmap The first test for starting two bytes is also true for lif files like hpcc88.lif handled by Magdir/lif. But there at offset 2 the volume label ( ASCII like HPCC88 or hexadecimal 485043433838 in example hpcc88.lif) is stored. So by second test line for unused (that means zero) colour palette bits lif examples are skipped and the test succeeds for PC1 bitmaps. PI1 examples like bigspid.pi1 and bilboule.pi1 are done by lines like : 0 beshort 0x0000 >2 uquad !0 >>4 ubeshort&0xF000 0 >>>0 use degas-bitmap Here by second test line some "bad" ISO 9660 CD-ROM filesystems like plpbt.iso are skipped. Zero value here would be interpreted as 4 black colours at the beginning of the colour palette. For real DEGAS bitmaps one could find one black colour (that means value 0000) in the colour palette, but then the other entries have other colours (non zero values) in other entries. So example plpbt.iso with 8-byte zero value is skipped. For Atari mid-res DEGAS bitmap PI2 examples the lines looks like 0 beshort 0x0001 ... >>>>>>>32026 quad x >>>>>>>>0 use degas-bitmap After test for weak 2 starting bytes by test line eight the GEM HYPERPAINT Image clinton.img is skipped by check for existence of bytes at the end of DEGAS images. The differentiation between DEGAS PI3 examples and Adobe PhotoShop Brush ABR is a little bit tricky. For debugging purpose show my observed information about ABR by lines like: >>19 ubyte !0 \b, NOTE LENGTH %u >>>21 lestring16 x \b, BRUSH NOTE "%s" So for example Faux-Spitzen.abr i get note string "Gitter - klein " with length 15. So for example "Verschiedene Spitzen.abr" i get with length 8 the note string "Kreis 1 ". So if this string length is zero i assume that is not an ABR. So such examples must be DEGAS images. So many examples ( like: 4th_ofj2.pi3 GEMINI03.PI3 PEOPLE18.PI3 POWERFIX.PI3 abydos.pi3 highres.pi3 sigirl1.pi3 vanna5.pi3) are handled by branch like: >>19 ubyte =0 >>>0 use degas-bitmap Because brush note string is stored as UTF-16 the stored string length multiplied with 2 gives the number of bytes occupies by that string. So this information can be used to inspect the last last character of Adobe PhotoShop Brush UTF16-LE string and terminating nil character (that are 4 bytes) by lines like >>19 ubyte !0 >>>(19.b*2) ubequad x >>>>&8 ubelong x \b, LAST CHAR+NIL %8.8x For example "Faux-Spitzen.abr" here i get hexadecimal value 006e0000 (character n) and for "Verschiedene Spitzen.abr" i get hexadecimal value 00310000 (character 1). So when test for such nil bytes gives non zero value, it must be DEGAS image. So many PI3 examples (like ARABDEMO.PI3 ELMRSESN.PI3 GEMVIEW.PI3 LEREDACT.PI3 PICCOLO.PI3 REPRO_JR.PI3 ST_TOOLS.PI3 TBX_DEMO.PI3 evgem7.pi3) are skipped by other branch with additional nil test lines like: >>>>&8 ubelong&0xff00ffFF !0 >>>>>0 use degas-bitmap If test for such nil bytes gives zero value, test it again but now looks if place of last character of note string contains value is valid. If this value is "too low" (that means non printable character) it must be a DEGAS image. If this value is "high" enough is is a "normal" printable character. That means it is a Photoshop ABR. By this last branch the remaining last DEGAS bitmaps ( like BASICNES.PI3 DB_HELP.PI3 DB_WRITR.PI3 LEREDACT.PI3) are skipped by lines like: >>>>&8 ubelong&0xff00ffFF =0 >>>>>&-4 ubelong&0x00FF0000 <0x00200000 >>>>>>0 use degas-bitmap Some DEGAS high-res uncompressed bitmaps (like GEMINI03.PI3 MODEM2.PI3 POWERFIX.PI3 sigirl1.pi3 vanna5.pi3) are misidentified as "Intel ia64 COFF object file" because the 2 byte start pattern is the same. That was expressed by lines inside Magdir/intel like: 0 leshort 0x0200 >0 use display-coff Luckily the displaying part is done by subroutine display-coff inside Magdir/coff. So only additional test lines must be inserted before calling sub routine. For all my misidentified PI3 examples the interpreted first section name was nil, whereas for real COFF object file we get here typical 8-byte sized names (like .text .data .debug$S .drectve .testseg). So for most COFF objects the starting character is a point character (0x2E). If i remember right some Borland compiler for example use DATA instead of .data. So at least i assume that starting character is like ASCII printable (that means value "high enough"), whereas for DEGAS image at that offset the colour palette entry number 10 starts. Because of 4 reserved bits (that are in most cases zero) we get here a low value. So DEGAS images can be distinguished with very high rate from COFF object files. Unfortunately the section name can appear later if COFF sample contains an optional header. But in documentation is written that COFF object files have not header part compared with COFF executables. So i must check by additional second test for F_EXEC flag bit. If flag is set, then it is an executable and i can call directly sub routine. If this bit is not set, is is an object file and i check for starting character of section name befor calling sub routine. So magic lines now become like: 0 leshort 0x0200 >18 leshort ^0x0002 >>20 ubyte >0x1F >>>0 use display-coff >18 leshort &0x0002 >>0 use display-coff After applying the above mentioned modifications by patches file-5.41-images-degas.diff and file-5.41-intel-pi3.diff then all my Degas bitmaps are correctly identified and some misidetfication vanish like: ARTIS3.PC2: Atari DEGAS Elite compressed bitmap 640 x 200 x 4, color palette 0fff 0f00 00f0 0000 0007 ... BEETHVEN.IMG: GEM Image data 224 x 131, 1 planes, 352 x 352 pixelsize CHURCH.IMG: GEM Image data 224 x 170, 1 planes, 352 x 352 pixelsize Faux-Spitzen.abr: data GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, 4 planes, 372 x 372 pixelsize GNUCHESS.PC1: Atari DEGAS Elite compressed bitmap 320 x 200 x 16, color palette 0221 0000 0310 0420 0530 ... LEREDACT.PI3: Atari DEGAS Elite bitmap 640 x 400 x 2, color palette 0fff 0f00 00f0 0000 0fff ... SMTHDRAW.PC3: Atari DEGAS Elite compressed bitmap 640 x 400 x 2, color palette 0777 0700 0070 0000 0777 ... TBX_DEMO.PI3: Atari DEGAS Elite bitmap 640 x 400 x 2, color palette 0777 0700 0070 0000 0777 ... bigspid.pi1: Atari DEGAS Elite bitmap 320 x 200 x 16, color palette 0004 0025 0037 0000 0410 ... bilboule.pi1: Atari DEGAS Elite bitmap 320 x 200 x 16, color palette 0000 0111 8222 0333 fcc4 ... clinton.img: GEM HYPERPAINT Image data 77 x 87, 4 planes, 338 x 372 pixelsize g3test.g3: data gnucash-4.8.setup.exe.aria2: data hpcc88.lif: lif file "HPCC88", version 1, directory length 12, extensions 0x4d00000002..., 1st file MCITFYP4 load-v0001.aria2: data medres.pi2: Atari DEGAS Elite bitmap 640 x 200 x 4, color palette 0777 0700 0070 0000 0007 ... msvcrt.lib: Intel ia64 COFF object file, not stripped, 91 sections, symbol offset=0xa8d3, 644 symbols, created Tue Sep 19 05:18:08 2006, 1st section name ".drectve" plpbt.iso: ISO 9660 CD-ROM filesystem data 'Plop Boot Manager 5.0.14' (bootable) sigirl1.pi3: Atari DEGAS Elite bitmap 640 x 400 x 2, color palette 0001 0000 0000 0000 0000 ... I hope my diff files can be applied in future version of file utility . With best wishes J?rg Jenderek - -- J?rg Jenderek -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYbFWxwAKCRCv8rHJQhrU 1rqHAKCpHT9jL3dmki5XL/+jSjpzVHjyYwCfXx5NiYn9nsJ3jwLeoDMKgJCbPqg= =Wm0D -----END PGP SIGNATURE----- -------------- next part -------------- --- file-5.41/magic/Magdir/images.old 2021-10-18 14:20:03 +0000 +++ file-5.41/magic/Magdir/images 2021-12-09 00:20:48 +0000 @@ -1242,4 +1242,163 @@ 0 leshort 0x0296 Atari ATR image +# URL: http://fileformats.archiveteam.org/wiki/DEGAS_image +# Reference: https://wiki.multimedia.cx/index.php?title=Degas +# From: Joerg Jenderek +# http://mark0.net/download/triddefs_xml.7z/defs/b +# bitmap-pi2-degas.trid.xml bitmap-pi3-degas.trid.xml +# bitmap-pc1-degas.trid.xml bitmap-pc2-degas.trid.xml bitmap-pc3-degas.trid.xml +# Note: verified by NetPBM `pi3topbm sigirl1.pi3 | file` +# `deark -m degas -l -d2 ataribak.pi1` +# XnView `nconvert -fullinfo *.p??` +# DEGAS low-res uncompressed bitmap *.pi1 +0 beshort 0x0000 +# skip some ISO 9660 CD-ROM filesystems like plpbt.iso by test for 4 non black colors in palette entries +>2 quad !0 +# skip g3test.g3 by test for unused bits of 2nd color entry +>>4 ubeshort&0xF000 0 +>>>0 use degas-bitmap +# DEGAS mid-res uncompressed bitmap *.pi2 (strength=50) after GEM Images like: +# BEETHVEN.IMG CHURCH.IMG GAMEOVR4.IMG TURKEY.IMG clinton.img +0 beshort 0x0001 +#!:strength +0 +# skip many control files like gnucash-4.8.setup.exe.aria2 by test for non black in 4 palette entries +>2 quad !0 +# skip control file load-v0001.aria2 by test for unused bits of 5th color palette entry +>>10 ubeshort&0xF000 0 +# skip many GEM Image data like DANCER.IMG GAMEOVR4.IMG SHIP.IMG by test for unused bits of 8th color palette entry +>>>16 ubeshort&0xF000 0 +# skip many GEM Image data like BEETHVEN.IMG CABINETS.IMG MEMO.IMG by test for unused bits of 14th color palette entry +>>>>28 ubeshort&0xF000 0 +# skip few GEM Image data like CHURCH.IMG by test for unused bits of 15th color palette entry +>>>>>30 ubeshort&0xF000 0 +# skip many GEM Image data like TIGER.IMG TURKEY.IMG XMAS.IMG by test for unused bits of 16th color palette entry +>>>>>>32 ubeshort&0xF000 0 +# skip GEM Image data like clinton.img by test for existing bytes at the end +>>>>>>>32026 quad x +>>>>>>>>0 use degas-bitmap +# DEGAS high-res uncompressed bitmap *.pi3 +0 beshort 0x0002 +# skip Intel ia64 COFF msvcrt.lib by test for unused bits of 1st atari color palette entry +>2 ubeshort&0xF000 0 +# skip few Adobe PhotoShop Brushes like Faux-Spitzen.abr by check +# for invalid Adobe PhotoShop Brush UTF16-LE string length +>>19 ubyte =0 +# many like: 4th_ofj2.pi3 GEMINI03.PI3 PEOPLE18.PI3 POWERFIX.PI3 abydos.pi3 highres.pi3 sigirl1.pi3 vanna5.pi3 +>>>0 use degas-bitmap +# Adobe PhotoShop Brush UTF16-LE string length 15 "Gitter - klein " 8 "Kreis 1 " +>>19 ubyte !0 +#>>19 ubyte !0 \b, NOTE LENGTH %u +#>>>21 lestring16 x \b, BRUSH NOTE "%s" +>>>(19.b*2) ubequad x +# maybe last character of Adobe PhotoShop Brush UTF16-LE string and terminating nul char like +# 006e0000 for n in "Faux-Spitzen.abr" 00310000 for 1 in "Verschiedene Spitzen.abr" +# 00000000 "LEREDACT.PI3" 03730773 "TBX_DEMO.PI3" +#>>>>&8 ubelong x \b, LAST CHAR+NIL %8.8x +>>>>&8 ubelong&0xff00ffFF !0 +# many DEGAS bitmap like: ARABDEMO.PI3 ELMRSESN.PI3 GEMVIEW.PI3 LEREDACT.PI3 PICCOLO.PI3 REPRO_JR.PI3 ST_TOOLS.PI3 TBX_DEMO.PI3 evgem7.pi3 +>>>>>0 use degas-bitmap +# test for last character of Adobe PhotoShop Brush UTF16-LE string and terminating nul char +>>>>&8 ubelong&0xff00ffFF =0 +# select last DEGAS bitmaps by invalid last char of brush note like BASICNES.PI3 DB_HELP.PI3 DB_WRITR.PI3 LEREDACT.PI3 +>>>>>&-4 ubelong&0x00FF0000 <0x00200000 +>>>>>>0 use degas-bitmap +# last character of Adobe PhotoShop Brush UTF16-LE note +#>>>>>&-4 ubelong&0x00FF0000 >0x001F0000 \b, THAT IS ABR +# DEGAS low-res compressed bitmap *.pc1 like: BATTLSHP.PC1 GNUCHESS.PC1 MEDUSABL.PC1 MOONLORD.PC1 WILDROSE.PC1 +0 beshort 0x8000 +# skip lif files handled via ./lif by test for unused bits of 1st palette entry +>2 ubeshort&0xF000 0 +>>0 use degas-bitmap +# DEGAS mid-res compressed bitmap *.pc2 like: abydos.pc2 ARTIS3.PC2 SMTHDRAW.PC2 STAR_2K.PC2 TX2_DEMO.PC2 +0 beshort 0x8001 +>0 use degas-bitmap +# DEGAS high-res compressed bitmap *.pc3 like: abydos.pc3 COYOTE.PC3 ELEPHANT.PC3 TX2_DEMO.PC3 SMTHDRAW.PC3 +0 beshort 0x8002 +>0 use degas-bitmap +# display information of Atari DEGAS and DEGAS Elite bitmap images +0 name degas-bitmap +>0 ubyte x Atari DEGAS +#!:mime application/octet-stream +!:mime image/x-atari-degas +# compressed +>0 ubyte =0x80 Elite compressed +# uncompressed +#>0 ubyte =0x00 uncompressed +#>0 ubyte =0x00 un. +>0 ubyte =0x00 +# check for existence of footer for DEGAS Elite images +>>32042 ubequad x Elite +>0 beshort 0x0000 bitmap +!:ext pi1 +>0 beshort 0x0001 bitmap +!:ext pi2 +>0 beshort 0x0002 bitmap +# no example with SUH extension found +#!:ext pi3/suh +!:ext pi3 +>0 beshort 0x8000 bitmap +!:ext pc1 +>0 beshort 0x8001 bitmap +!:ext pc2 +>0 beshort 0x8002 bitmap +!:ext pc3 +# low resolution; 320x200, 16 colors +>1 ubyte =0 320 x 200 x 16 +# medium resolution; 640x200, 4 colors +>1 ubyte =1 640 x 200 x 4 +# high resolution; 640x400, 2 colors +>1 ubyte =2 640 x 400 x 2 +# http://fileformats.archiveteam.org/wiki/Atari_ST_color_palette +# hardware_palette[16]; 9 bit ?????RRR?GGG?BBB; 12 bit ????RRRRGGGGBBBB for Atari STE +# for Atari DEGAS apparently no Spectrum 512 Enhanced 15 bit palette RGB?RRRRGGGGBBBB +# Red Green Blue unused bit ? often 0 but not bilboule.pi1; color_value (examples or numbers) +# 1st color palette entry like: 0777 (61) 0fff (LEREDACT.PI3) 0fcf (devgem7.pi3) 0001 (9) +>2 ubeshort x \b, color palette %4.4x +# 2nd palette entry like: 0000 (32) 0700 (38) 0f00 (LEREDACT.PI3 devgem7.pi3) +>4 ubeshort x %4.4x +# 3rd palette entry +>6 ubeshort x %4.4x +# 4th palette entry like: 0000 (72) +>8 ubeshort x %4.4x +# 5th palette entry +>10 ubeshort x %4.4x +>2 ubeshort x ... +# 6th palette entry +#>12 ubeshort x %4.4x +# 7th palette entry like: 0000 (16) 0001 (ELMRSESN.PI3 elmrsesn.pi3) 0070 (51) 00f0 (BASICNES.PI3 LEREDACT.PI3) 00f8 (devgem7.pi3) +#>14 ubeshort x %4.4x +# 8th palette entry +#>16 ubeshort x %4.4x +# 9 palette entry +#>18 ubeshort x %4.4x +# 10 palette entry +#>20 ubeshort x %4.4x +# 11 palette entry +#>22 ubeshort x %4.4x +# 12 palette entry +#>24 ubeshort x %4.4x +# 13 palette entry +#>26 ubeshort x %4.4x +# 14th palette entry +#>28 ubeshort x %4.4x +# 15th palette entry +#>30 ubeshort x %4.4x +# 16th palette entry +#>32 ubeshort x %4.4x +# data[16000] for uncompressed images; pixel data +#>34 ubequad x \b, DATA %#16.16llx... +# footer for Elite variant images +# https://www.fileformat.info/format/atari/egff.htm +# https://pulkomandy.tk/projects/GrafX2/wiki/Develop/FileFormats/Atari +# left_color_animation[4]; like: 0000000000000000 0000000100020003 fffafff000000030 (bigspid.pi1) +#>32034 ubequad !0 \b, color animations %16.16llx (left) +# right_color_animation[4]; like: 0000000000000000 0000000100020003 +#>>32042 ubequad !0 %16.16llx (right) +# channel_direction[4]; 0~left 1~none 2~right like: 0001000100010001 0002000000010001 (cycle2.pi1) +# sometimes unexpected like: feaafc0000000000 (bigspid.pi1) +#>32050 ubequad !0 \b, channel directions %16.16llx +# channel_delay[4]; 128 - channel delay, timebase 1/60 s +#>32058 ubequad !0 \b, channel delays %16.16llx + # From: Joerg Jenderek # URL: http://fileformats.archiveteam.org/wiki/ImageLab/PrintTechnic -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-images-degas.diff.sig Type: application/octet-stream Size: 2913 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DEGAS-trid-v.txt.gz Type: application/x-gzip Size: 1837 bytes Desc: not available URL: -------------- next part -------------- --- file-5.41/magic/Magdir/intel.old 2021-05-12 16:30:24 +0000 +++ file-5.41/magic/Magdir/intel 2021-12-09 00:33:26 +0000 @@ -44,5 +44,14 @@ #>22 leshort >0 - version %d 0 leshort 0x0200 ->0 use display-coff +# no F_EXEC flag bit implies Intel ia64 COFF object file without optional header +>18 leshort ^0x0002 +# skip some DEGAS high-res uncompressed bitmap *.pi3 handled by ./images like +# GEMINI03.PI3 MODEM2.PI3 POWERFIX.PI3 sigirl1.pi3 vanna5.pi3 +# by test for valid starting character (often point 0x2E) of 1st section name +>>20 ubyte >0x1F +>>>0 use display-coff +# F_EXEC flag bit implies Intel ia64 COFF executable +>18 leshort &0x0002 +>>0 use display-coff 0 leshort 0x8664 >0 use display-coff -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-intel-pi3.diff.sig Type: application/octet-stream Size: 594 bytes Desc: not available URL: From vmihalko at redhat.com Thu Dec 9 16:05:54 2021 From: vmihalko at redhat.com (Vincent Mihalkovic) Date: Thu, 9 Dec 2021 17:05:54 +0100 Subject: [File] json magic - output string Message-ID: Hi, from https://bugzilla.redhat.com/show_bug.cgi?id=2020715: "Description of problem: The command "file" applied to a JSON file now outputs "JSON data", without including the string "text" in the output. Earlier versions behaved differently - on CentOS 6 file returned just "ASCII text" for JSON file,and this is also contrary to the documentation of file (man file), which states: The type printed will usually contain one of the words text (the file contains only printing characters and a few common control characters and is probably safe to read on an ASCII terminal), executable (the file con? tains the result of compiling a program in a form understandable to some UNIX kernel or another), or data meaning anything else (data is usually ?binary? or non-printable). Exceptions are well-known file formats (core files, tar archives) that are known to contain binary data. When modify? ing magic files or the program itself, make sure to preserve these keywords. Users depend on knowing that all the readable files in a di? rectory have the word ?text? printed. Don't do as Berkeley did and change ?shell commands text? to ?shell script?. Apparently file has now done as Berkeley did..." What do you think about changing the "JSON data" to "JSON text data" ? https://github.com/file/file/blob/b56b58d499dbe58f2bed28e6b3c297fe7add992e/src/is_json.c#L417 vincent mihalkovic -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.41-JSON-text.patch Type: text/x-patch Size: 387 bytes Desc: not available URL: From christos at zoulas.com Thu Dec 9 18:38:59 2021 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 9 Dec 2021 13:38:59 -0500 Subject: [File] json magic - output string In-Reply-To: References: Message-ID: <7DBEAB20-EF27-49FC-80F6-E7783E5CD8CD@zoulas.com> You got it! christos > On Dec 9, 2021, at 11:05 AM, Vincent Mihalkovic wrote: > > Hi, > > > > from https://bugzilla.redhat.com/show_bug.cgi?id=2020715 : > > > > "Description of problem: > > The command "file" applied to a JSON file now outputs "JSON data", without > including the string "text" in the output. Earlier versions behaved differently - > on CentOS 6 file returned just "ASCII text" for JSON file,and this is also > > contrary to the documentation of file (man file), which states: > > > The type printed will usually contain one of the words text (the file > contains only printing characters and a few common control characters and > is probably safe to read on an ASCII terminal), executable (the file con? > tains the result of compiling a program in a form understandable to some > UNIX kernel or another), or data meaning anything else (data is usually > ?binary? or non-printable). Exceptions are well-known file formats (core > files, tar archives) that are known to contain binary data. When modify? > ing magic files or the program itself, make sure to preserve these > keywords. Users depend on knowing that all the readable files in a di? > rectory have the word ?text? printed. Don't do as Berkeley did and > change ?shell commands text? to ?shell script?. > > Apparently file has now done as Berkeley did..." > > > > > > What do you think about changing the "JSON data" to "JSON text data" ? > > https://github.com/file/file/blob/b56b58d499dbe58f2bed28e6b3c297fe7add992e/src/is_json.c#L417 > > vincent mihalkovic > > > > > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From rootkea at gmail.com Fri Dec 10 17:32:53 2021 From: rootkea at gmail.com (Avinash Sonawane) Date: Fri, 10 Dec 2021 17:32:53 +0000 Subject: [File] json magic - output string Message-ID: We also need to update the JSON tests. Please find the attached patch fixing the tests. Regards, Avinash Sonawane (rootKea) https://www.rootkea.me -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Fix-JSON-tests.patch Type: text/x-patch Size: 1023 bytes Desc: not available URL: From christos at zoulas.com Fri Dec 10 18:29:22 2021 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 10 Dec 2021 13:29:22 -0500 Subject: [File] [PATCH] Magdir/images, intel for Atari DEGAS bitmap *.pi1 *.pi2 *.pi3 *.pc1 *.pc2 *.pc3 In-Reply-To: <7da42095-3a20-050b-8edf-ff0f868a56fb@gmx.net> References: <7da42095-3a20-050b-8edf-ff0f868a56fb@gmx.net> Message-ID: <372BB12B-97DB-4C55-BA0F-E7EDC0FBAA8B@zoulas.com> Committed, thanks! christos > On Dec 8, 2021, at 8:07 PM, J?rg Jenderek wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > some times ago i send patches for lif file because the starting 2 > byte magic 8000h was too weak. Unfortunately this starting sequence > also occur for Atari DEGAS low-resolution bitmap (*.pc1). > Now i inspect such Atari images with 6 different file name extension > (pi1 pi2 pi3 pc1 pc2 pc3). > > When running running file command version 5.41 on such bitmaps and > related files i get an output like: > ARTIS3.PC2: data > BEETHVEN.IMG: GEM Image data 224 x 131, > 1 planes, 352 x 352 pixelsize > CHURCH.IMG: GEM Image data 224 x 170, > 1 planes, 352 x 352 pixelsize > Faux-Spitzen.abr: data > GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, > 4 planes, 372 x 372 pixelsize > GNUCHESS.PC1: data > LEREDACT.PI3: data > SMTHDRAW.PC3: data > TBX_DEMO.PI3: data > bigspid.pi1: X11 SNF font data, MSB first > bilboule.pi1: data > clinton.img: GEM HYPERPAINT Image data 77 x 87, > 4 planes, 338 x 372 pixelsize > g3test.g3: data > gnucash-4.8.setup.exe.aria2: data > hpcc88.lif: lif file "HPCC88", version 1, > directory length 12, > extensions 0x4d00000002..., > 1st file MCITFYP4 > load-v0001.aria2: data > medres.pi2: data > msvcrt.lib: Intel ia64 COFF object file, > not stripped, 91 sections, > symbol offset=0xa8d3, 644 symbols, > created Tue Sep 19 05:18:08 2006, > 1st section name ".drectve" > plpbt.iso: ISO 9660 CD-ROM filesystem data > 'Plop Boot Manager 5.0.14' (bootable) > sigirl1.pi3: Intel ia64 COFF object file, > not stripped, 256 sections, > 1st section name "" > > For comparison reason i run the file format identification utility > TrID ( See https://mark0.net/soft-trid-e.html). > Most Atari Degas bitmap are described correctly by TrID as > DEGAS bitmap by first two bytes, sometimes with low recognition rate. > Only the PI1 examples are not recognised, probably because the > patterns are too unspecific. > The PI2 examples are described as "DEGAS med-res bitmap" by > definition bitmap-pi2-degas.trid.xml. The PI3 examples are described > as "DEGAS hi-res bitmap" by definition bitmap-pi3-degas.trid.xml. > The PC1 examples are described as "DEGAS low-res compressed bitmap" > by bitmap-pc1-degas.trid.xml. The PC2 examples are described as > "DEGAS med-res compressed bitmap" by bitmap-pc2-degas.trid.xml. > The PC3 examples are described as "DEGAS hi-res compressed bitmap" > by bitmap-pc3-degas.trid.xml. (See appended DEGAS-trid-v.txt.gz ). > > Luckily TrID with -v option display correct file name extension and > also URL pointing to file format specification. That is expressed > inside Magdir/images by additional comment lines likes: > # URL: http://fileformats.archiveteam.org/wiki/DEGAS_image > # Reference: https://wiki.multimedia.cx/index.php?title=Degas > # http://mark0.net/download/triddefs_xml.7z/defs/b > # bitmap-pi2-degas.trid.xml bitmap-pi3-degas.trid.xml > # bitmap-pc1-degas.trid.xml bitmap-pc2-degas.trid.xml > # bitmap-pc3-degas.trid.xml > > On that site download links for examples and graphic tools are > mentioned. I verified information by NetPBM tool pi3topbm, XnView > command line tool nconvert and Deark software. > > Unfortunately there exist no strong significant magic pattern (only 2 > bytes) for such bitmap images. So i put displaying part inside sub > routine degas-bitmap which starts like: > 0 name degas-bitmap >> 0 ubyte x Atari DEGAS > !:mime image/x-atari-degas >> 0 ubyte =0x80 Elite compressed >>> 32042 ubequad x Elite >> 0 beshort 0x0000 bitmap > !:ext pi1 >> 0 beshort 0x0001 bitmap > !:ext pi2 >> 0 beshort 0x0002 bitmap > !:ext pi3 >> 0 beshort 0x8000 bitmap > !:ext pc1 >> 0 beshort 0x8001 bitmap > !:ext pc2 >> 0 beshort 0x8002 bitmap > !:ext pc3 >> 1 ubyte =0 320 x 200 x 16 >> 1 ubyte =1 640 x 200 x 4 >> 1 ubyte =2 640 x 400 x 2 > > The first byte determinate if the image bytes are compressed. The > value 80h means compressed variant and 0 means uncompressed. > The second byte determinate the image resolution and colour depth. > The value 0 means low resolution (320?200, 16 colours). One means > medium resolution (640?200, 4 colours) and value two means high > resolution (640?400, 2 colours). Depending on compression and > resolution the file name extension varies. The documentation also > mention a second extension SUH for uncompressed high resolution, > but i myself found no such examples. Instead of generic mime type > application/octet-stream i show a user defined one. For the > uncompressed images exist a "elite" variant. There after the pixel > data animation information about colour to change, direction and > time delay is stored. This can also be shown at the end by > additional debugging lines like: >> 32034 ubequad !0 \b, color animations %16.16llx (left) >>> 32042 ubequad !0 %16.16llx (right) >> 32050 ubequad !0 \b, channel directions %16.16llx >> 32058 ubequad !0 \b, channel delays %16.16llx > > After the 2 starting bytes the colour palette is stored til offset > 34. The Atari ST palette has place for 16 colour entries. Each entry > occupies 2 bytes ( or 16 bits) in big endian order. Not all 16 bits > are used. Some examples use only 9 bits ( ?????RRR?GGG?BBB with R for > red, G for green, B for Blue and ? for unused. That means 512 > different colours) and some use 12 bits ( ????RRRRGGGGBBBB . That > means 4096 different colours). In documentation also an Atari > Spectrum 512 Enhanced 15 bit palette is mentioned, but luckily this > variant apparently does not be used for DEGAS images. According to > documentation you can not rely that unused bits are zero like in > mentioned "bad" example bilboule.pi1, but i myself found that in most > examples unused bits are zero ( so replace ? by 0 in bit mask). > For control reason show the first colour entries by lines like: >> 2 ubeshort x \b, color palette %4.4x >> 4 ubeshort x %4.4x >> 6 ubeshort x %4.4x >> 8 ubeshort x %4.4x >> 10 ubeshort x %4.4x > So often we see value 0000. That means black colour. The value 0777 > means white colour in 9 bit variant and 0fff means white in 12 varian > t. > So this information can be used as an additional test. So DEGAS > low-resolution compressed bitmaps (like: BATTLSHP.PC1 GNUCHESS.PC1 > MEDUSABL.PC1 MOONLORD.PC1 WILDROSE.PC1) are recognized by lines like: > 0 beshort 0x8000 >> 2 ubeshort&0xF000 0 >>> 0 use degas-bitmap > > The first test for starting two bytes is also true for lif files like > hpcc88.lif handled by Magdir/lif. But there at offset 2 the volume > label ( ASCII like HPCC88 or hexadecimal 485043433838 in example > hpcc88.lif) is stored. So by second test line for unused (that means > zero) colour palette bits lif examples are skipped and the test > succeeds for PC1 bitmaps. > PI1 examples like bigspid.pi1 and bilboule.pi1 are done by lines like > : > 0 beshort 0x0000 >> 2 uquad !0 >>> 4 ubeshort&0xF000 0 >>>> 0 use degas-bitmap > Here by second test line some "bad" ISO 9660 CD-ROM filesystems like > plpbt.iso are skipped. Zero value here would be interpreted as 4 > black colours at the beginning of the colour palette. For real DEGAS > bitmaps one could find one black colour (that means value 0000) in > the colour palette, but then the other entries have other colours > (non zero values) in other entries. So example plpbt.iso with > 8-byte zero value is skipped. > > For Atari mid-res DEGAS bitmap PI2 examples the lines looks like > 0 beshort 0x0001 > ... >>>>>>>> 32026 quad x >>>>>>>>> 0 use degas-bitmap > After test for weak 2 starting bytes by test line eight the GEM > HYPERPAINT Image clinton.img is skipped by check for existence of > bytes at the end of DEGAS images. > > The differentiation between DEGAS PI3 examples and Adobe PhotoShop > Brush ABR is a little bit tricky. For debugging purpose show my > observed information about ABR by lines like: >>> 19 ubyte !0 \b, NOTE LENGTH %u >>>> 21 lestring16 x \b, BRUSH NOTE "%s" > So for example Faux-Spitzen.abr i get note string "Gitter - klein " > with length 15. So for example "Verschiedene Spitzen.abr" i get with > length 8 the note string "Kreis 1 ". > > So if this string length is zero i assume that is not an ABR. So such > examples must be DEGAS images. So many examples ( like: 4th_ofj2.pi3 > GEMINI03.PI3 PEOPLE18.PI3 POWERFIX.PI3 abydos.pi3 highres.pi3 > sigirl1.pi3 vanna5.pi3) are handled by branch like: >>> 19 ubyte =0 >>>> 0 use degas-bitmap > Because brush note string is stored as UTF-16 the stored string > length multiplied with 2 gives the number of bytes occupies by that > string. So this information can be used to inspect the last > last character of Adobe PhotoShop Brush UTF16-LE string and > terminating nil character (that are 4 bytes) by lines like >>> 19 ubyte !0 >>>> (19.b*2) ubequad x >>>>> &8 ubelong x \b, LAST CHAR+NIL %8.8x > For example "Faux-Spitzen.abr" here i get hexadecimal value 006e0000 > (character n) and for "Verschiedene Spitzen.abr" i get hexadecimal > value 00310000 (character 1). > > So when test for such nil bytes gives non zero value, it must be > DEGAS image. So many PI3 examples (like ARABDEMO.PI3 ELMRSESN.PI3 > GEMVIEW.PI3 LEREDACT.PI3 PICCOLO.PI3 REPRO_JR.PI3 ST_TOOLS.PI3 > TBX_DEMO.PI3 evgem7.pi3) are skipped by other branch with additional > nil test lines like: >>>>> &8 ubelong&0xff00ffFF !0 >>>>>> 0 use degas-bitmap > > If test for such nil bytes gives zero value, test it again but now > looks if place of last character of note string contains value is > valid. If this value is "too low" (that means non printable > character) it must be a DEGAS image. If this value is "high" enough > is is a "normal" printable character. That means it is a Photoshop > ABR. By this last branch the remaining last DEGAS bitmaps ( like > BASICNES.PI3 DB_HELP.PI3 DB_WRITR.PI3 LEREDACT.PI3) are skipped by > lines like: >>>>> &8 ubelong&0xff00ffFF =0 >>>>>> &-4 ubelong&0x00FF0000 <0x00200000 >>>>>>> 0 use degas-bitmap > > Some DEGAS high-res uncompressed bitmaps (like GEMINI03.PI3 > MODEM2.PI3 POWERFIX.PI3 sigirl1.pi3 vanna5.pi3) are misidentified as > "Intel ia64 COFF object file" because the 2 byte start pattern is the > same. That was expressed by lines inside Magdir/intel like: > 0 leshort 0x0200 >> 0 use display-coff > Luckily the displaying part is done by subroutine display-coff inside > Magdir/coff. So only additional test lines must be inserted before > calling sub routine. For all my misidentified PI3 examples the > interpreted first section name was nil, whereas for real COFF object > file we get here typical 8-byte sized names (like .text .data > .debug$S .drectve .testseg). So for most COFF objects the starting > character is a point character (0x2E). If i remember right some > Borland compiler for example use DATA instead of .data. So at least > i assume that starting character is like ASCII printable (that > means value "high enough"), whereas for DEGAS image at that offset > the colour palette entry number 10 starts. Because of 4 reserved > bits (that are in most cases zero) we get here a low value. > So DEGAS images can be distinguished with very high rate from COFF > object files. Unfortunately the section name can appear later if COFF > sample contains an optional header. But in documentation is written > that COFF object files have not header part compared with COFF > executables. So i must check by additional second test for F_EXEC > flag bit. If flag is set, then it is an executable and i can > call directly sub routine. If this bit is not set, is is an object > file and i check for starting character of section name befor calling > sub routine. So magic lines now become like: > 0 leshort 0x0200 >> 18 leshort ^0x0002 >>> 20 ubyte >0x1F >>>> 0 use display-coff >> 18 leshort &0x0002 >>> 0 use display-coff > > After applying the above mentioned modifications by patches > file-5.41-images-degas.diff and file-5.41-intel-pi3.diff > then all my Degas bitmaps are correctly identified and some > misidetfication vanish like: > > ARTIS3.PC2: Atari DEGAS Elite compressed bitmap > 640 x 200 x 4, color palette > 0fff 0f00 00f0 0000 0007 ... > BEETHVEN.IMG: GEM Image data 224 x 131, > 1 planes, 352 x 352 pixelsize > CHURCH.IMG: GEM Image data 224 x 170, > 1 planes, 352 x 352 pixelsize > Faux-Spitzen.abr: data > GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, > 4 planes, 372 x 372 pixelsize > GNUCHESS.PC1: Atari DEGAS Elite compressed bitmap > 320 x 200 x 16, color palette > 0221 0000 0310 0420 0530 ... > LEREDACT.PI3: Atari DEGAS Elite bitmap > 640 x 400 x 2, color palette > 0fff 0f00 00f0 0000 0fff ... > SMTHDRAW.PC3: Atari DEGAS Elite compressed bitmap > 640 x 400 x 2, color palette > 0777 0700 0070 0000 0777 ... > TBX_DEMO.PI3: Atari DEGAS Elite bitmap > 640 x 400 x 2, color palette > 0777 0700 0070 0000 0777 ... > bigspid.pi1: Atari DEGAS Elite bitmap > 320 x 200 x 16, color palette > 0004 0025 0037 0000 0410 ... > bilboule.pi1: Atari DEGAS Elite bitmap > 320 x 200 x 16, color palette > 0000 0111 8222 0333 fcc4 ... > clinton.img: GEM HYPERPAINT Image data 77 x 87, > 4 planes, 338 x 372 pixelsize > g3test.g3: data > gnucash-4.8.setup.exe.aria2: data > hpcc88.lif: lif file "HPCC88", version 1, > directory length 12, > extensions 0x4d00000002..., > 1st file MCITFYP4 > load-v0001.aria2: data > medres.pi2: Atari DEGAS Elite bitmap > 640 x 200 x 4, color palette > 0777 0700 0070 0000 0007 ... > msvcrt.lib: Intel ia64 COFF object file, > not stripped, 91 sections, > symbol offset=0xa8d3, 644 symbols, > created Tue Sep 19 05:18:08 2006, > 1st section name ".drectve" > plpbt.iso: ISO 9660 CD-ROM filesystem data > 'Plop Boot Manager 5.0.14' (bootable) > sigirl1.pi3: Atari DEGAS Elite bitmap > 640 x 400 x 2, color palette > 0001 0000 0000 0000 0000 ... > > I hope my diff files can be applied in future version of file utility > . > > > With best wishes > J?rg Jenderek > - -- > J?rg Jenderek > > > -----BEGIN PGP SIGNATURE----- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYbFWxwAKCRCv8rHJQhrU > 1rqHAKCpHT9jL3dmki5XL/+jSjpzVHjyYwCfXx5NiYn9nsJ3jwLeoDMKgJCbPqg= > =Wm0D > -----END PGP SIGNATURE----- > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From rootkea at gmail.com Fri Dec 10 23:50:35 2021 From: rootkea at gmail.com (Avinash Sonawane) Date: Sat, 11 Dec 2021 05:20:35 +0530 Subject: [File] Don't count `\0` in string length Message-ID: <20211211052035.2fc23e1d@optimus> Hello! At present, we count `\0` byte in string length for `desired_len` but not for `result_len` (uses strlen). So, on current master currently for every successful test `desired_len = result_len + 1` but since we make comparison using strcmp we don't see it (for successful test). This patch correctly ignores `\0` while calculating `desired_len`. Thanks! Regards, Avinash Sonawane (rootKea) https://www.rootkea.me -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Don-t-count-0-in-string-length.patch Type: text/x-patch Size: 611 bytes Desc: not available URL: From rootkea at gmail.com Sat Dec 11 00:04:08 2021 From: rootkea at gmail.com (Avinash Sonawane) Date: Sat, 11 Dec 2021 05:34:08 +0530 Subject: [File] json magic - output string In-Reply-To: References: Message-ID: <20211211053408.23e440c3@optimus> On Fri, 10 Dec 2021 17:32:53 +0000 Avinash Sonawane wrote: > We also need to update the JSON tests. The tests are still broken. I think it's because we introduced newlines at the end of json[1-3].result files? BTW, since result files are text files we should indeed place newline at the end of files? Trying to be POSIX compliant.[0] This will need a change in test.c/slurp(). [0] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206 Regards, Avinash Sonawane (rootKea) https://www.rootkea.me From christos at zoulas.com Sun Dec 12 16:27:07 2021 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 12 Dec 2021 11:27:07 -0500 Subject: [File] json magic - output string In-Reply-To: <20211211053408.23e440c3@optimus> References: <20211211053408.23e440c3@optimus> Message-ID: <45BDC7D9-DE4B-40BE-AC9E-878CCB9C7CF4@zoulas.com> Perhaps, but let's remove the newlines for now since this is the minimal change. christos > On Dec 10, 2021, at 7:04 PM, Avinash Sonawane wrote: > > On Fri, 10 Dec 2021 17:32:53 +0000 > Avinash Sonawane wrote: > >> We also need to update the JSON tests. > > The tests are still broken. > > I think it's because we introduced newlines at the end of > json[1-3].result files? > > BTW, since result files are text files we should indeed place newline > at the end of files? Trying to be POSIX compliant.[0] > > This will need a change in test.c/slurp(). > > [0] > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206 > > Regards, > Avinash Sonawane (rootKea) > https://www.rootkea.me > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: From joerg.jen.der.ek at gmx.net Wed Dec 22 22:02:00 2021 From: joerg.jen.der.ek at gmx.net (=?UTF-8?Q?J=c3=b6rg_Jenderek?=) Date: Wed, 22 Dec 2021 23:02:00 +0100 Subject: [File] Magdir for aria control file *.aria2 Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, some times ago i send patches for Atari DEGAS bitmap. Unfortunately the weak starting sequence 0x0001 of medium resolution bitmap *.PI2 also occur in aria control files. Now i inspect such control files with file name extension aria2, which are produced by download tool client aria2c during work progres s. When running running file command version 5.41 on such control files and related files i get an output like: 8GadgetPackSetup.msi: Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.2, MSI Installer, Code page: 1252, Title: Installation Database, Subject: 8GadgetPack, Author: 8GadgetPack.net, 8GadgetPackSetup.msi.aria2: data DIAGRAM1.PI2: dBase III DBT, version number 0, next free block index 256, 1st item "\377\377" GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, 4 planes, 372 x 372 pixelsize ST_TOOLS.PI2: data gnucash-4.8.setup.exe: PE32 executable (GUI) Intel 80386, for MS Windows gnucash-4.8.setup.exe.aria2: data load-nonBt-v0001.aria2: data load-v0001.aria2: data medres.pi2: data For comparison reason i run the file format identification utility TrID ( See https://mark0.net/soft-trid-e.html). All aria control files are misidentified as "DEGAS med-res bitmap" with PI2 file name extension by bitmap-pi2-degas.trid.xml, because of same weak 2 byte starting pattern (See appended aria-trid-v.txt.gz). Luckily on German Wikipedia was a page about Aria software. On referenced software page on GitHub is a document with technical notes (especially the aria2 control file format). That is expressed inside Magdir aria.txt by comment lines likes: # URL: https://de.wikipedia.org/wiki/Aria_(Software) # Reference: https://github.com/aria2/aria2/ # blob/master/doc/manual-src/en/technical-notes.rst Unfortunately there exist no strong significant magic pattern (only 2 bytes) for such control files. So i put displaying part inside sub routine aria which starts like: 0 name aria >0 beshort x aria2 control file, version %u !:mime application/x-aria !:ext aria2 The first 2 bytes determinate the version. It should be either version 0 (0x0000) or version 1 (0x0001). In version 1, all multi-byte integers are saved in network byte order (big endian). In version 0, all multi-byte integers are saved in host byte order. I handle only version 1 variant, because i found no older version 0 examples. Instead of generic mime type application/octet-stream i show a user defined one and display used filename extension aria2. Afterwards 4 EXTENSION bytes are stored. Here i found only value 0 or 1. According to specification aria2 checks whether the saved Info Has h and current downloading one are the same ( called "infoHashCheck" extension) if EXT[3]&1 == 1. So show non zero value by line like: >2 ubelong !0 \b, infoHashCheck %#x Afterwards show info hash information if available (like in example load-v0001.aria2) by lines like >6 ubelong !0 \b, %x bytes info hash >>10 ubequad x %#16.16llx... Afterwards show other length information by lines like: >(6.L+10) ubelong x \b, piece length 0x%x >(6.L+14) ubequad x \b, total length %llu >(6.L+22) ubequad !0 \b, upload length %#llx With these information i was able to identify such control files by lines like: 0 beshort 0x0001 >2 ubelong&0xffFFffFE 0x00000000 >>(6.L+14) ubequad >0 >>>0 use aria The first test line checks for valid version number of one variant. This test is also true for DEGAS PI2 bitmaps and GEM IMG bitmaps. By second test for valid infoHashCheck extension probably all DEGAS PI2 bitmaps are skipped. Interpreting EXT value as DEGAS colour palette entry would mean a bitmap with black (0000) in first entry and also black again in second entry or "nearly black with little blue" (0001) in second entry. But PI2 images have only 4 colors. So you maybe find on black entry but typically the second entry is another contrast colour like white (0777 or 0FFFF). Finally i found one such unlikely example DIAGRAM1.PI2 with 2 black colour entries at beginning of palette. Also GEM IMG graphics are skipped by second test, because there at offset 2 the header size is stored which is significantly higher than 0 or 1. By third test i check for valid total download length (At the moment greater than zero). By this test line DIAGRAM1.PI2 is skipped because there value at possible offset gives invalid length 0. After applying the above mentioned modifications by aria.txt then former diff for DEGAS images then all my aria control files are correctly identified and are different from Atari Degas bitmaps and GEM IMG graphics. This now looks like: 8GadgetPackSetup.msi: Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.2, MSI Installer, Code page: 1252, Title: Installation Database, Subject: 8GadgetPack, Author: 8GadgetPack.net, 8GadgetPackSetup.msi.aria2: aria2 control file, version 1, piece length 0x100000, total length 27517992, 0x4 bytes bitfield 0x1... DIAGRAM1.PI2: Atari DEGAS Elite bitmap 640 x 200 x 4, color palette 0000 0000 0777 0777 0000 GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, 4 planes, 372 x 372 pixelsize ST_TOOLS.PI2: Atari DEGAS Elite bitmap 640 x 200 x 4, color palette 0fff 0f00 00f0 0000 0fff gnucash-4.8.setup.exe: PE32 executable (GUI) Intel 80386, for MS Windows gnucash-4.8.setup.exe.aria2: aria2 control file, version 1, piece length 0x100000, total length 153249768, 0x13 bytes bitfield load-nonBt-v0001.aria2: aria2 control file, version 1, piece length 0x400, total length 81920, 0xa bytes bitfield 0xffffffffffffffff... load-v0001.aria2: aria2 control file, version 1, infoHashCheck 0x1, 0x14 bytes info hash 0x1122334455667788..., piece length 0x400, total length 81920, upload length 0x400, 0xa bytes bitfield 0xffffffffffffffff... medres.pi2: Atari DEGAS Elite bitmap 640 x 200 x 4, color palette 0777 0700 0070 0000 0007 I hope my aria template for Magdir can be applied in future version of file utility. With best wishes J?rg Jenderek - -- J?rg Jenderek -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYcOgWAAKCRCv8rHJQhrU 1nmhAKCX/RoBtNY516Dz8WiKgzs8S1QUZgCdEgKx/8GXEJI62nLf3wrpScVLUTQ= =cJOs -----END PGP SIGNATURE----- -------------- next part -------------- #------------------------------------------------------------------------------ # URL: https://de.wikipedia.org/wiki/Aria_(Software) # Reference: https://github.com/aria2/aria2/blob/master/doc/manual-src/en/technical-notes.rst # From: Joerg Jenderek # Note: only version 1 suited # check for valid version one 0 beshort 0x0001 # skip most uncompressed DEGAS med-res bitmap *.PI2 and GEM bitmap (v1) *.IMG # by test for valid infoHashCheck extension >2 ubelong&0xffFFffFE 0x00000000 # skip DEGAS med-res bitmap DIAGRAM1.PI2 by test for valid length of download >>(6.L+14) ubequad >0 >>>0 use aria 0 name aria # version; (0x0000) or (0x0001); for 0 all multi-byte are in host byte order. For 1 big endian >0 beshort x aria2 control file, version %u #!:mime application/octet-stream !:mime application/x-aria !:ext aria2 # EXTension; if EXT[3]&1 == 1 checks whether saved InfoHash and current downloading the same; infoHashCheck extension >2 ubelong !0 \b, infoHashCheck %#x # info hash length like: 0 14h >6 ubelong !0 \b, %#x bytes info hash # info hash; BitTorrent InfoHash >>10 ubequad x %#16.16llx... # piece length; the length of the piece like: 400h 100000h >(6.L+10) ubelong x \b, piece length 0x%x # total length; the total length of the download >(6.L+14) ubequad x \b, total length %llu #>(6.L+14) ubequad x \b, total length %#llx # upload length; the uploaded length of download like: 0 400h >(6.L+22) ubequad !0 \b, upload length %#llx # bitfield length; the length of bitfield like: 4 6 Ah 10h 13h 167h >(6.L+30) ubelong x \b, %#x bytes bitfield # bitfield; bitfield which represents current download progress >(6.L+34) ubequad !0 %#llx... -------------- next part -------------- A non-text attachment was scrubbed... Name: aria.txt.sig Type: application/octet-stream Size: 941 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: aria-trid-v.txt.gz Type: application/x-gzip Size: 1420 bytes Desc: not available URL: From christos at zoulas.com Fri Dec 24 18:09:08 2021 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 24 Dec 2021 13:09:08 -0500 Subject: [File] Magdir for aria control file *.aria2 In-Reply-To: References: Message-ID: <2FB95730-5867-486B-B762-9AE6F00C16CC@zoulas.com> Committed thanks! Happy Holidays, christos > On Dec 22, 2021, at 5:02 PM, J?rg Jenderek wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > some times ago i send patches for Atari DEGAS bitmap. > > Unfortunately the weak starting sequence 0x0001 of medium resolution > bitmap *.PI2 also occur in aria control files. > Now i inspect such control files with file name extension aria2, > which are produced by download tool client aria2c during work progres > s. > > When running running file command version 5.41 on such control files > and related files i get an output like: > > 8GadgetPackSetup.msi: Composite Document File V2 Document, > Little Endian, Os: Windows, Version 6.2, > MSI Installer, Code page: 1252, > Title: Installation Database, > Subject: 8GadgetPack, > Author: 8GadgetPack.net, > 8GadgetPackSetup.msi.aria2: data > DIAGRAM1.PI2: dBase III DBT, version number 0, > next free block index 256, > 1st item "\377\377" > GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, > 4 planes, 372 x 372 pixelsize > ST_TOOLS.PI2: data > gnucash-4.8.setup.exe: PE32 executable (GUI) Intel 80386, > for MS Windows > gnucash-4.8.setup.exe.aria2: data > load-nonBt-v0001.aria2: data > load-v0001.aria2: data > medres.pi2: data > > For comparison reason i run the file format identification utility > TrID ( See https://mark0.net/soft-trid-e.html). All aria control > files are misidentified as "DEGAS med-res bitmap" with PI2 file name > extension by bitmap-pi2-degas.trid.xml, because of same weak 2 byte > starting pattern (See appended aria-trid-v.txt.gz). > > Luckily on German Wikipedia was a page about Aria software. On > referenced software page on GitHub is a document with technical notes > (especially the aria2 control file format). That is expressed inside > Magdir aria.txt by comment lines likes: > # URL: https://de.wikipedia.org/wiki/Aria_(Software) > # Reference: https://github.com/aria2/aria2/ > # blob/master/doc/manual-src/en/technical-notes.rst > > Unfortunately there exist no strong significant magic pattern (only 2 > bytes) for such control files. So i put displaying part inside sub > routine aria which starts like: > 0 name aria >> 0 beshort x aria2 control file, version %u > !:mime application/x-aria > !:ext aria2 > > The first 2 bytes determinate the version. It should be either > version 0 (0x0000) or version 1 (0x0001). In version 1, all > multi-byte integers are saved in network byte order (big endian). In > version 0, all multi-byte integers are saved in host byte order. I > handle only version 1 variant, because i found no older version 0 > examples. Instead of generic mime type application/octet-stream i > show a user defined one and display used filename extension aria2. > > Afterwards 4 EXTENSION bytes are stored. Here i found only value 0 or > 1. According to specification aria2 checks whether the saved Info Has > h > and current downloading one are the same ( called "infoHashCheck" > extension) if EXT[3]&1 == 1. > So show non zero value by line like: >> 2 ubelong !0 \b, infoHashCheck %#x > > Afterwards show info hash information if available (like in example > load-v0001.aria2) by lines like >> 6 ubelong !0 \b, %x bytes info hash >>> 10 ubequad x %#16.16llx... > Afterwards show other length information by lines like: >> (6.L+10) ubelong x \b, piece length 0x%x >> (6.L+14) ubequad x \b, total length %llu >> (6.L+22) ubequad !0 \b, upload length %#llx > > With these information i was able to identify such control files by > lines like: > 0 beshort 0x0001 >> 2 ubelong&0xffFFffFE 0x00000000 >>> (6.L+14) ubequad >0 >>>> 0 use aria > > The first test line checks for valid version number of one variant. > This test is also true for DEGAS PI2 bitmaps and GEM IMG bitmaps. By > second test for valid infoHashCheck extension probably all DEGAS PI2 > bitmaps are skipped. Interpreting EXT value as DEGAS colour palette > entry would mean a bitmap with black (0000) in first entry and also > black again in second entry or "nearly black with little blue" (0001) > in second entry. But PI2 images have only 4 colors. So you maybe find > on black entry but typically the second entry is another contrast > colour like white (0777 or 0FFFF). Finally i found one such > unlikely example DIAGRAM1.PI2 with 2 black colour entries at > beginning of palette. Also GEM IMG graphics are skipped by second > test, because there at offset 2 the header size is stored which is > significantly higher than 0 or 1. By third test i check for valid > total download length (At the moment greater than zero). By this > test line DIAGRAM1.PI2 is skipped because there value at possible > offset gives invalid length 0. > > After applying the above mentioned modifications by aria.txt then > former diff for DEGAS images then all my aria control files are > correctly identified and are different from Atari Degas bitmaps and > GEM IMG graphics. This now looks like: > > 8GadgetPackSetup.msi: Composite Document File V2 Document, > Little Endian, Os: Windows, Version 6.2, > MSI Installer, Code page: 1252, > Title: Installation Database, > Subject: 8GadgetPack, > Author: 8GadgetPack.net, > 8GadgetPackSetup.msi.aria2: aria2 control file, version 1, > piece length 0x100000, > total length 27517992, > 0x4 bytes bitfield 0x1... > DIAGRAM1.PI2: Atari DEGAS Elite bitmap 640 x 200 x 4, > color palette 0000 0000 0777 0777 0000 > GAMEOVR4.IMG: GEM XIMG Image data 256 x 176, > 4 planes, 372 x 372 pixelsize > ST_TOOLS.PI2: Atari DEGAS Elite bitmap 640 x 200 x 4, > color palette 0fff 0f00 00f0 0000 0fff > gnucash-4.8.setup.exe: PE32 executable (GUI) Intel 80386, > for MS Windows > gnucash-4.8.setup.exe.aria2: aria2 control file, version 1, > piece length 0x100000, > total length 153249768, > 0x13 bytes bitfield > load-nonBt-v0001.aria2: aria2 control file, version 1, > piece length 0x400, > total length 81920, > 0xa bytes bitfield 0xffffffffffffffff... > load-v0001.aria2: aria2 control file, version 1, > infoHashCheck 0x1, > 0x14 bytes > info hash 0x1122334455667788..., > piece length 0x400, > total length 81920, > upload length 0x400, > 0xa bytes bitfield 0xffffffffffffffff... > medres.pi2: Atari DEGAS Elite bitmap 640 x 200 x 4, > color palette 0777 0700 0070 0000 0007 > > I hope my aria template for Magdir can be applied in future version > of file utility. > > With best wishes > J?rg Jenderek > - -- > J?rg Jenderek > > > > > > > -----BEGIN PGP SIGNATURE----- > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > > iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYcOgWAAKCRCv8rHJQhrU > 1nmhAKCX/RoBtNY516Dz8WiKgzs8S1QUZgCdEgKx/8GXEJI62nLf3wrpScVLUTQ= > =cJOs > -----END PGP SIGNATURE----- > -- > File mailing list > File at astron.com > https://mailman.astron.com/mailman/listinfo/file > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 235 bytes Desc: Message signed with OpenPGP URL: