[File] Magdir for Data Interchange Format; *.dif

Jörg Jenderek joerg.jen.der.ek at gmx.net
Sun Apr 5 22:58:13 UTC 2020


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

some days ago i used SoftMaker Office. The spreadsheet module can
read/read documents with filename extension dif.
When running file command version 5.38 on such documents these
examples are described generic as text.

For comparison reason i also run other file identifying tools.
TrID ( See https://mark0.net/soft-trid-e.html ) identifies such
documents as "Data Interchange Format"
DROID ( See https://digital-preservation.github.io/droid/ )
describes some documents as "VisiCalc Database" by x-fmt/368 or
x-fmt/41.

Information about that file format can be found on file formats
archive team website and on Wikipedia. This is expressed by comment
lines like:
 # URL: http://en.wikipedia.org/wiki/Data_Interchange_Format
 # http://fileformats.archiveteam.org/wiki/Data_Interchange_Format
According to documentation characteristic at the beginning of
document is upcased text identifier TABLE. That is used as
first test line:
 0	string		TABLE
Unfortunately this is still not unique enough. To skip text starting
with phrase TABLE look for numeric version number on 2nd line (CRNL
or NL terminated) by additional test line:
 >6	search/2	0,
At this point identification is still not unique. To skip DROID
x-fmt-41-signature-id-380.dif look at the beginning for upcased text
identifier TUPLES by third test line
 >>27	search/128	TUPLES		Data Interchange Format

Data Interchange Format (.dif) is a text file format used to
import/export single spreadsheets between spreadsheet programs
(OpenOffice.org Calc, Excel, Gnumeric, StarCalc, Lotus 1-2-3,
FileMaker, dBase, Framework, Multiplan, PlanMaker etc). Sometimes
a generator comment like EXCEL is found on third line enclosed in
double quotes (="=0x22). Two double quotations with no space between
if it is not used. So show this information if available by lines
 >>>10	search/3	"
 >>>>&0	ubyte		!0x22		\b, generator or table name
 >>>>>&-2		string	x	%s

According to documentation extension dif is used. This is expressed
by line
 !:ext	dif

Because DIF examples are just pure test files generic mime type
"text/plain" could be used. Well suited seems to be
application/x-dif-spreadsheet but this explanation for dif found on
pcmatic.com web site was not so precisely. I find this type some
times for Gnumeric. The type application/vnd.ms-excel is found on
systems with registered name "Microsoft Office Excel Data Interchange
Format".
LibreOffice and OpenOffice sometimes seems to use
application/x-dif-document.
Most often seems to be used application/x-dif. That type is also
mentioned on Wikipedia list of file formats. So i choose this type.
That is expressed by line:
 !:mime	application/x-dif

After applying the above mentioned modifications by appended DIF.txt
then all inspected Data Interchange Format examples are now described
correctly and i get an output like:

lsusb-wiko.dif:                ASCII text
PGP262I.DIF:                   ASCII text, with CRLF line terminators
TABLE.txt:                     ASCII text, with CRLF line terminators
x-fmt-41-signature-id-380.dif: ISO-8859 text
visicalc.dif:                  Data Interchange Format
LibreOffice-dif.dif:           Data Interchange Format,
                               generator or table name "MyTable1"
planmaker-dif.dif:             Data Interchange Format,
                               generator or table name "pmw"
excel-dif.dif:                 Data Interchange Format,
                               generator or table name "EXCEL"

Because file format is pure text magic lines maybe belong to
Magdir/wordprocessors. Because file format is a spreadsheet the lines
maybe belong to Magdir/database.
The remaining DIFs are real pure text like PGP262I.DIF or output of
diff command.

I hope my lines for Magdir can be applied in future version of file
utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek




-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCXopicwAKCRCv8rHJQhrU
1tMEAJ9EqxFDdEQSj9mVk0YCiqjSzfgv9wCdFV+kH3xTS8ZxUHAWMSkV0qzqZ94=
=2kAL
-----END PGP SIGNATURE-----
-------------- next part --------------

#------------------------------------------------------------------------------

# From:	Joerg Jenderek
# URL:	http://en.wikipedia.org/wiki/Data_Interchange_Format
#	http://fileformats.archiveteam.org/wiki/Data_Interchange_Format
# Note:	called by TrID "Data Interchange Format",
#	by DROID x-fmt/368 "VisiCalc Database"
0	string		TABLE
# skip text starting with TABLE by looking for numeric version on 2nd line
>6	search/2	0,
# skip DROID x-fmt-41-signature-id-380.dif by looking for key word TUPLES at the beginning
>>27	search/128	TUPLES		Data Interchange Format
# https://www.pcmatic.com/company/libraries/fileextension/detail.asp?ext=dif.html
#!:mime	application/x-dif-spreadsheet	Gnumeric
# https://github.com/LibreOffice/online/blob/master/discovery.xml
#!:mime	application/x-dif-document	LibreOffice
# https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats/Lists/File_formats
!:mime	application/x-dif
# https://extension.nirsoft.net/dif
#!:mime	application/vnd.ms-excel
#!:mime	text/plain
!:ext	dif
# look for double quote 0x22 on 3rd line
>>>10	search/3	"
# skip if next character also double quote 
>>>>&0	ubyte		!0x22		\b, generator or table name
# comment like EXCEL, pwm enclosed in double quotes
>>>>>&-2	string	x		%s

-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIF.txt.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20200406/c42b5f2f/attachment.obj>


More information about the File mailing list