[File] [PATCH] of Magdir/msdos Microsoft OneNote Package misidetfied as Microsoft Cabinet archive

Jörg Jenderek joerg.jen.der.ek at gmx.net
Tue Sep 6 22:56:07 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Some days ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.
One listed extension is ONEPKG.

When running file command (version 5.42) on such examples i get an
output like:

DemoNotebook.onepkg: Microsoft Cabinet archive data,
		     many, 234775 bytes, 8 files,
		     at 0x2c
		     +A "Bespechungsnotizen.one"
		     +A "Forschung.one"
		     , number 1,
		     12 datablocks, 0x1203 compression
Notebook03.onepkg:   Microsoft Cabinet archive data,
		     many, 1272589 bytes, 2 files,
		     at 0x44
		     +Utf "Editor \303\266ffnen.onetoc2"
		     +Utf "Allgemein.one"
		     , flags 0x4, number 1,
		     extra bytes 20 in head,
		     44 datablocks, 0xf03 compression
ONGuide.onepkg:      Microsoft Cabinet archive data,
		     many, 248915 bytes, 2 files,
		     at 0x44
		     + "Editor \303\266ffnen.onetoc2"
		     + "Erste Schritte - Beta 1.one"
		     , flags 0x4, number 1,
		     extra bytes 20 in head,
		     datablocks, 0xf03 compression
test-onenote.onepkg: Microsoft Cabinet archive data,
		     Windows 2000/XP setup, 3977 bytes, 1 file,
		     at 0x2c
		     +A "test-onenote.one"
		     , number 1,
		     1 datablock, 0x1203 compression

With --extension option cab or _/?_/??_ is displayed. Furthermore
with -i option only generic application/vnd.ms-cab-compressed
is shown.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This identifies all
such examples with low rate as "Microsoft Cabinet Archive" by
ark-cab.trid.xml. Many examples are described with high rate
as "Microsoft OneNote Package" by onepkg.trid.xml
(See appended trid-v-onepkg.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
identifies few examples also only generic as "Windows Cabinet File"
with mime type application/vnd.ms-cab-compressed by PUID x-fmt/414.
But it complains about file suffix ONEPKG.  Many examples are
described as "Microsoft OneNote Package File" by fmt/987 (See
appended output/droid-onepkg.csv). This utility identifies onepkg by
looking for file name extension onetoc inside the first 2 KB blocks.
But this not always true, because some packages have no table of
contents.

Luckily TrID with -v option shows Related URL and file name
extension. So with this help information about file formats can be
added. That is expressed inside Magdir/msdos by comment lines like:

# URL:		https://en.wikipedia.org/wiki/Microsoft_OneNote
#		http://fileformats.archiveteam.org/wiki/OneNote
# Reference:	https://mark0.net/download/triddefs_xml.7z
#		defs/o/onepkg.trid.xml

According to that documentation ONEPKG are just CAB archive
containing Microsoft OneNote, with 3 byte file name extension one.
OneNote table of contents have file name extension ONETOC or ONETOC2.

So when looking in current output of file command, we see that first
member name is something like "Class Notes.one", "test-onenote.one",
"Open Notebook.onetoc2" or "Editor Öffnen.onetoc2". This can be
verified by unpacking command line tool via command line like:
	7z l -tcab *onepkg

The description happens inside Magdir/msdos by starting like:
0	string/b MSCF\0\0\0\0	Microsoft Cabinet archive data

No i must insert lines looking for 3 byte one file name suffix. The
jump to first member entry and looking for point character before
suffix is done by lines like:
 >>(16.l+16)	ubyte	x
 .
 >>>>&-1	search/255 	.
Now i am in branch for file name extension. After last of that kind
(that is theme for Windows 7 or 8 Theme Pack) i insert lines for
OneNote Package. This looks like:
 >>>>>&0	string/c	one		\b, OneNote Package
 !:mime	application/msonenote
 !:ext	onepkg
Instead of generic mime type application/vnd.ms-cab-compressed
or application/octet-stream i show a type mentioned on nirsoft web
site. But this is not official registered.

After applying the above mentioned modifications by patch
file-5.42-msdos-onepkg.diff then my OneNote Packages are described
more precisely like:

DemoNotebook.onepkg: Microsoft Cabinet archive data,
		     OneNote Package, 234775 bytes, 8 files,
		     at 0x2c
		     +A "Bespechungsnotizen.one"
		     +A "Forschung.one"
		     , number 1,
		     12 datablocks, 0x1203 compression
Notebook03.onepkg:   Microsoft Cabinet archive data,
		     OneNote Package, 1272589 bytes, 2 files,
		     at 0x44
		     +Utf "Editor \303\266ffnen.onetoc2"
		     +Utf "Allgemein.one"
		     , flags 0x4, number 1,
		     extra bytes 20 in head,
		     44 datablocks, 0xf03 compression
ONGuide.onepkg:      Microsoft Cabinet archive data,
		     OneNote Package, 248915 bytes, 2 files,
		     at 0x44
		     + "Editor \303\266ffnen.onetoc2"
		     + "Erste Schritte - Beta 1.one"
		     , flags 0x4, number 1,
		     extra bytes 20 in head,
		     9 datablocks, 0xf03 compression
test-onenote.onepkg: Microsoft Cabinet archive data,
		     OneNote Package, 3977 bytes, 1 file,
		     at 0x2c
		     +A "test-onenote.one"
		     , number 1,
		     1 datablock, 0x1203 compression

I hope my diff file can be applied in future version of
file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCYxfQBgAKCRCv8rHJQhrU
1v9rAKCdBfg22WJHViuJPPCmi4tT1XFSyQCgqhcIFHC+MO6hLZ4FT+hdNE2DxKw=
=CI3n
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-onepkg.txt.gz
Type: application/x-gzip
Size: 470 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220907/e3529648/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-onepkg.csv.gz
Type: application/x-gzip
Size: 442 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220907/e3529648/attachment-0001.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: file-5.42-msdos-openpkg.diff
URL: <https://mailman.astron.com/pipermail/file/attachments/20220907/e3529648/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.42-msdos-openpkg.diff.sig
Type: application/octet-stream
Size: 1010 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220907/e3529648/attachment.obj>


More information about the File mailing list