[File] [PATCH] Magdir/windows codepage translator missed Cyrillic variant 12510866.CPX

Jörg Jenderek joerg.jen.der.ek at gmx.net
Thu Oct 28 20:05:38 UTC 2021


Hello,

some times ago i send patches to recognize Windows code page
translator files with file name extension CPX. Some days ago i
installed an old Windows software. Just for interest i run
file command inside installation directory.

When running running file command version 5.41 such CPX examples i
get an output like:
12500852.CPX: Windows codepage translator
12510866.CPX: ASCII text, with CRLF line terminators
12520437.CPX: Windows codepage translator
12520850.CPX: Windows codepage translator
12520860.CPX: Windows codepage translator
12520861.CPX: Windows codepage translator
12520863.CPX: Windows codepage translator
12520865.CPX: Windows codepage translator

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html).
All CPX examples are described correctly by TrID as "Windows Codepage
translator" by cpx.trid.xml. I got also a description "Generic INI
configuration" by ini.trid.xml which is also correct but less
specific. (See appended cpx_trid-v.txt.gz).

The used document obdcdriv_character_translation.htm about Character
Translation inside Magdir/windows is now found at other place. So
these informations are now expressed by comment lines inside
Magdir/windows like:
# URL:		https://documentation.basis.com/BASISHelp/WebHelp/
#		b3odbc/ODBC_Driver/obdcdriv_character_translation.htm
# Ref.:	http://mark0.net/download/triddefs_xml.7z/defs/c/cpx.trid.xml

Most examples are described as correctly as "Windows codepage
translator" with CPX extension, but example 12510866.CPX is only
described as ASCII text with CRLF line terminators and unknown
extension. Most examples have starting lines like:
[Windows Latin 1/437 (English)]
[Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
[Windows Latin 1(1252)/860 (Portugal)]
[Windows Latin 1(1252)/861 (Iceland)]
[Windows Latin 1(1252)/863 (French Canada)]

Unfortunately no official or complete documentation exist for that
file format. So i myself rely on Gary Kessler file signature table
(See garykessler.net). That information was used to describe such CPX
examples inside sub routine ini-file by lines like:
  >>&0 regex/c	\^(Windows\ Latin)	Windows codepage translator
  !:mime	text/plain
  !:ext	cpx

The undetected example 12510866.CPX has a starting line like:
[Windows Cyrillic(1251)/866 (Russian)]
So here second word is Cyrillic instead of Latin. So i mention this
fact comment line. So the magic lines now becomes like:
  >>&0 regex/c \^(Windows\ )(Latin|Cyrillic)
					Windows codepage translator
  !:mime	text/x-ms-cpx
  !:ext	cpx
Instead of generic mime type text/plain i choose the user defined one
mentioned by TrID.

After applying the above mentioned modifications by patch
file-5.41-windows-cpx.diff and then all my CPX examples are still
correctly identified and misidentification vanish like:

12500852.CPX: Windows codepage translator
12510866.CPX: Windows codepage translator
12520437.CPX: Windows codepage translator
12520850.CPX: Windows codepage translator
12520860.CPX: Windows codepage translator
12520861.CPX: Windows codepage translator
12520863.CPX: Windows codepage translator
12520865.CPX: Windows codepage translator

I hope my diff file can be applied in future version of file utility.
Maybe there exist more non Latin codepage translator files.

With best wishes
Jörg Jenderek
--
Jörg Jenderek









































-------------- next part --------------
--- file-5.41/magic/Magdir/windows.old	2021-05-12 16:30:24 +0000
+++ file-5.41/magic/Magdir/windows	2021-10-28 19:42:38 +0000
@@ -577,11 +577,15 @@
 !:mime text/plain
 !:ext	hhp
 # From:		Joerg Jenderek
-# URL:		https://documentation.basis.com/BASISHelp/WebHelp/b3odbc/obdcdriv_character_translation.htm
+# URL:		https://documentation.basis.com/BASISHelp/WebHelp/b3odbc/ODBC_Driver/obdcdriv_character_translation.htm
 # Reference:	https://www.garykessler.net/library/file_sigs.html
+#		http://mark0.net/download/triddefs_xml.7z/defs/c/cpx.trid.xml
 # Note:		stored in directory %WINDIR%\SysWOW64 or %WINDIR%\system
->>&0	regex/c		\^(Windows\ Latin)				Windows codepage translator
-!:mime	text/plain
+#		second word often Latin but sometimes Cyrillic like in 12510866.CPX
+>>&0	regex/c		\^(Windows\ )(Latin|Cyrillic)			Windows codepage translator
+#!:mime	text/plain
+!:mime	text/x-ms-cpx
+# like: 12510866.CPX 
 !:ext	cpx
 # unknown keyword after opening bracket
 >>&0	default				x
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.41-windows-cpx.diff.sig
Type: application/octet-stream
Size: 713 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211028/26fc5ae7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cpx_trid-v.txt.gz
Type: application/x-gzip
Size: 455 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20211028/26fc5ae7/attachment.bin>


More information about the File mailing list