[File] [PATCH] Magdir/windows codepage translator missed Cyrillic variant 12510866.CPX

Christos Zoulas christos at zoulas.com
Thu Oct 28 20:23:08 UTC 2021


Committed, thanks!

christos

> On Oct 28, 2021, at 4:05 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some times ago i send patches to recognize Windows code page
> translator files with file name extension CPX. Some days ago i
> installed an old Windows software. Just for interest i run
> file command inside installation directory.
> 
> When running running file command version 5.41 such CPX examples i
> get an output like:
> 12500852.CPX: Windows codepage translator
> 12510866.CPX: ASCII text, with CRLF line terminators
> 12520437.CPX: Windows codepage translator
> 12520850.CPX: Windows codepage translator
> 12520860.CPX: Windows codepage translator
> 12520861.CPX: Windows codepage translator
> 12520863.CPX: Windows codepage translator
> 12520865.CPX: Windows codepage translator
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html).
> All CPX examples are described correctly by TrID as "Windows Codepage
> translator" by cpx.trid.xml. I got also a description "Generic INI
> configuration" by ini.trid.xml which is also correct but less
> specific. (See appended cpx_trid-v.txt.gz).
> 
> The used document obdcdriv_character_translation.htm about Character
> Translation inside Magdir/windows is now found at other place. So
> these informations are now expressed by comment lines inside
> Magdir/windows like:
> # URL:		https://documentation.basis.com/BASISHelp/WebHelp/
> #		b3odbc/ODBC_Driver/obdcdriv_character_translation.htm
> # Ref.:	http://mark0.net/download/triddefs_xml.7z/defs/c/cpx.trid.xml
> 
> Most examples are described as correctly as "Windows codepage
> translator" with CPX extension, but example 12510866.CPX is only
> described as ASCII text with CRLF line terminators and unknown
> extension. Most examples have starting lines like:
> [Windows Latin 1/437 (English)]
> [Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
> [Windows Latin 1(1252)/860 (Portugal)]
> [Windows Latin 1(1252)/861 (Iceland)]
> [Windows Latin 1(1252)/863 (French Canada)]
> 
> Unfortunately no official or complete documentation exist for that
> file format. So i myself rely on Gary Kessler file signature table
> (See garykessler.net). That information was used to describe such CPX
> examples inside sub routine ini-file by lines like:
> >>&0 regex/c	\^(Windows\ Latin)	Windows codepage translator
> !:mime	text/plain
> !:ext	cpx
> 
> The undetected example 12510866.CPX has a starting line like:
> [Windows Cyrillic(1251)/866 (Russian)]
> So here second word is Cyrillic instead of Latin. So i mention this
> fact comment line. So the magic lines now becomes like:
> >>&0 regex/c \^(Windows\ )(Latin|Cyrillic)
> 					Windows codepage translator
> !:mime	text/x-ms-cpx
> !:ext	cpx
> Instead of generic mime type text/plain i choose the user defined one
> mentioned by TrID.
> 
> After applying the above mentioned modifications by patch
> file-5.41-windows-cpx.diff and then all my CPX examples are still
> correctly identified and misidentification vanish like:
> 
> 12500852.CPX: Windows codepage translator
> 12510866.CPX: Windows codepage translator
> 12520437.CPX: Windows codepage translator
> 12520850.CPX: Windows codepage translator
> 12520860.CPX: Windows codepage translator
> 12520861.CPX: Windows codepage translator
> 12520863.CPX: Windows codepage translator
> 12520865.CPX: Windows codepage translator
> 
> I hope my diff file can be applied in future version of file utility.
> Maybe there exist more non Latin codepage translator files.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> <file-5_41-windows-cpx_diff.DEFANGED-194436><file-5_41-windows-cpx_diff_sig.DEFANGED-194437><cpx_trid-v.txt.gz>-- 
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>



More information about the File mailing list