[File] [PATCH] of Magdir/misctools vCard visiting card; NON-standards

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri May 21 18:27:41 UTC 2021


Hello,

some days ago i wanted to transfer some contacts from an old PC
system to new Android system. These contacts are stored as vCard with
file name extension VCF.

Because i had some trouble with some contacts i check all my VCF
examples by running file command version 5.40 on such examples and i
get an output like:


$R00H5MZ.vcf:                 ASCII text, with CRLF line terminators
basic_vcard_addressbook.vcf:  vCard visiting card, version 2.1
example-2.1.vcard:            vCard visiting card, version 2.1
example-3.vcf:                vCard visiting card, version 3.0
example-4.vcf:                vCard visiting card, version 4.0
fmt-395-signature-id-634.vcf: vCard visiting card, version END
Joerg_Jenderek_67.vcf:        vCard visiting card
std.vcf:                      vCard visiting card, version 3.0
test.vcf:                     vCard visiting card, version 4.0
unknown-2.1.vcf:              ASCII text, with CRLF line terminators

Furthermore with --extension only ??? is displayed and with --apple
option only UNKNUNKN is displayed.

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). This describes most
examples as "vCard - Business Card" by definition vcf.trid.xml (See
appended vcf-trid-v.txt.gz). It also displays used file name
extensions "VCF" and "VCARD".

That information and the apple type (--apple option) are now shown by
additional lines after the mime type like:
  !:apple	????vCrd
  !:ext		vcf/vcard

But a few examples like $R00H5MZ.vcf and unknown-2.1.vcf are only
described as ASCII text.
On Wikipedia page about vCard is written that all vCards begin with
BEGIN:VCARD. That is wrong. Most examples start in this way, but the
unrecognized examples do not start in this way.

In definition RFC 2425 for older vCard Version 2.1 is written that
type names and parameter names are case insensitive (e.g., the type
name "fn" is the same as "FN" and "Fn"). If i understand this right
then Vcard could even start with a phrase like BeGiN:vCARd, but in
real world beside common used up cased variant i only found sometimes
all low cased variant.

To match this low case variant i changed inside Magdir/misctools
wrong line
  0	string/c	BEGIN:VCARD	vCard visiting card
to correct line
  0	string/c	begin:vcard

This downcase variant violates the newer RFC 6350, but some older
and/or "bad" software produce such vcards. To inform the user i
display that information about non standard examples later by
additional line like:
  >>0	string		!BEGIN			\b, not up case

For some examples like Joerg_Jenderek_67.vcf no version information
is shown. So i raise the range to search for version string.
The current concerning lines are
  >12	search/14000/c	VERSION:
  >>&0	string		x			\b, version %-.3s

These now becomes to:
  >> 12	search/0x113b4/c	version:
  >>> &0	string		x			\b, version %-.3s

By the above last magic line the version number is shown. Typically
3 values (2.1 3.0 4.0) should occur here, but for example
fmt-395-signature-id-634.vcf here the 3 byte string END is shown.
This is not a real vcard, but it is just a test pattern for file
identifying utility DROID (See
https://sourceforge.net/projects/droid/). To skip that non-real vcard
i add a second test line like:
    >13	string		!VERSION:END		vCard visiting card

For earlier versions like 2.1 of vCard allowed the VERSION property
to be placed anywhere in the vCard object.
According to RFC 6350 the version property MUST must appear
immediately after BEGIN:VCARD and the value MUST be "4.0".
That is the main difference when comparing the variants.

For version 3 the situation is a little bit unclear. In RFC 6350 is
written that earlier versions of vCard allowed Version property to be
placed anywhere in the vCard object, but on German Wikipedia page
about vCard is written that VERSION must directly follow the BEGIN
property, except for vCard 2.1. That seems to be common used. So in
487 of my inspected v3 examples there VERSION property occur on
second line.

To mark such non standard examples like std.vcf and test.vcf i check
the case of version string depending on version number and display a
warning message by additional lines like:
  >>>&0	string	x	\b, version %-.3s
  >>>&0	string	!2.1
  >>>>13	string	!VERSION: \b, 2nd line does not start with VERSION:

One of my 334 version 2.1 examples is not described by TrID definition
vcf-v2.trid.xml. That was basic_vcard_addressbook.vcf found inside
the sources of Thunderbird ( at least for version 60.5.3 and
78.10.1). When i inspect this example i see that only line
feed character 0x0A is used for terminating the lines, but according
to RFC 6350 individual lines within vCard are delimited by the line
break, which is a CRLF sequence (U+000D followed by U+000A).

So this non standard examples are also tagged by additional magic line
like:
  >>11	beshort	 !0x0D0A	\b, lines not separated by CRLF

After applying the above mentioned modifications by patch
file-5.40-misctools-vcf.diff then all vcards are now described and
even with more details (especially non standard formats) like:


$R00H5MZ.vcf:                 vCard visiting card, version 2.1
			      , not up case
basic_vcard_addressbook.vcf:  vCard visiting card, version 2.1
			      , lines not separated by CRLF
example-2.1.vcard:            vCard visiting card, version 2.1
example-3.vcf:                vCard visiting card, version 3.0
example-4.vcf:                vCard visiting card, version 4.0
fmt-395-signature-id-634.vcf: ISO-8859 text, with no line terminators
Joerg_Jenderek_67.vcf:        vCard visiting card, version 2.1
std.vcf:                      vCard visiting card, version 3.0
			      , 2nd line does not start with VERSION:
test.vcf:                     vCard visiting card, version 4.0
			      , 2nd line does not start with VERSION:
			      , lines not separated by CRLF
unknown-2.1.vcf:              vCard visiting card, version 2.1
			      , not up case

I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcf-trid-v.txt.gz
Type: application/x-gzip
Size: 611 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210521/ea65150d/attachment.bin>
-------------- next part --------------
--- file-5.40/magic/Magdir/misctools.old	2021-02-22 23:49:24 +0000
+++ file-5.40/magic/Magdir/misctools	2021-05-14 20:07:59 +0000
@@ -7,15 +7,30 @@
 0	string/c	BEGIN:VCALENDAR		vCalendar calendar file
 !:mime	text/calendar
-# updated by Joerg Jenderek at Apr 2015
-# Extension: .vcf
+# updated by Joerg Jenderek at Apr 2015, May 2021
 # https://en.wikipedia.org/wiki/VCard
-0	string/c	BEGIN:VCARD		vCard visiting card
+# URL: 	http://fileformats.archiveteam.org/wiki/VCard
+# https://datatracker.ietf.org/doc/html/rfc6350
+# the value is case-insensitive
+0	string/c	begin:vcard
+# skip DROID fmt-395-signature-id-634.vcf
+>13	string		!VERSION:END		vCard visiting card
 # deprecated
 #!:mime	text/x-vcard
 !:mime	text/vcard
+!:apple	????vCrd
+!:ext	vcf/vcard
 # VERSION must come right after BEGIN for 3.0 or 4.0 except in 2.1 , where it can be anywhere
->12	search/14000/c	VERSION:
+# Joerg_Jenderek_67.vcf
+>>12	search/0x113b4/c	version:
 # VERSION 2.1 , 3.0 or 4.0
->>&0	string		x			\b, version %-.3s
+>>>&0	string		x			\b, version %-.3s
+>>>&0	string		!2.1
+>>>>13	string		!VERSION:		\b, 2nd line does not start with VERSION:
+# downcase violates RFC 6350, but some "bad" software produce such vcards
+>>0	string		!BEGIN			\b, not up case
+# http://ftp.mozilla.org/pub/thunderbird/candidates/
+# 78.10.1-candidates/build1/source/thunderbird-78.10.1.source.tar.xz
+# thunderbird-78.10.1/comm/mailnews/import/test/unit/resources/basic_vcard_addressbook.vcf
+>>11	beshort		!0x0D0A			\b, lines not separated by CRLF
 
 # Summary: Libtool library file
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.40-misctools-vcf.diff.sig
Type: application/octet-stream
Size: 95 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20210521/ea65150d/attachment.obj>


More information about the File mailing list