[File] [PATCH] Magdir/misctools vCalendar calendar *.vcs versus iCalender *.ics

Jörg Jenderek joerg.jen.der.ek at gmx.net
Fri Jan 27 23:26:53 UTC 2023


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Some days ago i planed an event and tried to document this by a
calendar program. I had some problems with my calendar event files.
So i look for other calendar files on my systems.

When running running file command version 5.44 on a dozens of
calendar events and related files i get an output like:

Juish.ics:                        vCalendar calendar file
Meister des Alltags.vcs:          vCalendar calendar file
Sport Today.ics:                  vCalendar calendar file
Sport Today.vcs:                  vCalendar calendar file
calendar.ics:                     vCalendar calendar file
ferien_NRW_2023.ics:              vCalendar calendar file
fmt-387-signature-id-572.vcs:     vCalendar calendar file
fmt-388-signature-id-573-b.ics:   vCalendar calendar file
fmt-388-signature-id-573.ics:     vCalendar calendar file
holidays_NRW_2014.ics:            vCalendar calendar file
import-real-world-2004-11-19.ics: vCalendar calendar file
import-with-timezone.ics:         vCalendar calendar file
wikipedia-busy.ifb:               vCalendar calendar file

With option -i good looking text/calendar mime type is shown.
Furthermore with --extension option ??? is displayed and with --apple
option UNKNUNKN is shown. When running with option -e soft i get a
more surprising output like:

Juish.ics:                        Unicode text, UTF-8 text,
				  with CRLF line terminators
Meister des Alltags.vcs:          ASCII text,
	    			  with CRLF, LF line terminators
Sport Today.ics:                  ASCII text,
      				  with CRLF, LF line terminators
Sport Today.vcs:                  ASCII text,
      				  with CRLF, LF line terminators
calendar.ics:                     ASCII text,
				  with CRLF line terminators
ferien_NRW_2023.ics:              ASCII text
fmt-387-signature-id-572.vcs:     data
fmt-388-signature-id-573-b.ics:   ISO-8859 text,
				  with no line terminators
fmt-388-signature-id-573.ics:     data
holidays_NRW_2014.ics:            ASCII text
import-real-world-2004-11-19.ics: ASCII text
import-with-timezone.ics:         ASCII text
wikipedia-busy.ifb:               ASCII text,
				  with CRLF line terminators

For comparison reason i run the file format identification utility
TrID ( See https://mark0.net/soft-trid-e.html). Surprisingly there i
get a little bit more differentiation. All events are described with
highest priority as "iCalendar - vCalendar" by vcs.trid.xml. And 2
name suffix are listed here (.ICS/VCS See appended
trid-v-calendar.txt.gz).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). Here
the VCS examples are described as "VCalendar format" with mime type
text/x-vCalendar by PUID fmt/387. The sample where suffix starts with
ICS are described as "Internet Calendar and Scheduling format" with
mime type text/calendar	by PUID fmt/388. But only pure ICS suffix is
accepted as correct and ICSALARM suffix is considered as "bad". Also
IFB extension is considered as "bad". A few examples like Juish.ics,
import-with-timezone.ics and "Sport Today.ics" are not recognized
(See appended droid-calendar.csv.gz).

With the help of TrID output i found pages on file formats archive
team web site. That informations are expressed inside
Magdir/misctools by additional comment lines like:
# URL:		http://fileformats.archiveteam.org/wiki/ICalendar
# 		http://fileformats.archiveteam.org/wiki/VCalendar
#		https://en.wikipedia.org/wiki/ICalendar
# Reference:	https://www.rfc-editor.org/rfc/rfc5545
#		http://mark0.net/download/triddefs_xml.7z
#		defs/v/vcs.trid.xml
There you also find links to calendar that can be downloaded.

The current description happens in side Magdir/misctools. There exist
2 lines for such calendar. These look like:
 0	string/c	BEGIN:VCALENDAR	vCalendar calendar file
 !:mime	text/calendar

According to TrID also not up case for second word can occur. So the
first line now becomes like:
 0	string/c			BEGIN:vcalendar

According to newest current file format specification (correctly only
for ICS samples) by RFC 5545 CarriageReturn (CR=0x0D) and LineFeed
(LF=0x0A) should be used as separators. This becomes visible by
phrase "with CRLF line terminators" when running file command with -e
soft option. Unfortunately some samples only use LF character as
separator. This becomes visible by no phrase "line terminators" when
running file command with -e soft option. I found such samples like
import-real-world-2004-11-19.ics inside sources of emacs-28.1. I also
get such samples like ferien_NRW_2023.ics or holidays_NRW_2014.ics
from web site schulferien.org. There you get school holidays for
Germany as calendar events. I also have samples like "Meister des
Alltags.vcs", "Sport Today.ics" and "Sport Today.vcs" with CRLF, LF
line terminators. These sample were generated by calendar plug in of
TV-Browser (see web site www.tvbrowser.org). I could import all
these "relaxed" samples in Thunderbird as calendar events although
these samples do not use the strict rules. Maybe that other calendar
application refuse to import such samples. So i show for most unusual
samples this information at the end by line like:
 >>15	ubeshort		!0x0D0A		\b, without CRLF

The samples like fmt-387-signature-id-572.vcs and
fmt-388-signature-id-573.ics are used by DROID to identify such
calendar samples. Therefore these samples contains only just some
starting bytes and are not real calendar events. These contain
instead of CR or LF after first line separating char with value 0x0
or 0xAB. So i skip DROID samples by additional second test line which
looks like:
 >15	ubyte&0xF8			=0x08

Often as described in documentation on second line comes VERSION:2.0
for iCalender variant (that has often ics suffix) whereas for
vCalender variant (that has vcs suffix) this has something like
VERSION:1.0. Unfortunately this can occur also later like in example
holidays_NRW_2014.ics. So i first look for VERSION keyword by line
like:
 >>0	search/188			VERSION

Unfortunately i found one example import-with-timezone.ics ( inside
sources of emacs-28.1) without version tag. This violates newest
specification but this sample is accepted by Thunderbird. So i
describe such sample by lines like:
>> 0	default	x	vCalendar calendar file, without VERSION
!:mime			text/x-vcalendar
!:ext			ics/vcs
Because this sample violates specification i do not use the official
registered mime type text/calendar here. Instead i use the deprecated
predecessor type. Here i also found only one example with suffix ics,
but i assume that this can also occur for VCS samples.

For control reason i look for text after version keyword by debug
line like:
>>> &0		string		x		AFTER_VERSION=%.15s
Often i get i get here expected :2.0 part. But in some examples there
is a gap between version keyword and version value part. In example
Juish.ics generated from web site www.webcal.guru when looking for
Jewish religious days i get ;VALUE=TEXT:2.0. If i understand the
specification right then optional verparamater in form of
;other-param like ;VALUE=TEXT is allowed to be inserted. I also found
an example like import-real-world-2004-11-19.ics ( inside sources of
emacs-28.1) where this text look like \n\040:2.0. If i understand
specification right then this also allowed as method to fold long
lines. So i look for version 2.0 that implies iCalendar variant by
lines starting like:
 >>>&0		search/81	:2.0	iCalendar calendar
If no VERSION 2.0 is found then assume it is VERSION 1.0, that is
older vCalendar variant. This sub class is called "VCalendar format"
by DROID via fmt/387 and mime type text/x-vcalendar is described by
lines like:
 >>>&0		default		x	vCalendar calendar file
 !:mime					text/x-vcalendar
 !:ext					vcs
The version 2.0 variant is called "Internet Calendar and Scheduling
format" by DROID via fmt/388 with mime type text/calendar. DROID only
assume an optional gap of maximal 2 characters. So sample
import-real-world-2004-11-19.ics is recognized but not Juish.ics.
According to documentation there exist variants with different apple
type codes and distinguishing file name suffix. So i add sub
branches to handle such items. So for version 2.0 variants i first
look for Free/Busy components. Unfortunately i found no such real
examples on my systems. So i use the example mentioned on Wikipedia
page about ICalendar. This sample get an own suffix and apple type.
This is done by lines like:
 >>>>15	search/278 :VFREEBUSY	file, with Free/Busy component
 !:mime				text/calendar
 !:apple			????iFBf
 !:ext				ifb
Afterwards i look for iCalendar samples with ALARM component. I found
such samples on macOS beneath /Users/$USER/Library/Calendars as
EventAllDayAlarms.icsalarm or EventTimedAlarms.icsalarm. So another
suffix is used here. Such samples are handled by lines like:
 >>>>15		default		x
 >>>>>15	search/154 	:VALARM	file, with ALARM component
 !:mime					text/calendar
 !:apple				????iCal
 !:ext					icsalarm/ics

The remaining other iCalendar samples are handled by lines like:
 >>>>>15			default		x	file
 !:mime							text/calendar
 !:apple						????iCal
 !:ext							ics
According to documentation also suffix ical or icalender can occur,
but in my samples i only found suffix ics.

After applying the above mentioned modifications by patch
file-5.44-misctools-calendar.diff then all my calender samples are
still described, but with more details and differentiation between
vCalendar and iCalender with different suffix (vcs versus ics). Also
some "bad" examples like DROID test signature samples are not
misidentified any more. This now looks like:

EventAllDayAlarms.icsalarm:       iCalendar calendar file
				  , with ALARM component
EventTimedAlarms.icsalarm:        iCalendar calendar file
				  , with ALARM component
Juish.ics:                        iCalendar calendar file
Meister des Alltags.vcs:          vCalendar calendar file
Sport Today.ics:                  iCalendar calendar file
Sport Today.vcs:                  vCalendar calendar file
calendar.ics:                     iCalendar calendar file
ferien_NRW_2023.ics:              iCalendar calendar file
				  , without CRLF
fmt-387-signature-id-572.vcs:     data
fmt-388-signature-id-573-b.ics:   ISO-8859 text,
				  with no line terminators
fmt-388-signature-id-573.ics:     data
holidays_NRW_2014.ics:            iCalendar calendar file
				  , without CRLF
import-real-world-2004-11-19.ics: iCalendar calendar file
				  , without CRLF
import-with-timezone.ics:         vCalendar calendar file
				  , without VERSION
				  , without CRLF
wikipedia-busy.ifb:               iCalendar calendar file
				  , with Free/Busy component

Now with --extension option i get expected file name suffix. This
looks like:
EventAllDayAlarms.icsalarm:       icsalarm/ics
EventTimedAlarms.icsalarm:        icsalarm/ics
Juish.ics:                        ics
Meister des Alltags.vcs:          vcs
Sport Today.ics:                  ics
Sport Today.vcs:                  vcs
calendar.ics:                     ics
ferien_NRW_2023.ics:              ics
fmt-387-signature-id-572.vcs:     ???
fmt-388-signature-id-573-b.ics:   ???
fmt-388-signature-id-573.ics:     ???
holidays_NRW_2014.ics:            ics
import-real-world-2004-11-19.ics: ics
import-with-timezone.ics:         ics/vcs
wikipedia-busy.ifb:               ifb

I hope my diff file can be applied in future version of file utility.

With best wishes
Jörg Jenderek
- --
Jörg Jenderek




-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY9RdvAAKCRCv8rHJQhrU
1nRPAKCHXh9eOg9L7pz5HbX1bQVFTlDyHQCgh9AepKgi6Ns1vXMrj5ySGA4nLUo=
=gOhy
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-calendar.txt.gz
Type: application/x-gzip
Size: 793 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230128/08663787/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: droid-calendar.csv.gz
Type: application/x-gzip
Size: 801 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230128/08663787/attachment-0001.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/misctools.old	2021-07-03 16:49:16.000000000 +0200
+++ file-5.44/magic/Magdir/misctools	2023-01-27 23:52:02.237084100 +0100
@@ -6,4 +6,64 @@
 0	search/1	%%!!			X-Post-It-Note text
-0	string/c	BEGIN:VCALENDAR		vCalendar calendar file
-!:mime	text/calendar
+# URL:		http://fileformats.archiveteam.org/wiki/ICalendar
+#		https://en.wikipedia.org/wiki/ICalendar
+# Update:	Joerg Jenderek
+# Reference:	https://www.rfc-editor.org/rfc/rfc5545
+#		http://mark0.net/download/triddefs_xml.7z/defs/v/vcs.trid.xml
+# Note:		called "iCalendar - vCalendar" by TrID
+0	string/c			BEGIN:vcalendar
+# skip DROID fmt-387-signature-id-572.vcs fmt-388-signature-id-573.ics
+# with invalid separator 0x0 or 0xAB instead of CarriageReturn (0x0D) or LineFeed (0x0A)
+>15	ubyte&0xF8			=0x08
+# look for VERSION keyword often on second line but sometimes later as in holidays_NRW_2014.ics
+>>0	search/188			VERSION
+# after VERSION keword :1.0 or often :2.0 but sometimes also ;VALUE=TEXT:2.0 like in Jewish religious Juish.ics
+# http://www.webcal.guru/de-DE/kalender_herunterladen?calendar_instance_id=217
+# \n\040:2.0 like in import-real-world-2004-11-19.ics found at
+# https://ftp.gnu.org/gnu/emacs/emacs-28.1.tar.xz
+# emacs-28.1/test/lisp/calendar/icalendar-resources/import-real-world-2004-11-19.ics
+#>>>&0		string			x		AFTER_VERSION=%.15s
+# Note:		called "Internet Calendar and Scheduling format" by DROID via PUID fmt/388
+# skip optional verparam=;other-param like ;VALUE=TEXT and look for version 2.0 that implies iCalendar variant
+>>>&0		search/81		:2.0		iCalendar calendar
+# look for Free/Busy component
+>>>>15			search/278	:VFREEBUSY	file, with Free/Busy component
+!:mime							text/calendar
+!:apple							????iFBf
+# no real examples found but only example on Wikipedia page
+!:ext							ifb
+# iCalendar calendar without Free/Busy component
+>>>>15			default		x
+# look for ALARM component
+>>>>>15				search/154 	:VALARM	file, with ALARM component
+!:mime							text/calendar
+!:apple							????iCal
+# found on macOS beneath /Users/$USER/Library/Calendars/ as EventAllDayAlarms.icsalarm or EventTimedAlarms.icsalarm
+# no isc examples found
+!:ext							icsalarm/ics
+# iCalendar calendar without Free/Busy component and ALARM component
+>>>>>15				default		x	file
+!:mime							text/calendar
+!:apple							????iCal
+# no examples found with .ical .icalender suffix
+!:ext							ics
+# if no VERSION 2.0 is found then assume it is VERSION 1.0, that is older vCalendar
+# URL:		http://fileformats.archiveteam.org/wiki/VCalendar
+# Note:		called "VCalendar format" by DROID via fmt/387
+>>>&0		default			x		vCalendar calendar file
+# deprecated
+!:mime							text/x-vcalendar
+!:ext							vcs
+# GRR: without VERSION keyword violates specification but accepted by Thunderbird like
+# https://ftp.gnu.org/gnu/emacs/emacs-28.1.tar.xz
+# emacs-28.1/test/lisp/calendar/icalendar-resources/import-with-timezone.ics
+>>0	default				x		vCalendar calendar file, without VERSION
+!:mime							text/x-vcalendar
+#!:mime							text/calendar
+# no vcs example found
+!:ext							ics/vcs
+# GRR: According to newest specification CarriageReturn (0xD) and LineFeed (0xA) should be used as separator but others accepted by Thunderbird
+# like CRLF,LF in Sport Today.vcs created by calendar plugin of TV-Browser https://enwiki.tvbrowser.org/index.php/Calendar_Export
+# or LF like https://www.schulferien.org/media/ical/deutschland/ferien_nordrhein-westfalen_2023.ics?k=foo
+>>15	ubeshort			!0x0D0A		\b, without CRLF
+
 # updated by Joerg Jenderek at Apr 2015, May 2021
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-misctools-calendar.diff.sig
Type: application/octet-stream
Size: 1764 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230128/08663787/attachment.obj>


More information about the File mailing list