[File] [PATCH] Magdir/misctools vCalendar calendar *.vcs versus iCalender *.ics

Christos Zoulas christos at zoulas.com
Fri Feb 3 20:43:59 UTC 2023


Committed, thanks!

christos

> On Jan 27, 2023, at 6:26 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,
> 
> Some days ago i planed an event and tried to document this by a
> calendar program. I had some problems with my calendar event files.
> So i look for other calendar files on my systems.
> 
> When running running file command version 5.44 on a dozens of
> calendar events and related files i get an output like:
> 
> Juish.ics:                        vCalendar calendar file
> Meister des Alltags.vcs:          vCalendar calendar file
> Sport Today.ics:                  vCalendar calendar file
> Sport Today.vcs:                  vCalendar calendar file
> calendar.ics:                     vCalendar calendar file
> ferien_NRW_2023.ics:              vCalendar calendar file
> fmt-387-signature-id-572.vcs:     vCalendar calendar file
> fmt-388-signature-id-573-b.ics:   vCalendar calendar file
> fmt-388-signature-id-573.ics:     vCalendar calendar file
> holidays_NRW_2014.ics:            vCalendar calendar file
> import-real-world-2004-11-19.ics: vCalendar calendar file
> import-with-timezone.ics:         vCalendar calendar file
> wikipedia-busy.ifb:               vCalendar calendar file
> 
> With option -i good looking text/calendar mime type is shown.
> Furthermore with --extension option ??? is displayed and with --apple
> option UNKNUNKN is shown. When running with option -e soft i get a
> more surprising output like:
> 
> Juish.ics:                        Unicode text, UTF-8 text,
> 				  with CRLF line terminators
> Meister des Alltags.vcs:          ASCII text,
> 	    			  with CRLF, LF line terminators
> Sport Today.ics:                  ASCII text,
>      				  with CRLF, LF line terminators
> Sport Today.vcs:                  ASCII text,
>      				  with CRLF, LF line terminators
> calendar.ics:                     ASCII text,
> 				  with CRLF line terminators
> ferien_NRW_2023.ics:              ASCII text
> fmt-387-signature-id-572.vcs:     data
> fmt-388-signature-id-573-b.ics:   ISO-8859 text,
> 				  with no line terminators
> fmt-388-signature-id-573.ics:     data
> holidays_NRW_2014.ics:            ASCII text
> import-real-world-2004-11-19.ics: ASCII text
> import-with-timezone.ics:         ASCII text
> wikipedia-busy.ifb:               ASCII text,
> 				  with CRLF line terminators
> 
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Surprisingly there i
> get a little bit more differentiation. All events are described with
> highest priority as "iCalendar - vCalendar" by vcs.trid.xml. And 2
> name suffix are listed here (.ICS/VCS See appended
> trid-v-calendar.txt.gz).
> 
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). Here
> the VCS examples are described as "VCalendar format" with mime type
> text/x-vCalendar by PUID fmt/387. The sample where suffix starts with
> ICS are described as "Internet Calendar and Scheduling format" with
> mime type text/calendar	by PUID fmt/388. But only pure ICS suffix is
> accepted as correct and ICSALARM suffix is considered as "bad". Also
> IFB extension is considered as "bad". A few examples like Juish.ics,
> import-with-timezone.ics and "Sport Today.ics" are not recognized
> (See appended droid-calendar.csv.gz).
> 
> With the help of TrID output i found pages on file formats archive
> team web site. That informations are expressed inside
> Magdir/misctools by additional comment lines like:
> # URL:		http://fileformats.archiveteam.org/wiki/ICalendar
> # 		http://fileformats.archiveteam.org/wiki/VCalendar
> #		https://en.wikipedia.org/wiki/ICalendar
> # Reference:	https://www.rfc-editor.org/rfc/rfc5545
> #		http://mark0.net/download/triddefs_xml.7z
> #		defs/v/vcs.trid.xml
> There you also find links to calendar that can be downloaded.
> 
> The current description happens in side Magdir/misctools. There exist
> 2 lines for such calendar. These look like:
> 0	string/c	BEGIN:VCALENDAR	vCalendar calendar file
> !:mime	text/calendar
> 
> According to TrID also not up case for second word can occur. So the
> first line now becomes like:
> 0	string/c			BEGIN:vcalendar
> 
> According to newest current file format specification (correctly only
> for ICS samples) by RFC 5545 CarriageReturn (CR=0x0D) and LineFeed
> (LF=0x0A) should be used as separators. This becomes visible by
> phrase "with CRLF line terminators" when running file command with -e
> soft option. Unfortunately some samples only use LF character as
> separator. This becomes visible by no phrase "line terminators" when
> running file command with -e soft option. I found such samples like
> import-real-world-2004-11-19.ics inside sources of emacs-28.1. I also
> get such samples like ferien_NRW_2023.ics or holidays_NRW_2014.ics
> from web site schulferien.org. There you get school holidays for
> Germany as calendar events. I also have samples like "Meister des
> Alltags.vcs", "Sport Today.ics" and "Sport Today.vcs" with CRLF, LF
> line terminators. These sample were generated by calendar plug in of
> TV-Browser (see web site www.tvbrowser.org). I could import all
> these "relaxed" samples in Thunderbird as calendar events although
> these samples do not use the strict rules. Maybe that other calendar
> application refuse to import such samples. So i show for most unusual
> samples this information at the end by line like:
>>> 15	ubeshort		!0x0D0A		\b, without CRLF
> 
> The samples like fmt-387-signature-id-572.vcs and
> fmt-388-signature-id-573.ics are used by DROID to identify such
> calendar samples. Therefore these samples contains only just some
> starting bytes and are not real calendar events. These contain
> instead of CR or LF after first line separating char with value 0x0
> or 0xAB. So i skip DROID samples by additional second test line which
> looks like:
>> 15	ubyte&0xF8			=0x08
> 
> Often as described in documentation on second line comes VERSION:2.0
> for iCalender variant (that has often ics suffix) whereas for
> vCalender variant (that has vcs suffix) this has something like
> VERSION:1.0. Unfortunately this can occur also later like in example
> holidays_NRW_2014.ics. So i first look for VERSION keyword by line
> like:
>>> 0	search/188			VERSION
> 
> Unfortunately i found one example import-with-timezone.ics ( inside
> sources of emacs-28.1) without version tag. This violates newest
> specification but this sample is accepted by Thunderbird. So i
> describe such sample by lines like:
>>> 0	default	x	vCalendar calendar file, without VERSION
> !:mime			text/x-vcalendar
> !:ext			ics/vcs
> Because this sample violates specification i do not use the official
> registered mime type text/calendar here. Instead i use the deprecated
> predecessor type. Here i also found only one example with suffix ics,
> but i assume that this can also occur for VCS samples.
> 
> For control reason i look for text after version keyword by debug
> line like:
>>>> &0		string		x		AFTER_VERSION=%.15s
> Often i get i get here expected :2.0 part. But in some examples there
> is a gap between version keyword and version value part. In example
> Juish.ics generated from web site www.webcal.guru when looking for
> Jewish religious days i get ;VALUE=TEXT:2.0. If i understand the
> specification right then optional verparamater in form of
> ;other-param like ;VALUE=TEXT is allowed to be inserted. I also found
> an example like import-real-world-2004-11-19.ics ( inside sources of
> emacs-28.1) where this text look like \n\040:2.0. If i understand
> specification right then this also allowed as method to fold long
> lines. So i look for version 2.0 that implies iCalendar variant by
> lines starting like:
>>>> &0		search/81	:2.0	iCalendar calendar
> If no VERSION 2.0 is found then assume it is VERSION 1.0, that is
> older vCalendar variant. This sub class is called "VCalendar format"
> by DROID via fmt/387 and mime type text/x-vcalendar is described by
> lines like:
>>>> &0		default		x	vCalendar calendar file
> !:mime					text/x-vcalendar
> !:ext					vcs
> The version 2.0 variant is called "Internet Calendar and Scheduling
> format" by DROID via fmt/388 with mime type text/calendar. DROID only
> assume an optional gap of maximal 2 characters. So sample
> import-real-world-2004-11-19.ics is recognized but not Juish.ics.
> According to documentation there exist variants with different apple
> type codes and distinguishing file name suffix. So i add sub
> branches to handle such items. So for version 2.0 variants i first
> look for Free/Busy components. Unfortunately i found no such real
> examples on my systems. So i use the example mentioned on Wikipedia
> page about ICalendar. This sample get an own suffix and apple type.
> This is done by lines like:
>>>>> 15	search/278 :VFREEBUSY	file, with Free/Busy component
> !:mime				text/calendar
> !:apple			????iFBf
> !:ext				ifb
> Afterwards i look for iCalendar samples with ALARM component. I found
> such samples on macOS beneath /Users/$USER/Library/Calendars as
> EventAllDayAlarms.icsalarm or EventTimedAlarms.icsalarm. So another
> suffix is used here. Such samples are handled by lines like:
>>>>> 15		default		x
>>>>>> 15	search/154 	:VALARM	file, with ALARM component
> !:mime					text/calendar
> !:apple				????iCal
> !:ext					icsalarm/ics
> 
> The remaining other iCalendar samples are handled by lines like:
>>>>>> 15			default		x	file
> !:mime							text/calendar
> !:apple						????iCal
> !:ext							ics
> According to documentation also suffix ical or icalender can occur,
> but in my samples i only found suffix ics.
> 
> After applying the above mentioned modifications by patch
> file-5.44-misctools-calendar.diff then all my calender samples are
> still described, but with more details and differentiation between
> vCalendar and iCalender with different suffix (vcs versus ics). Also
> some "bad" examples like DROID test signature samples are not
> misidentified any more. This now looks like:
> 
> EventAllDayAlarms.icsalarm:       iCalendar calendar file
> 				  , with ALARM component
> EventTimedAlarms.icsalarm:        iCalendar calendar file
> 				  , with ALARM component
> Juish.ics:                        iCalendar calendar file
> Meister des Alltags.vcs:          vCalendar calendar file
> Sport Today.ics:                  iCalendar calendar file
> Sport Today.vcs:                  vCalendar calendar file
> calendar.ics:                     iCalendar calendar file
> ferien_NRW_2023.ics:              iCalendar calendar file
> 				  , without CRLF
> fmt-387-signature-id-572.vcs:     data
> fmt-388-signature-id-573-b.ics:   ISO-8859 text,
> 				  with no line terminators
> fmt-388-signature-id-573.ics:     data
> holidays_NRW_2014.ics:            iCalendar calendar file
> 				  , without CRLF
> import-real-world-2004-11-19.ics: iCalendar calendar file
> 				  , without CRLF
> import-with-timezone.ics:         vCalendar calendar file
> 				  , without VERSION
> 				  , without CRLF
> wikipedia-busy.ifb:               iCalendar calendar file
> 				  , with Free/Busy component
> 
> Now with --extension option i get expected file name suffix. This
> looks like:
> EventAllDayAlarms.icsalarm:       icsalarm/ics
> EventTimedAlarms.icsalarm:        icsalarm/ics
> Juish.ics:                        ics
> Meister des Alltags.vcs:          vcs
> Sport Today.ics:                  ics
> Sport Today.vcs:                  vcs
> calendar.ics:                     ics
> ferien_NRW_2023.ics:              ics
> fmt-387-signature-id-572.vcs:     ???
> fmt-388-signature-id-573-b.ics:   ???
> fmt-388-signature-id-573.ics:     ???
> holidays_NRW_2014.ics:            ics
> import-real-world-2004-11-19.ics: ics
> import-with-timezone.ics:         ics/vcs
> wikipedia-busy.ifb:               ifb
> 
> I hope my diff file can be applied in future version of file utility.
> 
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> 
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY9RdvAAKCRCv8rHJQhrU
> 1nRPAKCHXh9eOg9L7pz5HbX1bQVFTlDyHQCgh9AepKgi6Ns1vXMrj5ySGA4nLUo=
> =gOhy
> -----END PGP SIGNATURE-----
> <trid-v-calendar.txt.gz><droid-calendar.csv.gz><file-5_44-misctools-calendar_diff.DEFANGED-1111><file-5_44-misctools-calendar_diff_sig.DEFANGED-1112>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230203/d72d285d/attachment.asc>


More information about the File mailing list