[File] [PATCH] Magdir/misctools vCalendar calendar *.vcs versus iCalender *.ics
Christos Zoulas
christos at zoulas.com
Fri Feb 3 20:43:59 UTC 2023
Committed, thanks!
christos
> On Jan 27, 2023, at 6:26 PM, Jörg Jenderek <joerg.jen.der.ek at gmx.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> Some days ago i planed an event and tried to document this by a
> calendar program. I had some problems with my calendar event files.
> So i look for other calendar files on my systems.
>
> When running running file command version 5.44 on a dozens of
> calendar events and related files i get an output like:
>
> Juish.ics: vCalendar calendar file
> Meister des Alltags.vcs: vCalendar calendar file
> Sport Today.ics: vCalendar calendar file
> Sport Today.vcs: vCalendar calendar file
> calendar.ics: vCalendar calendar file
> ferien_NRW_2023.ics: vCalendar calendar file
> fmt-387-signature-id-572.vcs: vCalendar calendar file
> fmt-388-signature-id-573-b.ics: vCalendar calendar file
> fmt-388-signature-id-573.ics: vCalendar calendar file
> holidays_NRW_2014.ics: vCalendar calendar file
> import-real-world-2004-11-19.ics: vCalendar calendar file
> import-with-timezone.ics: vCalendar calendar file
> wikipedia-busy.ifb: vCalendar calendar file
>
> With option -i good looking text/calendar mime type is shown.
> Furthermore with --extension option ??? is displayed and with --apple
> option UNKNUNKN is shown. When running with option -e soft i get a
> more surprising output like:
>
> Juish.ics: Unicode text, UTF-8 text,
> with CRLF line terminators
> Meister des Alltags.vcs: ASCII text,
> with CRLF, LF line terminators
> Sport Today.ics: ASCII text,
> with CRLF, LF line terminators
> Sport Today.vcs: ASCII text,
> with CRLF, LF line terminators
> calendar.ics: ASCII text,
> with CRLF line terminators
> ferien_NRW_2023.ics: ASCII text
> fmt-387-signature-id-572.vcs: data
> fmt-388-signature-id-573-b.ics: ISO-8859 text,
> with no line terminators
> fmt-388-signature-id-573.ics: data
> holidays_NRW_2014.ics: ASCII text
> import-real-world-2004-11-19.ics: ASCII text
> import-with-timezone.ics: ASCII text
> wikipedia-busy.ifb: ASCII text,
> with CRLF line terminators
>
> For comparison reason i run the file format identification utility
> TrID ( See https://mark0.net/soft-trid-e.html). Surprisingly there i
> get a little bit more differentiation. All events are described with
> highest priority as "iCalendar - vCalendar" by vcs.trid.xml. And 2
> name suffix are listed here (.ICS/VCS See appended
> trid-v-calendar.txt.gz).
>
> For comparison reason i also run the file format identification
> utility DROID ( See https://sourceforge.net/projects/droid/). Here
> the VCS examples are described as "VCalendar format" with mime type
> text/x-vCalendar by PUID fmt/387. The sample where suffix starts with
> ICS are described as "Internet Calendar and Scheduling format" with
> mime type text/calendar by PUID fmt/388. But only pure ICS suffix is
> accepted as correct and ICSALARM suffix is considered as "bad". Also
> IFB extension is considered as "bad". A few examples like Juish.ics,
> import-with-timezone.ics and "Sport Today.ics" are not recognized
> (See appended droid-calendar.csv.gz).
>
> With the help of TrID output i found pages on file formats archive
> team web site. That informations are expressed inside
> Magdir/misctools by additional comment lines like:
> # URL: http://fileformats.archiveteam.org/wiki/ICalendar
> # http://fileformats.archiveteam.org/wiki/VCalendar
> # https://en.wikipedia.org/wiki/ICalendar
> # Reference: https://www.rfc-editor.org/rfc/rfc5545
> # http://mark0.net/download/triddefs_xml.7z
> # defs/v/vcs.trid.xml
> There you also find links to calendar that can be downloaded.
>
> The current description happens in side Magdir/misctools. There exist
> 2 lines for such calendar. These look like:
> 0 string/c BEGIN:VCALENDAR vCalendar calendar file
> !:mime text/calendar
>
> According to TrID also not up case for second word can occur. So the
> first line now becomes like:
> 0 string/c BEGIN:vcalendar
>
> According to newest current file format specification (correctly only
> for ICS samples) by RFC 5545 CarriageReturn (CR=0x0D) and LineFeed
> (LF=0x0A) should be used as separators. This becomes visible by
> phrase "with CRLF line terminators" when running file command with -e
> soft option. Unfortunately some samples only use LF character as
> separator. This becomes visible by no phrase "line terminators" when
> running file command with -e soft option. I found such samples like
> import-real-world-2004-11-19.ics inside sources of emacs-28.1. I also
> get such samples like ferien_NRW_2023.ics or holidays_NRW_2014.ics
> from web site schulferien.org. There you get school holidays for
> Germany as calendar events. I also have samples like "Meister des
> Alltags.vcs", "Sport Today.ics" and "Sport Today.vcs" with CRLF, LF
> line terminators. These sample were generated by calendar plug in of
> TV-Browser (see web site www.tvbrowser.org). I could import all
> these "relaxed" samples in Thunderbird as calendar events although
> these samples do not use the strict rules. Maybe that other calendar
> application refuse to import such samples. So i show for most unusual
> samples this information at the end by line like:
>>> 15 ubeshort !0x0D0A \b, without CRLF
>
> The samples like fmt-387-signature-id-572.vcs and
> fmt-388-signature-id-573.ics are used by DROID to identify such
> calendar samples. Therefore these samples contains only just some
> starting bytes and are not real calendar events. These contain
> instead of CR or LF after first line separating char with value 0x0
> or 0xAB. So i skip DROID samples by additional second test line which
> looks like:
>> 15 ubyte&0xF8 =0x08
>
> Often as described in documentation on second line comes VERSION:2.0
> for iCalender variant (that has often ics suffix) whereas for
> vCalender variant (that has vcs suffix) this has something like
> VERSION:1.0. Unfortunately this can occur also later like in example
> holidays_NRW_2014.ics. So i first look for VERSION keyword by line
> like:
>>> 0 search/188 VERSION
>
> Unfortunately i found one example import-with-timezone.ics ( inside
> sources of emacs-28.1) without version tag. This violates newest
> specification but this sample is accepted by Thunderbird. So i
> describe such sample by lines like:
>>> 0 default x vCalendar calendar file, without VERSION
> !:mime text/x-vcalendar
> !:ext ics/vcs
> Because this sample violates specification i do not use the official
> registered mime type text/calendar here. Instead i use the deprecated
> predecessor type. Here i also found only one example with suffix ics,
> but i assume that this can also occur for VCS samples.
>
> For control reason i look for text after version keyword by debug
> line like:
>>>> &0 string x AFTER_VERSION=%.15s
> Often i get i get here expected :2.0 part. But in some examples there
> is a gap between version keyword and version value part. In example
> Juish.ics generated from web site www.webcal.guru when looking for
> Jewish religious days i get ;VALUE=TEXT:2.0. If i understand the
> specification right then optional verparamater in form of
> ;other-param like ;VALUE=TEXT is allowed to be inserted. I also found
> an example like import-real-world-2004-11-19.ics ( inside sources of
> emacs-28.1) where this text look like \n\040:2.0. If i understand
> specification right then this also allowed as method to fold long
> lines. So i look for version 2.0 that implies iCalendar variant by
> lines starting like:
>>>> &0 search/81 :2.0 iCalendar calendar
> If no VERSION 2.0 is found then assume it is VERSION 1.0, that is
> older vCalendar variant. This sub class is called "VCalendar format"
> by DROID via fmt/387 and mime type text/x-vcalendar is described by
> lines like:
>>>> &0 default x vCalendar calendar file
> !:mime text/x-vcalendar
> !:ext vcs
> The version 2.0 variant is called "Internet Calendar and Scheduling
> format" by DROID via fmt/388 with mime type text/calendar. DROID only
> assume an optional gap of maximal 2 characters. So sample
> import-real-world-2004-11-19.ics is recognized but not Juish.ics.
> According to documentation there exist variants with different apple
> type codes and distinguishing file name suffix. So i add sub
> branches to handle such items. So for version 2.0 variants i first
> look for Free/Busy components. Unfortunately i found no such real
> examples on my systems. So i use the example mentioned on Wikipedia
> page about ICalendar. This sample get an own suffix and apple type.
> This is done by lines like:
>>>>> 15 search/278 :VFREEBUSY file, with Free/Busy component
> !:mime text/calendar
> !:apple ????iFBf
> !:ext ifb
> Afterwards i look for iCalendar samples with ALARM component. I found
> such samples on macOS beneath /Users/$USER/Library/Calendars as
> EventAllDayAlarms.icsalarm or EventTimedAlarms.icsalarm. So another
> suffix is used here. Such samples are handled by lines like:
>>>>> 15 default x
>>>>>> 15 search/154 :VALARM file, with ALARM component
> !:mime text/calendar
> !:apple ????iCal
> !:ext icsalarm/ics
>
> The remaining other iCalendar samples are handled by lines like:
>>>>>> 15 default x file
> !:mime text/calendar
> !:apple ????iCal
> !:ext ics
> According to documentation also suffix ical or icalender can occur,
> but in my samples i only found suffix ics.
>
> After applying the above mentioned modifications by patch
> file-5.44-misctools-calendar.diff then all my calender samples are
> still described, but with more details and differentiation between
> vCalendar and iCalender with different suffix (vcs versus ics). Also
> some "bad" examples like DROID test signature samples are not
> misidentified any more. This now looks like:
>
> EventAllDayAlarms.icsalarm: iCalendar calendar file
> , with ALARM component
> EventTimedAlarms.icsalarm: iCalendar calendar file
> , with ALARM component
> Juish.ics: iCalendar calendar file
> Meister des Alltags.vcs: vCalendar calendar file
> Sport Today.ics: iCalendar calendar file
> Sport Today.vcs: vCalendar calendar file
> calendar.ics: iCalendar calendar file
> ferien_NRW_2023.ics: iCalendar calendar file
> , without CRLF
> fmt-387-signature-id-572.vcs: data
> fmt-388-signature-id-573-b.ics: ISO-8859 text,
> with no line terminators
> fmt-388-signature-id-573.ics: data
> holidays_NRW_2014.ics: iCalendar calendar file
> , without CRLF
> import-real-world-2004-11-19.ics: iCalendar calendar file
> , without CRLF
> import-with-timezone.ics: vCalendar calendar file
> , without VERSION
> , without CRLF
> wikipedia-busy.ifb: iCalendar calendar file
> , with Free/Busy component
>
> Now with --extension option i get expected file name suffix. This
> looks like:
> EventAllDayAlarms.icsalarm: icsalarm/ics
> EventTimedAlarms.icsalarm: icsalarm/ics
> Juish.ics: ics
> Meister des Alltags.vcs: vcs
> Sport Today.ics: ics
> Sport Today.vcs: vcs
> calendar.ics: ics
> ferien_NRW_2023.ics: ics
> fmt-387-signature-id-572.vcs: ???
> fmt-388-signature-id-573-b.ics: ???
> fmt-388-signature-id-573.ics: ???
> holidays_NRW_2014.ics: ics
> import-real-world-2004-11-19.ics: ics
> import-with-timezone.ics: ics/vcs
> wikipedia-busy.ifb: ifb
>
> I hope my diff file can be applied in future version of file utility.
>
> With best wishes
> Jörg Jenderek
> - --
> Jörg Jenderek
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iF0EARECAB0WIQS5/qNWKD4ASGOJGL+v8rHJQhrU1gUCY9RdvAAKCRCv8rHJQhrU
> 1nRPAKCHXh9eOg9L7pz5HbX1bQVFTlDyHQCgh9AepKgi6Ns1vXMrj5ySGA4nLUo=
> =gOhy
> -----END PGP SIGNATURE-----
> <trid-v-calendar.txt.gz><droid-calendar.csv.gz><file-5_44-misctools-calendar_diff.DEFANGED-1111><file-5_44-misctools-calendar_diff_sig.DEFANGED-1112>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230203/d72d285d/attachment.asc>
More information about the File
mailing list