[File] [PATCH] Magdir/linux Journal file *.journal~
Christos Zoulas
christos at zoulas.com
Mon Jul 17 20:19:43 UTC 2023
Committed, thanks!
christos
> On Jul 10, 2023, at 8:18 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
>
> Hello,
>
> some weeks ago i installed Linux Mint 21.1. The main partition size was
> 89 GiB but i run out of space and get 100% usage. I tried to delete some
> unnecessary files but the free space is immediately filled. That was
> annoying. Nowadays for every little item you get a notification but not
> for the really important things. The problem for me is what can be
> deleted. I know i can remove some old log files, backup files, downloads
> and cache files, but i can not do this in a hinted way by bleachbit or
> Czkawka because the graphical Desktop environment does not start any
> more. I tried to use command line tools like du, df, ncdu but these do
> not work reliable on btrfs file system. Furthermore it is difficult to
> find many small files. For this purpose i tried many different disk
> space visualisation tools running from rescue or other operating system.
> For me tools like baobab, k4dirstat, Filelight are not useful because i
> get a coloured map of my disk, but the colours are not correlated to a
> file type, but that is what i needed. At least gdmap has this feature i
> needed, but only a few file types by extensions are predefined. So i
> just spend one day to add more colours for "big" or "many" file types
> which are shown with grey colour. The other solution was tool
> SequoiaView, but this requires wine environment. Nearly 2 GiB were
> occupied by files beneath /var/log/journal. When i look i subdirectory
> with "machine id" i get many similar files.
>
> When running file command version 5.44 on such journal examples i get an
> output like:
>
> system.journal: Journal file
> , online
> system at 0005fbf676d65363-7341c5cfb7780156.journal~: Journal file
> , offline
> system at 0005febec199e7eb-f21f00dabead02cd.journal~: Journal file
> empty, offline
> system at 0005febee06c5ddc-0354971f29c02bec.journal~: Journal file
> empty, offline
> system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~: Journal file
> empty, online
> user-1000.journal: Journal file
> , online
> user-1001.journal: Journal file
> , offline
>
> With option -i only generic application/octet-stream is shown.
> Furthermore with --extension option ??? is displayed.
>
> For comparison reason i run other utilities. DROID (Digital Record and
> Object Identification) is a software tool developed by The National
> Archives of UK to perform automated batch identification of file
> formats. See
> https://digital-preservation.github.io/droid/
> This does not recognize the samples.
>
> The file identifier tool TrID (see http://mark0.net/soft-trid-e.html)
> does recognize the files. All are described as "systemd journal" by
> journal-sysd.trid.xml. Here the same generic mime type is shown.
> Only suffix journal is here shown as acceptable, whereas the journal~
> suffix is not shown. The tool with -v option shows are related URL. That
> is the same mentioned inside Magdir/linux (See appended
> trid-v-journal.txt.gz).
>
> But this URL is described as obsoleted and replaced. So that
> informations are now expressed inside Magdir/linux by comment lines like:
> # URL: https://systemd.io/JOURNAL_FILE_FORMAT/
> # Reference: http://mark0.net/download/triddefs_xml.7z
> # defs/j/journal-sysd.trid.xml
>
> The detection happens inside Magdir/linux by first checking for magic
> signature[8]. Then as second test the state is checked for one known
> values (STATE_OFFLINE~0 STATE_ONLINE~1 STATE_ARCHIVED~2). The next test
> checks for non zero value of 3 id128s (file_id, machine_id, boot_id). So
> this look like:
> 0 string LPKSHHRH
> >16 ubyte&252 0
> >>24 ubequad >0
> >>>32 ubequad >0
> >>>>40 ubequad >0
> >>>>>48 ubequad >0
> >>>>>>56 ubequad >0
> >>>>>>>64 ubequad >0 Journal file
> Afterwards instead of generic mime type application/octet-stream i show
> a user defined one. This is done by additional line like:
> !:mime application/x-linux-journal
>
> Afterwards the head_entry_realtime is handled. According to
> documentations this contains a POSIX timestamp stored in microseconds.
> Obviously if the journal is not filled (It is empty) the time stamp
> field is nil. So this information is shown by line like:
> >>>>>>>>184 leqdate 0 empty
> So i now also show non zero time stamps values by additional line like:
> >>>>>>>>184 leqdate/1000000 !0 \b, %s
>
> In order to distinguish journal and journal~ i also look at not used
> fields between starting with 7 reserved bytes (apparently nil),
> seqnum_id and ending with entry_array_offset. Most of these fields are
> not useful. So for the "not useful" fields i add magic lines as comment
> lines like:
> #>>>>>>>>72 ubequad x \b, seqnum_id %#16.16llx
> #>>>>>>>>80 ubequad x b%16.16llx
>
> But a few fields are useful. The header_size in all samples samples was
> 100h. So mention unusual cases by additional lines at the end just in
> case somebody will inspect fields after header. This is done by
> additional line like:
> >>>>>>>>88 ulequad !0x100h \b, header size %#llx
> The number of entries is stored inside field n_entries. This information
> is shown by line like:
> >>>>>>>>152 ulequad >0 \b, entries %#llx
> For empty journals the value is obviously zero. So that is no bargain
> but for non zero cases now i get a quantitative value. This can be
> verified by command line like:
> journalctl --file=user-1000.journal | wc -l
>
> For incompatible_flags only the first bit is considered. This was done
> by line like:
> >>>>>>>>12 ulelong&1 1 \b, compressed
> According to documentation that means compressed by XZ method. But
> according to documentation also other compression methods
> (COMPRESSED_LZ4~2 COMPRESSED_ZSTD ~8) can appear. In my inspected
> samples zstd was used. Also other information like using keyed siphash24
> hash function instead of the unkeyed Jenkins hash function is stored as
> bit in that field. Also that new binary format that uses less space on
> disk compared to the original format is stored as
> HEADER_INCOMPATIBLE_COMPACT with value 16. So show all flags bits by
> additional lines like:
> #>>>>>>>>12 ulelong x FLAGS=%#x
> >>>>>>>>12 ulelong&2 !0 \b, compressed lz4
> >>>>>>>>12 ulelong&4 !0 \b, keyed hash siphash24
> >>>>>>>>12 ulelong&8 !0 \b, compressed zstd
> >>>>>>>>12 ulelong&16 !0 \b, compact
>
> Now comes the lines that are relevant for me. The state of the journal
> is shown by lines like:
> >>>>>>>>16 ubyte 0 \b, offline
> >>>>>>>>16 ubyte 1 \b, online
> >>>>>>>>16 ubyte 2 \b, archived
>
> In Linux manual page systemd-journald.service(8) is written that if the
> daemon is stopped uncleanly, or if the files are found to be corrupted,
> they are renamed using the ".journal~" suffix, and the daemon starts
> writing to a new file. Unfortunately is not explained how this is
> expressed inside the journal structure itself. The suffix journal~ is
> not used as i expected by my intuition. So by try and error i can only
> say that for empty variants of offline/online i always got suffix
> journal~. So the file name suffix information is now shown by lines like:
>
> >>>>>>>>16 ubyte 0 \b,
> >>>>>>>>>184 leqdate 0 offline
> !:ext journal~
> >>>>>>>>>184 leqdate !0 offline
> !:ext journal/journal~
> >>>>>>>>16 ubyte 1 \b,
> >>>>>>>>>184 leqdate 0 online
> !:ext journal~
> >>>>>>>>>184 leqdate !0 online
> !:ext journal
> >>>>>>>>16 ubyte 2 \b, archived
> !:ext journal
>
>
> After applying the above mentioned modifications by patch
> file-5.44-linux-journal.diff i get error message like:
> # Magdir/linux, 463: Warning:
> EXTENSION type ` journal~' has bad char '~'
> To overcome this error i add tilde character ~ inside function
> parse_ext in src/apprentice.c by patch
> file-5.44-apprentice-journal.diff. So there the relevant line now
> becomes like:
> sizeof(me->mp[0].ext), "EXTENSION", ",!+-/@?_$&~", 0);
>
> After applying my 2 patches then i get an output like:
>
> system.journal: Journal file
> , Sat Jul 8
> 20:48:18 2023
> , online
> , keyed hash
> siphash24
> , compressed zstd
> , entries 0xaa17
> system at 0005fbf676d65363-7341c5cfb7780156.journal~: Journal file
> , Wed May 17
> 00:05:28 2023
> , offline
> , keyed hash
> siphash24
> , compressed zstd
> , entries 0x3125
> system at 0005febec199e7eb-f21f00dabead02cd.journal~: Journal file
> empty
> , offline
> , keyed hash
> siphash24
> , compressed zstd
> system at 0005febee06c5ddc-0354971f29c02bec.journal~: Journal file
> empty
> , offline
> , keyed hash
> siphash24
> , compressed zstd
> system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~: Journal file
> empty
> , online
> , keyed hash
> siphash24
> , compressed zstd
> user-1000.journal: Journal file
> , Sat Jul 8
> 20:52:22 2023
> , online
> , keyed hash
> siphash24
> , compressed zstd
> , entries 0x270
> user-1001.journal: Journal file
> , Sat Jul 8
> 21:33:16 2023
> , offline
> , keyed hash
> siphash24
> , compressed zstd
> , entries 0x1e
>
> I hope my diff file can be applied in future version of file utility.
> Now i know that i can delete empty *.journal~ samples to get some
> hundred MiB more free space.
>
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-v-journal.txt.gz><file-5_44-linux-journal_diff.DEFANGED-0><file-5_44-linux-journal_diff_sig.DEFANGED-1><file-5_44-apprentice-journal_diff.DEFANGED-2><file-5_44-apprentice-journal_diff_sig.DEFANGED-3>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230717/ef02b81d/attachment.asc>
More information about the File
mailing list