[File] [PATCH] Magdir/linux Journal file *.journal~

Christos Zoulas christos at zoulas.com
Mon Jul 17 20:19:43 UTC 2023


Committed, thanks!

christos

> On Jul 10, 2023, at 8:18 PM, Jörg Jenderek (GMX) <joerg.jen.der.ek at gmx.net> wrote:
> 
> Hello,
> 
> some weeks ago i installed Linux Mint 21.1. The main partition size was
> 89 GiB but i run out of space and get 100% usage. I tried to delete some
> unnecessary files but the free space is immediately filled. That was
> annoying. Nowadays for every little item you get a notification but not
> for the really important things. The problem for me is what can be
> deleted. I know i can remove some old log files, backup files, downloads
> and cache files, but i can not do this in a hinted way by bleachbit or
> Czkawka because the graphical Desktop environment does not start any
> more.  I tried to use command line tools like du, df, ncdu but these do
> not work reliable on btrfs file system. Furthermore it is difficult to
> find many small files. For this purpose i tried many different disk
> space visualisation tools running from rescue or other operating system.
> For me tools like baobab, k4dirstat, Filelight are not useful because i
> get a coloured map of my disk, but the colours are not correlated to a
> file type, but that is what i needed. At least gdmap has this feature i
> needed, but only a few file types by extensions are predefined. So i
> just spend one day to add more colours for "big" or "many" file types
> which are shown with grey colour. The other solution was tool
> SequoiaView, but this requires wine environment. Nearly 2 GiB were
> occupied by files beneath /var/log/journal. When i look i subdirectory
> with "machine id" i get many similar files.
> 
> When running file command version 5.44 on such journal examples i get an
> output like:
> 
> system.journal:                                    Journal file
> 						   , online
> system at 0005fbf676d65363-7341c5cfb7780156.journal~: Journal file
> 						   , offline
> system at 0005febec199e7eb-f21f00dabead02cd.journal~: Journal file
> 						   empty, offline
> system at 0005febee06c5ddc-0354971f29c02bec.journal~: Journal file
> 						   empty, offline
> system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~: Journal file
> 						   empty, online
> user-1000.journal:                                 Journal file
> 						   , online
> user-1001.journal:                                 Journal file
> 						   , offline
> 
> With option -i only generic application/octet-stream is shown.
> Furthermore with --extension option ??? is displayed.
> 
> For comparison reason i run other utilities. DROID (Digital Record and
> Object Identification) is a software tool developed by The National
> Archives of UK to perform automated batch identification of file
> formats. See
> 	https://digital-preservation.github.io/droid/
> This does not recognize the samples.
> 
> The file identifier tool TrID  (see http://mark0.net/soft-trid-e.html)
> does recognize the files. All are described as "systemd journal" by
> journal-sysd.trid.xml. Here the same generic mime type is shown.
> Only suffix journal is here shown as acceptable, whereas the journal~
> suffix is not shown. The tool with -v option shows are related URL. That
> is the same mentioned inside Magdir/linux (See appended
> trid-v-journal.txt.gz).
> 
> But this URL is described as obsoleted and replaced. So that
> informations are now expressed inside Magdir/linux by comment lines like:
> # URL:		https://systemd.io/JOURNAL_FILE_FORMAT/
> # Reference:	http://mark0.net/download/triddefs_xml.7z
> #		defs/j/journal-sysd.trid.xml
> 
> The detection happens inside Magdir/linux by first checking for magic
> signature[8]. Then as second test the state is checked for one known
> values (STATE_OFFLINE~0 STATE_ONLINE~1 STATE_ARCHIVED~2). The next test
> checks for non zero value of 3 id128s (file_id, machine_id, boot_id). So
> this look like:
>  0	string	LPKSHHRH
>  >16		ubyte&252	0
>  >>24		ubequad		>0
>  >>>32		ubequad		>0
>  >>>>40	ubequad		>0
>  >>>>>48	ubequad		>0
>  >>>>>>56	ubequad		>0
>  >>>>>>>64	ubequad		>0	Journal file
> Afterwards instead of generic mime type application/octet-stream i show
> a user defined one. This is done by additional line like:
> !:mime application/x-linux-journal
> 
> Afterwards the head_entry_realtime is handled. According to
> documentations this contains a POSIX timestamp stored in microseconds.
> Obviously if the journal is not filled (It is empty) the time stamp
> field is nil. So this information is shown by line like:
>  >>>>>>>>184	leqdate		0	empty
> So i now also show non zero time stamps values by additional line like:
>  >>>>>>>>184	leqdate/1000000	!0	\b, %s
> 
> In order to distinguish journal and journal~ i also look at not used
> fields between starting with 7 reserved bytes (apparently nil),
> seqnum_id and ending with entry_array_offset. Most of these fields are
> not useful. So for the "not useful" fields i add magic lines as comment
> lines like:
> #>>>>>>>>72	ubequad		x	\b, seqnum_id %#16.16llx
> #>>>>>>>>80	ubequad		x	b%16.16llx
> 
> But a few fields are useful. The header_size in all samples samples was
> 100h. So mention unusual cases by additional lines at the end just in
> case somebody will inspect fields after header. This is done by
> additional line like:
>  >>>>>>>>88	ulequad		!0x100h	\b, header size %#llx
> The number of entries is stored inside field n_entries. This information
> is shown by line like:
>  >>>>>>>>152	ulequad		>0	\b, entries %#llx
> For empty journals the value is obviously zero. So that is no bargain
> but for non zero cases now i get a quantitative value. This can be
> verified by command line like:
> 	journalctl --file=user-1000.journal | wc -l
> 
> For incompatible_flags only the first bit is considered. This was done
> by line like:
>  >>>>>>>>12	ulelong&1	1	\b, compressed
> According to documentation that means compressed by XZ method. But
> according to documentation also other compression methods
> (COMPRESSED_LZ4~2 COMPRESSED_ZSTD ~8) can appear. In my inspected
> samples zstd was used. Also other information like using keyed siphash24
> hash function instead of the unkeyed Jenkins hash function is stored as
> bit in that field. Also that new binary format that uses less space on
> disk compared to the original format is stored as
> HEADER_INCOMPATIBLE_COMPACT with value 16. So show all flags bits by
> additional lines like:
>  #>>>>>>>>12	ulelong		x	FLAGS=%#x
>  >>>>>>>>12	ulelong&2	!0	\b, compressed lz4
>  >>>>>>>>12	ulelong&4	!0	\b, keyed hash siphash24
>  >>>>>>>>12	ulelong&8	!0	\b, compressed zstd
>  >>>>>>>>12	ulelong&16	!0	\b, compact
> 
> Now comes the lines that are relevant for me. The state of the journal
> is shown by lines like:
>  >>>>>>>>16	ubyte		0	\b, offline
>  >>>>>>>>16	ubyte		1	\b, online
>  >>>>>>>>16	ubyte		2	\b, archived
> 
> In Linux manual page systemd-journald.service(8) is written that if the
> daemon is stopped  uncleanly, or if the files are found to be corrupted,
> they are renamed using the ".journal~" suffix, and the daemon starts
> writing to a new file. Unfortunately is not explained how this is
> expressed inside the journal structure itself. The suffix journal~ is
> not used as i expected by my intuition. So by try and error i can only
> say that for empty variants of offline/online i always got suffix
> journal~. So the file name suffix information is now shown by lines like:
> 
>  >>>>>>>>16	ubyte		0	\b,
>  >>>>>>>>>184	leqdate		0	offline
>  !:ext		journal~
>  >>>>>>>>>184	leqdate		!0	offline
>  !:ext		journal/journal~
>  >>>>>>>>16	ubyte		1	\b,
>  >>>>>>>>>184	leqdate		0	online
>  !:ext		journal~
>  >>>>>>>>>184	leqdate		!0	online
>  !:ext		journal
>  >>>>>>>>16	ubyte		2	\b, archived
>  !:ext		journal
> 
> 
> After applying the above mentioned modifications by patch
> file-5.44-linux-journal.diff i get error message like:
> # Magdir/linux, 463: Warning:
> EXTENSION type `		journal~' has bad char '~'
> To overcome this error i add tilde character ~ inside function
> parse_ext in src/apprentice.c by patch
> file-5.44-apprentice-journal.diff. So there the relevant line now
> becomes like:
> 	    sizeof(me->mp[0].ext), "EXTENSION", ",!+-/@?_$&~", 0);
> 
> After applying my 2 patches then i get an output like:
> 
> system.journal:                                    Journal file
> 						   , Sat Jul  8
> 						   20:48:18 2023
> 						   , online
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> 						   , entries 0xaa17
> system at 0005fbf676d65363-7341c5cfb7780156.journal~: Journal file
> 						   , Wed May 17
> 						   00:05:28 2023
> 						   , offline
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> 						   , entries 0x3125
> system at 0005febec199e7eb-f21f00dabead02cd.journal~: Journal file
> 						   empty
> 						   , offline
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> system at 0005febee06c5ddc-0354971f29c02bec.journal~: Journal file
> 						   empty
> 						   , offline
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~: Journal file
> 						   empty
> 						   , online
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> user-1000.journal:                                 Journal file
> 						   , Sat Jul  8
> 						   20:52:22 2023
> 						   , online
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> 						   , entries 0x270
> user-1001.journal:                                 Journal file
> 						   , Sat Jul  8
> 						   21:33:16 2023
> 						   , offline
> 						   , keyed hash
> 						   siphash24
> 						   , compressed zstd
> 						   , entries 0x1e
> 
> I hope my diff file can be applied in future version of file utility.
> Now i know that i can delete empty *.journal~ samples to get some
> hundred MiB more free space.
> 
> With best wishes
> Jörg Jenderek
> --
> Jörg Jenderek
> <trid-v-journal.txt.gz><file-5_44-linux-journal_diff.DEFANGED-0><file-5_44-linux-journal_diff_sig.DEFANGED-1><file-5_44-apprentice-journal_diff.DEFANGED-2><file-5_44-apprentice-journal_diff_sig.DEFANGED-3>--
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file
> <sanitizer.log>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20230717/ef02b81d/attachment.asc>


More information about the File mailing list