[File] [PATCH] Magdir/linux Journal file *.journal~

Jörg Jenderek (GMX) joerg.jen.der.ek at gmx.net
Tue Jul 11 00:18:10 UTC 2023


Hello,

some weeks ago i installed Linux Mint 21.1. The main partition size was
89 GiB but i run out of space and get 100% usage. I tried to delete some
unnecessary files but the free space is immediately filled. That was
annoying. Nowadays for every little item you get a notification but not
for the really important things. The problem for me is what can be
deleted. I know i can remove some old log files, backup files, downloads
and cache files, but i can not do this in a hinted way by bleachbit or
Czkawka because the graphical Desktop environment does not start any
more.  I tried to use command line tools like du, df, ncdu but these do
not work reliable on btrfs file system. Furthermore it is difficult to
find many small files. For this purpose i tried many different disk
space visualisation tools running from rescue or other operating system.
For me tools like baobab, k4dirstat, Filelight are not useful because i
get a coloured map of my disk, but the colours are not correlated to a
file type, but that is what i needed. At least gdmap has this feature i
needed, but only a few file types by extensions are predefined. So i
just spend one day to add more colours for "big" or "many" file types
which are shown with grey colour. The other solution was tool
SequoiaView, but this requires wine environment. Nearly 2 GiB were
occupied by files beneath /var/log/journal. When i look i subdirectory
with "machine id" i get many similar files.

When running file command version 5.44 on such journal examples i get an
output like:

system.journal:                                    Journal file
						   , online
system at 0005fbf676d65363-7341c5cfb7780156.journal~: Journal file
						   , offline
system at 0005febec199e7eb-f21f00dabead02cd.journal~: Journal file
						   empty, offline
system at 0005febee06c5ddc-0354971f29c02bec.journal~: Journal file
						   empty, offline
system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~: Journal file
						   empty, online
user-1000.journal:                                 Journal file
						   , online
user-1001.journal:                                 Journal file
						   , offline

With option -i only generic application/octet-stream is shown.
Furthermore with --extension option ??? is displayed.

For comparison reason i run other utilities. DROID (Digital Record and
Object Identification) is a software tool developed by The National
Archives of UK to perform automated batch identification of file
formats. See
	https://digital-preservation.github.io/droid/
This does not recognize the samples.

The file identifier tool TrID  (see http://mark0.net/soft-trid-e.html)
does recognize the files. All are described as "systemd journal" by
journal-sysd.trid.xml. Here the same generic mime type is shown.
Only suffix journal is here shown as acceptable, whereas the journal~
suffix is not shown. The tool with -v option shows are related URL. That
is the same mentioned inside Magdir/linux (See appended
trid-v-journal.txt.gz).

But this URL is described as obsoleted and replaced. So that
informations are now expressed inside Magdir/linux by comment lines like:
# URL:		https://systemd.io/JOURNAL_FILE_FORMAT/
# Reference:	http://mark0.net/download/triddefs_xml.7z
#		defs/j/journal-sysd.trid.xml

The detection happens inside Magdir/linux by first checking for magic
signature[8]. Then as second test the state is checked for one known
values (STATE_OFFLINE~0 STATE_ONLINE~1 STATE_ARCHIVED~2). The next test
checks for non zero value of 3 id128s (file_id, machine_id, boot_id). So
this look like:
   0	string	LPKSHHRH
   >16		ubyte&252	0
   >>24		ubequad		>0
   >>>32		ubequad		>0
   >>>>40	ubequad		>0
   >>>>>48	ubequad		>0
   >>>>>>56	ubequad		>0
   >>>>>>>64	ubequad		>0	Journal file
Afterwards instead of generic mime type application/octet-stream i show
a user defined one. This is done by additional line like:
!:mime application/x-linux-journal

Afterwards the head_entry_realtime is handled. According to
documentations this contains a POSIX timestamp stored in microseconds.
Obviously if the journal is not filled (It is empty) the time stamp
field is nil. So this information is shown by line like:
   >>>>>>>>184	leqdate		0	empty
So i now also show non zero time stamps values by additional line like:
   >>>>>>>>184	leqdate/1000000	!0	\b, %s

In order to distinguish journal and journal~ i also look at not used
fields between starting with 7 reserved bytes (apparently nil),
seqnum_id and ending with entry_array_offset. Most of these fields are
not useful. So for the "not useful" fields i add magic lines as comment
lines like:
#>>>>>>>>72	ubequad		x	\b, seqnum_id %#16.16llx
#>>>>>>>>80	ubequad		x	b%16.16llx

But a few fields are useful. The header_size in all samples samples was
100h. So mention unusual cases by additional lines at the end just in
case somebody will inspect fields after header. This is done by
additional line like:
   >>>>>>>>88	ulequad		!0x100h	\b, header size %#llx
The number of entries is stored inside field n_entries. This information
is shown by line like:
   >>>>>>>>152	ulequad		>0	\b, entries %#llx
For empty journals the value is obviously zero. So that is no bargain
but for non zero cases now i get a quantitative value. This can be
verified by command line like:
	journalctl --file=user-1000.journal | wc -l

For incompatible_flags only the first bit is considered. This was done
by line like:
   >>>>>>>>12	ulelong&1	1	\b, compressed
According to documentation that means compressed by XZ method. But
according to documentation also other compression methods
(COMPRESSED_LZ4~2 COMPRESSED_ZSTD ~8) can appear. In my inspected
samples zstd was used. Also other information like using keyed siphash24
hash function instead of the unkeyed Jenkins hash function is stored as
bit in that field. Also that new binary format that uses less space on
disk compared to the original format is stored as
HEADER_INCOMPATIBLE_COMPACT with value 16. So show all flags bits by
additional lines like:
   #>>>>>>>>12	ulelong		x	FLAGS=%#x
   >>>>>>>>12	ulelong&2	!0	\b, compressed lz4
   >>>>>>>>12	ulelong&4	!0	\b, keyed hash siphash24
   >>>>>>>>12	ulelong&8	!0	\b, compressed zstd
   >>>>>>>>12	ulelong&16	!0	\b, compact

Now comes the lines that are relevant for me. The state of the journal
is shown by lines like:
   >>>>>>>>16	ubyte		0	\b, offline
   >>>>>>>>16	ubyte		1	\b, online
   >>>>>>>>16	ubyte		2	\b, archived

In Linux manual page systemd-journald.service(8) is written that if the
daemon is stopped  uncleanly, or if the files are found to be corrupted,
they are renamed using the ".journal~" suffix, and the daemon starts
writing to a new file. Unfortunately is not explained how this is
expressed inside the journal structure itself. The suffix journal~ is
not used as i expected by my intuition. So by try and error i can only
say that for empty variants of offline/online i always got suffix
journal~. So the file name suffix information is now shown by lines like:

   >>>>>>>>16	ubyte		0	\b,
   >>>>>>>>>184	leqdate		0	offline
   !:ext		journal~
   >>>>>>>>>184	leqdate		!0	offline
   !:ext		journal/journal~
   >>>>>>>>16	ubyte		1	\b,
   >>>>>>>>>184	leqdate		0	online
   !:ext		journal~
   >>>>>>>>>184	leqdate		!0	online
   !:ext		journal
   >>>>>>>>16	ubyte		2	\b, archived
   !:ext		journal


After applying the above mentioned modifications by patch
file-5.44-linux-journal.diff i get error message like:
# Magdir/linux, 463: Warning:
EXTENSION type `		journal~' has bad char '~'
To overcome this error i add tilde character ~ inside function
parse_ext in src/apprentice.c by patch
file-5.44-apprentice-journal.diff. So there the relevant line now
becomes like:
	    sizeof(me->mp[0].ext), "EXTENSION", ",!+-/@?_$&~", 0);

After applying my 2 patches then i get an output like:

system.journal:                                    Journal file
						   , Sat Jul  8
						   20:48:18 2023
						   , online
						   , keyed hash
						   siphash24
						   , compressed zstd
						   , entries 0xaa17
system at 0005fbf676d65363-7341c5cfb7780156.journal~: Journal file
						   , Wed May 17
						   00:05:28 2023
						   , offline
						   , keyed hash
						   siphash24
						   , compressed zstd
						   , entries 0x3125
system at 0005febec199e7eb-f21f00dabead02cd.journal~: Journal file
						   empty
						   , offline
						   , keyed hash
						   siphash24
						   , compressed zstd
system at 0005febee06c5ddc-0354971f29c02bec.journal~: Journal file
						   empty
						   , offline
						   , keyed hash
						   siphash24
						   , compressed zstd
system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~: Journal file
						   empty
						   , online
						   , keyed hash
						   siphash24
						   , compressed zstd
user-1000.journal:                                 Journal file
						   , Sat Jul  8
						   20:52:22 2023
						   , online
						   , keyed hash
						   siphash24
						   , compressed zstd
						   , entries 0x270
user-1001.journal:                                 Journal file
						   , Sat Jul  8
						   21:33:16 2023
						   , offline
						   , keyed hash
						   siphash24
						   , compressed zstd
						   , entries 0x1e

I hope my diff file can be applied in future version of file utility.
Now i know that i can delete empty *.journal~ samples to get some
hundred MiB more free space.

With best wishes
Jörg Jenderek
--
Jörg Jenderek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trid-v-journal.txt.gz
Type: application/x-gzip
Size: 483 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230711/4c9999ee/attachment.bin>
-------------- next part --------------
--- file-5.44/magic/Magdir/linux.old	2022-11-30 00:10:29.000000000 +0100
+++ file-5.44/magic/Magdir/linux	2023-07-10 02:31:50.604803700 +0200
@@ -380,26 +380,96 @@
 # Systemd journald files
 # See https://www.freedesktop.org/wiki/Software/systemd/journal-files/.
 # From: Zbigniew Jedrzejewski-Szmek <zbyszek at in.waw.pl>
-
-# check magic
+# Update: 	Joerg Jenderek
+# URL:		https://systemd.io/JOURNAL_FILE_FORMAT/
+# Reference:	http://mark0.net/download/triddefs_xml.7z/defs/j/journal-sysd.trid.xml
+# Note:		called "systemd journal" by TrID
+#		verified by `journalctl --file=user-1000.journal`
+# check magic signature[8]
 0	string	LPKSHHRH
 # check that state is one of known values
+# STATE_OFFLINE~0 STATE_ONLINE~1 STATE_ARCHIVED~2
 >16		ubyte&252	0
 # check that each half of three unique id128s is non-zero
+# file_id
 >>24		ubequad		>0
 >>>32		ubequad		>0
+# machine_id
 >>>>40		ubequad		>0
 >>>>>48		ubequad		>0
+# boot_id; last writer
 >>>>>>56	ubequad		>0
 >>>>>>>64	ubequad		>0	Journal file
-!:mime application/octet-stream
+#!:mime application/octet-stream
+!:mime application/x-linux-journal
 # provide more info
+# head_entry_realtime; contains a POSIX timestamp stored in microseconds
+>>>>>>>>184	leqdate/1000000	!0	\b, %s
 >>>>>>>>184	leqdate		0	empty
->>>>>>>>16	ubyte		0	\b, offline
->>>>>>>>16	ubyte		1	\b, online
+# If a file is closed after writing the state field should be set to STATE_OFFLINE
+>>>>>>>>16	ubyte		0	\b,
+# for offline and empty only journal~ extension found
+>>>>>>>>>184	leqdate		0	offline
+# https://man7.org/linux/man-pages/man8/systemd-journald.service.8.html
+# GRR: add char ~ inside parse_ext in ../../src/apprentice.c to avoid in file version 5.44 error like:
+# Magdir/linux, 463: Warning: EXTENSION type `		journal~' has bad char '~'
+!:ext		journal~
+# for offline and non empty often *.journal~ but also user-1001.journal
+>>>>>>>>>184	leqdate		!0	offline
+!:ext		journal/journal~
+# if a file is opened for writing the state field should be set to STATE_ONLINE
+>>>>>>>>16	ubyte		1	\b,
+# for online and empty only journal~ extension found
+>>>>>>>>>184	leqdate		0	online
+# system at 0005febee06e2ff2-f7ea54d10e4346ff.journal~
+!:ext		journal~
+# for online and non empty only journal extension found
+>>>>>>>>>184	leqdate		!0	online
+# system.journal user-1000.journal
+!:ext		journal
+# after a file has been rotated it should be set to STATE_ARCHIVED
 >>>>>>>>16	ubyte		2	\b, archived
+!:ext		journal
+# no *.journal~ found
+#!:ext		journal/journal~
+# compatible_flags
 >>>>>>>>8	ulelong&1	1	\b, sealed
+# incompatible_flags; COMPRESSED_XZ~1 COMPRESSED_LZ4~2 KEYED_HASH~4 COMPRESSED_ZSTD~8 COMPACT~16
+#>>>>>>>>12	ulelong		x	FLAGS=%#x
 >>>>>>>>12	ulelong&1	1	\b, compressed
+>>>>>>>>12	ulelong&2	!0	\b, compressed lz4
+>>>>>>>>12	ulelong&4	!0	\b, keyed hash siphash24
+>>>>>>>>12	ulelong&8	!0	\b, compressed zstd
+>>>>>>>>12	ulelong&16	!0	\b, compact
+# uint8_t reserved[7]; apparently nil
+#>>17		long		!0	\b, reserved %#8.8x
+# seqnum_id; like: 0 e623691afec94b5aa968ae2d726c49cc f98b2af481924b29 8d6816ca3639edc6
+#>>>>>>>>72	ubequad		x	\b, seqnum_id %#16.16llx
+#>>>>>>>>80	ubequad		x	b%16.16llx
+# header_size like: 100h
+>>>>>>>>88	ulequad		!0x100h	\b, header size %#llx
+# arena_size  like: 0 7fff00h ffff00h 17fff00h
+#>>>>>>>>96	ulequad		>0	\b, arena size %#llx
+# data_hash_table_offset like: 0 15f0h 15f0h
+#>>>>>>>>104	ulequad		>0	\b, hash table offset %#llx
+# data_hash_table_size like: 0 38e380h
+#>>>>>>>>112	ulequad		>0	\b, hash table size %#llx
+# field_hash_table_offset like: 0 110h
+#>>>>>>>>120	ulequad		>0	\b, field hash table offset %#llx
+# field_hash_table_size like: 0 14d0h
+#>>>>>>>>128	ulequad		>0	\b, field hash table size %#llx
+# tail_object_offset like: 0 43edd8h 511278h c68968h d487d0h efaa98h
+#>>>>>>>>136	ulequad		>0	\b, tail object offset %#llx
+# n_objects like: 0 1032h 5a2eh 92bdh a8b5h aa75h 112adh 40c23h 4714eh
+#>>>>>>>>144	ulequad		>0	\b, objects %#llx
+# n_entries like: 0 3aeh 235ah 2dc4h 3125h 16129h 187a1h
+>>>>>>>>152	ulequad		>0	\b, entries %#llx
+# tail_entry_seqnum like: 0 1988h 16249h 24c12h 24c12h 41e64h 9fefdh
+#>>>>>>>>160	ulequad		>0	\b, tail entry seqnum %#llx
+# head_entry_seqnum like: 0 1h 15dbh 6552h 213bfh 213bfh 3e672h 9a28ah
+#>>>>>>>>168	ulequad		>0	\b, head entry seqnum %#llx
+# entry_array_offset like: 0 390058h 3909d8h 3909e0h
+#>>>>>>>>176	ulequad		>0	\b, entry array offset %#llx
 
 # BCache backing and cache devices
 # From: Gabriel de Perthuis <g2p.code at gmail.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-linux-journal.diff.sig
Type: application/octet-stream
Size: 2106 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230711/4c9999ee/attachment.obj>
-------------- next part --------------
--- file-5.44/src/apprentice.c.old	2022-12-26 19:19:11.000000000 +0100
+++ file-5.44/src/apprentice.c	2023-07-08 22:56:19.406406000 +0200
@@ -2564,11 +2564,12 @@
 parse_ext(struct magic_set *ms, struct magic_entry *me, const char *line,
     size_t len)
 {
 	return parse_extra(ms, me, line, len,
 	    CAST(off_t, offsetof(struct magic, ext)),
-	    sizeof(me->mp[0].ext), "EXTENSION", ",!+-/@?_$&", 0); /* & for b&w */
+	    sizeof(me->mp[0].ext), "EXTENSION", ",!+-/@?_$&~", 0); /* & for b&w */
+						/* ~ for journal~ */
 }
 
 /*
  * parse a MIME annotation line from magic file, put into magic[index - 1]
  * if valid
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file-5.44-apprentice-journal.diff.sig
Type: application/octet-stream
Size: 531 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20230711/4c9999ee/attachment-0001.obj>


More information about the File mailing list