[File] [PATCH] speedup text file matching

Dirk Müller dirk at dmllr.de
Fri Mar 11 23:43:29 UTC 2022


Hi all,

rpm(1) and rpmlint(1) and probably other applications are using
libmagic to mass-determine the filetype
of files to process into a package. While looking at that, I noticed
that libmagic is quite slow when
it is processing large text-like files. This is because of many
complicated regexps being processed,
and the regexes are compiled repeatedly. regexp compilation is very
time consuming compared to regexp matching,
so it helps to do that only once and cache the result for further processing.

On a 64bit host this adds about 200kb of RAM usage, but it improves
matching speed by factor 2 - 4, which
is very noticeable when processing a larger directory of files.

Thanks,
Dirk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Cache-compiled-regexps-between-magic-matches.patch
Type: text/x-patch
Size: 11144 bytes
Desc: not available
URL: <https://mailman.astron.com/pipermail/file/attachments/20220312/09f34a48/attachment.bin>


More information about the File mailing list