[File] Mis-detection of *roff as ReStructuredText file

Christos Zoulas christos at zoulas.com
Mon Apr 27 02:17:17 UTC 2020


Should be fixed now, thanks!

christos

> On Apr 26, 2020, at 1:29 PM, Christoph Biedl <astron.com.bwoj at manchmal.in-ulm.de> wrote:
> 
> Hello,
> 
> Debian bug report #949878[1]:
> 
> The commit
> 
> | commit 61fc8e453a9988be03b1183be912d3f1c9bad24b
> | Author: Christos Zoulas <christos at zoulas.com>
> | Date:   Sat Nov 2 18:37:58 2019 +0000
> |
> |     Add ReStructuredText
> 
> introduced a regression: troff files, at least those generated by Perl's
> pod2man are mis-detected as ReStructuredText.
> 
> Since it took a while to understand why, here details of my analysis:
> 
> The generated troff starts with:
> 
> | .\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35)
> | .\"
> | .\" Standard preamble:
> | .\" ========================================================================
> | .de Sp \" Vertical space (when we can't use .PP)
> 
> The first search[2] in magic/Magdir/rst
> 
> | 0       search/256      \=\=
> 
> will match in the fourth line, column 4. The subsequent regexp search
> 
> | >&0     regex/256       \^[\=]+$
> 
> starts at that very place *and* treats it as a start of line, resulting
> in a match. In other words, this combination matches always if there's a
> line ending with equal signs in the search range. And things go downhill
> from there, more precisely: The fifth line matches "^..[a-zA-Z]"
> 
> 
> Not sure how to fix in a sane way. One solution was to start the regex
> search a bit earlier, i.e. ">&-10", but honestly, no.
> 
> It seems to be possible to include a newline in the first search i.e.
> "\n==" - this will fail on files with a different line ending (do we
> care?), and if the equal signs are in the very first line. Since the
> semantics are underlining of a previous line, that should be acceptable.
> 
> Regards,
> 
>    Christoph
> 
> PS: Possibly the escaping in the other regexp is incomplete:
> 
> | >>>&0 regex/512       \^\.\.[A-Za-z]  ReStructuredText file
> 
> Checking for lines that start with two dots and a letter would require
> 
> | >>>&0 regex/512       \^\\.\\.[A-Za-z]  ReStructuredText file
> 
> ... but I couldn't find either in descriptions of RST.
> 
> [1] https://bugs.debian.org/949878
> [2] Aside, I fail to see the need to escape the equal sign.
> --
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20200426/32068a51/attachment.asc>


More information about the File mailing list