[File] Mis-detection of *roff as ReStructuredText file

Christoph Biedl astron.com.bwoj at manchmal.in-ulm.de
Sun Apr 26 17:29:09 UTC 2020


Hello,

Debian bug report #949878[1]:

The commit

| commit 61fc8e453a9988be03b1183be912d3f1c9bad24b
| Author: Christos Zoulas <christos at zoulas.com>
| Date:   Sat Nov 2 18:37:58 2019 +0000
|
|     Add ReStructuredText

introduced a regression: troff files, at least those generated by Perl's
pod2man are mis-detected as ReStructuredText.

Since it took a while to understand why, here details of my analysis:

The generated troff starts with:

| .\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35)
| .\"
| .\" Standard preamble:
| .\" ========================================================================
| .de Sp \" Vertical space (when we can't use .PP)

The first search[2] in magic/Magdir/rst

| 0       search/256      \=\=

will match in the fourth line, column 4. The subsequent regexp search

| >&0     regex/256       \^[\=]+$

starts at that very place *and* treats it as a start of line, resulting
in a match. In other words, this combination matches always if there's a
line ending with equal signs in the search range. And things go downhill
from there, more precisely: The fifth line matches "^..[a-zA-Z]"


Not sure how to fix in a sane way. One solution was to start the regex
search a bit earlier, i.e. ">&-10", but honestly, no.

It seems to be possible to include a newline in the first search i.e.
"\n==" - this will fail on files with a different line ending (do we
care?), and if the equal signs are in the very first line. Since the
semantics are underlining of a previous line, that should be acceptable.

Regards,

    Christoph

PS: Possibly the escaping in the other regexp is incomplete:

| >>>&0 regex/512       \^\.\.[A-Za-z]  ReStructuredText file

Checking for lines that start with two dots and a letter would require

| >>>&0 regex/512       \^\\.\\.[A-Za-z]  ReStructuredText file

... but I couldn't find either in descriptions of RST.

[1] https://bugs.debian.org/949878
[2] Aside, I fail to see the need to escape the equal sign.


More information about the File mailing list