[File] Mis-detection of *roff as ReStructuredText file
Christoph Biedl
astron.com.bwoj at manchmal.in-ulm.de
Sun Apr 26 17:29:09 UTC 2020
Hello,
Debian bug report #949878[1]:
The commit
| commit 61fc8e453a9988be03b1183be912d3f1c9bad24b
| Author: Christos Zoulas <christos at zoulas.com>
| Date: Sat Nov 2 18:37:58 2019 +0000
|
| Add ReStructuredText
introduced a regression: troff files, at least those generated by Perl's
pod2man are mis-detected as ReStructuredText.
Since it took a while to understand why, here details of my analysis:
The generated troff starts with:
| .\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35)
| .\"
| .\" Standard preamble:
| .\" ========================================================================
| .de Sp \" Vertical space (when we can't use .PP)
The first search[2] in magic/Magdir/rst
| 0 search/256 \=\=
will match in the fourth line, column 4. The subsequent regexp search
| >&0 regex/256 \^[\=]+$
starts at that very place *and* treats it as a start of line, resulting
in a match. In other words, this combination matches always if there's a
line ending with equal signs in the search range. And things go downhill
from there, more precisely: The fifth line matches "^..[a-zA-Z]"
Not sure how to fix in a sane way. One solution was to start the regex
search a bit earlier, i.e. ">&-10", but honestly, no.
It seems to be possible to include a newline in the first search i.e.
"\n==" - this will fail on files with a different line ending (do we
care?), and if the equal signs are in the very first line. Since the
semantics are underlining of a previous line, that should be acceptable.
Regards,
Christoph
PS: Possibly the escaping in the other regexp is incomplete:
| >>>&0 regex/512 \^\.\.[A-Za-z] ReStructuredText file
Checking for lines that start with two dots and a letter would require
| >>>&0 regex/512 \^\\.\\.[A-Za-z] ReStructuredText file
... but I couldn't find either in descriptions of RST.
[1] https://bugs.debian.org/949878
[2] Aside, I fail to see the need to escape the equal sign.
More information about the File
mailing list