[File] Java source file incorrectly identified as HTML document

Aman Sharma amansha at kth.se
Sun Apr 20 20:53:31 UTC 2025


Hi,


I have two files, Reference.txt<https://github.com/user-attachments/files/19689452/Reference.txt> and Rebuild.txt<https://github.com/user-attachments/files/19689451/Rebuild.txt>. Their file type is:


$ file Reference.txt Rebuild.txt

Reference.txt: HTML document, ASCII text, with very long lines (6135)
Rebuild.txt:   Java source, ASCII text, with very long lines (6135)


Both are Java source files. However, Reference.txt is incorrectly identified as an HTML document. As suggested by Chris here<https://lists.reproducible-builds.org/pipermail/diffoscope/2025-April/002838.html>, if line 30 in Reference.txt is removed,`"  <title>(.*?)<\\/title>"`, file command correctly classifies it as Java source.


Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.astron.com/pipermail/file/attachments/20250420/e0a789f3/attachment.htm>


More information about the File mailing list