[File] suggestion or maybe a documentation point

Patrice Duroux patrice.duroux at gmail.com
Tue Mar 7 09:15:29 UTC 2023


Goal: get also the character and EOL encodings for script files like a Perl one.

$ file -v
magic file from /etc/magic:/usr/share/misc/magic
$ file test.cgi
test.cgi: Perl script text executable
$ cat test.cgi

print "€\n";

Man documentation says:
     If a file does not match any of the entries in the magic file, it
is examined to see if it seems to be a text file.  ASCII, ISO-8859-x,
     8-bit extended-ASCII character sets (such as those used on
Macintosh and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded
Unicode, and EBCDIC
     character sets can be distinguished by the different ranges and
sequences of bytes that constitute printable text in each set.  If a
file passes
     any of these tests, its character set is reported.  ASCII,
ISO-8859-x, UTF-8, and extended-ASCII files are identified as “text”
because they will
     be mostly readable on nearly any terminal; UTF-16 and EBCDIC are
only “character data” because, while they contain text, it is text
that will re‐
     quire translation before it can be read.  In addition, file will
attempt to determine other characteristics of text-type files.  If the
lines of a
     file are terminated by CR, CRLF, or NEL, instead of the
Unix-standard LF, this will be reported.  Files that contain embedded
escape sequences or
     overstriking will also be identified.

So if I understand well, it is like the following pseudo-code:
magic(input) || text(input)

Would it be possible then to have an option to get something like
magic(input) ; text(input)
or may be just ('--no-magic'?):

Sure, I did not try to create a pseudo (empty) magic file and to use
the -m option.

If my point here has been already addressed (probably many times) and
already solved
in some way, could it be added to the example section of its man page then?

Many thanks,

More information about the File mailing list