[File] mixed and seemingly inconsistent results....
Christos Zoulas
christos at zoulas.com
Sun Jul 26 17:06:19 UTC 2020
Hi,
> On Jul 25, 2020, at 9:03 PM, Astara <file at tlinx.org> wrote:
>
> When I run the file command on my system over some RAID stripes, I'm seeing
> a couple of inconsistencies and/or oddities.
>
> The main class that is odd, is in files that have an interpreter line,
> examples:
>
> #/usr/bin/perl -w = a /usr/bin/perl -w script executable (binary data)
> #/bin/sh = POSIX shell script executable (binary data)
> #/bin/bash = Bourne-Again shell script executable (binary data)
>
> Note, in perl, we also see:
> Perl5 module source, ASCII text (starts with package NAME)
> Perl POD document, ASCII text (starts with '=head1 NAME')
> Perl POD document, UTF-8 Unicode text
> Perl Script text executable
>
> Maybe part of this is I'm looking at disk stripes and while many start with
> a file, it may be several files in one 64K stripe with a bunch of binary
> 00000's after the file to line it up to a sector (4k sector size).
>
> When I started this post, I didn't understand the binary data annotation,
> since the sources in them were not binary -- but that's likely explained by
> file looking at a 64K disk stripe and seeing multiple files separated by
> NUL's.
>
> The other oddity are separate names for various perl files.
> What I mean by that, is that I have Perl module file that is a
> module file, has POD code for the module in it, and can be executed
> like a program, and has UTF-8 characters in it.
>
> It ID'd as a Perl Script text executable, but would also be a:
> Perl5 module source, UTF-8 text
> Perl POD document, UTF-8 Unicode text
> (isn't "Unicode" after "UTF-8" redundant?)
Well, it is... The encoding magic text was inconsistent anyway with
the unicode magic file so I made it match. It will now print:
Unicode text, UTF-8
Still redundant but at least consistent everywhere within the program.
The rationale is to print Unicode text (which it is) followed by the
encoding, and optionally followed by endianness.
>
> Beginning of file looks like:
> #!/usr/bin/perl -w
> # vim=:SetNumberAndWidth
>
> =encoding utf-8
>
> =head1 NAME
> P - Safer, friendlier printf/print/sprintf + say
>
> =head1 VERSION
>
> Version "1.1.38"
>
> =cut
>
> { package P;
> use warnings; use strict;use mem;
> our $VERSION='1.1.38';
>
>
> I feel 'file' made an acceptable choice choice in calling it a
> perl script text executable, though it's primary purpose is being a
> module: the executable part was to demo features of the
> module.
Well, if it has #! , it was meant to be executed.
>
>
>
> Conversely, some C-source files that also had NUL's between
> them were simply labeled:
> "fname: data".
>
> They were several C-source files separated by the NUL's, and
> started out:
>
> // SPDX-License-Identifier: GPL-2.0-or-later
> /*
> * CRC32C
> *@Article{castagnoli-crc,
> * authors = { Guy C. Stefan B. and Martin H.},
> * month = {June},
> *}
> * Used by the iSCSI driver, possibly others, and derived from the
> * the iscsi-crc.c module of the linux-iscsi driver at
> * http://linux-iscsi.sourceforge.net.
> */
> #include <crypto/hash.h>
> #include <linux/err.h>
>
> static struct crypto_shash *tfm;
>
> I.e. it's C-source. After nulls, turns out 'file' "ldb" is several
> C source files with zero'd EOF space after each C file.
This is an unusual setup that you have with all the NUL's in
the data. Perhaps what's needed here is an option to ignore them.
>
>
> On the ones starting with C-source files, I'm guessing the
> NUL's would have had file wanting to label it with (binary data),
> but that would conflict with C-source -- even though, 'file' had
> no problem displaying some script files w/tag of (binary data).
> Not wrong, exactly, but just inconsistent.
There are two types of magic, ascii and binary. If the file has
NUL's only binary magic is consulted.
> In these cases, it's almost like it needs to look at content
> to know what type of source file it is (Perl/Bash/C), but rather
> than label the file as (binary data), it would be more useful
> (not sure of what's involved) to note that the binary was
> 'nul' data between C-source files, vs. labelling the whole thing
> as just 'data' (as it did with C source files, but not script
> files).
>
> Have written too much (sorry), but trying to be clear
> w/examples. Use file "alot", so don't think I'm criticizing
> or even "need a fix", but wanted to point out that having many
> disparate sources, may be having the effect of creating
> inconsistencies in the output (some of which I am contributing
> to by running it over images-in-files that are stripes from
> a RAID disk.
>
> Thanks for the fish & tool!
> Astara (aka L.A. Walsh)
You are welcome.
christos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20200726/fb027e5c/attachment.asc>
More information about the File
mailing list