[File] new feature of recognizing JSON (1.35) / -k not working?

Yaroslav Halchenko list-file at onerussian.com
Mon Apr 29 13:38:02 UTC 2019


Thank you Christos for the reply,

On Sat, 27 Apr 2019, Christos Zoulas wrote:
>    I understand what you are trying to do and this is a valid request. There
>    are three separate issues here:
>    1. You want to just identify text vs binary files
>        There is no direct way to do this, file tries to print "text" in the
>    description but not in the mime output
>        when there is an application. Perhaps you can use --mime-encoding

hm, perhaps it would work!  So the rule then should be that binary is if
encoding is binary (so we commit to annex) and consider it to be
"text" otherwise (commit to git).

Would you be so kind to point me to the list of possible
encodings?  I just want to see if there is anything else we might
consider "binary" for our use case.

>    2. New magic changes the output (JSON in this case). You can exclude the
>    json identification
>        with -e json. In fact perhaps you should exclude all the tests except
>    "text" in your application.

I should have RTFM more closely.  I have tried -e appinfo but now
I see that it is not "exclude all application/" types.

confirming that such a workaround could potentially work:

	$> file --mime 1.json 
	1.json: application/json; charset=utf-8

	$> file --mime -e json 1.json
	1.json: text/plain; charset=utf-8

There is no environment variable which could be set for libmagic to
consume the types to exclude, is there?

>    3. the -k option is buggy. Please file a bug report to
>    https://bugs.astron.com/ with reproducers.

done: https://bugs.astron.com/view.php?id=77

>    Perhaps we can add some code to improve things with --include flag to only
>    include what specified,
>    by fixing -k to work and adding a separate option to print file's idea (if
>    a file is contains text or is binary).

I will need to wait for Joey (git-annex author) to chime in on what he
would thing would be the best way for git-annex.  I feel that relying on
mime-encoding (would require adding support for that in git-annex) to be
"binary" or not is the best way to go.

Cheers,
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        


More information about the File mailing list