[File] new feature of recognizing JSON (1.35) / -k not working?

Christos Zoulas christos at zoulas.com
Mon Apr 29 15:55:36 UTC 2019

On Apr 29,  9:38am, list-file at onerussian.com (Yaroslav Halchenko) wrote:
-- Subject: Re: [File] new feature of recognizing JSON (1.35) / -k not workin

| Thank you Christos for the reply,
| On Sat, 27 Apr 2019, Christos Zoulas wrote:
| >    I understand what you are trying to do and this is a valid request. There
| >    are three separate issues here:
| >    1. You want to just identify text vs binary files
| >        There is no direct way to do this, file tries to print "text" in the
| >    description but not in the mime output
| >        when there is an application. Perhaps you can use --mime-encoding
| hm, perhaps it would work!  So the rule then should be that binary is if
| encoding is binary (so we commit to annex) and consider it to be
| "text" otherwise (commit to git).
| Would you be so kind to point me to the list of possible
| encodings?  I just want to see if there is anything else we might
| consider "binary" for our use case.

It is in encodings.c, the others are all textual.

| >    2. New magic changes the output (JSON in this case). You can exclude the
| >    json identification
| >        with -e json. In fact perhaps you should exclude all the tests except
| >    "text" in your application.

| I should have RTFM more closely.  I have tried -e appinfo but now
| I see that it is not "exclude all application/" types.
| confirming that such a workaround could potentially work:
| 	$> file --mime 1.json 
| 	1.json: application/json; charset=utf-8
| 	$> file --mime -e json 1.json
| 	1.json: text/plain; charset=utf-8
| There is no environment variable which could be set for libmagic to
| consume the types to exclude, is there?

No, but you can you can use magic_open/magic_setflags with the appropriate
flags, to set it.

| >    3. the -k option is buggy. Please file a bug report to
| >    https://bugs.astron.com/ with reproducers.
| done: https://bugs.astron.com/view.php?id=77


| >    Perhaps we can add some code to improve things with --include flag to only
| >    include what specified,
| >    by fixing -k to work and adding a separate option to print file's idea (if
| >    a file is contains text or is binary).
| I will need to wait for Joey (git-annex author) to chime in on what he
| would thing would be the best way for git-annex.  I feel that relying on
| mime-encoding (would require adding support for that in git-annex) to be
| "binary" or not is the best way to go.



