[File] new feature of recognizing JSON (1.35) / -k not working?
Christos Zoulas
christos at zoulas.com
Mon Apr 29 15:55:36 UTC 2019
On Apr 29, 9:38am, list-file at onerussian.com (Yaroslav Halchenko) wrote:
-- Subject: Re: [File] new feature of recognizing JSON (1.35) / -k not workin
| Thank you Christos for the reply,
|
| On Sat, 27 Apr 2019, Christos Zoulas wrote:
| > I understand what you are trying to do and this is a valid request. There
| > are three separate issues here:
| > 1. You want to just identify text vs binary files
| > There is no direct way to do this, file tries to print "text" in the
| > description but not in the mime output
| > when there is an application. Perhaps you can use --mime-encoding
|
| hm, perhaps it would work! So the rule then should be that binary is if
| encoding is binary (so we commit to annex) and consider it to be
| "text" otherwise (commit to git).
|
| Would you be so kind to point me to the list of possible
| encodings? I just want to see if there is anything else we might
| consider "binary" for our use case.
It is in encodings.c, the others are all textual.
| > 2. New magic changes the output (JSON in this case). You can exclude the
| > json identification
| > with -e json. In fact perhaps you should exclude all the tests except
| > "text" in your application.
|
| I should have RTFM more closely. I have tried -e appinfo but now
| I see that it is not "exclude all application/" types.
|
| confirming that such a workaround could potentially work:
|
| $> file --mime 1.json
| 1.json: application/json; charset=utf-8
|
| $> file --mime -e json 1.json
| 1.json: text/plain; charset=utf-8
|
| There is no environment variable which could be set for libmagic to
| consume the types to exclude, is there?
No, but you can you can use magic_open/magic_setflags with the appropriate
flags, to set it.
| > 3. the -k option is buggy. Please file a bug report to
| > https://bugs.astron.com/ with reproducers.
|
| done: https://bugs.astron.com/view.php?id=77
Thanks.
|
| > Perhaps we can add some code to improve things with --include flag to only
| > include what specified,
| > by fixing -k to work and adding a separate option to print file's idea (if
| > a file is contains text or is binary).
|
| I will need to wait for Joey (git-annex author) to chime in on what he
| would thing would be the best way for git-annex. I feel that relying on
| mime-encoding (would require adding support for that in git-annex) to be
| "binary" or not is the best way to go.
Ok.
christos
More information about the File
mailing list