<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I understand what you are trying to do and this is a valid request. There are three separate issues here:<div class="">1. You want to just identify text vs binary files</div><div class=""> There is no direct way to do this, file tries to print "text" in the description but not in the mime output</div><div class=""> when there is an application. Perhaps you can use --mime-encoding</div><div class="">2. New magic changes the output (JSON in this case). You can exclude the json identification</div><div class=""> with -e json. In fact perhaps you should exclude all the tests except "text" in your application.</div><div class="">3. the -k option is buggy. Please file a bug report to <a href="https://bugs.astron.com/" class="">https://bugs.astron.com/</a> with reproducers.</div><div class=""><br class=""></div><div class="">Perhaps we can add some code to improve things with --include flag to only include what specified,</div><div class="">by fixing -k to work and adding a separate option to print file's idea (if a file is contains text or is binary).</div><div class=""><br class=""></div><div class="">Best,</div><div class=""><br class=""></div><div class="">christos<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 26, 2019, at 5:16 PM, Yaroslav Halchenko <<a href="mailto:list-file@onerussian.com" class="">list-file@onerussian.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Dear Magic(al) File people,<br class=""><br class="">A little background: while using git-annex for managing repositories with both<br class="">text files to be committed to git and data files to go under git-annex control,<br class="">we rely on libmagic to provide mimetype to drive git-annex'es decision. We say<br class="">that all files of some text/ mimetype should get committed directly into git<br class="">and the rest managed by git-annex. Worked really splendidly -- all<br class="">scripts .py, .txt, .md, .json, .yaml etc files got automagically committed into<br class="">git, while data/binary files of any kind (images, neuroimaging data etc) -- to<br class="">annex.<br class=""><br class="">Today, after some time of troubleshooting (was a bit of pain since we used<br class="">bundled newer libmagic1 while system wide file was old one ;)) we<br class="">realized that since libmagic/file 1.35 there is special discovery of .json<br class="">files, and they are no longer text/* <br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --mime-type sample.json<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sample.json: application/json<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --version <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>file-5.35<br class=""><br class="">which sure thing breaks our workflow, and I see a potential problem with such<br class="">discovery in general:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>json is a subset of yaml. So any json file an also be reported as YAML file.<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>It is not reported ATM since there seems to be no application/yaml parser.<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>But then what will happen when such parser appears? Change again?<br class=""><br class="">And I wondered how we could remedy the situation for our use case. I looked at<br class="">file --help options, but it seems that -k, --keep-going which could<br class="">potentially report us all the matches from more specific (application/json) to<br class="">less specific (e.g. application/yaml and then text/plain). But that<br class="">option seems to not perform correctly in our .json case:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --mime -kr 1.json <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>1.json: application/json<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>- ; charset=utf-8<br class=""><br class="">and similarly ugly on sample .log file laying around:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --mime -kr tests.log<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>tests.log: text/plain<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>- ; charset=utf-8<br class=""><br class=""><br class="">so it seems in the json case it detects that there is some additional mime<br class="">(not sure what it is for the .log, if any), but does not actually output it.<br class="">Note that on older file/magic version there is no strange empty second<br class="">entry for that .log file<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --mime -kr tests.log <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>tests.log: text/plain; charset=utf-8<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --version <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>file-5.22<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>magic file from /etc/magic:/usr/share/misc/magic<br class=""><br class=""><br class="">Bug? <br class=""><br class="">More info -- for some files I found it reporting multiple types:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file -kr --mime-type dcmqrscp<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>dcmqrscp: application/x-pie-executable<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>- application/octet-stream<br class=""><br class="">and for a sweep under /usr/bin:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>hopa:/usr/bin<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --mime-type -k * | grep '/.*/' | sed -e 's,.*: *,,g' | sort | uniq -c<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>187 application/x-executable\012- application/octet-stream<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span> 2828 application/x-pie-executable\012- application/octet-stream<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 9 text/x-perl\012- text/html\012- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 35 text/x-perl\012- text/x-c\012- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 1 text/x-perl\012- text/x-java\012- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 1 text/x-perl\012- text/x-makefile\012- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 46 text/x-perl\012- text/x-perl\012- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 2 text/x-perl\012- text/x-tex\012- <br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> 1 text/x-shellscript\012- application/octet-stream<br class=""><br class=""><br class="">but for many it is that second blank one like in our case with .json:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>hopa:/usr/bin<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>$> file --mime-type -k * | sed -e 's,.*: *,,g' | grep '\\012- \\012-' | sort | uniq -c<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>406 text/x-perl\012- \012- <br class=""><br class=""><br class="">Overall question is on how we could reliably (now and in the future and ideally<br class="">in the past of file/magic;)) discover if any file could be considered<br class="">some kind of a text/ file?<br class=""><br class="">Thank you in advance!<br class="">-- <br class="">Yaroslav O. Halchenko<br class="">Center for Open Neuroscience <a href="http://centerforopenneuroscience.org" class="">http://centerforopenneuroscience.org</a><br class="">Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755<br class="">Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419<br class="">WWW: <a href="http://www.linkedin.com/in/yarik" class="">http://www.linkedin.com/in/yarik</a> <br class="">-- <br class="">File mailing list<br class=""><a href="mailto:File@astron.com" class="">File@astron.com</a><br class="">https://mailman.astron.com/mailman/listinfo/file<br class=""></div></div></blockquote></div><br class=""></div></body></html>