[File] Magic for Stata data files

Christos Zoulas christos at zoulas.com
Thu Oct 8 17:52:06 UTC 2020


Added, thanks!

christos

> On Oct 7, 2020, at 7:42 PM, Rémi Rampin <remirampin at gmail.com> wrote:
> 
> Hi,
> 
> Stata is a statistical software tool that was created in 1985. While I
> don't personally use it, data files in its native (proprietary) format
> are common (.dta files).
> 
> Because they are so common, especially in statistical and social
> sciences, Stata files and SPSS files can be opened by a lot of modern
> software, for example Python's pandas package provides built-in
> support for them (read_stata() and read_spss()).
> 
> I noticed that the magic database includes an entry for SPSS files but
> not Stata files. Stata files for Stata 13 and newer (formats 117, 118,
> and 119) always begin with the string "<stata_dta><header>" as per
> https://www.stata.com/help.cgi?dta#definition
> 
> The format version number always follows, for example:
>    <stata_dta><header><release>117</release>
>    <stata_dta><header><release>118</release>
> 
> Therefore the following line would do the trick:
>    0       string  <stata_dta><header>     Stata Data File
> 
> (I'm sure the version number could be captured as well but I did not
> manage this without a regex)
> 
> Unfortunately the previous formats (created by Stata before 13, which
> was released 2013) are harder to recognize. Format 115 starts with the
> four bytes 0x73010100 or 0x73020100, format 114 with 0x72010100 or
> 0x72020100, format 113 with 0x71010101 or 0x71020101.
> 
> For additional reference, the Library of Congress website has an entry
> for the Stata Data File Format 118:
> https://www.loc.gov/preservation/digital/formats/fdd/fdd000471.shtml
> 
> Example of those files can be found on Zenodo:
> https://zenodo.org/search?page=1&size=20&q=&file_type=dta
> 
> Best regards
> --
> Rémi Rampin
> Research Engineer, New York University
> --
> File mailing list
> File at astron.com
> https://mailman.astron.com/mailman/listinfo/file

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman.astron.com/pipermail/file/attachments/20201008/971d7156/attachment.asc>


More information about the File mailing list