All systems

Statistical Files (SPSS, Stata, SAS)

SPSS, Stata, and SAS data files
Supported

Supported Formats

  1. SPSS: .sav files (IBM SPSS Statistics)
  2. Stata: .dta files (Versions 8-18)
  3. SAS Transport: .xpt files (V5/V8)
  4. SAS Native: .sas7bdat files (uncompressed or RLE)
  5. Upload a single file or ZIP with multiple files
  6. Note: Encrypted files and .zsav/.por not supported
Guide

Statistical software databases store research and analytical data with specialized metadata like variable labels, missing value codes, and data weights.

What You Can Upload

  • .sav files (SPSS)
  • .dta files (Stata)
  • .sas7bdat files (SAS datasets)
  • .xpt files (SAS transport format)
  • ZIP archives with multiple files

What You Get Out

DataMeans extracts your data into multiple modern formats:

OutputDescription
csv/{TableName}.csvOne CSV file per table with all row data
xlsx/{TableName}.xlsxExcel workbook per table
xls/{TableName}.xlsLegacy Excel format per table
json/{TableName}.jsonJSON array of records per table
json/{TableName}.jsonlNewline-delimited JSON (streaming-friendly)
postgres.sqlPostgreSQL CREATE TABLE + INSERT statements
schema/schema-graph.jsonRelationship graph for visualization
schema/er-model.jsonER model for diagram tools
report.jsonStructured extraction report
report.mdHuman-readable extraction summary

How to Export / Obtain Files

No special export needed - upload files directly:

  1. Locate your statistical data file
  2. Upload as-is or in a ZIP archive

Alternative exports from statistical software:

  • SPSS: File > Save As > SPSS Statistics (.sav)
  • Stata: export delimited using "filename.csv" (or save the dataset as .dta)
  • SAS: PROC EXPORT or save as .sas7bdat

Supported Features

  • Complete record extraction
  • Variable label preservation
  • Value labels (factor levels)
  • Missing value codes
  • Data type detection
  • Survey weights (documented)

Known Limitations

  • Complex survey designs documented but may need review
  • Some version-specific format variations
  • Very large datasets may need chunked processing

Last updated: January 2026

Technical reference

Overview

Statistical data files are proprietary formats used by statistical analysis software packages like SPSS, Stata, and SAS. These formats store datasets with variables, observations, and metadata optimized for statistical computing. They support complex data types, missing values, and metadata structures designed for quantitative research, data analysis, and statistical modeling in social sciences, economics, and scientific research.

History and Background

  • 1960s: Early statistical software development (SPSS first released in 1968).
  • 1970s: SPSS Inc. incorporated (1975); SAS Institute founded (1976).
  • 1980s: Stata first released for MS-DOS (1985); SAS rewritten in C (1985), enabling PC versions.
  • 1990s: Stata Corporation formed (1993); SAS 7 introduces the .sas7bdat format (1998).
  • 2000s: Open-source alternatives emerge (R 1.0.0 released in 2000), but proprietary formats persist.
  • 2010s: Stata 13 adds strL long strings (2013); SAS 9.4 adds extended attributes (2013).
  • Present: Open-source libraries such as pandas and haven read all three formats outside their native packages.

Stata:

  • 1985: Stata developed by William Gould at Computing Resource Center, first released for MS-DOS.
  • 1993: Company renamed Stata Corporation (now StataCorp) and relocated to College Station, Texas.
  • 2005: Stata 9 introduces the Mata matrix programming language.
  • 2009: Stata 11 adds factor variables and multiple imputation.
  • 2015: Stata 14 introduces Unicode support and Bayesian analysis.
  • 2017: Stata 15 introduces format 119, enabling datasets with more than 32,767 variables (Stata/MP limit).
  • 2019: Stata 16 adds Python integration.

File Format Specifications

SPSS (.sav format):

  • Proprietary binary format with SPSS-specific record structure
  • File header contains metadata (case count, creation date and time, file label)
  • Data stored in rectangular format as fixed 8-byte elements
  • Variable dictionary includes names, types, labels, and missing value definitions
  • File signature at offset 0: $FL2 (uncompressed or bytecode-compressed files) or $FL3 (ZLIB-compressed files)
  • Optional bytecode or ZLIB compression, indicated by a header flag
  • File extension: .sav
  • Maximum variables: 2,147,483,647 (32,767 before version 10)
  • Maximum cases: 2,147,483,647

SAS (.sas7bdat format):

  • Proprietary binary format introduced in SAS 7
  • File consists of a header (dataset name, timestamps, page size and count) followed by data pages
  • Optional RLE (COMPRESS=CHAR) or RDC (COMPRESS=BINARY) compression
  • Column names, labels, and display formats stored in metadata subheaders
  • File extension: .sas7bdat
  • Maximum observations: Limited by disk space

Stata (.dta format):

  • Binary format with version-specific structure
  • Header contains dataset metadata (variables, observations, labels)
  • Data stored observation by observation (row-major order); format 114 (Stata 10–12) limits fixed strings to 244 bytes
  • Supports value labels and variable characteristics
  • File extension: .dta (format version recorded in the file header)
  • Maximum variables: Version-dependent (up to 120,000 in Stata/MP)
  • Maximum observations: Limited by memory

Data Types and Structures

TypeSPSSSASStataDescription
Numeric8-byte floatFloat, 3-8 bytes (default 8)4-byte float, 8-byte doubleReal numbers with decimal precision
IntegerStored as 8-byte floatStored as floating point1-, 2-, and 4-byte (byte, int, long)Whole numbers
StringFixed width, up to 32,767 bytesFixed width, up to 32,767 bytesstr1-str2045 fixed; strL up to 2 GBText data
DateSeconds since October 14, 1582Days since January 1, 1960Days since January 1, 1960Date values
TimeSecondsSeconds (datetime: seconds since January 1, 1960)Milliseconds since January 1, 1960 (datetime)Time values
Boolean0/1 numeric0/1 numeric0/1 numericTrue/false values (no native type)
MissingSystem and user-defined., ._, and .A-.Z. and .a-.zNull/unknown values

Common Structures:

  • Variables: Columns with names, types, and metadata
  • Observations: Rows representing data records
  • Value Labels: Coded categorical data with text descriptions
  • Variable Labels: Descriptive names for variables
  • Formats: Display formatting for output
  • Weights: Case weighting for analysis

Version Differences

SoftwareVersionYearKey Format ChangesCompatibility
SPSS12.02004Variable names up to 64 bytesLong names stored in extension record; 8-byte short names retained
SPSS16.02007Unicode (UTF-8) modeUnicode files unreadable before version 16
SAS71998New .sas7bdat format; 32-character variable namesReplaces Version 6 data formats
SAS9.42013Extended attributes for data setsData sets with extended attributes unreadable before 9.4
Stata82003New format 113; extended missing values (.a-.z)Not readable by earlier Stata versions
Stata132013New format 117; strL strings up to 2 billion bytesRequires Stata 13 or later
Stata142015New format 118; UTF-8 string encodingRequires Stata 14 or later
Stata152017New format 119; supports datasets with more than 32,767 variablesStata/SE cannot read format 119; use only when variable count exceeds format 118 capacity

Compatibility Notes:

  • Newer SPSS versions read .sav files written by earlier versions
  • SAS .sas7bdat requires SAS 7 or later
  • Older Stata versions cannot read .dta files written in newer formats
  • All formats include version information in headers
  • Third-party tools may have limited version support

Technical References


To learn how to use this format with DataMeans, see the User Guide.