Supported Formats
- SPSS: .sav files (IBM SPSS Statistics)
- Stata: .dta files (Versions 8-18)
- SAS Transport: .xpt files (V5/V8)
- SAS Native: .sas7bdat files (uncompressed or RLE)
- Upload a single file or ZIP with multiple files
- Note: Encrypted files and .zsav/.por not supported
Statistical software databases store research and analytical data with specialized metadata like variable labels, missing value codes, and data weights.
What You Can Upload
.savfiles (SPSS).dtafiles (Stata).sas7bdatfiles (SAS datasets).xptfiles (SAS transport format)- ZIP archives with multiple files
What You Get Out
DataMeans extracts your data into multiple modern formats:
| Output | Description |
|---|---|
csv/{TableName}.csv | One CSV file per table with all row data |
xlsx/{TableName}.xlsx | Excel workbook per table |
xls/{TableName}.xls | Legacy Excel format per table |
json/{TableName}.json | JSON array of records per table |
json/{TableName}.jsonl | Newline-delimited JSON (streaming-friendly) |
postgres.sql | PostgreSQL CREATE TABLE + INSERT statements |
schema/schema-graph.json | Relationship graph for visualization |
schema/er-model.json | ER model for diagram tools |
report.json | Structured extraction report |
report.md | Human-readable extraction summary |
How to Export / Obtain Files
No special export needed - upload files directly:
- Locate your statistical data file
- Upload as-is or in a ZIP archive
Alternative exports from statistical software:
- SPSS: File > Save As > SPSS Statistics (
.sav) - Stata:
export delimited using "filename.csv"(or save the dataset as.dta) - SAS: PROC EXPORT or save as
.sas7bdat
Supported Features
- Complete record extraction
- Variable label preservation
- Value labels (factor levels)
- Missing value codes
- Data type detection
- Survey weights (documented)
Known Limitations
- Complex survey designs documented but may need review
- Some version-specific format variations
- Very large datasets may need chunked processing
Last updated: January 2026
Overview
Statistical data files are proprietary formats used by statistical analysis software packages like SPSS, Stata, and SAS. These formats store datasets with variables, observations, and metadata optimized for statistical computing. They support complex data types, missing values, and metadata structures designed for quantitative research, data analysis, and statistical modeling in social sciences, economics, and scientific research.
History and Background
- 1960s: Early statistical software development (SPSS first released in 1968).
- 1970s: SPSS Inc. incorporated (1975); SAS Institute founded (1976).
- 1980s: Stata first released for MS-DOS (1985); SAS rewritten in C (1985), enabling PC versions.
- 1990s: Stata Corporation formed (1993); SAS 7 introduces the
.sas7bdatformat (1998). - 2000s: Open-source alternatives emerge (R 1.0.0 released in 2000), but proprietary formats persist.
- 2010s: Stata 13 adds strL long strings (2013); SAS 9.4 adds extended attributes (2013).
- Present: Open-source libraries such as pandas and haven read all three formats outside their native packages.
Stata:
- 1985: Stata developed by William Gould at Computing Resource Center, first released for MS-DOS.
- 1993: Company renamed Stata Corporation (now StataCorp) and relocated to College Station, Texas.
- 2005: Stata 9 introduces the Mata matrix programming language.
- 2009: Stata 11 adds factor variables and multiple imputation.
- 2015: Stata 14 introduces Unicode support and Bayesian analysis.
- 2017: Stata 15 introduces format 119, enabling datasets with more than 32,767 variables (Stata/MP limit).
- 2019: Stata 16 adds Python integration.
File Format Specifications
SPSS (.sav format):
- Proprietary binary format with SPSS-specific record structure
- File header contains metadata (case count, creation date and time, file label)
- Data stored in rectangular format as fixed 8-byte elements
- Variable dictionary includes names, types, labels, and missing value definitions
- File signature at offset 0:
$FL2(uncompressed or bytecode-compressed files) or$FL3(ZLIB-compressed files) - Optional bytecode or ZLIB compression, indicated by a header flag
- File extension:
.sav - Maximum variables: 2,147,483,647 (32,767 before version 10)
- Maximum cases: 2,147,483,647
SAS (.sas7bdat format):
- Proprietary binary format introduced in SAS 7
- File consists of a header (dataset name, timestamps, page size and count) followed by data pages
- Optional RLE (COMPRESS=CHAR) or RDC (COMPRESS=BINARY) compression
- Column names, labels, and display formats stored in metadata subheaders
- File extension:
.sas7bdat - Maximum observations: Limited by disk space
Stata (.dta format):
- Binary format with version-specific structure
- Header contains dataset metadata (variables, observations, labels)
- Data stored observation by observation (row-major order); format 114 (Stata 10–12) limits fixed strings to 244 bytes
- Supports value labels and variable characteristics
- File extension:
.dta(format version recorded in the file header) - Maximum variables: Version-dependent (up to 120,000 in Stata/MP)
- Maximum observations: Limited by memory
Data Types and Structures
| Type | SPSS | SAS | Stata | Description |
|---|---|---|---|---|
| Numeric | 8-byte float | Float, 3-8 bytes (default 8) | 4-byte float, 8-byte double | Real numbers with decimal precision |
| Integer | Stored as 8-byte float | Stored as floating point | 1-, 2-, and 4-byte (byte, int, long) | Whole numbers |
| String | Fixed width, up to 32,767 bytes | Fixed width, up to 32,767 bytes | str1-str2045 fixed; strL up to 2 GB | Text data |
| Date | Seconds since October 14, 1582 | Days since January 1, 1960 | Days since January 1, 1960 | Date values |
| Time | Seconds | Seconds (datetime: seconds since January 1, 1960) | Milliseconds since January 1, 1960 (datetime) | Time values |
| Boolean | 0/1 numeric | 0/1 numeric | 0/1 numeric | True/false values (no native type) |
| Missing | System and user-defined | ., ._, and .A-.Z | . and .a-.z | Null/unknown values |
Common Structures:
- Variables: Columns with names, types, and metadata
- Observations: Rows representing data records
- Value Labels: Coded categorical data with text descriptions
- Variable Labels: Descriptive names for variables
- Formats: Display formatting for output
- Weights: Case weighting for analysis
Version Differences
| Software | Version | Year | Key Format Changes | Compatibility |
|---|---|---|---|---|
| SPSS | 12.0 | 2004 | Variable names up to 64 bytes | Long names stored in extension record; 8-byte short names retained |
| SPSS | 16.0 | 2007 | Unicode (UTF-8) mode | Unicode files unreadable before version 16 |
| SAS | 7 | 1998 | New .sas7bdat format; 32-character variable names | Replaces Version 6 data formats |
| SAS | 9.4 | 2013 | Extended attributes for data sets | Data sets with extended attributes unreadable before 9.4 |
| Stata | 8 | 2003 | New format 113; extended missing values (.a-.z) | Not readable by earlier Stata versions |
| Stata | 13 | 2013 | New format 117; strL strings up to 2 billion bytes | Requires Stata 13 or later |
| Stata | 14 | 2015 | New format 118; UTF-8 string encoding | Requires Stata 14 or later |
| Stata | 15 | 2017 | New format 119; supports datasets with more than 32,767 variables | Stata/SE cannot read format 119; use only when variable count exceeds format 118 capacity |
Compatibility Notes:
- Newer SPSS versions read
.savfiles written by earlier versions - SAS
.sas7bdatrequires SAS 7 or later - Older Stata versions cannot read
.dtafiles written in newer formats - All formats include version information in headers
- Third-party tools may have limited version support
Technical References
- PSPP Developers Guide: System File Format
- Stata Help: Description of the .dta File Format
- SAS7BDAT Database Binary Format
- SAS 9.4 Data Set Options Reference: COMPRESS=
- Stata Manual: Datetime Values and Variables
To learn how to use this format with DataMeans, see the User Guide.