CSV / Delimited Text

Comma or tab-separated values

Supported

How to Prepare Your Files

Prepare one or more .csv or .tsv files
If you have multiple files, create a ZIP archive containing them
First row of each file should contain column names
Upload the .csv/.tsv file or ZIP archive below

Guide

CSV (Comma-Separated Values) and other delimited text files are simple, widely-used formats for tabular data.

What You Can Upload

.csv files (comma-separated)
.tsv files (tab-separated)
Other delimited text files (semicolon, pipe, etc.)
ZIP archives with multiple files

What You Get Out

DataMeans extracts your data into multiple modern formats:

Output	Description
`csv/{TableName}.csv`	One CSV file per table with all row data
`xlsx/{TableName}.xlsx`	Excel workbook per table
`xls/{TableName}.xls`	Legacy Excel format per table
`json/{TableName}.json`	JSON array of records per table
`json/{TableName}.jsonl`	Newline-delimited JSON (streaming-friendly)
`postgres.sql`	PostgreSQL CREATE TABLE + INSERT statements
`schema/schema-graph.json`	Relationship graph for visualization
`schema/er-model.json`	ER model for diagram tools
`report.json`	Structured extraction report
`report.md`	Human-readable extraction summary

How to Export / Obtain Files

Most applications support CSV export:

Excel/Sheets: File > Save As > CSV
Databases: Export to CSV option
Applications: Look for "Export Data" feature

Supported Features

Multiple delimiter detection (comma, tab, semicolon, pipe)
Character encoding detection (UTF-8, Latin1, Windows-1252)
Automatic header row detection
Type inference (numbers, dates, text)

Known Limitations

No built-in type system (types inferred from data)
No nested/hierarchical data support
Quote escaping varies between implementations
Date format detection may need verification

Best Practices

Always include a header row
Use UTF-8 encoding when possible
Quote fields containing delimiters or newlines
Use consistent date formats within columns

Last updated: January 2026

Technical reference

Overview

CSV (Comma-Separated Values) is a simple text format for tabular data exchange, using delimiters to separate fields and line breaks to separate records. Documented by RFC 4180, which also registers the text/csv MIME type, CSV files are widely used for data import/export between applications due to their simplicity and universal support. While not a database format per se, CSV serves as a common interchange format for relational data, with variations in delimiters, quoting, and escaping across implementations.

History and Background

1972: IBM's Fortran (level H extended) compiler under OS/360 supports list-directed input/output with commas between values.
1978: ANSI approves FORTRAN 77, which formally standardises list-directed I/O using * as the format specifier; commas or spaces separate values, and quoted character strings may contain commas.
1983: The manual for the Osborne Executive computer documents the CSV quoting convention of the bundled SuperCalc spreadsheet.
2005: RFC 4180 published through the Internet Engineering Task Force (IETF), documenting the common format and registering the text/csv MIME type.
2013: W3C charters the CSV on the Web Working Group on 10 December; the group closes on 29 February 2016 after delivering its recommendations.
2014: RFC 7111 defines URI fragment identifiers (row, column, cell) for the text/csv media type.
2015: W3C publishes the CSV on the Web recommendations, including the Model for Tabular Data and Metadata on the Web, in December.
Present: Ubiquitous format supported by virtually all major applications and programming languages.

File Format Specifications

CSV files are plain text with specific structural rules.

Basic Structure:

Records: Lines separated by line breaks (CRLF per RFC 4180; LF widely accepted in practice)
Final record: The last record in the file may or may not have an ending line break (RFC 4180)
Fields: Values within records separated by delimiters (comma by default)
Headers: Optional first record with column names
Encoding: UTF-8 common today (RFC 4180 cites US-ASCII as common usage); a BOM is sometimes used to signal Unicode

Delimiters and Quoting:

Field delimiter: Comma (,), semicolon (;), tab (\t), pipe (|)
Quote character: Double quote (") for fields containing delimiters, quotes, or line breaks

Field Encapsulation:

Fields containing commas, quotes, or line breaks must be enclosed in double quotes
Double quotes within fields are escaped by doubling ("")
Leading/trailing whitespace preserved unless trimmed by parser
The W3C tabular data model strips leading and trailing whitespace when parsing cells whose datatype is not a string type

Variations:

Delimiter: Tab (\t) for TSV, semicolon (;) in locales where comma is the decimal separator, pipe (|)
Quote character: Single quotes (') sometimes used
Escape sequences: Backslash escaping in some implementations

Key Specifications:

Maximum field length: Unlimited (implementation-dependent)
Maximum record length: Unlimited
Maximum file size: Limited by filesystem and application
Character encoding: UTF-8 recommended, but varies; the W3C tabular data model further advises Unicode Normalization Form C
MIME type parameters: charset selects the character set; header takes the values present or absent (RFC 4180); when charset is absent, UTF-8 should be assumed (per the IANA text/csv registration, last updated 2014-01-17)
CSVW datatype aliases: number maps to xsd:double, binary maps to xsd:base64Binary, datetime maps to xsd:dateTime, and any maps to xsd:anyAtomicType (W3C CSVW Metadata Vocabulary)

Data Types and Structures

CSV has no native data types - all fields are stored as strings. Data typing is determined by the importing application.

Interpreted Type	Example	Notes
String/Text	"Hello World"	Default type for all fields
Integer	123	Parsed by applications
Float/Decimal	123.45	Locale-dependent decimal separator
Date	2023-12-25	Various formats (ISO, US, European)
Time	14:30:00	`xsd:time` under the W3C tabular data model
Datetime	2023-12-25T14:30:00	`xsd:dateTime` under the W3C tabular data model
Duration	PT2H30M	`xsd:duration` under the W3C tabular data model
Boolean	true, false, 1, 0	Application-specific
Null/Empty	"" or missing	Empty fields represent null values
anyURI	https://example.com	`xsd:anyURI` under the W3C CSVW metadata vocabulary
Binary (hex)	48656C6C6F	`xsd:hexBinary` under the W3C CSVW metadata vocabulary
Binary (base64)	SGVsbG8=	`xsd:base64Binary`; aliased as `binary` in the W3C CSVW metadata vocabulary

File Structure:

Optional header row with column names
Data rows with equal number of fields
Trailing empty lines ignored
Comments not standardized (some use # prefix)

Version Differences

Variant	Year	Key Characteristics	Compatibility
RFC 4180	2005	Comma delimiter, CRLF record breaks, optional header line, quotes escaped by doubling	Registers the text/csv MIME type with IANA
TSV (tab-separated values)	1993	Tab delimiter; header line required; tabs not allowed within fields, so no quoting mechanism	IANA media type text/tab-separated-values
RFC 7111	2014	URI fragment identifiers for rows, columns, and cells (row=, col=, cell=)	Updates the text/csv media type registration of RFC 4180
W3C CSV on the Web	2015	JSON metadata describing dialect, column datatypes, and annotations for CSV tables	W3C Recommendation (17 December 2015); parsing model based on RFC 4180
CSV Schema (The National Archives, UK)	2014	Textual language defining the structure, datatypes, and validation rules for CSV data; version 1.0 published 28 August 2014, revised as version 1.1	No format change — external schema used to validate `.csv` files

Compatibility Notes:

No formal versioning - dialects vary by application
Character encoding differences cause issues (UTF-8 vs ANSI)
Line ending differences (CRLF vs LF) between Windows/Unix
Quote handling varies (required vs optional)
Field count mismatches common error
Spreadsheet grid limits constrain imports: an Excel worksheet holds at most 1,048,576 rows by 16,384 columns, with up to 32,767 characters per cell
CSV injection (formula injection): fields beginning with =, +, -, @, tab, carriage return, or line feed may be interpreted as formulas by spreadsheet software
TSV constraint: the IANA text/tab-separated-values registration explicitly prohibits tab characters within field values, so TSV has no quoting mechanism for tabs; CSV is preferred when field data may contain tab characters

Technical References

To learn how to use this format with DataMeans, see the User Guide.

Convert a CSV / Delimited Text file