All systems

CSV / Delimited Text

Comma or tab-separated values
Supported

How to Prepare Your Files

  1. Prepare one or more .csv or .tsv files
  2. If you have multiple files, create a ZIP archive containing them
  3. First row of each file should contain column names
  4. Upload the .csv/.tsv file or ZIP archive below
Guide

CSV (Comma-Separated Values) and other delimited text files are simple, widely-used formats for tabular data.

What You Can Upload

  • .csv files (comma-separated)
  • .tsv files (tab-separated)
  • Other delimited text files (semicolon, pipe, etc.)
  • ZIP archives with multiple files

What You Get Out

DataMeans extracts your data into multiple modern formats:

OutputDescription
csv/{TableName}.csvOne CSV file per table with all row data
xlsx/{TableName}.xlsxExcel workbook per table
xls/{TableName}.xlsLegacy Excel format per table
json/{TableName}.jsonJSON array of records per table
json/{TableName}.jsonlNewline-delimited JSON (streaming-friendly)
postgres.sqlPostgreSQL CREATE TABLE + INSERT statements
schema/schema-graph.jsonRelationship graph for visualization
schema/er-model.jsonER model for diagram tools
report.jsonStructured extraction report
report.mdHuman-readable extraction summary

How to Export / Obtain Files

Most applications support CSV export:

  • Excel/Sheets: File > Save As > CSV
  • Databases: Export to CSV option
  • Applications: Look for "Export Data" feature

Supported Features

  • Multiple delimiter detection (comma, tab, semicolon, pipe)
  • Character encoding detection (UTF-8, Latin1, Windows-1252)
  • Automatic header row detection
  • Type inference (numbers, dates, text)

Known Limitations

  • No built-in type system (types inferred from data)
  • No nested/hierarchical data support
  • Quote escaping varies between implementations
  • Date format detection may need verification

Best Practices

  • Always include a header row
  • Use UTF-8 encoding when possible
  • Quote fields containing delimiters or newlines
  • Use consistent date formats within columns

Last updated: January 2026

Technical reference

Overview

CSV (Comma-Separated Values) is a simple text format for tabular data exchange, using delimiters to separate fields and line breaks to separate records. Documented by RFC 4180, which also registers the text/csv MIME type, CSV files are widely used for data import/export between applications due to their simplicity and universal support. While not a database format per se, CSV serves as a common interchange format for relational data, with variations in delimiters, quoting, and escaping across implementations.

History and Background

  • 1972: IBM's Fortran (level H extended) compiler under OS/360 supports list-directed input/output with commas between values.
  • 1978: ANSI approves FORTRAN 77, which formally standardises list-directed I/O using * as the format specifier; commas or spaces separate values, and quoted character strings may contain commas.
  • 1983: The manual for the Osborne Executive computer documents the CSV quoting convention of the bundled SuperCalc spreadsheet.
  • 2005: RFC 4180 published through the Internet Engineering Task Force (IETF), documenting the common format and registering the text/csv MIME type.
  • 2013: W3C charters the CSV on the Web Working Group on 10 December; the group closes on 29 February 2016 after delivering its recommendations.
  • 2014: RFC 7111 defines URI fragment identifiers (row, column, cell) for the text/csv media type.
  • 2015: W3C publishes the CSV on the Web recommendations, including the Model for Tabular Data and Metadata on the Web, in December.
  • Present: Ubiquitous format supported by virtually all major applications and programming languages.

File Format Specifications

CSV files are plain text with specific structural rules.

Basic Structure:

  • Records: Lines separated by line breaks (CRLF per RFC 4180; LF widely accepted in practice)
  • Final record: The last record in the file may or may not have an ending line break (RFC 4180)
  • Fields: Values within records separated by delimiters (comma by default)
  • Headers: Optional first record with column names
  • Encoding: UTF-8 common today (RFC 4180 cites US-ASCII as common usage); a BOM is sometimes used to signal Unicode

Delimiters and Quoting:

  • Field delimiter: Comma (,), semicolon (;), tab (\t), pipe (|)
  • Quote character: Double quote (") for fields containing delimiters, quotes, or line breaks

Field Encapsulation:

  • Fields containing commas, quotes, or line breaks must be enclosed in double quotes
  • Double quotes within fields are escaped by doubling ("")
  • Leading/trailing whitespace preserved unless trimmed by parser
  • The W3C tabular data model strips leading and trailing whitespace when parsing cells whose datatype is not a string type

Variations:

  • Delimiter: Tab (\t) for TSV, semicolon (;) in locales where comma is the decimal separator, pipe (|)
  • Quote character: Single quotes (') sometimes used
  • Escape sequences: Backslash escaping in some implementations

Key Specifications:

  • Maximum field length: Unlimited (implementation-dependent)
  • Maximum record length: Unlimited
  • Maximum file size: Limited by filesystem and application
  • Character encoding: UTF-8 recommended, but varies; the W3C tabular data model further advises Unicode Normalization Form C
  • MIME type parameters: charset selects the character set; header takes the values present or absent (RFC 4180); when charset is absent, UTF-8 should be assumed (per the IANA text/csv registration, last updated 2014-01-17)
  • CSVW datatype aliases: number maps to xsd:double, binary maps to xsd:base64Binary, datetime maps to xsd:dateTime, and any maps to xsd:anyAtomicType (W3C CSVW Metadata Vocabulary)

Data Types and Structures

CSV has no native data types - all fields are stored as strings. Data typing is determined by the importing application.

Interpreted TypeExampleNotes
String/Text"Hello World"Default type for all fields
Integer123Parsed by applications
Float/Decimal123.45Locale-dependent decimal separator
Date2023-12-25Various formats (ISO, US, European)
Time14:30:00xsd:time under the W3C tabular data model
Datetime2023-12-25T14:30:00xsd:dateTime under the W3C tabular data model
DurationPT2H30Mxsd:duration under the W3C tabular data model
Booleantrue, false, 1, 0Application-specific
Null/Empty"" or missingEmpty fields represent null values
anyURIhttps://example.comxsd:anyURI under the W3C CSVW metadata vocabulary
Binary (hex)48656C6C6Fxsd:hexBinary under the W3C CSVW metadata vocabulary
Binary (base64)SGVsbG8=xsd:base64Binary; aliased as binary in the W3C CSVW metadata vocabulary

File Structure:

  • Optional header row with column names
  • Data rows with equal number of fields
  • Trailing empty lines ignored
  • Comments not standardized (some use # prefix)

Version Differences

VariantYearKey CharacteristicsCompatibility
RFC 41802005Comma delimiter, CRLF record breaks, optional header line, quotes escaped by doublingRegisters the text/csv MIME type with IANA
TSV (tab-separated values)1993Tab delimiter; header line required; tabs not allowed within fields, so no quoting mechanismIANA media type text/tab-separated-values
RFC 71112014URI fragment identifiers for rows, columns, and cells (row=, col=, cell=)Updates the text/csv media type registration of RFC 4180
W3C CSV on the Web2015JSON metadata describing dialect, column datatypes, and annotations for CSV tablesW3C Recommendation (17 December 2015); parsing model based on RFC 4180
CSV Schema (The National Archives, UK)2014Textual language defining the structure, datatypes, and validation rules for CSV data; version 1.0 published 28 August 2014, revised as version 1.1No format change — external schema used to validate .csv files

Compatibility Notes:

  • No formal versioning - dialects vary by application
  • Character encoding differences cause issues (UTF-8 vs ANSI)
  • Line ending differences (CRLF vs LF) between Windows/Unix
  • Quote handling varies (required vs optional)
  • Field count mismatches common error
  • Spreadsheet grid limits constrain imports: an Excel worksheet holds at most 1,048,576 rows by 16,384 columns, with up to 32,767 characters per cell
  • CSV injection (formula injection): fields beginning with =, +, -, @, tab, carriage return, or line feed may be interpreted as formulas by spreadsheet software
  • TSV constraint: the IANA text/tab-separated-values registration explicitly prohibits tab characters within field values, so TSV has no quoting mechanism for tabs; CSV is preferred when field data may contain tab characters

Technical References


To learn how to use this format with DataMeans, see the User Guide.