All systems

COBOL Data Files

COBOL data files
Supported

Prepare

  1. Upload COBOL data
Guide

COBOL (Common Business-Oriented Language) has been the backbone of enterprise computing for decades. COBOL applications use various file formats including sequential, indexed (ISAM), and relative files.

What You Can Upload

  • VSAM (Virtual Storage Access Method) files
  • ISAM (Indexed Sequential Access Method) files
  • Sequential files with COBOL copybook definitions
  • Relative files
  • ZIP archive with data files and copybooks

What You Get Out

DataMeans extracts your data into multiple modern formats:

OutputDescription
csv/{TableName}.csvOne CSV file per table with all row data
xlsx/{TableName}.xlsxExcel workbook per table
xls/{TableName}.xlsLegacy Excel format per table
json/{TableName}.jsonJSON array of records per table
json/{TableName}.jsonlNewline-delimited JSON (streaming-friendly)
postgres.sqlPostgreSQL CREATE TABLE + INSERT statements
schema/schema-graph.jsonRelationship graph for visualization
schema/er-model.jsonER model for diagram tools
report.jsonStructured extraction report
report.mdHuman-readable extraction summary

How to Export / Obtain Files

  1. Identify your COBOL data files on the mainframe/server
  2. Export or FTP the binary data files
  3. Locate corresponding copybook definitions (.cpy, .cob files)
  4. Create a ZIP with data files and copybooks together
  5. Upload the ZIP to DataMeans

Supported Features

  • Copybook parsing for field definitions
  • COMP, COMP-3 (packed decimal) field conversion
  • REDEFINES clause handling
  • OCCURS clause (array) normalization
  • Automatic EBCDIC to ASCII conversion
  • Numeric precision preservation

Known Limitations

  • Copybook definitions required for accurate field interpretation
  • Complex nested REDEFINES may require manual review
  • Variable-length records need length indicators

Troubleshooting

IssueSolution
Fields misalignedVerify copybook matches data file layout
Garbled charactersCheck EBCDIC vs ASCII encoding settings
Numeric errorsConfirm COMP-3 vs DISPLAY field types

Last updated: January 2026

Technical reference

Overview

COBOL (Common Business-Oriented Language) defines file organizations and data structures for business data processing, supporting sequential, indexed, and relative file access methods. COBOL data files use record-based structures with hierarchical field definitions, designed for batch processing and transaction systems in mainframe and enterprise environments. Unlike modern databases, COBOL files emphasize fixed-length records and efficient sequential processing.

History and Background

  • 1959: CODASYL committee begins COBOL development.
  • 1960: COBOL 60 first specification released.
  • 1961: COBOL 61 revision addresses flaws in the original specification.
  • 1963: COBOL-61 Extended adds the sort and Report Writer facilities.
  • 1965: COBOL, Edition 1965 adds mass storage file handling and table facilities.
  • 1968: First ANSI COBOL standard (COBOL 68, ANSI X3.23-1968).
  • 1974: COBOL 74 adds indexed and relative file organizations, the DELETE statement, and the segmentation module.
  • 1985: COBOL 85 adds structured programming features: scope terminators, EVALUATE, inline PERFORM, and nested subprograms.
  • 1989: ANSI X3.23a-1989 amendment adds the intrinsic function module.
  • 2002: COBOL 2002 (ISO/IEC 1989:2002) adds object orientation, Unicode support, and bit/Boolean data types.
  • 2014: COBOL 2014 adds IEEE 754 arithmetic data types, dynamic-capacity tables, and method overloading.
  • 2023: COBOL 2023 adds transaction processing (COMMIT/ROLLBACK), asynchronous messaging (SEND/RECEIVE), and the LINE SEQUENTIAL file organization.

File Format Specifications

COBOL defines three primary file organizations implemented through various access methods.

File Organizations:

  • Sequential: Records stored in order written, accessed sequentially.
  • Indexed: Key-based access with ISAM-like indexing.
  • Relative: Direct access by record number (RRDS-like).
  • Line Sequential: Text-line records delimited by line terminators - a line feed (hex 0A) on UNIX-type GnuCOBOL builds, carriage-return/line-feed on native Windows; standardized in COBOL 2023.

File Extensions:

  • No standard extensions - platform-dependent
  • Mainframe: z/OS datasets use dataset names without file extensions
  • PC compilers: commonly .dat for data, with a separate .idx index file for Micro Focus indexed files

File Structure:

  • Records: Fixed or variable-length data structures; z/OS QSAM recording modes are F (fixed), V (variable), U (one record per block), and S (spanned)
  • Descriptors: Recording mode V prefixes each record with a 4-byte record-length field and each block with a 4-byte block descriptor; mode S records span blocks, with a segment descriptor on each segment
  • Fields: Defined by PICTURE clauses and hierarchical levels
  • Keys: Primary and alternate record keys for indexed files; VSAM limits key length to 255 bytes
  • Headers: Implementation-specific; e.g., GnuCOBOL relative files prefix each fixed-length record with a 4-byte header holding the record's data length
  • Limits: QSAM records hold up to 32,760 bytes; block size may reach 2,147,483,647 bytes with large block interface support (DFSMS 2.10 or later)

Data Types and Structures

COBOL data types are defined by PICTURE (PIC) clauses rather than explicit type names:

PIC ClauseTypeDescriptionExample
PIC 9NumericDigits onlyPIC 9(5) - 5-digit number
PIC XAlphanumericAny charactersPIC X(10) - 10-character string
PIC AAlphabeticLetters onlyPIC A(5) - 5-letter word
PIC S9Signed numericPositive/negativePIC S9(4)V99 - signed decimal
PIC 9V9DecimalImplied decimal pointPIC 9(3)V99 - 123.45
PIC 1BooleanBoolean data (COBOL 2002+)PIC 1 USAGE BIT - one bit
PIC ZNumeric-editedLeading zeros replaced by spacesPIC ZZZ9 - blanks leading zeros
PIC PScaled numericAssumed decimal scaling position; not counted in item sizePIC 9(3)P(3) - scales by 1000
PIC 9 COMPBinaryTwo's complement; 2, 4, or 8 bytes for 1-4, 5-9, or 10-18 digitsPIC S9(4) COMP - 2 bytes
PIC 9 COMP-3Packed decimalTwo digits per byte; trailing half-byte holds the signPIC S9(7) COMP-3 - 4 bytes
PIC 9 COMP-5Native binaryUses full capacity of its 2-, 4-, or 8-byte binary fieldPIC S9(4) COMP-5 - 2 bytes
USAGE COMP-1Floating pointSingle precision, 4 bytes; PICTURE string not allowed01 RATE USAGE COMP-1
USAGE COMP-2Floating pointDouble precision, 8 bytes; PICTURE string not allowed01 TOTAL USAGE COMP-2
PIC NNationalUTF-16 character data, 2 bytes per characterPIC N(10) - 20 bytes
PIC GDBCSDouble-byte characters with USAGE DISPLAY-1, 2 bytes eachPIC G(5) - 10 bytes

In IBM Enterprise COBOL, COMP and COMP-4 are synonyms of BINARY, and COMP-3 of PACKED-DECIMAL; packed-decimal items hold up to 18 digits, or 31 under the ARITH(EXTEND) compiler option.

Record Structure:

  • 01 level: Record definition
  • 05/10 levels: Field groupings
  • OCCURS: Arrays and tables
  • REDEFINES: Overlay definitions
  • FILLER: Unused space
  • VALUE: Default values

Version Differences

VersionYearKey ChangesFile Support
COBOL 601960Initial specificationSequential only
COBOL 651965Mass storage files, table handlingSequential, mass storage facilities
COBOL 681968First ANSI standard (X3.23-1968)Sequential, random access module
COBOL 741974DELETE statement, segmentation moduleAdds indexed and relative organizations
COBOL 851985Scope terminators, EVALUATE, nested programsNo new organizations; adds I/O status codes
COBOL 85 Amendment 11989Intrinsic function module (ANSI X3.23a-1989)No format change
COBOL 20022002Object orientation, Unicode, free-form codeNo format change
COBOL 20142014IEEE 754 arithmetic, dynamic-capacity tablesNo format change; Report Writer made optional
COBOL 20232023COMMIT/ROLLBACK, asynchronous messagingAdds LINE SEQUENTIAL organization, DELETE FILE statement

Compatibility Notes:

  • COBOL 85 widely supported in legacy systems
  • COBOL 74 and COBOL 85 each changed or removed features, breaking some earlier programs
  • File organizations vary by platform/compiler
  • EBCDIC vs ASCII differences affect both character encoding and collating sequence, altering sort, merge, comparison, and indexed-file key order
  • USAGE BINARY data is big-endian on z/OS with the sign in the leftmost bit; COMP-5 follows the platform's native binary representation
  • Packed-decimal items defined with an odd number of digits use all bits of every byte; even digit counts leave an unused half-byte
  • The RECORDING MODE clause applies only to QSAM files and is ignored for VSAM files
  • Record formats may include platform-specific headers

Technical References


To learn how to use this format with DataMeans, see the User Guide.