Prepare
- Upload COBOL data
COBOL (Common Business-Oriented Language) has been the backbone of enterprise computing for decades. COBOL applications use various file formats including sequential, indexed (ISAM), and relative files.
What You Can Upload
- VSAM (Virtual Storage Access Method) files
- ISAM (Indexed Sequential Access Method) files
- Sequential files with COBOL copybook definitions
- Relative files
- ZIP archive with data files and copybooks
What You Get Out
DataMeans extracts your data into multiple modern formats:
| Output | Description |
|---|---|
csv/{TableName}.csv | One CSV file per table with all row data |
xlsx/{TableName}.xlsx | Excel workbook per table |
xls/{TableName}.xls | Legacy Excel format per table |
json/{TableName}.json | JSON array of records per table |
json/{TableName}.jsonl | Newline-delimited JSON (streaming-friendly) |
postgres.sql | PostgreSQL CREATE TABLE + INSERT statements |
schema/schema-graph.json | Relationship graph for visualization |
schema/er-model.json | ER model for diagram tools |
report.json | Structured extraction report |
report.md | Human-readable extraction summary |
How to Export / Obtain Files
- Identify your COBOL data files on the mainframe/server
- Export or FTP the binary data files
- Locate corresponding copybook definitions (
.cpy,.cobfiles) - Create a ZIP with data files and copybooks together
- Upload the ZIP to DataMeans
Supported Features
- Copybook parsing for field definitions
- COMP, COMP-3 (packed decimal) field conversion
- REDEFINES clause handling
- OCCURS clause (array) normalization
- Automatic EBCDIC to ASCII conversion
- Numeric precision preservation
Known Limitations
- Copybook definitions required for accurate field interpretation
- Complex nested REDEFINES may require manual review
- Variable-length records need length indicators
Troubleshooting
| Issue | Solution |
|---|---|
| Fields misaligned | Verify copybook matches data file layout |
| Garbled characters | Check EBCDIC vs ASCII encoding settings |
| Numeric errors | Confirm COMP-3 vs DISPLAY field types |
Last updated: January 2026
Overview
COBOL (Common Business-Oriented Language) defines file organizations and data structures for business data processing, supporting sequential, indexed, and relative file access methods. COBOL data files use record-based structures with hierarchical field definitions, designed for batch processing and transaction systems in mainframe and enterprise environments. Unlike modern databases, COBOL files emphasize fixed-length records and efficient sequential processing.
History and Background
- 1959: CODASYL committee begins COBOL development.
- 1960: COBOL 60 first specification released.
- 1961: COBOL 61 revision addresses flaws in the original specification.
- 1963: COBOL-61 Extended adds the sort and Report Writer facilities.
- 1965: COBOL, Edition 1965 adds mass storage file handling and table facilities.
- 1968: First ANSI COBOL standard (COBOL 68, ANSI X3.23-1968).
- 1974: COBOL 74 adds indexed and relative file organizations, the DELETE statement, and the segmentation module.
- 1985: COBOL 85 adds structured programming features: scope terminators, EVALUATE, inline PERFORM, and nested subprograms.
- 1989: ANSI X3.23a-1989 amendment adds the intrinsic function module.
- 2002: COBOL 2002 (ISO/IEC 1989:2002) adds object orientation, Unicode support, and bit/Boolean data types.
- 2014: COBOL 2014 adds IEEE 754 arithmetic data types, dynamic-capacity tables, and method overloading.
- 2023: COBOL 2023 adds transaction processing (COMMIT/ROLLBACK), asynchronous messaging (SEND/RECEIVE), and the LINE SEQUENTIAL file organization.
File Format Specifications
COBOL defines three primary file organizations implemented through various access methods.
File Organizations:
- Sequential: Records stored in order written, accessed sequentially.
- Indexed: Key-based access with ISAM-like indexing.
- Relative: Direct access by record number (RRDS-like).
- Line Sequential: Text-line records delimited by line terminators - a line feed (hex
0A) on UNIX-type GnuCOBOL builds, carriage-return/line-feed on native Windows; standardized in COBOL 2023.
File Extensions:
- No standard extensions - platform-dependent
- Mainframe: z/OS datasets use dataset names without file extensions
- PC compilers: commonly
.datfor data, with a separate.idxindex file for Micro Focus indexed files
File Structure:
- Records: Fixed or variable-length data structures; z/OS QSAM recording modes are F (fixed), V (variable), U (one record per block), and S (spanned)
- Descriptors: Recording mode V prefixes each record with a 4-byte record-length field and each block with a 4-byte block descriptor; mode S records span blocks, with a segment descriptor on each segment
- Fields: Defined by PICTURE clauses and hierarchical levels
- Keys: Primary and alternate record keys for indexed files; VSAM limits key length to 255 bytes
- Headers: Implementation-specific; e.g., GnuCOBOL relative files prefix each fixed-length record with a 4-byte header holding the record's data length
- Limits: QSAM records hold up to 32,760 bytes; block size may reach 2,147,483,647 bytes with large block interface support (DFSMS 2.10 or later)
Data Types and Structures
COBOL data types are defined by PICTURE (PIC) clauses rather than explicit type names:
| PIC Clause | Type | Description | Example |
|---|---|---|---|
| PIC 9 | Numeric | Digits only | PIC 9(5) - 5-digit number |
| PIC X | Alphanumeric | Any characters | PIC X(10) - 10-character string |
| PIC A | Alphabetic | Letters only | PIC A(5) - 5-letter word |
| PIC S9 | Signed numeric | Positive/negative | PIC S9(4)V99 - signed decimal |
| PIC 9V9 | Decimal | Implied decimal point | PIC 9(3)V99 - 123.45 |
| PIC 1 | Boolean | Boolean data (COBOL 2002+) | PIC 1 USAGE BIT - one bit |
| PIC Z | Numeric-edited | Leading zeros replaced by spaces | PIC ZZZ9 - blanks leading zeros |
| PIC P | Scaled numeric | Assumed decimal scaling position; not counted in item size | PIC 9(3)P(3) - scales by 1000 |
| PIC 9 COMP | Binary | Two's complement; 2, 4, or 8 bytes for 1-4, 5-9, or 10-18 digits | PIC S9(4) COMP - 2 bytes |
| PIC 9 COMP-3 | Packed decimal | Two digits per byte; trailing half-byte holds the sign | PIC S9(7) COMP-3 - 4 bytes |
| PIC 9 COMP-5 | Native binary | Uses full capacity of its 2-, 4-, or 8-byte binary field | PIC S9(4) COMP-5 - 2 bytes |
| USAGE COMP-1 | Floating point | Single precision, 4 bytes; PICTURE string not allowed | 01 RATE USAGE COMP-1 |
| USAGE COMP-2 | Floating point | Double precision, 8 bytes; PICTURE string not allowed | 01 TOTAL USAGE COMP-2 |
| PIC N | National | UTF-16 character data, 2 bytes per character | PIC N(10) - 20 bytes |
| PIC G | DBCS | Double-byte characters with USAGE DISPLAY-1, 2 bytes each | PIC G(5) - 10 bytes |
In IBM Enterprise COBOL, COMP and COMP-4 are synonyms of BINARY, and COMP-3 of PACKED-DECIMAL; packed-decimal items hold up to 18 digits, or 31 under the ARITH(EXTEND) compiler option.
Record Structure:
- 01 level: Record definition
- 05/10 levels: Field groupings
- OCCURS: Arrays and tables
- REDEFINES: Overlay definitions
- FILLER: Unused space
- VALUE: Default values
Version Differences
| Version | Year | Key Changes | File Support |
|---|---|---|---|
| COBOL 60 | 1960 | Initial specification | Sequential only |
| COBOL 65 | 1965 | Mass storage files, table handling | Sequential, mass storage facilities |
| COBOL 68 | 1968 | First ANSI standard (X3.23-1968) | Sequential, random access module |
| COBOL 74 | 1974 | DELETE statement, segmentation module | Adds indexed and relative organizations |
| COBOL 85 | 1985 | Scope terminators, EVALUATE, nested programs | No new organizations; adds I/O status codes |
| COBOL 85 Amendment 1 | 1989 | Intrinsic function module (ANSI X3.23a-1989) | No format change |
| COBOL 2002 | 2002 | Object orientation, Unicode, free-form code | No format change |
| COBOL 2014 | 2014 | IEEE 754 arithmetic, dynamic-capacity tables | No format change; Report Writer made optional |
| COBOL 2023 | 2023 | COMMIT/ROLLBACK, asynchronous messaging | Adds LINE SEQUENTIAL organization, DELETE FILE statement |
Compatibility Notes:
- COBOL 85 widely supported in legacy systems
- COBOL 74 and COBOL 85 each changed or removed features, breaking some earlier programs
- File organizations vary by platform/compiler
- EBCDIC vs ASCII differences affect both character encoding and collating sequence, altering sort, merge, comparison, and indexed-file key order
- USAGE BINARY data is big-endian on z/OS with the sign in the leftmost bit; COMP-5 follows the platform's native binary representation
- Packed-decimal items defined with an odd number of digits use all bits of every byte; even digit counts leave an unused half-byte
- The RECORDING MODE clause applies only to QSAM files and is ignored for VSAM files
- Record formats may include platform-specific headers
Technical References
- ISO/IEC 1989:2023 COBOL Standard (INCITS)
- Wikipedia: COBOL
- GnuCOBOL Programmer's Guide
- IBM Enterprise COBOL for z/OS Language Reference
- The Open Group Technical Standard: COBOL Language
To learn how to use this format with DataMeans, see the User Guide.