Prepare
- Upload GIS / Geospatial files
Geographic Information System databases store spatial and mapping data with coordinates, projections, and geographic metadata.
Planned Support
- ESRI Shapefiles (
.shp,.shx,.dbf,.prj) - MapInfo TAB format (
.tab,.dat,.id,.map) - Coordinate extraction and projection handling
- Attribute data normalization
- PostgreSQL PostGIS compatibility
What You Get Out
Once the parser ships, DataMeans will extract your data into multiple modern formats:
| Output | Description |
|---|---|
csv/{TableName}.csv | One CSV file per table with all row data |
xlsx/{TableName}.xlsx | Excel workbook per table |
xls/{TableName}.xls | Legacy Excel format per table |
json/{TableName}.json | JSON array of records per table |
json/{TableName}.jsonl | Newline-delimited JSON (streaming-friendly) |
postgres.sql | PostgreSQL CREATE TABLE + INSERT statements |
schema/schema-graph.json | Relationship graph for visualization |
schema/er-model.json | ER model for diagram tools |
report.json | Structured extraction report |
report.md | Human-readable extraction summary |
File Requirements
For Shapefiles:
.shp(shape geometry).shx(shape index).dbf(attribute data).prj(projection info) - optional
For MapInfo:
.tab(table definition).dat(data file).id(index).map(map display)
Current Status
Parser development is in the planning phase. Shapefile support is well-defined (open ESRI standard); MapInfo is more proprietary.
Technical Notes
Spatial data requires specialized coordinate transformation libraries. The focus will be on extracting attribute data with coordinate columns for PostGIS import.
Last updated: January 2026
Overview
Geographic Information Systems (GIS) use specialized file formats to store spatial data with geometric shapes, coordinates, and attribute information for mapping and location-based analysis. These formats support vector geometries (points, lines, polygons), coordinate reference systems, and spatial relationships, enabling applications in urban planning, environmental monitoring, and geographic research. Unlike traditional databases, GIS formats include spatial indexing and projection metadata for accurate geographic representation.
History and Background
- 1960s: Early GIS development at universities and government labs.
- 1969: Esri (Environmental Systems Research Institute) founded by Jack and Laura Dangermond in Redlands, California.
- 1970s: Canada Geographic Information System (CGIS) operational.
- 1980s: Commercial GIS software emerges (ArcInfo, MapInfo).
- 1990s: ESRI Shapefile becomes de facto standard for vector data.
- 1994: OpenGIS Consortium (renamed Open Geospatial Consortium in 2004) founded for standards.
- 2000s: Web-based GIS with Google Maps, OpenStreetMap.
- 2004: Google acquires Keyhole, Inc., the originator of the KML format.
- 2008: GeoJSON specification for JSON-based geospatial data.
- 2014: GeoPackage adopted as an OGC standard.
- 2016: GeoJSON standardized by the IETF as RFC 7946.
File Format Specifications
GIS formats vary from proprietary binary to open standards, supporting vector and raster data.
File Extensions:
.shp- ESRI Shapefile geometry.dbf- Attribute data (dBase format).shx- Positional index of feature geometry.prj- Projection information.geojson- JSON-based geospatial.kml- Keyhole Markup Language.kmz- Zipped KML archive.gml- Geography Markup Language (XML).gpkg- GeoPackage SQLite database.fgb- FlatGeobuf binary encoding.tab- MapInfo table format
File Structure:
- Geometry: Coordinate-based shapes and locations
- Attributes: Tabular data linked to geometries
- Metadata: Projection, bounds, and spatial reference
- Index: Spatial indexing for query performance
- Topology: Relationships between spatial features
- Shapefile Header:
.shpopens with a fixed 100-byte header (file code, version, shape type, bounding box) - Shapefile Records: 8-byte record headers; lengths counted in 16-bit words; coordinates stored as 8-byte IEEE doubles
- Index Records: each
.shxentry is 8 bytes (record offset and content length)
Key Components:
- Features: Individual spatial objects
- Layers: Collections of related features
- Coordinate Systems: Geographic or projected
- Spatial Reference: Datum and projection parameters
- Bounding Box: Extent of spatial data
Data Types and Structures
| Type | Description | Storage |
|---|---|---|
| POINT | Single coordinate location | X,Y coordinates |
| LINESTRING | Connected line segments | Array of coordinates |
| POLYGON | Closed area boundary | Outer/inner rings |
| MULTIPOINT | Multiple point locations | Array of points |
| MULTILINESTRING | Multiple line features | Array of linestrings |
| MULTIPOLYGON | Multiple polygon areas | Array of polygons |
| GEOMETRYCOLLECTION | Mixed geometry types | Collection of geometries |
| CIRCULARSTRING | Curve with circular arcs between points | Arc-defining coordinates |
| POLYHEDRALSURFACE | Surface of connected polygon patches | Polygon patches |
| TIN | Triangulated irregular network surface | Connected triangles |
Spatial Model:
- Vector data represents discrete features
- Attributes provide descriptive information
- Spatial relationships (contains, intersects, etc.)
- Coordinate precision and accuracy
- Topology rules for data integrity
- WKT (text) and WKB (binary) geometry encodings defined by OGC Simple Features
Version Differences
| Format | Year | Key Changes | Compatibility |
|---|---|---|---|
| Shapefile | early 1990s | Multi-file binary vector format (spec published 1998) | 2 GB limit per .shp/.dbf file |
| GeoJSON | 2008 | JSON text encoding of features | RFC 7946 (2016) requires WGS 84 coordinates |
| KML 2.2 | 2008 | Google Earth XML format adopted by OGC | Supports altitude and TimeSpan/TimeStamp |
| KML 2.3 | 2015 | Minor revision (OGC 12-007r2); XML namespace unchanged from 2.2 | KML 2.2 documents remain valid |
| GML 3.2 | 2007 | OGC XML encoding, published as ISO 19136:2007 | Validated against XML Schema |
| GeoPackage | 2014 | SQLite 3 database container (.gpkg) | Vector features and raster tiles in one file |
| FlatGeobuf | 2018 | FlatBuffers binary encoding | Optional packed Hilbert R-tree index; streamable |
Compatibility Notes:
- Shapefiles lack topology and are file-based
- GeoJSON is text-based and web-friendly
- KML supports 3D placement (altitude) and time animation
- Shapefile
.dbfattribute fields are limited to 10-character names - Shapefile
.dbftables allow at most 255 fields; text fields hold up to 254 characters - GeoJSON (RFC 7946) orders coordinates longitude, latitude and removed the 2008 draft's
crsmember - KML altitude is measured in meters above the WGS 84 EGM96 geoid;
.kmzfiles are zipped KML archives - FlatGeobuf deliberately omits random-write support; its spatial index is optional so files can be written as a stream
- GeoPackage stores all content in a single SQLite 3 database file
Technical References
- Open Geospatial Consortium (OGC)
- Wikipedia: Geographic Information System
- ESRI Shapefile Technical Description
- GeoJSON Specification
- OGC GeoPackage Encoding Standard
To learn how to use this format with DataMeans, see the User Guide.