API Reference

Data Converters

Seismic Data

Note

By default, the SEG-Y ingestion tool uses Python’s multiprocessing to speed up parsing the data. This almost always requires a __main__ guard on any other Python code that is executed directly like python file.py. When running inside Jupyter, this is NOT needed.

1if __name__ == "__main__":
2    segy_to_mdio(...)

When the CLI is invoked, this is already handled.

See the official multiprocessing documentation here and here.

Conversion from SEG-Y to MDIO v1 format.

mdio.converters.segy.segy_to_mdio(segy_spec, mdio_template, input_path, output_path, overwrite=False, grid_overrides=None, segy_header_overrides=None)

A function that converts a SEG-Y file to an MDIO v1 file.

Ingest a SEG-Y file according to the segy_spec. This could be a spec from registry or custom.

Parameters:
  • segy_spec (SegySpec) – The SEG-Y specification to use for the conversion.

  • mdio_template (AbstractDatasetTemplate) – The MDIO template to use for the conversion.

  • input_path (UPath | Path | str) – The universal path of the input SEG-Y file.

  • output_path (UPath | Path | str) – The universal path for the output MDIO v1 file.

  • overwrite (bool) – Whether to overwrite the output file if it already exists. Defaults to False.

  • grid_overrides (dict[str, Any] | None) – Option to add grid overrides.

  • segy_header_overrides (SegyHeaderOverrides | None) – Option to override specific SEG-Y headers during ingestion.

Raises:

FileExistsError – If the output location already exists and overwrite is False.

Return type:

None

Conversion from to MDIO various other formats.

mdio.converters.mdio.mdio_to_segy(segy_spec, input_path, output_path, selection_mask=None, client=None)

Convert MDIO file to SEG-Y format.

We export N-D seismic data to the flattened SEG-Y format used in data transmission.

The input headers are preserved as is, and will be transferred to the output file.

Input MDIO can be local or cloud based. However, the output SEG-Y will be generated locally.

A selection_mask can be provided (same shape as spatial grid) to export a subset.

Parameters:
  • segy_spec (SegySpec) – The SEG-Y specification to use for the conversion.

  • input_path (UPath | Path | str) – Store or URL (and cloud options) for MDIO file.

  • output_path (UPath | Path | str) – Path to the output SEG-Y file.

  • selection_mask (np.ndarray) – Array that lists the subset of traces

  • client (distributed.Client) – Dask client. If None we will use local threaded scheduler. If auto is used we will create multiple processes (with 8 threads each).

Raises:
  • ImportError – if distributed package isn’t installed but requested.

  • ValueError – if cut mask is empty, i.e. no traces will be written.

Return type:

None

Examples

To export an existing local MDIO file to SEG-Y we use the code snippet below. This will export the full MDIO (without padding) to SEG-Y format.

>>> from upath import UPath
>>> from mdio import mdio_to_segy
>>>
>>> input_path = UPath("prefix2/file.mdio")
>>> output_path = UPath("prefix/file.segy")
>>> mdio_to_segy(input_path, output_path)

Core Functionality

Dimensions

Dimension (grid) abstraction and serializers.

class mdio.core.dimension.Dimension(coords, name)

Dimension class.

Dimension has a name and coordinates associated with it. The Dimension coordinates can only be a vector.

Parameters:
  • coords (list | tuple | NDArray | range) – Vector of coordinates.

  • name (str) – Name of the dimension.

coords

Vector of coordinates.

Type:

list | tuple | NDArray | range

name

Name of the dimension.

Type:

str

max()

Get maximum value of dimension.

Return type:

NDArray[float]

min()

Get minimum value of dimension.

Return type:

NDArray[float]

property size: int

Size of the dimension.

Optimization

Optimize MDIO seismic datasets for fast access patterns using ZFP compression and Dask.

This module provides tools to create compressed, rechunked transpose views of seismic data for efficient access along dataset dimensions. It uses configurable ZFP compression based on data statistics and supports parallel processing with Dask Distributed.

pydantic model mdio.optimize.access_pattern.OptimizedAccessPatternConfig

Configuration for fast access pattern optimization.

Show JSON schema
{
   "title": "OptimizedAccessPatternConfig",
   "description": "Configuration for fast access pattern optimization.",
   "type": "object",
   "properties": {
      "optimize_dimensions": {
         "additionalProperties": {
            "items": {
               "type": "integer"
            },
            "type": "array"
         },
         "description": "Optimize dims and desired chunks.",
         "title": "Optimize Dimensions",
         "type": "object"
      },
      "processing_chunks": {
         "additionalProperties": {
            "type": "integer"
         },
         "description": "Chunk sizes for processing the original variable.",
         "title": "Processing Chunks",
         "type": "object"
      },
      "compressor": {
         "anyOf": [
            {
               "$ref": "#/$defs/Blosc"
            },
            {
               "$ref": "#/$defs/ZFP"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Compressor to use for access patterns.",
         "title": "Compressor"
      }
   },
   "$defs": {
      "Blosc": {
         "additionalProperties": false,
         "description": "Data Model for Blosc options.",
         "properties": {
            "name": {
               "default": "blosc",
               "description": "Name of the compressor.",
               "title": "Name",
               "type": "string"
            },
            "cname": {
               "$ref": "#/$defs/BloscCname",
               "default": "zstd",
               "description": "Compression algorithm name."
            },
            "clevel": {
               "default": 5,
               "description": "Compression level (integer 0\u20139)",
               "maximum": 9,
               "minimum": 0,
               "title": "Clevel",
               "type": "integer"
            },
            "shuffle": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/BloscShuffle"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Shuffling mode before compression."
            },
            "typesize": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The size in bytes that the shuffle is performed over.",
               "title": "Typesize"
            },
            "blocksize": {
               "default": 0,
               "description": "The size (in bytes) of blocks to divide data before compression.",
               "title": "Blocksize",
               "type": "integer"
            }
         },
         "title": "Blosc",
         "type": "object"
      },
      "BloscCname": {
         "description": "Enum for compression library used by blosc.",
         "enum": [
            "lz4",
            "lz4hc",
            "blosclz",
            "zstd",
            "snappy",
            "zlib"
         ],
         "title": "BloscCname",
         "type": "string"
      },
      "BloscShuffle": {
         "description": "Enum for shuffle filter used by blosc.",
         "enum": [
            "noshuffle",
            "shuffle",
            "bitshuffle"
         ],
         "title": "BloscShuffle",
         "type": "string"
      },
      "ZFP": {
         "additionalProperties": false,
         "description": "Data Model for ZFP options.",
         "properties": {
            "name": {
               "default": "zfp",
               "description": "Name of the compressor.",
               "title": "Name",
               "type": "string"
            },
            "mode": {
               "$ref": "#/$defs/ZFPMode"
            },
            "tolerance": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Fixed accuracy in terms of absolute error tolerance.",
               "title": "Tolerance"
            },
            "rate": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Fixed rate in terms of number of compressed bits per value.",
               "title": "Rate"
            },
            "precision": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Fixed precision in terms of number of uncompressed bits per value.",
               "title": "Precision"
            }
         },
         "required": [
            "mode"
         ],
         "title": "ZFP",
         "type": "object"
      },
      "ZFPMode": {
         "description": "Enum for ZFP algorithm modes.",
         "enum": [
            "fixed_rate",
            "fixed_precision",
            "fixed_accuracy",
            "reversible"
         ],
         "title": "ZFPMode",
         "type": "string"
      }
   },
   "required": [
      "optimize_dimensions",
      "processing_chunks"
   ]
}

field compressor: Blosc | ZFP | None = None

Compressor to use for access patterns.

field optimize_dimensions: dict[str, tuple[int, ...]] [Required]

Optimize dims and desired chunks.

field processing_chunks: dict[str, int] [Required]

Chunk sizes for processing the original variable.

mdio.optimize.access_pattern.optimize_access_patterns(dataset, config, n_workers=1, threads_per_worker=1)

Optimize MDIO dataset for fast access along dimensions.

Optimize an MDIO dataset by creating compressed, rechunked views for fast access along configurable dimensions, then append them to the existing MDIO file.

This uses ZFP compression with tolerance based on data standard deviation and the provided quality level. Requires Dask Distributed for parallel execution. It will try to grab the existing distributed.Client or create its own. Existing Client will be kept running after optimization.

Parameters:
  • dataset (Dataset) – MDIO Dataset containing the seismic data.

  • config (OptimizedAccessPatternConfig) – Configuration object with quality, access patterns, and processing chunks.

  • n_workers (int) – Number of Dask workers. Default is 1.

  • threads_per_worker (int) – Threads per Dask worker. Default is 1.

Raises:

ValueError – If required attrs/stats are missing or the dataset is invalid.

Return type:

None

Examples

For Post-Stack 3D seismic data, we can optimize the inline, crossline, and depth dimensions.

>>> from mdio import optimize_access_patterns, OptimizedAccessPatternConfig
>>> from mdio import open_mdio
>>>
>>> conf = OptimizedAccessPatternConfig(
>>>     optimize_dimensions={
>>>         "inline": (4, 512, 512),
>>>         "crossline": (512, 4, 512),
>>>         "time": (512, 512, 4),
>>>     },
>>>     processing_chunks= {"inline": 512, "crossline": 512, "time": 512}
>>> )
>>>
>>> ds = open_mdio("/path/to/seismic.mdio")
>>> optimize_access_patterns(ds, conf, n_workers=4)