by Al Danial

Read YAML, ini, TOML files in MATLAB with Python

Part 2 of the Python Is The Ultimate MATLAB Toolbox series.

YAML, TOML, ini: Convenient formats for program configuration data

As applications evolve, their inputs tend to become more complex. Parsing text files for program configuration data is a hassle so I typically store such data in formats such as YAML, TOML, ini, JSON, or XML. MATLAB natively supports JSON and XML but of the five listed options, these are the least attractive to me. JSON does not support comments and XML is just an all-around drag to view, edit, and code for. ini isn’t great either but suffices for simple inputs.

FileExchange options exist for reading YAML, TOML, and ini files. Alternatively you can use Python to read and write files in these formats.

As an example, say you’re writing an optimization program in MATLAB and you want to load your program’s configuration data into a struct, config, with these fields:

% MATLAB
config.max_iter = 1000;
config.newmark.alpha = 0.25;
config.newmark.beta  = 0.5;
config.input_dir = "/xfer/sim_data/2022/05/28";
config.tune_coeff = [1.2e-4  -3.25  58.2];

One option is to store the above lines in a .m file then invoke the name of the file in your application. While convenient, this technique combines code and data—definitely not a best practice. Instead, let’s store the data in YAML, ini, and TOML files then populate our config struct by loading these files with Python.

Reading YAML

Hierarchy in YAML is defined by horizonal whitespace, just as in Python. This YAML file

# optim_config.yaml
max_iter : 1000
newmark :
  alpha : 0.25
  beta : 0.5
input_dir : "/xfer/sim_data/2022/05/28"
tune_coeff : [1.2e-4, -3.25, 58.2]

loads into a dictionary like this in Python:

# Python
In : import yaml
In : with open('optim_config.yaml') as fh:
   :     config = yaml.safe_load(fh)
In : print(config)

{'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
 'input_dir': '/xfer/sim_data/2022/05/28',
 'tune_coeff': [0.00012, -3.25, 58.2]}

Let’s try it in MATLAB. If you haven’t already, set up your MATLAB+Python environment. Then try

% MATLAB
>> config = py.yaml.safe_load(py.open('optim_config.yaml'))
config =

  Python dict with no properties.

    {'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
     'input_dir': '/xfer/sim_data/2022/05/28',
     'tune_coeff': [0.00012, -3.25, 58.2]}

It worked! Sort of—config is a Python dictionary within MATLAB, not a MATLAB struct. Individual values are accessible, but only in a clumsy manner:

% MATLAB
>> config.get('newmark').get('beta')

    0.5000

What we really want is a native MATLAB struct, not a Python dictionary. Enter py2mat.m….

py2mat.m

py2mat.m is a generic Python-to-MATLAB variable converter I wrote to simplify MATLAB/Python interaction. We can use it to convert the Python dictionary returned by yaml.safe_load() to a native MATLAB struct:

% MATLAB
>> config = py2mat( py.yaml.safe_load(py.open('optim_config.yaml')) )

  struct with fields:

      max_iter: 1000
       newmark: [1x1 struct]
     input_dir: "/xfer/sim_data/2022/05/28"
    tune_coeff: {[1.2000e-04]  [-3.2500]  [58.2000]}

and now we can access fields the way we intended all along:

% MATLAB
>> config.newmark.beta

    0.5000

py2mat.m converts arbitrarily nested Python dictionaries, lists, tuples, sets, scalars, NumPy arrays, SciPy sparse matrices—even datetimes with correct timezone handling—to corresponding MATLAB structs and cell arrays containing scalars, strings, dense and sparse matrices, and MATLAB datetimes.

mat2py.m

The counterpart to py2mat.m is mat2py.m. It takes a native MATLAB variable and converts it to equivalent native Python variable within MATLAB. This is handy for passing MATLAB data into Python functions—as when using the Python yaml module to store a MATLAB variable in a YAML file. Let’s give that a try.

df2t.m, t2df.m for Pandas Dataframes and MATLAB Tables

Do you work with Pandas dataframes or MATLAB tables? Artem Lenskiy wrote a pair of converters, df2t.m and t2df.m, that are the dataframe and table equivalents of py2mat.m and mat2py.m. You can find df2t.m and t2df.m at his PandasToMatlab Github project.

Writing YAML

The dump() function from the Python yaml module needs two things to write a YAML file: a variable whose contents are to be written and a file handle to write to. To do this from MATLAB however, we’ll need to provide a Python variable and a Python file handle. We can get a Python version of the MATLAB variable with mat2py() and a Python file handle simply by calling py.open() instead of open(). This example echoes our current config variable to a new YAML file:

% MATLAB
>> fh = py.open('new_config.yaml','w');  % Python, not MATLAB, file handle
>> py_config = mat2py(config)

  Python dict with no properties.

      {'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
       'input_dir': '/xfer/sim_data/2022/05/28',
       'tune_coeff': [0.00012, -3.25, 58.2]}

>> py.yaml.dump(py_config, fh)
>> fh.close()

The newly written file, new_config.yaml, looks like this:

input_dir: /xfer/sim_data/2022/05/28
max_iter: 1000
newmark:
  alpha: 0.25
  beta: 0.5
tune_coeff:
- 0.00012
- -3.25
- 58.2

While the sequence of entries and layout differ from the original optim_config.yaml file, the data loads into the same structure in both Python and MATLAB.

Reading TOML

TOML files can store hierarchical data in text files like YAML without need for correctly-aligned horizontal whitespace. Our configuration data can be stored in TOML like this:

# optim_config.toml
max_iter = 1000
input_dir = "/xfer/sim_data/2022/05/28"
tune_coeff = [1.2e-4, -3.25, 58.2]
[newmark]
alpha = 0.25
beta = 0.5

The toml module load function can take a file name directly so there’s no need for a separate call to the file open(). The load function returns a dictionary:

# Python
In : import toml
In : config = toml.load('optim_config.toml')
In : print(config)
{'max_iter': 1000, 'input_dir': '/xfer/sim_data/2022/05/28',
 'tune_coeff': [0.00012, -3.25, 58.2],
 'newmark': {'alpha': 0.25, 'beta': 0.5}}

Loading TOML files in MATLAB is a one-line operation, just as it is in Python:

% MATLAB
>> config = py2mat(py.toml.load('optim_config.toml'))
  struct with fields:

      max_iter: 1000
     input_dir: "/xfer/sim_data/2022/05/28"
    tune_coeff: {[1.2000e-04]  [-3.2500]  [58.2000]}
       newmark: [1x1 struct]

>> config.newmark.alpha

    0.2500

Reading ini

ini files are less versatile than YAML or TOML since 1) they don’t allow one to store arbitrarily nested hierarchical data and 2) all values come in as strings. An additional nuisance is that the read function in Python’s ini handling module, configparser, does not throw an error if the given file cannot be read or parsed properly; the reader merely returns an empty list.

While somewhat resembling the TOML file, all values to the right of the equals sign are loaded as strings. Even 1.2e-4, -3.25, 58.2 is a single string that we have to deal with ourselves by separating the string into words then converting those words to numbers, Our configuration data might be stored in ini like so:

; optim_config.ini
[max_iter]
value = 1000
[newmark]
alpha = 0.25
beta = 0.5
[input_dir]
value = /xfer/sim_data/2022/05/28
[tune_coeff]
value = 1.2e-4, -3.25, 58.2

Loading this file in Python involves considerably more work than loading YAML or TOML.

# Python
In : import configparser
In : parser = configparser.ConfigParser()
In : parser.read('optim_config.ini')
In : max_iter = int(parser.get('max_iter', 'value'))
In : newmark_alpha = float(parser.get('newmark', 'alpha'))
In : newmark_beta  = float(parser.get('newmark', 'beta'))
In : input_dir  = parser.get('input_dir', 'value')
In : tune_coeff = [float(_) for _ in parser.get('tune_coeff', 'value').split(',')]

In : newmark_beta
Out: 0.5

In : tune_coeff
Out: [0.00012, -3.25, 58.2]

The MATLAB version is even messier:

% MATLAB
>> parser = py.configparser.ConfigParser();
>> parser.read('optim_config.ini');
>> max_iter = int64(py.int(parser.get('max_iter', 'value')));
>> newmark_alpha = double(py.float(parser.get('newmark', 'alpha')));
>> newmark_beta  = double(py.float(parser.get('newmark', 'beta' )));
>> input_dir  = string(parser.get('input_dir', 'value'));
>> tune_coeff = cellfun(@(x) double(x), ...
                        py2mat(parser.get('tune_coeff', 'value').split(',')));

>> newmark_beta

    0.5000

>> tune_coeff

    0.0001   -3.2500   58.2000

Next: SQLite

Join me again on June 11, 2022 for a discussion on reading and writing SQLite files in MATLAB.