Recent Posts (page 7 / 8)

by Al Danial

Read YAML, ini, TOML files in MATLAB with Python

Part 2 of the Python Is The Ultimate MATLAB Toolbox series.

YAML, TOML, ini: Convenient formats for program configuration data

As applications evolve, their inputs tend to become more complex. Parsing text files for program configuration data is a hassle so I typically store such data in formats such as YAML, TOML, ini, JSON, or XML. MATLAB natively supports JSON and XML but of the five listed options, these are the least attractive to me. JSON does not support comments and XML is just an all-around drag to view, edit, and code for. ini isn’t great either but suffices for simple inputs.

FileExchange options exist for reading YAML, TOML, and ini files. Alternatively you can use Python to read and write files in these formats.

As an example, say you’re writing an optimization program in MATLAB and you want to load your program’s configuration data into a struct, config, with these fields:

% MATLAB
config.max_iter = 1000;
config.newmark.alpha = 0.25;
config.newmark.beta  = 0.5;
config.input_dir = "/xfer/sim_data/2022/05/28";
config.tune_coeff = [1.2e-4  -3.25  58.2];

One option is to store the above lines in a .m file then invoke the name of the file in your application. While convenient, this technique combines code and data—definitely not a best practice. Instead, let’s store the data in YAML, ini, and TOML files then populate our config struct by loading these files with Python.

Reading YAML

Hierarchy in YAML is defined by horizonal whitespace, just as in Python. This YAML file

# optim_config.yaml
max_iter : 1000
newmark :
  alpha : 0.25
  beta : 0.5
input_dir : "/xfer/sim_data/2022/05/28"
tune_coeff : [1.2e-4, -3.25, 58.2]

loads into a dictionary like this in Python:

# Python
In : import yaml
In : with open('optim_config.yaml') as fh:
   :     config = yaml.safe_load(fh)
In : print(config)

{'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
 'input_dir': '/xfer/sim_data/2022/05/28',
 'tune_coeff': [0.00012, -3.25, 58.2]}

Let’s try it in MATLAB. If you haven’t already, set up your MATLAB+Python environment. Then try

% MATLAB
>> config = py.yaml.safe_load(py.open('optim_config.yaml'))
config =

  Python dict with no properties.

    {'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
     'input_dir': '/xfer/sim_data/2022/05/28',
     'tune_coeff': [0.00012, -3.25, 58.2]}

It worked! Sort of—config is a Python dictionary within MATLAB, not a MATLAB struct. Individual values are accessible, but only in a clumsy manner:

% MATLAB
>> config.get('newmark').get('beta')

    0.5000

What we really want is a native MATLAB struct, not a Python dictionary. Enter py2mat.m….

py2mat.m

py2mat.m is a generic Python-to-MATLAB variable converter I wrote to simplify MATLAB/Python interaction. We can use it to convert the Python dictionary returned by yaml.safe_load() to a native MATLAB struct:

% MATLAB
>> config = py2mat( py.yaml.safe_load(py.open('optim_config.yaml')) )

  struct with fields:

      max_iter: 1000
       newmark: [1x1 struct]
     input_dir: "/xfer/sim_data/2022/05/28"
    tune_coeff: {[1.2000e-04]  [-3.2500]  [58.2000]}

and now we can access fields the way we intended all along:

% MATLAB
>> config.newmark.beta

    0.5000

py2mat.m converts arbitrarily nested Python dictionaries, lists, tuples, sets, scalars, NumPy arrays, SciPy sparse matrices—even datetimes with correct timezone handling—to corresponding MATLAB structs and cell arrays containing scalars, strings, dense and sparse matrices, and MATLAB datetimes.

mat2py.m

The counterpart to py2mat.m is mat2py.m. It takes a native MATLAB variable and converts it to equivalent native Python variable within MATLAB. This is handy for passing MATLAB data into Python functions—as when using the Python yaml module to store a MATLAB variable in a YAML file. Let’s give that a try.

df2t.m, t2df.m for Pandas Dataframes and MATLAB Tables

Do you work with Pandas dataframes or MATLAB tables? Artem Lenskiy wrote a pair of converters, df2t.m and t2df.m, that are the dataframe and table equivalents of py2mat.m and mat2py.m. You can find df2t.m and t2df.m at his PandasToMatlab Github project.

Writing YAML

The dump() function from the Python yaml module needs two things to write a YAML file: a variable whose contents are to be written and a file handle to write to. To do this from MATLAB however, we’ll need to provide a Python variable and a Python file handle. We can get a Python version of the MATLAB variable with mat2py() and a Python file handle simply by calling py.open() instead of open(). This example echoes our current config variable to a new YAML file:

% MATLAB
>> fh = py.open('new_config.yaml','w');  % Python, not MATLAB, file handle
>> py_config = mat2py(config)

  Python dict with no properties.

      {'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
       'input_dir': '/xfer/sim_data/2022/05/28',
       'tune_coeff': [0.00012, -3.25, 58.2]}

>> py.yaml.dump(py_config, fh)
>> fh.close()

The newly written file, new_config.yaml, looks like this:

input_dir: /xfer/sim_data/2022/05/28
max_iter: 1000
newmark:
  alpha: 0.25
  beta: 0.5
tune_coeff:
- 0.00012
- -3.25
- 58.2

While the sequence of entries and layout differ from the original optim_config.yaml file, the data loads into the same structure in both Python and MATLAB.

Reading TOML

TOML files can store hierarchical data in text files like YAML without need for correctly-aligned horizontal whitespace. Our configuration data can be stored in TOML like this:

# optim_config.toml
max_iter = 1000
input_dir = "/xfer/sim_data/2022/05/28"
tune_coeff = [1.2e-4, -3.25, 58.2]
[newmark]
alpha = 0.25
beta = 0.5

The toml module load function can take a file name directly so there’s no need for a separate call to the file open(). The load function returns a dictionary:

# Python
In : import toml
In : config = toml.load('optim_config.toml')
In : print(config)
{'max_iter': 1000, 'input_dir': '/xfer/sim_data/2022/05/28',
 'tune_coeff': [0.00012, -3.25, 58.2],
 'newmark': {'alpha': 0.25, 'beta': 0.5}}

Loading TOML files in MATLAB is a one-line operation, just as it is in Python:

% MATLAB
>> config = py2mat(py.toml.load('optim_config.toml'))
  struct with fields:

      max_iter: 1000
     input_dir: "/xfer/sim_data/2022/05/28"
    tune_coeff: {[1.2000e-04]  [-3.2500]  [58.2000]}
       newmark: [1x1 struct]

>> config.newmark.alpha

    0.2500

Reading ini

ini files are less versatile than YAML or TOML since 1) they don’t allow one to store arbitrarily nested hierarchical data and 2) all values come in as strings. An additional nuisance is that the read function in Python’s ini handling module, configparser, does not throw an error if the given file cannot be read or parsed properly; the reader merely returns an empty list.

While somewhat resembling the TOML file, all values to the right of the equals sign are loaded as strings. Even 1.2e-4, -3.25, 58.2 is a single string that we have to deal with ourselves by separating the string into words then converting those words to numbers, Our configuration data might be stored in ini like so:

; optim_config.ini
[max_iter]
value = 1000
[newmark]
alpha = 0.25
beta = 0.5
[input_dir]
value = /xfer/sim_data/2022/05/28
[tune_coeff]
value = 1.2e-4, -3.25, 58.2

Loading this file in Python involves considerably more work than loading YAML or TOML.

# Python
In : import configparser
In : parser = configparser.ConfigParser()
In : parser.read('optim_config.ini')
In : max_iter = int(parser.get('max_iter', 'value'))
In : newmark_alpha = float(parser.get('newmark', 'alpha'))
In : newmark_beta  = float(parser.get('newmark', 'beta'))
In : input_dir  = parser.get('input_dir', 'value')
In : tune_coeff = [float(_) for _ in parser.get('tune_coeff', 'value').split(',')]

In : newmark_beta
Out: 0.5

In : tune_coeff
Out: [0.00012, -3.25, 58.2]

The MATLAB version is even messier:

% MATLAB
>> parser = py.configparser.ConfigParser();
>> parser.read('optim_config.ini');
>> max_iter = int64(py.int(parser.get('max_iter', 'value')));
>> newmark_alpha = double(py.float(parser.get('newmark', 'alpha')));
>> newmark_beta  = double(py.float(parser.get('newmark', 'beta' )));
>> input_dir  = string(parser.get('input_dir', 'value'));
>> tune_coeff = cellfun(@(x) double(x), ...
                        py2mat(parser.get('tune_coeff', 'value').split(',')));

>> newmark_beta

    0.5000

>> tune_coeff

    0.0001   -3.2500   58.2000

Previous: Table of Contents | Next: Read and write SQLite files

by Al Danial

Python Is The Ultimate MATLAB Toolbox


2022-12-17: The complete series of “Python is the Ultimate MATLAB Toolbox” articles is available as an eBook in PDF and epub formats (174 pages, Creative Commons license).


MATLAB’s ability to run Python code is fantastic—it extends MATLAB’s capabilities to encompass everything Python can do. Oddly, few MATLAB developers I know are aware of MATLAB’s binary API to Python, or of the vast possibilities Python offers MATLAB developers. I wrote Python for MATLAB Development to teach Python to MATLAB programmers and to demonstrate many of the benefits Python can bring them. My presentation at the MATLAB EXPO 2022 (slides) demonstrates the fundamentals of calling Python code from MATLAB.

Over the next few months I’ll post selections from the book that demonstrate how to call Python code from MATLAB. Prepare by setting up your MATLAB+Python environment (PfMD § 2.5). Then join me every other week on a tour of examples that show how…

Python helps MATLAB to:

What about toolboxes from the MathWorks?

The MathWorks has a large collection of toolboxes that can perform most of the tasks above. Their toolboxes are the preferred solutions because they offer

  • unique capability: many MATLAB toolboxes solve such complex or specialized problems that no alternatives–Python or otherwise–exist
  • programming convenience: a solution coded entirely in MATLAB is easier to write, debug, and maintain
  • configuration convenience: you don’t need to create a special MATLAB-friendly Python virtual environment or configure your MATLAB setup to use it
  • stability: hybrid MATLAB/Python solutions are vulnerable to library compatibility problems caused by routine operating system updates. MathWorks toolboxes don’t have this problem.

Python-based solutions become attractive if you don’t have timely access to to the necessary MathWorks toolboxes. License procurement at large organizations often takes months and irritates both budget managers and the people needing the toolbox. Software purchase requests are typically challenged with questions like: Why was this need not anticipated? Which project’s funds should we cut to pay for this? Can you justify raising our overhead rates with this purchase? Is this a one-time need or will you need it again next year? If one can get organizational approval and can wait a few months, then MathWorks toolboxes are the way to go.

What about the FileExchange?

The MathWorks' code sharing site, FileExchange, hosts more than 40,000 freely available packages to supplement MATLAB, including solutions to several of the Python-powered options listed above. Why turn to Python when pure-MATLAB options exist on the FileExchange? There are several reasons:

  • Value-added Python distributions like Anaconda come bundled with a large collection of popular modules for numeric, scientific, engineering, and financial computation. If your organization supports Anaconda, chances are good that Python modules you need to augment your MATLAB work are already available.

  • Organizations that have commercial support agreements with Python distributors like Anaconda and ActiveState, or Linux providers such as Red Hat, have access to security-vetted packages from these vendors' curated repositories. Installing Python modules from these sources is a safer bet than pulling files from the FileExchange.

  • The Python community is several times larger than MATLAB’s. This means popular Python modules (including all listed above) see heavier use and so tend to be more extensively exercised and feature-rich than corresponding FileExchange options. Although project star counts are crude proxies for popularity and use, compare the number of stars in two popular MySQL client interfaces: the MATLAB MySQL Database Connector has 32 stars on the FileExchange while Python’s PyMySQL module (which itself is less popular than the official MySQL connector released by the MySQL development team) has 6,800 stars on Github.

  • Open source Python modules tend to be developed, discussed, tracked, and released on code collaboration sites such as Github, Gitlab, or SourceForge. These allow one to report and track bugs, track forks, follow code commits, and view code in the browser. In contrast, the FileExchange is essentially a storage location with options for comments. Compared to the collaboration sites, the FileExchange offers less insight on a project’s vitality, code content, or number of contributors.


Next: Read and write YAML, .ini, and TOML files