Read YAML, ini, TOML files in MATLAB with Python
Part 2 of the Python Is The Ultimate MATLAB Toolbox series.
YAML, TOML, ini: Convenient formats for program configuration data
As applications evolve, their inputs tend to become more complex.
Parsing text files for program configuration data is a hassle so
I typically store such data in formats such as
YAML,
TOML,
ini
,
JSON,
or
XML.
MATLAB natively supports JSON and XML but of the five
listed options, these are the least attractive to me.
JSON does not support comments
and XML is just an all-around drag to view, edit, and code for.
ini
isn’t great either but suffices for simple inputs.
FileExchange options exist for reading
YAML,
TOML,
and
ini
files. Alternatively you can use Python to read and write
files in these formats.
As an example, say you’re writing an optimization program in MATLAB
and you want to load your program’s configuration data
into a struct, config
, with these fields:
% MATLAB
config.max_iter = 1000;
config.newmark.alpha = 0.25;
config.newmark.beta = 0.5;
config.input_dir = "/xfer/sim_data/2022/05/28";
config.tune_coeff = [1.2e-4 -3.25 58.2];
One option is to store the above lines in a .m
file then
invoke the name of the file in your application.
While convenient, this technique combines code and data—definitely
not a best practice.
Instead, let’s store the data in YAML, ini
, and TOML files then
populate our config
struct by loading these files
with Python.
Reading YAML
Hierarchy in YAML is defined by horizonal whitespace, just as in Python. This YAML file
# optim_config.yaml
max_iter : 1000
newmark :
alpha : 0.25
beta : 0.5
input_dir : "/xfer/sim_data/2022/05/28"
tune_coeff : [1.2e-4, -3.25, 58.2]
loads into a dictionary like this in Python:
# Python
In : import yaml
In : with open('optim_config.yaml') as fh:
: config = yaml.safe_load(fh)
In : print(config)
{'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
'input_dir': '/xfer/sim_data/2022/05/28',
'tune_coeff': [0.00012, -3.25, 58.2]}
Let’s try it in MATLAB. If you haven’t already, set up your MATLAB+Python environment. Then try
% MATLAB
>> config = py.yaml.safe_load(py.open('optim_config.yaml'))
config =
Python dict with no properties.
{'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
'input_dir': '/xfer/sim_data/2022/05/28',
'tune_coeff': [0.00012, -3.25, 58.2]}
It worked! Sort of—config
is a Python
dictionary within MATLAB, not a MATLAB struct.
Individual values are accessible, but only in a clumsy manner:
% MATLAB
>> config.get('newmark').get('beta')
0.5000
What we really want is a native MATLAB struct, not a Python dictionary.
Enter py2mat.m
….
py2mat.m
py2mat.m
is a generic Python-to-MATLAB variable converter I wrote to
simplify MATLAB/Python interaction.
We can use it to convert the Python dictionary returned by
yaml.safe_load()
to a native MATLAB struct:
% MATLAB
>> config = py2mat( py.yaml.safe_load(py.open('optim_config.yaml')) )
struct with fields:
max_iter: 1000
newmark: [1x1 struct]
input_dir: "/xfer/sim_data/2022/05/28"
tune_coeff: {[1.2000e-04] [-3.2500] [58.2000]}
and now we can access fields the way we intended all along:
% MATLAB
>> config.newmark.beta
0.5000
py2mat.m
converts arbitrarily nested
Python dictionaries, lists, tuples, sets, scalars, NumPy arrays,
SciPy sparse matrices—even datetimes with correct timezone
handling—to corresponding MATLAB
structs and cell arrays containing
scalars, strings, dense and sparse matrices, and MATLAB datetimes.
mat2py.m
The counterpart to py2mat.m
is
mat2py.m
.
It takes a native MATLAB variable and converts it to equivalent
native Python variable within MATLAB.
This is handy for passing MATLAB data into Python functions—as when
using the Python yaml
module to store a MATLAB variable in a YAML file.
Let’s give that a try.
df2t.m
, t2df.m
for Pandas Dataframes and MATLAB Tables
Do you work with Pandas dataframes or MATLAB tables?
Artem Lenskiy wrote a pair of converters,
df2t.m
and t2df.m
,
that are the dataframe and table equivalents of
py2mat.m
and mat2py.m
.
You can find df2t.m
and t2df.m
at his
PandasToMatlab
Github project.
Writing YAML
The dump()
function from the Python yaml
module
needs two things to write a YAML file:
a variable whose contents are to be written and
a file handle to write to.
To do this from MATLAB however, we’ll need to
provide a
Python variable
and a
Python file handle.
We can get a Python version of the MATLAB variable
with mat2py()
and a Python file handle
simply by calling py.open()
instead of open()
.
This example echoes our current config
variable to
a new YAML file:
% MATLAB
>> fh = py.open('new_config.yaml','w'); % Python, not MATLAB, file handle
>> py_config = mat2py(config)
Python dict with no properties.
{'max_iter': 1000, 'newmark': {'alpha': 0.25, 'beta': 0.5},
'input_dir': '/xfer/sim_data/2022/05/28',
'tune_coeff': [0.00012, -3.25, 58.2]}
>> py.yaml.dump(py_config, fh)
>> fh.close()
The newly written file, new_config.yaml
, looks like this:
input_dir: /xfer/sim_data/2022/05/28
max_iter: 1000
newmark:
alpha: 0.25
beta: 0.5
tune_coeff:
- 0.00012
- -3.25
- 58.2
While the sequence of entries and layout differ from the original
optim_config.yaml
file,
the data loads into the same structure in both Python and MATLAB.
Reading TOML
TOML files can store hierarchical data in text files like YAML without need for correctly-aligned horizontal whitespace. Our configuration data can be stored in TOML like this:
# optim_config.toml
max_iter = 1000
input_dir = "/xfer/sim_data/2022/05/28"
tune_coeff = [1.2e-4, -3.25, 58.2]
[newmark]
alpha = 0.25
beta = 0.5
The toml
module load function can take a file name directly
so there’s no need for a separate call to the file open()
.
The load function returns a dictionary:
# Python
In : import toml
In : config = toml.load('optim_config.toml')
In : print(config)
{'max_iter': 1000, 'input_dir': '/xfer/sim_data/2022/05/28',
'tune_coeff': [0.00012, -3.25, 58.2],
'newmark': {'alpha': 0.25, 'beta': 0.5}}
Loading TOML files in MATLAB is a one-line operation, just as it is in Python:
% MATLAB
>> config = py2mat(py.toml.load('optim_config.toml'))
struct with fields:
max_iter: 1000
input_dir: "/xfer/sim_data/2022/05/28"
tune_coeff: {[1.2000e-04] [-3.2500] [58.2000]}
newmark: [1x1 struct]
>> config.newmark.alpha
0.2500
Reading ini
ini
files are less versatile than YAML or TOML since 1) they
don’t allow one to store arbitrarily nested
hierarchical data and 2) all values come
in as strings.
An additional nuisance is that the read function in
Python’s ini
handling module,
configparser
, does not throw an
error if the given file cannot be read or parsed properly; the
reader merely returns an empty list.
While somewhat resembling the TOML file,
all values to the right of the equals sign are loaded as strings.
Even 1.2e-4, -3.25, 58.2
is a single string that we have
to deal with ourselves by
separating the string into words then
converting those words to numbers,
Our configuration data might be stored in ini
like so:
; optim_config.ini
[max_iter]
value = 1000
[newmark]
alpha = 0.25
beta = 0.5
[input_dir]
value = /xfer/sim_data/2022/05/28
[tune_coeff]
value = 1.2e-4, -3.25, 58.2
Loading this file in Python involves considerably more work than loading YAML or TOML.
# Python
In : import configparser
In : parser = configparser.ConfigParser()
In : parser.read('optim_config.ini')
In : max_iter = int(parser.get('max_iter', 'value'))
In : newmark_alpha = float(parser.get('newmark', 'alpha'))
In : newmark_beta = float(parser.get('newmark', 'beta'))
In : input_dir = parser.get('input_dir', 'value')
In : tune_coeff = [float(_) for _ in parser.get('tune_coeff', 'value').split(',')]
In : newmark_beta
Out: 0.5
In : tune_coeff
Out: [0.00012, -3.25, 58.2]
The MATLAB version is even messier:
% MATLAB
>> parser = py.configparser.ConfigParser();
>> parser.read('optim_config.ini');
>> max_iter = int64(py.int(parser.get('max_iter', 'value')));
>> newmark_alpha = double(py.float(parser.get('newmark', 'alpha')));
>> newmark_beta = double(py.float(parser.get('newmark', 'beta' )));
>> input_dir = string(parser.get('input_dir', 'value'));
>> tune_coeff = cellfun(@(x) double(x), ...
py2mat(parser.get('tune_coeff', 'value').split(',')));
>> newmark_beta
0.5000
>> tune_coeff
0.0001 -3.2500 58.2000
Previous: Table of Contents | Next: Read and write SQLite files