py2mat.m and mat2py.m Performance Improvements
Introduction
Last week a Github user, https://github.com/hcommin, posted
a pair of issues, #4
and #5
on my book’s
Github repo
showing ways to make
py2mat.m
and
mat2py.m
run much faster.
TL;DR
The method I used to import Python modules numpy
, datetime
, and scipy.sparse
into the two MATLAB scripts was suboptimal.
As a result, both scripts suffered unnecessarily large load latencies.
Directly using a Python module via MATLAB’s py.
is faster than explicitly
importing the module. For example, x = py.numpy.array(y)
is much faster
than np = py.importlib.import_module('numpy'); x = np.array(y)
even if
np
is defined only once at the top of the MATLAB program.
Invoke Python modules directly via py.
instead of explicitly importing them
https://github.com/hcommin profiled py2mat.m
and mat2py.m
and found that
most of the time was spent loading Python modules into MATLAB via
py.importlib.import_module()
.
Original, slow versions
py2mat.m
and mat2py.m
used to import and reference
Python modules like this:
py2mat.py
:
Im = @py.importlib.import_module;
np = Im('numpy');
:
x_mat = int64(x_py.astype(np.float64));
mat2py.m
:
Im = @py.importlib.import_module;
np = Im('numpy');
sp = Im('scipy.sparse');
dt = Im('datetime');
tz = Im('dateutil.tz');
:
x_py = np.array(x_mat);
New, fast versions
It turns out that explicitly importing Python modules via
py.importlib.import_module()
—at least for the modules
above—is not only unnecessary, but is also slower than
using the modules directly via, for example, py.numpy.*
The faster versions of py2mat.m
and mat2py.m
committed
on Sept. 15, 2023 no longer call py.importlib.import_module()
and dispense with the np
, sp
, dt
, and t_py
aliases.
Their equivalent lines of code are
py2mat.py
:
x_mat = int64(x_py.astype('float64'));
mat2py.py
:
x_py = py.numpy.array(x_mat);
Benchmarks
The seemingly small code changes have a large impact on performance, esprecially when the utilities are called frequently. These MATLAB code snippets were use to measure before and after conversion times. The tests were run with MATLAB 2022b on Linux (Ubuntu 2020).
py2mat.py
:
x_mat = rand(1000);
x_py = mat2py(x_mat);
tic;
for i = 1:100
y_mat = old_py2mat(x_py);
end
fprintf('OLD time to convert rand(1000) to MATLAB= %.3f s\n', toc)
tic;
for i = 1:100
y_mat = py2mat(x_py);
end
fprintf('NEW time to convert rand(1000) to MATLAB= %.3f s\n', toc)
clear x_mat
x_mat.a = {'abc', [12, 13]};
x_mat.b = rand(5);
x_py = mat2py(x_mat);
tic;
for i = 1:100
y_mat = old_py2mat(x_py);
end
fprintf('OLD time to convert struct to MATLAB = %.3f s\n', toc)
tic;
for i = 1:100
y_mat = py2mat(x_py);
end
fprintf('NEW time to convert struct to MATLAB = %.3f s\n', toc)
Python to MATLAB matrix conversion saw a 3x increase while converting a structured variable from Python to MATLAB was nearly 8x faster:
OLD time to convert rand(1000) to MATLAB= 0.976 s
NEW time to convert rand(1000) to MATLAB= 0.302 s
OLD time to convert struct to MATLAB = 0.779 s
NEW time to convert struct to MATLAB = 0.099 s
Next, going in the other direction, from MATLAB to Python variables:
mat2py.py
:
x_mat = rand(1000);
tic;
for i = 1:100
y_py = old_mat2py(x_mat);
end
fprintf('OLD time to convert rand(1000) to Python= %.3f s\n', toc)
tic;
for i = 1:100
y_py = mat2py(x_mat);
end
fprintf('NEW time to convert rand(1000) to Python= %.3f s\n', toc)
clear x_mat
x_mat.a = {'abc', [12, 13]};
x_mat.b = rand(5);
tic;
for i = 1:100
y_py = old_mat2py(x_mat);
end
fprintf('OLD time to convert struct to Python = %.3f s\n', toc)
tic;
for i = 1:100
y_py = mat2py(x_mat);
end
fprintf('NEW time to convert struct to Python = %.3f s\n', toc)
The performance boost is even more noticeable in the MATLAB to Python direction:
OLD time to convert rand(1000) to Python= 0.953 s
NEW time to convert rand(1000) to Python= 0.120 s
OLD time to convert struct to Python = 0.968 s
NEW time to convert struct to Python = 0.140 s