Read and write SQLite databases in MATLAB with Python
Part 3 of the Python Is The Ultimate MATLAB Toolbox series.
SQLite: the most widely deployed database on earth
Most likely the application you’re using to read this article is running SQLite because SQLite is built into the Firefox, Chrome, and Safari web browsers. The computer you’re on most definitely has it: SQLite is built into Android, iOS, macOS, Windows 10 and up, and is standard on Linux distributions. In addition to being everywhere, SQLite is also blazingly fast.
Given its ubiquity and high performance for storing and retrieving data, you may want to use SQLite in MATLAB. Your tool of choice for interacting with SQLite in MATLAB is MathWorks' Database Toolbox. If you have access to the Database Toolbox, the rest of this post may not interest you.
MATLAB + SQLite via Python
An alternative to the Database Toolbox is to use Python’s sqlite3
module
in MATLAB.
In this article I’ll explore the CRUD—create, read, update, delete—operations with SQLite in MATLAB with a simple database representing package deliveries between offices of a corporation. The database has a table that stores each package’s type, weight, origin, and destination, and a second table with price per unit weight for each type. Here are represetative values:
orders
table
origin | item | weight | destination |
---|---|---|---|
Z66 | tube | 1.73 | N51 |
O64 | package | 3.66 | J43 |
U74 | tube | 4.73 | U71 |
K54 | letter | 1.14 | V08 |
W09 | letter | 2.58 | D50 |
T89 | package | 2.53 | H15 |
P70 | tube | 2.58 | E95 |
L96 | card | 4.20 | N56 |
rates
table
item | cost by weight |
---|---|
tube | 0.46 |
package | 0.65 |
letter | 0.16 |
card | 0.05 |
Create (and delete)
The “create” part of CRUD refers to three things: creating the
database itself, creating tables within the database,
and populating the tables with rows.
An SQLite database is a file, typically saved with a .db
extension,
and is created the first time you attempt to “connect” to it.
SQLite tables are created by sending SQL “create table” commands
to the connected database, and popualted with SQL “insert” commands.
SQL table creation commands
Conventional SQL commands to create these two tables might look like this:
create table orders (origin text,
item text,
weight float,
dest text);
create table rates ( item text,
cost_by_weight float);
If you put the six lines above in a text file
called make_deliveries_db.sql
, you can create a SQLite
database by piping the file into the sqlite3
command line tool:
cat make_deliveries_db.sql | sqlite3 deliveries.db # Linux, macOS
type make_deliveries_db.sql | sqlite3.exe deliveries.db # Windows
What we’re really after though is doing this in MATLAB.
To get there we’ll start with a Python version since Python
will be our gateway to using SQLite in MATLAB.
A complete Python program to create the
SQLite database file deliveries.db
with our tables orders
and rates
is:
#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db') # creates this if it does not exist
cursor = conn.cursor()
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
create_orders = """create table orders (origin text,
item text,
weight float,
dest text);"""
create_rates = """create table rates ( item text,
cost_by_weight float);"""
cursor.execute(create_orders)
cursor.execute(create_rates)
conn.commit()
conn.close()
The two “drop table if exists
” lines delete the named tables from
the database if they are already defined.
If we didn’t include these lines, the second time we try to run
the program we’ll hit a “table already exists” error.
By dropping the tables (if they exist) first, we’ll
be able to rerun without errors—very handy during development.
Next: same thing in MATLAB. If you haven’t already, set up your MATLAB+Python environment. You’ll note the MATLAB code looks almost the same has the Python equivalent above.
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db'); % creates this if it does not exist
cursor = conn.cursor();
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
create_order_table = "create table orders (" + ...
"origin text," + ...
"item text," + ...
"weight float," + ...
"dest text);";
cursor.execute(create_order_table);
cursor.execute("create table rates(item text, cost_by_weight float);");
conn.commit();
conn.close();
Inserts
Our tables are ready for content. Generic SQL commands to populate our two tables with the representative data shown above look like this:
insert into orders values('Z66', 'tube' , 1.73, 'N51');
insert into orders values('O64',' package', 3.66, 'J43');
insert into orders values('U74',' tube' , 4.73, 'U71');
insert into orders values('K54',' letter' , 1.14, 'V08');
insert into orders values('W09',' letter' , 2.58, 'D50');
insert into orders values('T89',' package', 2.53, 'H15');
insert into orders values('P70',' tube' , 2.58, 'E95');
insert into orders values('L96',' card' , 4.20, 'N56');
insert into rates values('tube' , 0.46);
insert into rates values('package', 0.65);
insert into rates values('letter' , 0.16);
insert into rates values('card' , 0.05);
Row inserts can be done with Python’s sqlite3
module
either individually or in bulk.
After you’ve created the
cursor
variable, the two forms of
inserts can be done like this in Python:
# Python
# single row insert
row = [ 'Z66', 'tube' , 1.73, 'N51']
cursor.execute('insert into orders values (?,?,?,?)', row)
# multiple row insert
rows = [ [ 'O64',' package', 3.66, 'J43'],
[ 'U74',' tube' , 4.73, 'U71'],
[ 'K54',' letter' , 1.14, 'V08'],
[ 'W09',' letter' , 2.58, 'D50'],
[ 'T89',' package', 2.53, 'H15'],
[ 'P70',' tube' , 2.58, 'E95'],
[ 'L96',' card' , 4.20, 'N56'], ]
cursor.executemany('insert into orders values (?,?,?,?)', rows)
The multiple row insertion is done in a single transaction and yields much higher performance than single row inserts. The lines above look like this in MATLAB:
% MATLAB
% single row insert
row = { 'Z66', 'tube' , 1.73, 'N51'};
cursor.execute('insert into orders values (?,?,?,?)', row);
% multiple row insert
rows = { { 'O64',' package', 3.66, 'J43'}, ...
{ 'U74',' tube' , 4.73, 'U71'}, ...
{ 'K54',' letter' , 1.14, 'V08'}, ...
{ 'W09',' letter' , 2.58, 'D50'}, ...
{ 'T89',' package', 2.53, 'H15'}, ...
{ 'P70',' tube' , 2.58, 'E95'}, ...
{ 'L96',' card' , 4.20, 'N56'}, };
cursor.executemany('insert into orders values (?,?,?,?)', rows);
A data generator
A database whose biggest table has just eight rows underwhelms.
This small MATLAB program creates a file, orders.txt
,
containing an arbitrary number of rows of random entries
for the orders
table (100,000 in this case).
We can use it to produce a large database and test insert
and query performance.
Aside: Python’s Faker module is an excellent resource for generating arbitrary fake data for testing.
File: fake_data.m
|
|
Output is a text file, orders.txt
, with four columns of
random values.
The first few lines of orders.txt
might be
Y91 package 4.79 M81
R77 card 3.28 E70
H00 package 3.47 I90
L37 package 0.93 M46
S72 tube 3.28 E14
Y35 letter 3.76 G56
Order data saved to this text file needs to be loaded by our
programs into a row major variable suitable for use by
cursor.executemany()
.
In other words, the first level index i to the variable
returns the i-th row of data.
In Python this can be done with a list comprehension over
lines of the file, where each line is separated into
a list of values:
# Python
import pathlib
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]
MATLAB’s data loaders such as textscan()
are column major
so a bit of manipulation is needed to coerce the required structure:
% MATLAB
fh = fopen('orders.txt');
cols = textscan(fh, '%s %s %f %s');
fclose(fh);
n_rows = length(cols{1});
rows = cell(1,n_rows);
for r = 1:n_rows
rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
end
Note the subtle indexing difference for column 3. Parentheses, rather than braces, must be used to dereference numeric values within a cell.
Complete programs to create and populate our deliveries
database using data from orders.txt
now look like this.
Python:
File: make_db.py
|
|
MATLAB:
File: make_db.m
|
|
Insert performance
Our programs' insert performance depends on how rapidly they can ingest the data from files and how long the bulk insert itself takes. Times in seconds on my Ubuntu 20.04 laptop using MATLAB 2022a and Python 3.8.8 are
n_rows | MATLAB read | MATLAB insert | Python read | Python insert |
---|---|---|---|---|
100,000 | 1.10 | 0.68 | 0.10 | 0.20 |
1,000,000 | 6.47 | 4.65 | 2.36 | 2.15 |
MATLAB is slower, but it must also do more work: it has to 1) transpose
the loaded data from a column-major to a row-major container and
2) internally translate MATLAB strings and numeric values
to Python equivalents when calling cursor.executemany()
.
We can boost MATLAB’s speed by using Python to ease both of these
issues simply by replacing MATLAB’s file reading code with the
Python reader.
The Python reader creates a Python-native variable that’s
immediately ready for the bulk insert with
cursor.executemany()
.
This bypasses the
MATLAB-to-Python impedance mismatch that MATLAB
otherwise handles for us under the hood—at a time expense, of course.
Here’s a small Python module with a function that implements the file loader we can call from MATLAB:
File: load_orders.py
|
|
All that remains is to replace the MATLAB ingest code
with a call to the above Python function.
Here are the before and after MATLAB lines, where the ‘before’
code is blocked via if (0)
and the ‘after’ code exists
in the else
block:
if (0)
% BEFORE -- MATLAB file read + transpose
fh = fopen('orders.txt');
cols = textscan(fh, '%s %s %f %s');
fclose(fh);
n_rows = length(cols{1});
rows = cell(1,n_rows);
for r = 1:n_rows
rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
end
else
% AFTER -- Python file read
rows = py.load_orders.readfile('orders.txt');
end
MATLAB’s performance
with load_orders.readfile()
is much closer to Python’s:
n_rows | MATLAB+Python read | MATLAB insert | Python read | Python insert |
---|---|---|---|---|
100,000 | 1.10 | 0.68 | 0.10 | 0.20 |
1,000,000 | 3.11 | 2.34 | 2.36 | 2.15 |
Read
Database reads are done with the SQL select
statement.
Here’s a query that computes the cummulative cost of shipments
by originating office:
select origin,sum(weight)*cost_by_weight from orders O, rates R
where O.item = R.item group by origin;
The query can be done programmatically in Python with
File: select.py
|
|
and equivalently in MATLAB with
File: select.m
|
|
Query performance is essentially the same in Python and MATLAB.
Update
A database update modifies the content of existing rows. Say the scale used to weigh packages (as opposed to the scale used for cards, letters, and tubes) was set to the wrong units, for example pounds instead of kilograms. This SQL update scales the weight of packages by 2.2 while leaving the wieghts of cards, letters, and tubes alone:
update orders set weight = weight*2.2 where item = 'package';
The programs below repeat the select
statement from the
previous section to show total cost by origin.
Next the programs apply the weight update for packages then
repeat the query.
The programs below show how this is
done programmatically
in Python:
File: select_update.py
|
|
and in MATLAB:
File: select_update.m
|
|
Table of Contents | Previous: Read and write YAML, .ini, and TOML files | Next: Write custom Excel files