by Al Danial

Read and write SQLite databases in MATLAB with Python

Part 3 of the Python Is The Ultimate MATLAB Toolbox series.

SQLite: the most widely deployed database on earth

Most likely the application you’re using to read this article is running SQLite because SQLite is built into the Firefox, Chrome, and Safari web browsers. The computer you’re on most definitely has it: SQLite is built into Android, iOS, macOS, Windows 10 and up, and is standard on Linux distributions. In addition to being everywhere, SQLite is also blazingly fast.

Given its ubiquity and high performance for storing and retrieving data, you may want to use SQLite in MATLAB. Your tool of choice for interacting with SQLite in MATLAB is MathWorks' Database Toolbox. If you have access to the Database Toolbox, the rest of this post may not interest you.

MATLAB + SQLite via Python

An alternative to the Database Toolbox is to use Python’s sqlite3 module in MATLAB.

In this article I’ll explore the CRUD—create, read, update, delete—operations with SQLite in MATLAB with a simple database representing package deliveries between offices of a corporation. The database has a table that stores each package’s type, weight, origin, and destination, and a second table with price per unit weight for each type. Here are represetative values:

orders table

origin item weight destination
Z66 tube 1.73 N51
O64 package 3.66 J43
U74 tube 4.73 U71
K54 letter 1.14 V08
W09 letter 2.58 D50
T89 package 2.53 H15
P70 tube 2.58 E95
L96 card 4.20 N56

rates table

item cost by weight
tube 0.46
package 0.65
letter 0.16
card 0.05

Create (and delete)

The “create” part of CRUD refers to three things: creating the database itself, creating tables within the database, and populating the tables with rows. An SQLite database is a file, typically saved with a .db extension, and is created the first time you attempt to “connect” to it. SQLite tables are created by sending SQL “create table” commands to the connected database, and popualted with SQL “insert” commands.

SQL table creation commands

Conventional SQL commands to create these two tables might look like this:

create table orders (origin text,
                     item   text,
                     weight float,
                     dest   text);
create table rates ( item   text,
                     cost_by_weight float);

If you put the six lines above in a text file called make_deliveries_db.sql, you can create a SQLite database by piping the file into the sqlite3 command line tool:

cat  make_deliveries_db.sql | sqlite3 deliveries.db      # Linux, macOS
type make_deliveries_db.sql | sqlite3.exe deliveries.db  # Windows

What we’re really after though is doing this in MATLAB. To get there we’ll start with a Python version since Python will be our gateway to using SQLite in MATLAB. A complete Python program to create the SQLite database file deliveries.db with our tables orders and rates is:

#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db') # creates this if it does not exist
cursor = conn.cursor()
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
create_orders =   """create table orders (origin text,
                                          item   text,
                                          weight float,
                                          dest   text);"""
create_rates  =   """create table rates ( item   text,
                                          cost_by_weight float);"""
cursor.execute(create_orders)
cursor.execute(create_rates)
conn.commit()
conn.close()

The two “drop table if exists” lines delete the named tables from the database if they are already defined. If we didn’t include these lines, the second time we try to run the program we’ll hit a “table already exists” error. By dropping the tables (if they exist) first, we’ll be able to rerun without errors—very handy during development.

Next: same thing in MATLAB. If you haven’t already, set up your MATLAB+Python environment. You’ll note the MATLAB code looks almost the same has the Python equivalent above.

% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db'); % creates this if it does not exist
cursor = conn.cursor();
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
create_order_table = "create table orders (" + ...
                           "origin text,"    + ...
                           "item   text,"    + ...
                           "weight float,"   + ...
                           "dest   text);";
cursor.execute(create_order_table);
cursor.execute("create table rates(item text, cost_by_weight float);");
conn.commit();
conn.close();

Inserts

Our tables are ready for content. Generic SQL commands to populate our two tables with the representative data shown above look like this:

insert into orders values('Z66', 'tube'   , 1.73, 'N51');
insert into orders values('O64',' package', 3.66, 'J43');
insert into orders values('U74',' tube'   , 4.73, 'U71');
insert into orders values('K54',' letter' , 1.14, 'V08');
insert into orders values('W09',' letter' , 2.58, 'D50');
insert into orders values('T89',' package', 2.53, 'H15');
insert into orders values('P70',' tube'   , 2.58, 'E95');
insert into orders values('L96',' card'   , 4.20, 'N56');

insert into rates values('tube'   , 0.46);
insert into rates values('package', 0.65);
insert into rates values('letter' , 0.16);
insert into rates values('card'   , 0.05);

Row inserts can be done with Python’s sqlite3 module either individually or in bulk. After you’ve created the cursor variable, the two forms of inserts can be done like this in Python:

# Python

# single row insert
row  =   [ 'Z66', 'tube'   , 1.73, 'N51']
cursor.execute('insert into orders values (?,?,?,?)', row)

# multiple row insert
rows = [ [ 'O64',' package', 3.66, 'J43'],
         [ 'U74',' tube'   , 4.73, 'U71'],
         [ 'K54',' letter' , 1.14, 'V08'],
         [ 'W09',' letter' , 2.58, 'D50'],
         [ 'T89',' package', 2.53, 'H15'],
         [ 'P70',' tube'   , 2.58, 'E95'],
         [ 'L96',' card'   , 4.20, 'N56'], ]
cursor.executemany('insert into orders values (?,?,?,?)', rows)

The multiple row insertion is done in a single transaction and yields much higher performance than single row inserts. The lines above look like this in MATLAB:

% MATLAB

% single row insert
row  =   { 'Z66', 'tube'   , 1.73, 'N51'};
cursor.execute('insert into orders values (?,?,?,?)', row);

% multiple row insert
rows = { { 'O64',' package', 3.66, 'J43'}, ...
         { 'U74',' tube'   , 4.73, 'U71'}, ...
         { 'K54',' letter' , 1.14, 'V08'}, ...
         { 'W09',' letter' , 2.58, 'D50'}, ...
         { 'T89',' package', 2.53, 'H15'}, ...
         { 'P70',' tube'   , 2.58, 'E95'}, ...
         { 'L96',' card'   , 4.20, 'N56'}, };
cursor.executemany('insert into orders values (?,?,?,?)', rows);

A data generator

A database whose biggest table has just eight rows underwhelms. This small MATLAB program creates a file, orders.txt, containing an arbitrary number of rows of random entries for the orders table (100,000 in this case). We can use it to produce a large database and test insert and query performance.

Aside: Python’s Faker module is an excellent resource for generating arbitrary fake data for testing.

File: fake_data.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
% MATLAB
n_rows = 100000;

item = {'letter', 'card', 'tube', 'package'};
n_item = length(item);
fh = fopen('orders.txt','w');
for i = 1:n_rows
    origin = char([64 + randi(26)  47 + randi(10,1,2)]);
    obj    = item{randi(n_item)};
    weight = 5*rand();
    dest   = char([64 + randi(26)  47 + randi(10,1,2)]);
    fprintf(fh,'%s %-8s  %5.2f %s\n', origin, obj, weight, dest);
end
fclose(fh);

Output is a text file, orders.txt, with four columns of random values. The first few lines of orders.txt might be

Y91 package    4.79 M81
R77 card       3.28 E70
H00 package    3.47 I90
L37 package    0.93 M46
S72 tube       3.28 E14
Y35 letter     3.76 G56

Order data saved to this text file needs to be loaded by our programs into a row major variable suitable for use by cursor.executemany(). In other words, the first level index i to the variable returns the i-th row of data. In Python this can be done with a list comprehension over lines of the file, where each line is separated into a list of values:

# Python
import pathlib
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]

MATLAB’s data loaders such as textscan() are column major so a bit of manipulation is needed to coerce the required structure:

% MATLAB
fh = fopen('orders.txt');
cols = textscan(fh, '%s %s %f %s');
fclose(fh);
n_rows = length(cols{1});
rows = cell(1,n_rows);
for r = 1:n_rows
    rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
end

Note the subtle indexing difference for column 3. Parentheses, rather than braces, must be used to dereference numeric values within a cell.

Complete programs to create and populate our deliveries database using data from orders.txt now look like this.

Python:

File: make_db.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/env python
import time
import sqlite3
import pathlib
conn = sqlite3.connect('deliveries.db') # creates if the file doesn't exist
cursor = conn.cursor()

# delete/create
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
cursor.execute("create table rates(item text, cost_by_weight float);");
create_orders_table = """create table orders (
                              origin text,
                              item   text,
                              weight float,
                              dest   text);"""
cursor.execute(create_orders_table)

rows = [ ['tube'   , 0.46],
         ['package', 0.65],
         ['letter' , 0.16],
         ['card'   , 0.05] ]
cursor.executemany('insert into rates values (?,?)', rows)

# load orders data from the text file
T_s = time.time()
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]
print(f'Data read time = {time.time() - T_s} seconds')

T_s = time.time()
cursor.executemany('insert into orders values (?,?,?,?)', rows)
conn.commit()
print(f'Data insert time = {time.time() - T_s} seconds')

conn.close()

MATLAB:

File: make_db.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db'); % create if it does not exist
cursor = conn.cursor();

% delete/create
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
cursor.execute("create table rates(item text, cost_by_weight float);");
create_order_table = "create table orders (" + ...
                           "origin text,"    + ...
                           "item   text,"    + ...
                           "weight float,"   + ...
                           "dest   text);";
cursor.execute(create_order_table);

% populate the rates table
rows = { {'tube'   , 0.46}, ...
         {'package', 0.65}, ...
         {'letter' , 0.16}, ...
         {'card'   , 0.05} };
cursor.executemany('insert into rates values (?,?)', rows);

% load orders data from the text file and insert to orders table
tic;
if (0)
    % BEFORE -- MATLAB file read + transpose
    fh = fopen('orders.txt');
    cols = textscan(fh, '%s %s %f %s');
    fclose(fh);
    n_rows = length(cols{1});
    rows = cell(1,n_rows);
    for r = 1:n_rows
        rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
    end
else
    % AFTER -- Python file read
    rows = py.load_orders.readfile('orders.txt');
end
fprintf('Data read time = %.6f seconds\n', toc)

tic;
cursor.executemany('insert into orders values (?,?,?,?)', rows);
conn.commit();
fprintf('Data insert time = %.6f seconds\n', toc)
conn.close();

Insert performance

Our programs' insert performance depends on how rapidly they can ingest the data from files and how long the bulk insert itself takes. Times in seconds on my Ubuntu 20.04 laptop using MATLAB 2022a and Python 3.8.8 are

n_rows MATLAB read MATLAB insert Python read Python insert
100,000 1.10 0.68 0.10 0.20
1,000,000 6.47 4.65 2.36 2.15

MATLAB is slower, but it must also do more work: it has to 1) transpose the loaded data from a column-major to a row-major container and 2) internally translate MATLAB strings and numeric values to Python equivalents when calling cursor.executemany().

We can boost MATLAB’s speed by using Python to ease both of these issues simply by replacing MATLAB’s file reading code with the Python reader. The Python reader creates a Python-native variable that’s immediately ready for the bulk insert with cursor.executemany(). This bypasses the MATLAB-to-Python impedance mismatch that MATLAB otherwise handles for us under the hood—at a time expense, of course.

Here’s a small Python module with a function that implements the file loader we can call from MATLAB:

File: load_orders.py

1
2
3
4
5
# Python; file load_orders.py
import pathlib
def readfile(File):
    P = pathlib.Path(File)
    return [_.split() for _ in P.read_text().split('\n') if _]

All that remains is to replace the MATLAB ingest code with a call to the above Python function. Here are the before and after MATLAB lines, where the ‘before’ code is blocked via if (0) and the ‘after’ code exists in the else block:

if (0)
    % BEFORE -- MATLAB file read + transpose
    fh = fopen('orders.txt');
    cols = textscan(fh, '%s %s %f %s');
    fclose(fh);
    n_rows = length(cols{1});
    rows = cell(1,n_rows);
    for r = 1:n_rows
        rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
    end
else
    % AFTER -- Python file read
    rows = py.load_orders.readfile('orders.txt');
end

MATLAB’s performance with load_orders.readfile() is much closer to Python’s:

n_rows MATLAB+Python read MATLAB insert Python read Python insert
100,000 1.10 0.68 0.10 0.20
1,000,000 3.11 2.34 2.36 2.15

Read

Database reads are done with the SQL select statement. Here’s a query that computes the cummulative cost of shipments by originating office:

select origin,sum(weight)*cost_by_weight from orders O, rates R
  where O.item = R.item group by origin;

The query can be done programmatically in Python with

File: select.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db')
cursor = conn.cursor()

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" \
        "    where O.item = R.item group by origin order by origin;"
for row in cursor.execute(query).fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

conn.close()

and equivalently in MATLAB with

File: select.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db');
cursor = conn.cursor();

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
for row = cursor.execute(query).fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

Query performance is essentially the same in Python and MATLAB.

Update

A database update modifies the content of existing rows. Say the scale used to weigh packages (as opposed to the scale used for cards, letters, and tubes) was set to the wrong units, for example pounds instead of kilograms. This SQL update scales the weight of packages by 2.2 while leaving the wieghts of cards, letters, and tubes alone:

update orders set weight = weight*2.2 where item = 'package';

The programs below repeat the select statement from the previous section to show total cost by origin. Next the programs apply the weight update for packages then repeat the query. The programs below show how this is done programmatically in Python:

File: select_update.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db')
cursor = conn.cursor()

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" \
        "    where O.item = R.item group by origin order by origin;"
print('Original cost:')
for row in cursor.execute(query).fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

# update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'")

print('After the weight correction:')
for row in cursor.execute(query).fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

conn.close()

and in MATLAB:

File: select_update.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db');
cursor = conn.cursor();

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
fprintf('Original cost:\n')
for row = cursor.execute(query).fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

% update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'");

fprintf('After the weight correction:\n')
for row = cursor.execute(query).fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

Join me again on June 25, 2022 to see how MATLAB can use Python’s openpyxl module to create richly-formatted Excel files on Windows, macOS, and Linux.