Part 5 of the
Python Is The Ultimate MATLAB Toolbox series.
In Read and write SQLite databases in MATLAB with Python
I showed how MATLAB can work with SQLite database files using
the sqlite3
module from the Python standard library.
Here I’ll demonstrate CRUD—create, read, update, delete—operations
using the
PostgreSQL database
engine and the
psycopg2
Python module.
The best way to work with PostgreSQL in MATLAB is
to use the MathWorks'
Database Toolbox.
This post may only interest you if you lack access to
the Database Toolbox and you’re willing to
include Python in your MATLAB workflow.
Install psycopg2
Unlike sqlite3
, psycopg2
is not in the standard Python
library.
Check your Python installation to see if it has psycopg2
by starting the Python command line REPL and importing
the module. If it imports successfully you can also check
the version number:
> python
>>> import psycopg2
>>> psycopg2.__version__
'2.9.3 (dt dec pq3 ext lo64)'
If you get a ModuleNotFoundError
, you’ll need to install it.
That can be done on an Anaconda Python installation with
or with
> python -m pip install psycopg2
using other Python distributions.
Connecting to a PostgreSQL Database
The most challenging step in this post is setting up, configuring,
and making an initial connection to a PostgreSQL database service—all
of which are beyond the scope of this post.
Ideally you are already familiar with the process, or
have access to an existing database service, or
know someone who can get you set up.
If not, follow the PostgreSQL
getting started
documentation.
The rest of this post assumes you have
- an account and password
- permission to create and delete databases
- network access
on or to a PostgreSQL server.
Briefly, a PostgreSQL server administrator on a Linux
computer might issue commands such as
> sudo -u postgres psql
postgres=# create role Al with login password 'SuperSecret';
postgres=# create database deliveries;
postgres=# grant all privileges on database deliveries to Al;
postgres=# quit
to set me up.
Run this small Python program to check if you can connect to
your PostgreSQL server:
#!/usr/bin/env python
# test_pg_connection.py
import psycopg2
conn = psycopg2.connect(host="big_db_server", port=5432, # default port
user="Al", password="SuperSecret",
database="deliveries")
conn.close()
Of course change the hostname, username, password, and/or
port number as appropriate for your setup.
All is well if the program runs without errors.
If there are errors, they’ll likely be one
of these:
psycopg2.OperationalError: could not connect to server: Connection refused
psycopg2.OperationalError: FATAL: password authentication failed for user Al
psycopg2.OperationalError: FATAL: database deliveries does not exist
The first means either the database service is down,
or a network issue (such as a
firewall) is preventing access.
The second and third errors show the database service
is reachable, but the account or database you want
to use either does not exist or is
not configured properly.
You or your database administrator will need to reexamine
the setup.
Create (and delete)
As hinted by the deliveries
database in the SQL
commands
above, I’ll repeat the package delivery example from
the MATLAB+SQLite
article.
To start, I’ll run the
fake_data.m
program in MATLAB to
create a text file, orders.txt
, containing 100,000 lines
of data to insert into the database.
MATLAB:
A Python program that reads orders.txt
, creates and inserts the
values into the orders table in batches of 10,000 rows
at a time looks like this:
File: make_pg_db.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
#!/usr/bin/env python
# make_pg_db.py
import time
import psycopg2
import pathlib
conn = psycopg2.connect(host="localhost", port=5432,
user="Al", password="SuperSecret",
database="deliveries")
cursor = conn.cursor()
# delete/create
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
cursor.execute("""create table rates(item varchar primary key,
cost_by_weight float);""")
create_orders_table = """create table orders (
origin varchar,
item varchar,
weight float,
dest varchar);"""
cursor.execute(create_orders_table)
rows = [ ['tube' , 0.46],
['package', 0.65],
['letter' , 0.16],
['card' , 0.05] ]
insert = "insert into rates (item,cost_by_weight) values(%s,%s)"
cursor.executemany(insert, rows)
# load orders data from the text file
T_s = time.time()
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]
print(f'Data read time = {time.time() - T_s:.3f} seconds')
insert = "insert into orders (origin,item,weight,dest) values(%s,%s,%s,%s)"
T_s = time.time()
batch_size = 10_000 # insert this many rows at a time
for i in range(0, len(rows), batch_size):
rows_subset = rows[i:i+batch_size]
cursor.executemany(insert, rows_subset)
print(f'inserted {i+1} to {i+batch_size}')
conn.commit()
print(f'Data insert time = {time.time() - T_s:.3f} seconds')
conn.close()
|
I’m connecting to a PostgreSQL instance running
on the same computer (2015 Dell XPS 13 laptop running
Ubuntu 20.04) where I run make_pg_db.py
.
Insert performance is about 8,000 rows per second,
50x slower that SQLite inserts.
I’m sure there are PostgreSQL configuration optimizations I could make
to increase insert performance if I knew more about this database engine.
As with inserts to a SQLite database, MATLAB needs to
store its insertion data row-major.
We already saw that
reading and reshuffling orders.txt
from column-major to row-major in MATLAB is slow
so our MATLAB implementation of the PostgreSQL
insert program will use the same
load_orders.py
Python module used to speed up MATLAB in the SQLite example.
Two equivalent MATLAB insertion program appear below.
The first relies on the Python function keyword argument
enhancements in 2022a.
File: make_pg_db.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
% MATLAB 2022a and newer
% make_pg_db.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(host="localhost", port=int64(5432), ...
user="Al", password="SuperSecret", ...
database="deliveries");
cursor = conn.cursor();
% delete/create
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
cursor.execute("create table rates(item varchar primary key, " + ...
"cost_by_weight float);");
create_orders_table = "create table orders (" + ...
"origin varchar," + ...
"item varchar," + ...
"weight float," + ...
"dest varchar);";
cursor.execute(create_orders_table);
% populate the rates table
rows = { {'tube' , 0.46}, ...
{'package', 0.65}, ...
{'letter' , 0.16}, ...
{'card' , 0.05} };
insert = "insert into rates (item,cost_by_weight) values(%s,%s)";
cursor.executemany(insert, rows);
% load orders data from the text file and insert to orders table
tic;
rows = py.load_orders.readfile('orders.txt');
fprintf('Data read time = %.6f seconds\n', toc)
insert = "insert into orders (origin,item,weight,dest) values(%s,%s,%s,%s)";
tic
batch_size = 10000; % insert this many rows at a time
n_rows = length(rows);
for i = 1: batch_size:n_rows
i_end = min(i+batch_size-1, n_rows);
rows_subset = rows(i:i_end);
cursor.executemany(insert, rows_subset)
fprintf('inserted %d to %d\n', i, i_end);
end
conn.commit();
fprintf('Data insert time = %.6f seconds\n', toc)
conn.close();
|
while the version below uses the older pyargs()
function to wrap
Python keyword arguments. It works with MATLAB 2020b and newer.
The two versions of this program differ only at lines 4, 5 and 6.
File: make_pg_db_2020b.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
% MATLAB 2020b and newer
% make_pg_db_2020b.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
"user","Al","password","SuperSecret", ...
"database","deliveries"));
cursor = conn.cursor();
% delete/create
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
cursor.execute("create table rates(item varchar primary key, " + ...
"cost_by_weight float);");
create_orders_table = "create table orders (" + ...
"origin varchar," + ...
"item varchar," + ...
"weight float," + ...
"dest varchar);";
cursor.execute(create_orders_table);
% populate the rates table
rows = { {'tube' , 0.46}, ...
{'package', 0.65}, ...
{'letter' , 0.16}, ...
{'card' , 0.05} };
insert = "insert into rates (item,cost_by_weight) values(%s,%s)";
cursor.executemany(insert, rows);
% load orders data from the text file and insert to orders table
tic;
rows = py.load_orders.readfile('orders.txt');
fprintf('Data read time = %.6f seconds\n', toc)
insert = "insert into orders (origin,item,weight,dest) values(%s,%s,%s,%s)";
tic
batch_size = 10000; % insert this many rows at a time
n_rows = length(rows);
for i = 1: batch_size:n_rows
i_end = min(i+batch_size-1, n_rows);
rows_subset = rows(i:i_end);
cursor.executemany(insert, rows_subset)
fprintf('inserted %d to %d\n', i, i_end);
end
conn.commit();
fprintf('Data insert time = %.6f seconds\n', toc)
conn.close();
|
MATLAB insert performance matches Python’s, about
8,000 rows per second—in other words,
not great compared to SQLite inserts.
Read
This query computes the cummulative cost of shipments
by originating office.
select origin,sum(weight*cost_by_weight) from orders O, rates R
where O.item = R.item group by origin;
Programmatic implementations of the query in
Python and MATLAB resemble those in the
SQLite select example:
File: select_pg.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
#!/usr/bin/env python
# select_pg.py
import psycopg2
conn = psycopg2.connect(host="localhost", port=5432,
user="Al", password="SuperSecret",
database="deliveries")
cursor = conn.cursor()
query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" \
" where O.item = R.item group by origin order by origin;"
cursor.execute(query)
for row in cursor.fetchall():
print(f'{row[0]} {row[1]:7.2f}')
conn.close()
|
and equivalently in MATLAB 2022a+ with
File: select_pg.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
% MATLAB 2022a and newer
% select_pg.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(host="localhost", port=int64(5432), ...
user="Al", password="SuperSecret", ...
database="deliveries");
cursor = conn.cursor();
query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
" where O.item = R.item group by origin order by origin;";
cursor.execute(query);
for row = cursor.fetchall()
fprintf('%s %7.2f\n', string(row{1}{1}), row{1}{2});
end
conn.close()
|
and in MATLAB 2020b+ with
File: select_pg_2020b.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
% MATLAB 2020b and newer
% select_pg.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
"user","Al","password","SuperSecret", ...
"database","deliveries"));
cursor = conn.cursor();
query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
" where O.item = R.item group by origin order by origin;";
cursor.execute(query);
for row = cursor.fetchall()
fprintf('%s %7.2f\n', string(row{1}{1}), row{1}{2});
end
conn.close()
|
Update
As in the
SQLite update example,
this SQL update scales the weight of packages by 2.2
while leaving the wieghts of cards, letters, and tubes alone:
update orders set weight = weight*2.2 where item = 'package';
The programs below repeat the select
statement from the
previous section to show total cost by origin.
The SQL update
is implemente with line 17 of the Python
program and lines 18 in the two MATLAB versions.
After updating the weight
field the programs re-query
for the new cummulative costs:
File: select_update_pg.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
#!/usr/bin/env python
# select_update_pg.py
import psycopg2
conn = psycopg2.connect(host="localhost", port=5432,
user="Al", password="SuperSecret",
database="deliveries")
cursor = conn.cursor()
query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" \
" where O.item = R.item group by origin order by origin;"
print('Original cost:')
cursor.execute(query)
for row in cursor.fetchall():
print(f'{row[0]} {row[1]:7.2f}')
# update: scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'")
print('After the weight correction:')
cursor.execute(query)
for row in cursor.fetchall():
print(f'{row[0]} {row[1]:7.2f}')
conn.close()
|
MATLAB 2022a+:
File: select_update_pg.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
% MATLAB 2022a and newer
% select_update_pg.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
"user","Al","password","SuperSecret", ...
"database","deliveries"));
cursor = conn.cursor();
query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
" where O.item = R.item group by origin order by origin;";
fprintf('Original cost:\n')
cursor.execute(query);
for row = cursor.fetchall()
fprintf('%s %7.2f\n', string(row{1}{1}), row{1}{2});
end
% update: scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'");
fprintf('After the weight correction:\n')
cursor.execute(query);
for row = cursor.fetchall()
fprintf('%s %7.2f\n', string(row{1}{1}), row{1}{2});
end
conn.close()
|
MATLAB 2020b+:
File: select_update_pg_2020b.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
% MATLAB 2020b and newer
% select_update_pg_2020b.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
"user","Al","password","SuperSecret", ...
"database","deliveries"));
cursor = conn.cursor();
query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
" where O.item = R.item group by origin order by origin;";
fprintf('Original cost:\n')
cursor.execute(query);
for row = cursor.fetchall()
fprintf('%s %7.2f\n', string(row{1}{1}), row{1}{2});
end
% update: scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'");
fprintf('After the weight correction:\n')
cursor.execute(query);
for row = cursor.fetchall()
fprintf('%s %7.2f\n', string(row{1}{1}), row{1}{2});
end
conn.close()
|
Join me again on July 23, 2022 to see how MATLAB can work with
MongoDB
databases using the
pymongo
Python module.
Table of Contents |
Previous: Write custom Excel spreadsheets | Next: Interact with MongoDB