Recent Posts (page 5 / 8)

by Al Danial

Work with MongoDB in MATLAB Using the pymongo Python Module

Part 6 of the Python Is The Ultimate MATLAB Toolbox series.

Earlier posts showed how Python can help MATLAB interact with SQLite and PostgreSQL. I’ll conclude the database portion of this series with examples showing how to work with MongoDB in MATLAB using the Python module pymongo. As with the other databases, the best way to interact with MongoDB in MATLAB is to use the MathWorks' Database Toolbox. This post may only be useful to you if you can include Python in your MATLAB workflow and don’t have access to the Database Toolbox.

Install pymongo

You’ll need the pymongo Python module in your installation to call it from MATLAB. Check your Python installation to see if it has pymongo by starting the Python command line REPL and importing the module. If it imports successfully you can also check the version number:

> python
>>> import pymongo
>>> pymongo.__version__
'3.11.0'

If you get a ModuleNotFoundError, you’ll need to install it. That can be done on an Anaconda Python installation with

> conda install pymongo

or with

> python -m pip install pymongo

using other Python distributions.

Connecting to a MongoDB Database

As with the PostgreSQL post, probably the hardest step in this post involves making the initial connection to a MongoDB server. That will only be successful if you have

  • an account and password
  • permission to create and delete databases
  • network access to the MongoDB server (if it isn’t on your local computer) on or to a MongoDB server.

To keep things simple, here I’ll work with a MongoDB instance running on my localhost. Refer to instructions on starting a MongoDB database on Ubuntu, Red Hat, Windows, macOS if you’re unfamiliar with the process. The mongo command line tool shows that the database engine is running and that I’m able to connect to it:

> mongo
MongoDB shell version v3.6.8
connecting to: mongodb://127.0.0.1:27017
Implicit session: session { "id" : UUID("e629be9e-e016-4e37-872c-e3f4ea3eb739") }
MongoDB server version: 3.6.8
Server has startup warnings:
2022-07-08T09:40:01.686-0700 I STORAGE  [initandlisten]
2022-07-08T09:40:03.337-0700 I CONTROL  [initandlisten]
2022-07-08T09:40:03.337-0700 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2022-07-08T09:40:03.337-0700 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2022-07-08T09:40:03.337-0700 I CONTROL  [initandlisten]

> show dbs;
admin            0.000GB
config           0.000GB
local            0.000GB

Connecting manually works so I’ll try it programmatically, first in Python, then in MATLAB:

Python: mongo_connect.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/env python
# file: mongo_connect.py
from pymongo import MongoClient
client = MongoClient('localhost:27017')
for db in client.list_databases():
    # db is a dictionary with entries like
    #   {'name': 'admin', 'sizeOnDisk': 32768.0, 'empty': False}
    print(db)
    print(db['name'])
client.close()

Running this gives the same result as show dbs; in the mongo shell:

> ./mongo_connect.py
admin
config
local

Now in MATLAB:

MATLAB: mongo_connect.m

1
2
3
4
5
6
7
% file: mongo_connect.m
pymongo = py.importlib.import_module('pymongo');
client = pymongo.MongoClient('localhost:27017');
for db = py.list(client.list_databases())
    fprintf('%s\n', string(db{1}{'name'}))
end
client.close()

The MATLAB version is a bit more complex for a couple reasons:

  1. The for loop cannot iterate over client.list_databases() directly as Python can. Attempting to do that results in the error Array formation and parentheses-style indexing with objects of class 'py.pymongo.command_cursor.CommandCursor' is not allowed. Use objects of class 'py.pymongo.command_cursor.CommandCursor' only as scalars or use a cell array. Error in mongo_connect The work-around is to explicitly expand this function call to a Python list.
  2. The loop variable db is a single-item Python list containing the dictionary we expect rather than the dictionary itself. The {1} index before the {'name'} key retrieves the dictionary out of the list. Next, the database name returned by db{1}{'name'} is a a Python string, but if we attempt to fprintf() that directly, MATLAB will print the individual characters of the string. To get around that we have to explicitly convert the Python string to a MATLAB version so we end up with string(db{1}{'name'}).

MATLAB 2020b and higher then give the expected output:

>> mongo_connect
admin
config
local

While relatively simple, the issues we’re stumbling across using MongoDB in MATLAB with pymongo foreshadow more challenges ahead.

Aside: connecting to a remote MongoDB Database

The client connection to a local MongoDB instance done with

client = MongoClient('localhost:27017')

takes a notably different form when connecting to a remote database that requires authentication. In that case the connection string passed to the MongoClient() function is more involved:

Python:

db_user = 'al'
db_passwd = 'SuperSecret'
db_host = '413.198.231.158'
db_port = 27027
uri = "mongodb://%s:%s@%d/purchase_orders" % (
         db_user, db_passwd, db_host, db_port)
client = MongoClient(uri)

MATLAB 2020b and higher:

db_user = 'al';
db_passwd = 'SuperSecret';
db_host = '413.198.231.158';
db_port = 27027;
uri = sprintf("mongodb://%s:%s@%s:%d/purchase_orders", ...
        db_user, db_passwd, db_host, db_port);
client = pymongo.MongoClient(uri);

Insert

I’ll insert the same package delivery data used for the SQLite+MATLAB and PostgreSQL+MATLAB. Well, not exactly the same since NoSQL database engines such as MongoDB don’t have a concept of tables. Rather than separating the delivery data into orders and rates tables, all information will be inserted into documents (BSON records) of a collection.

The first rows of each SQL table in the SQLite example are

orders table

origin item weight destination
Z66 tube 1.73 N51

rates table

item cost by weight
tube 0.46

These can be represented by the following JSON record (JSON is a subset of BSON):

  { "origin" : "Z66",
    "item" : "tube",
    "weight" : 1.73,
    "dest" : "N51",
    "cost_by_weight" : 0.46}

Inserting a single record

Creating a collection and inserting a document takes just a few lines of code. This is an expanded form of the connection tester which inserts the record above:

Python: mongo_insert_one.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/env python
# file: mongo_insert_one.py
from pymongo import MongoClient
client = MongoClient('localhost:27017')
db = client.deliveries
record =  { "origin" : "Z66", "item" : "tube",
            "weight" : 1.73, "dest" : "N51",
            "cost_by_weight" : 0.46}
db.deliveries.insert_one( record )
client.close()

Note that I didn’t pass a JSON string to the .insert_one() method. Instead, I gave it a Python dictionary. This is a convenience provided by the pymongo module which converts dictionaries to JSON—or BSON if needed—for us. Once again the MATLAB version requires more effort as it won’t create the deliveries collection for us automatically as in Python. Here we have to explicitly spell out a database and a collection to create:

MATLAB: mongo_insert_one.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
% file: mongo_insert_one.m
pymongo = py.importlib.import_module('pymongo');
client = pymongo.MongoClient('localhost:27017');
deliveries = client.get_database('deliveries').get_collection('deliveries');
record = py.dict(pyargs( ...
                 "origin", "Z66", "item", "tube", ...
                 "weight", 1.73, "dest", "N51",   ...
                 "cost_by_weight", 0.46));
deliveries.insert_one( record )
client.close()

Now that we’ve covered the basic mechanics of connecting to a MongoDB database and inserting a record we can move on to a more realistic program that reads many records from a text file then inserts them in bulk.

Inserting multiple records

Once again I’ll reuse the fake_data.m m-file from the SQLite article to create a text file of delivery data. Running it in MATLAB,

>> fake_data
>>
leaves me with the text file, orders.txt, which has 100,000 lines of delivery orders.

I’d also like to reuse the four line Python module load_orders.py to speed up data ingest in MATLAB as I did for the SQLite and PostgreSQL examples but the output from its readfile() function isn’t well-suited for use with MongoDB. We saw that pymongo’s insert_one() method takes a Python dictionary as an input argument. The corresponding bulk insert method, insert_many(), takes a list of dictionaries. The load_orders.readfile() function returns a list of lists which would have to be manipulated further to a list of dictionaries. Rather than doing that, I’ll implement a new file reader that directly returns the structure I want. It looks like this:

Python: load_orders_for_mongodb.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# file: load_orders_for_mongodb.py
import pathlib
def readfile(File):
    cbw = {'tube'   : 0.46, 'package': 0.65,
           'letter' : 0.16, 'card'   : 0.05}
    P = pathlib.Path(File)
    all_orders = []
    rows = [_.split() for _ in P.read_text().split('\n') if _]
    for origin, item, weight, dest in rows:
        order = { 'origin' : origin, 'item' : item,
                  'weight' : float(weight), 'dest' : dest,
                  'cost_by_weight' : cbw[item] }
        all_orders.append( order )
    return all_orders

I’ll import this module and call its readfile() function in both Python and MATLAB bulk insert programs:

Python: mongo_insert_many.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/env python
# file: mongo_insert_many.py
from pymongo import MongoClient
from load_orders_for_mongodb import readfile
import time
client = MongoClient('localhost:27017')
db = client.deliveries

tic = time.time()
all_records = readfile('orders.txt')
toc = time.time() - tic
print(f'read   {len(all_records)} took {toc:.3f} s')

# delete the deliveries collection if it already exists
if 'deliveries' in db.list_collection_names():
    db['deliveries'].drop()

tic = time.time()
db.deliveries.insert_many( all_records )
toc = time.time() - tic
print(f'insert {len(all_records)} took {toc:.3f} s')

client.close()

MATLAB: mongo_insert_many.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
% file: mongo_insert_many.m
pymongo = py.importlib.import_module('pymongo');
LO = py.importlib.import_module('load_orders_for_mongodb');
client = pymongo.MongoClient('localhost:27017');
db = client.get_database('deliveries');
deliveries = db.get_collection('deliveries');

tic;
all_records = LO.readfile('orders.txt');
fprintf('read   %d took %.3f s\n', length(all_records), toc)

% delete the deliveries collection if it already exists
if db.list_collection_names().count('deliveries') > 0
    db.drop_collection('deliveries');
end

tic;
deliveries.insert_many( all_records );
fprintf('insert %d took %.3f s\n', length(all_records), toc)

client.close()

Insert performance is fairly impressive. (Times in seconds; hardware is a 2015 Dell XPS 13 laptop running Ubuntu 20.04, Anaconda Python 2021.05, and MATLAB 2020b)

n_rows MATLAB + Python read MATLAB insert Python read Python insert
100,000 0.33 1.41 0.18 1.30
1,000,000 2.89 15.15 2.23 14.87

Query

The SQL statement

select origin,sum(weight*cost_by_weight) from orders O, rates R
  where O.item = R.item group by origin;
used in the SQLite and PostgreSQL examples takes a starkly different form in MongoDB. There the query can be expressed as
db.deliveries.aggregate([{$group :
    {_id : {F1:'$origin', F2:'$dest'},
     tot : {$sum : '$weight'} } } ])
which can be implemented like this in Python:

Python: mongo_query.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env python
# file: mongo_query.py
from pymongo import MongoClient
client = MongoClient('localhost:27017')
db = client.deliveries

n_records = db.deliveries.count_documents({})
print(f'number of records in deliveries = {n_records}')

pipeline = [{'$group' : {'_id' : {'F1':'$origin', 'F2':'$dest'},
                         'tot' : {'$sum' : '$weight'}}}]

for x in db.deliveries.aggregate(pipeline=pipeline):
    origin = x['_id']['F1']
    dest   = x['_id']['F2']
    sum_weight = x['tot']
    print(f'{origin} -> {dest}   {sum_weight:6.2f}')

client.close()

Lines 9-13 in the MATLAB code below show the rather clumsy notation needed to express the multiply-nested dictionaries of the aggregation pipeline:

MATLAB: mongo_query.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
% file: mongo_query.m
pymongo = py.importlib.import_module('pymongo');
client = pymongo.MongoClient('localhost:27017');
deliveries = client.get_database('deliveries').get_collection('deliveries');

n_records = deliveries.count_documents(py.dict());
fprintf('number of records in deliveries = %d\n', n_records)

query = py.dict(pyargs('$group', py.dict(pyargs(             ...
        '_id', py.dict(pyargs('F1','$origin','F2','$dest')), ...
        'tot', py.dict(pyargs('$sum','$weight')) )) ));
pipeline = py.list();
pipeline.append(query);

for x = py.list(deliveries.aggregate(pyargs('pipeline',pipeline)))
    origin = string(x{1}{'_id'}{'F1'});
    dest   = string(x{1}{'_id'}{'F2'});
    sum_weight = x{1}{'tot'};
    fprintf('%s -> %s   %6.2f\n', origin, dest, sum_weight)
end
client.close()

Update

In the SQLite and PostgreSQL examples I updated those respective databases by scaling the weight of packages by 2.2 with this SQL statement:

update orders set weight = weight*2.2 where item = 'package';
The equivalent in MongoDB is considerably more complex with my setup as the version I’m running, v3.6.8, does not support multiple record updates with operators (this is supported in v4.2). My options are to write a loop over the single record update function update_one(), or to completely replace the records which have item = 'package' with new entries in Python. Looping over a single record update function sounds slow so I’ll take the second approach.

Python: mongo_updage.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/env python
# file: mongo_update.py
from pymongo import MongoClient

def print_records(cursor):
    for i,x in enumerate(cursor):
        origin = x['origin']
        item = x['item']
        weight = x['weight']
        dest = x['dest']
        print(f'{i+1:3d}. {origin} {item:8s} {weight:10.6f} {dest}')

def heaviest_packages(n, db):
    packages = { "item": { "$eq": "package" } }
    return db.find(packages).sort([('weight', -1)]).limit(n)

client = MongoClient('localhost:27017')
db = client.deliveries

print('Before scaling weight of packages')
print_records(heaviest_packages(3, db.deliveries))

packages = { "item": { "$eq": "package" } }
package_records = db.deliveries.find(packages)
updated_packages = []
for record in package_records:
    new_record = record
    new_record['weight'] *= 2.2
    del new_record['_id']
    updated_packages.append(new_record)

db.deliveries.delete_many(packages)
db.deliveries.insert_many(updated_packages)

print('After scaling weight of packages')
print_records(heaviest_packages(3, db.deliveries))

client.close()

MATLAB: mongo_updage.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% file: mongo_update.m
pymongo = py.importlib.import_module('pymongo');
client = pymongo.MongoClient('localhost:27017');
deliveries = client.get_database('deliveries').get_collection('deliveries');

fprintf('Before scaling weight of packages\n')
print_records(heaviest_packages(10, deliveries))

packages = py.dict(pyargs("item",py.dict(pyargs("$eq","package"))));
%        = { "item" : { "$eq" : "packages } }
package_records = deliveries.find(packages);
updated_packages = py.list();
for record = py.list(package_records)
    new_record = record;
    new_record{1}.update(pyargs('weight', new_record{1}{'weight'} * 2.2));
    new_record{1}.pop('_id'); % == del new_record['_id']
    updated_packages.append(new_record{1});
end

deliveries.delete_many(packages);
deliveries.insert_many(updated_packages);

fprintf('After  scaling weight of packages\n')
print_records(heaviest_packages(3, deliveries))

client.close()

function print_records(cursor)
    i = 0;
    for x = py.list(cursor)
        i = i + 1;
        origin = string(x{1}{'origin'});
        item = string(x{1}{'item'});
        weight = x{1}{'weight'};
        dest = string(x{1}{'dest'});
        fprintf('%3d. %s %8s %10.6f %s\n', ...
            i, origin, item, weight, dest)
    end
end

function [cursor] = heaviest_packages(n, db)
    packages = py.dict(pyargs("item",py.dict(pyargs("$eq","package"))));
    by_weight = py.list({py.tuple({'weight', int64(-1)})});
    cursor = db.find(packages).sort(by_weight).limit(int64(n));
end

As a sanity check, both programs print records having the three heaviest packages before and after the update. The output from both is correct but also a bit disappointing:

Before scaling weight of packages
  1. C30 package    5.000000 Q38
  2. Y33 package    5.000000 E99
  3. B72 package    5.000000 L20
After scaling weight of packages
  1. C30 package   11.000000 Q38
  2. Y33 package   11.000000 E99
  3. B72 package   11.000000 L20

There’s no variation in the weight because the fake data generator only used three significant figures for weight. Consequently the maximum value of 5.00 appears many times in 100,000 records. A more convincing test is to work with a small database (for example, set n_rows to 50 in fake_data.m) then print all records before and after.


Table of Contents | Previous: Interact with PostgreSQL | Next : Interact with Redis

by Al Danial

Work with PostgreSQL in MATLAB Using the psycopg2 Python Module

Part 5 of the Python Is The Ultimate MATLAB Toolbox series.

In Read and write SQLite databases in MATLAB with Python I showed how MATLAB can work with SQLite database files using the sqlite3 module from the Python standard library. Here I’ll demonstrate CRUD—create, read, update, delete—operations using the PostgreSQL database engine and the psycopg2 Python module. The best way to work with PostgreSQL in MATLAB is to use the MathWorks' Database Toolbox. This post may only interest you if you lack access to the Database Toolbox and you’re willing to include Python in your MATLAB workflow.

Install psycopg2

Unlike sqlite3, psycopg2 is not in the standard Python library. Check your Python installation to see if it has psycopg2 by starting the Python command line REPL and importing the module. If it imports successfully you can also check the version number:

> python
>>> import psycopg2
>>> psycopg2.__version__
'2.9.3 (dt dec pq3 ext lo64)'

If you get a ModuleNotFoundError, you’ll need to install it. That can be done on an Anaconda Python installation with

> conda install psycopg2

or with

> python -m pip install psycopg2

using other Python distributions.

Connecting to a PostgreSQL Database

The most challenging step in this post is setting up, configuring, and making an initial connection to a PostgreSQL database service—all of which are beyond the scope of this post. Ideally you are already familiar with the process, or have access to an existing database service, or know someone who can get you set up. If not, follow the PostgreSQL getting started documentation. The rest of this post assumes you have

  • an account and password
  • permission to create and delete databases
  • network access

on or to a PostgreSQL server. Briefly, a PostgreSQL server administrator on a Linux computer might issue commands such as

> sudo -u postgres psql
postgres=# create role Al with login password 'SuperSecret';
postgres=# create database deliveries;
postgres=# grant all privileges on database deliveries to Al;
postgres=# quit
to set me up.

Run this small Python program to check if you can connect to your PostgreSQL server:

#!/usr/bin/env python
# test_pg_connection.py
import psycopg2
conn = psycopg2.connect(host="big_db_server", port=5432, # default port
                        user="Al", password="SuperSecret",
                        database="deliveries")
conn.close()

Of course change the hostname, username, password, and/or port number as appropriate for your setup. All is well if the program runs without errors. If there are errors, they’ll likely be one of these:

  • psycopg2.OperationalError: could not connect to server: Connection refused
  • psycopg2.OperationalError: FATAL: password authentication failed for user Al
  • psycopg2.OperationalError: FATAL: database deliveries does not exist

The first means either the database service is down, or a network issue (such as a firewall) is preventing access. The second and third errors show the database service is reachable, but the account or database you want to use either does not exist or is not configured properly. You or your database administrator will need to reexamine the setup.

Create (and delete)

As hinted by the deliveries database in the SQL commands above, I’ll repeat the package delivery example from the MATLAB+SQLite article. To start, I’ll run the fake_data.m program in MATLAB to create a text file, orders.txt, containing 100,000 lines of data to insert into the database.

MATLAB:

>> fake_data
>>

A Python program that reads orders.txt, creates and inserts the values into the orders table in batches of 10,000 rows at a time looks like this:

File: make_pg_db.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/usr/bin/env python
# make_pg_db.py
import time
import psycopg2
import pathlib
conn = psycopg2.connect(host="localhost", port=5432,
                        user="Al", password="SuperSecret",
                        database="deliveries")

cursor = conn.cursor()

# delete/create
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
cursor.execute("""create table rates(item varchar primary key,
                                     cost_by_weight float);""")
create_orders_table = """create table orders (
                              origin varchar,
                              item   varchar,
                              weight float,
                              dest   varchar);"""
cursor.execute(create_orders_table)

rows = [ ['tube'   , 0.46],
         ['package', 0.65],
         ['letter' , 0.16],
         ['card'   , 0.05] ]
insert = "insert into rates (item,cost_by_weight) values(%s,%s)"
cursor.executemany(insert, rows)

# load orders data from the text file
T_s = time.time()
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]
print(f'Data read time = {time.time() - T_s:.3f} seconds')

insert = "insert into orders (origin,item,weight,dest) values(%s,%s,%s,%s)"
T_s = time.time()
batch_size = 10_000 # insert this many rows at a time
for i in range(0, len(rows), batch_size):
    rows_subset = rows[i:i+batch_size]
    cursor.executemany(insert, rows_subset)
    print(f'inserted {i+1} to {i+batch_size}')
conn.commit()
print(f'Data insert time = {time.time() - T_s:.3f} seconds')

conn.close()

I’m connecting to a PostgreSQL instance running on the same computer (2015 Dell XPS 13 laptop running Ubuntu 20.04) where I run make_pg_db.py. Insert performance is about 8,000 rows per second, 50x slower that SQLite inserts. I’m sure there are PostgreSQL configuration optimizations I could make to increase insert performance if I knew more about this database engine.

As with inserts to a SQLite database, MATLAB needs to store its insertion data row-major. We already saw that reading and reshuffling orders.txt from column-major to row-major in MATLAB is slow so our MATLAB implementation of the PostgreSQL insert program will use the same load_orders.py Python module used to speed up MATLAB in the SQLite example.

Two equivalent MATLAB insertion program appear below. The first relies on the Python function keyword argument enhancements in 2022a.

File: make_pg_db.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
% MATLAB 2022a and newer
% make_pg_db.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(host="localhost", port=int64(5432), ...
                        user="Al", password="SuperSecret",  ...
                        database="deliveries");
cursor = conn.cursor();

% delete/create
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
cursor.execute("create table rates(item varchar primary key, " + ...
                                   "cost_by_weight float);");
create_orders_table = "create table orders (" + ...
                           "origin varchar,"  + ...
                           "item   varchar,"  + ...
                           "weight float,"    + ...
                           "dest   varchar);";
cursor.execute(create_orders_table);

% populate the rates table
rows = { {'tube'   , 0.46}, ...
         {'package', 0.65}, ...
         {'letter' , 0.16}, ...
         {'card'   , 0.05} };
insert = "insert into rates (item,cost_by_weight) values(%s,%s)";
cursor.executemany(insert, rows);

% load orders data from the text file and insert to orders table
tic;
rows = py.load_orders.readfile('orders.txt');
fprintf('Data read time = %.6f seconds\n', toc)

insert = "insert into orders (origin,item,weight,dest) values(%s,%s,%s,%s)";                                                                                          
tic
batch_size = 10000; % insert this many rows at a time
n_rows = length(rows);
for i = 1: batch_size:n_rows
    i_end = min(i+batch_size-1, n_rows);
    rows_subset = rows(i:i_end);
    cursor.executemany(insert, rows_subset)
    fprintf('inserted %d to %d\n', i, i_end);
end
conn.commit();
fprintf('Data insert time = %.6f seconds\n', toc)
conn.close();

while the version below uses the older pyargs() function to wrap Python keyword arguments. It works with MATLAB 2020b and newer. The two versions of this program differ only at lines 4, 5 and 6.

File: make_pg_db_2020b.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
% MATLAB 2020b and newer
% make_pg_db_2020b.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
                               "user","Al","password","SuperSecret",  ...
                               "database","deliveries"));
cursor = conn.cursor();

% delete/create
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
cursor.execute("create table rates(item varchar primary key, " + ...
                                   "cost_by_weight float);");
create_orders_table = "create table orders (" + ...
                           "origin varchar,"  + ...
                           "item   varchar,"  + ...
                           "weight float,"    + ...
                           "dest   varchar);";
cursor.execute(create_orders_table);

% populate the rates table
rows = { {'tube'   , 0.46}, ...
         {'package', 0.65}, ...
         {'letter' , 0.16}, ...
         {'card'   , 0.05} };
insert = "insert into rates (item,cost_by_weight) values(%s,%s)";
cursor.executemany(insert, rows);

% load orders data from the text file and insert to orders table
tic;
rows = py.load_orders.readfile('orders.txt');
fprintf('Data read time = %.6f seconds\n', toc)

insert = "insert into orders (origin,item,weight,dest) values(%s,%s,%s,%s)";                                                                                          
tic
batch_size = 10000; % insert this many rows at a time
n_rows = length(rows);
for i = 1: batch_size:n_rows
    i_end = min(i+batch_size-1, n_rows);
    rows_subset = rows(i:i_end);
    cursor.executemany(insert, rows_subset)
    fprintf('inserted %d to %d\n', i, i_end);
end
conn.commit();
fprintf('Data insert time = %.6f seconds\n', toc)
conn.close();

MATLAB insert performance matches Python’s, about 8,000 rows per second—in other words, not great compared to SQLite inserts.

Read

This query computes the cummulative cost of shipments by originating office.

select origin,sum(weight*cost_by_weight) from orders O, rates R
  where O.item = R.item group by origin;

Programmatic implementations of the query in Python and MATLAB resemble those in the SQLite select example:

File: select_pg.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/usr/bin/env python
# select_pg.py
import psycopg2
conn = psycopg2.connect(host="localhost", port=5432,
                        user="Al", password="SuperSecret",
                        database="deliveries")
cursor = conn.cursor()

query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" \
        "    where O.item = R.item group by origin order by origin;"
cursor.execute(query)
for row in cursor.fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

conn.close()

and equivalently in MATLAB 2022a+ with

File: select_pg.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
% MATLAB 2022a and newer
% select_pg.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(host="localhost", port=int64(5432), ...
                        user="Al", password="SuperSecret",  ...
                        database="deliveries");
cursor = conn.cursor();

query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
cursor.execute(query);
for row = cursor.fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

and in MATLAB 2020b+ with

File: select_pg_2020b.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
% MATLAB 2020b and newer
% select_pg.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
                               "user","Al","password","SuperSecret",  ...
                               "database","deliveries"));
cursor = conn.cursor();

query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
cursor.execute(query);
for row = cursor.fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

Update

As in the SQLite update example, this SQL update scales the weight of packages by 2.2 while leaving the wieghts of cards, letters, and tubes alone:

update orders set weight = weight*2.2 where item = 'package';

The programs below repeat the select statement from the previous section to show total cost by origin. The SQL update is implemente with line 17 of the Python program and lines 18 in the two MATLAB versions. After updating the weight field the programs re-query for the new cummulative costs:

File: select_update_pg.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env python
# select_update_pg.py
import psycopg2
conn = psycopg2.connect(host="localhost", port=5432,
                        user="Al", password="SuperSecret",
                        database="deliveries")
cursor = conn.cursor()

query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" \
        "    where O.item = R.item group by origin order by origin;"
print('Original cost:')
cursor.execute(query)
for row in cursor.fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

# update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'")

print('After the weight correction:')
cursor.execute(query)
for row in cursor.fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

conn.close()

MATLAB 2022a+:

File: select_update_pg.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
% MATLAB 2022a and newer
% select_update_pg.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
                               "user","Al","password","SuperSecret",  ...
                               "database","deliveries"));
cursor = conn.cursor();

query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
fprintf('Original cost:\n')
cursor.execute(query);
for row = cursor.fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

% update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'");

fprintf('After the weight correction:\n')
cursor.execute(query);
for row = cursor.fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

MATLAB 2020b+:

File: select_update_pg_2020b.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
% MATLAB 2020b and newer
% select_update_pg_2020b.m
psycopg2 = py.importlib.import_module('psycopg2');
conn = psycopg2.connect(pyargs("host","localhost","port",int64(5432), ...
                               "user","Al","password","SuperSecret",  ...
                               "database","deliveries"));
cursor = conn.cursor();

query = "select origin,sum(weight*cost_by_weight) from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
fprintf('Original cost:\n')
cursor.execute(query);
for row = cursor.fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

% update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'");

fprintf('After the weight correction:\n')
cursor.execute(query);
for row = cursor.fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

Join me again on July 23, 2022 to see how MATLAB can work with MongoDB databases using the pymongo Python module.


Table of Contents | Previous: Write custom Excel spreadsheets | Next: Interact with MongoDB