Recent Posts (page 6 / 8)

by Al Danial

Create Excel files in MATLAB with Python's openpyxl module

Part 4 of the Python Is The Ultimate MATLAB Toolbox series.

Your latest MATLAB masterpiece produced a goldmine of incisive data and now it’s time to present the results to management. MATLAB Live Scripts are an excellent reporting mechanism but these may not suit an organization’s workflow or documentation standards. Sometimes it’s necessary to fall back to showing results in a basic tool like Microsoft Excel.

MATLAB + Excel = not easy

MATLAB’s ability to create Excel files is modest at best. The writecell(), writematrix(), and writetable() functions can populate an Excel file with data, but only using the default text style. If you want to change typefaces, fonts, colors, merge cells and so on, you’re in for some hardship. Custom formatting is only supported on Windows using the COM layer (see Write Data to Excel Spreadsheets, bottom of the page) and examples demonstrating this process are hard to find.

If you’re using MATLAB on macOS or Linux, you’re completely out of luck writing rich Excel files.

MATLAB + Python + Excel = easy!

If you include Python with your MATLAB setup on the otherhand, creating richly-formatted Excel files on any platform is easy. Excel files can be created in MATLAB by calling functions from the Python openpyxl module. The first step is to check if you have openpyxl and to install it if you don’t.

Install openpyxl

Check your Python installation to see if it has openpyxl by starting the Python command line REPL and importing the module. If it imports successfully you can also check the version number:

> python
>>> import openpyxl
>>> openpyxl.__version__
'3.0.9'

If you get a ModuleNotFoundError, you’ll need to install it. That can be done on an Anaconda Python installation with

> conda install openpyxl

or with

> python -m pip install openpyxl

using other Python distributions.

Start a new .xlsx file

The small program below creates an empty Excel file, my_analysis.xlsx, containing two worksheets, or tabs, called “Amazing Results” and “Summary”. For special emphasis, the Summary tab’s color is changed to green. Unfortunately colors in openpyxl must be provided as RGB or aRGB (‘a’ = alpha, to control transparency) hex values rather than words like “green”.

Python file: empty.py:

1
2
3
4
5
6
7
#!/usr/bin/env python
import openpyxl as OP
book = OP.Workbook()
sheet_data = book.create_sheet("Amazing Results", 0)
sheet_summary = book.create_sheet("Summary", 1)
sheet_summary.sheet_properties.tabColor = "15AA15"
book.save("my_analysis.xlsx")

The MATLAB version is empty.m:

1
2
3
4
5
6
7
% MATLAB
OP = py.importlib.import_module('openpyxl');
book = OP.Workbook();
sheet_data = book.create_sheet("Amazing Results", int64(0));
sheet_summary = book.create_sheet("Summary", int64(1));
sheet_summary.sheet_properties.tabColor = "15AA15";
book.save("my_analysis.xlsx");

my_analysis.xlsx created by empty.py or empty.m

my_analysis.xlsx created by empty.py or empty.m

The create_sheet() method’s arguments are the worksheet’s title and an optional position in the tab sequence, where 0 is the leftmost tab. Note the presence of the ‘Sheet’ tab. This was created by default and was pushed to the right when our sheets were added. The ‘Sheet’ tab can be deleted like this:

# Python
del book['Sheet']
   #  -  or  -
existing_sheet = book['Sheet']
book.remove_sheet(existing_sheet)
   #  -  or  -
existing_sheet = book.get_sheet_by_name('Sheet')
book.remove_sheet(existing_sheet)

MATLAB won’t let us use [ ] to index the Python book object so only the third option works there:

% MATLAB
existing_sheet = book.get_sheet_by_name('Sheet');
book.remove_sheet(existing_sheet);

Avoiding bracket notation

As noted above, MATLAB won’t let us index Python objects with brackets. This prevents us from using openpyxl shortcuts such as assigning a cell’s value using something like sheet['C8'] = 3.14159. In Section 7.8 of Python for MATLAB Development, I suggested using a bridge module to overcome this interface limitation.

After the book came out, David Garrison at the MathWorks informed me that a bridge module is unnecessary since openpyxl offers method- and attribute-based ways to achieve the same thing. Setting cell C8’s value can also be done with

C8 = sheet.cell(3,8)
C8.value = 3.14159

All examples below use method and attribute access methods to accomodate MATLAB. The code examples therefore don’t demonstrate the most idiomatic use of openpyxl, but they have the benefits that Python and MATLAB code samples look almost identical, and that the MATLAB implementation won’t need a bridge module.

Fill a cell with a value

To populate a cell with a number or string, get a handle to the cell using the worksheet’s .cell() method, pass it the 1-based row and column indices, then set the handle’s .value attribute with the number or string of interest.

These lines put the string “Total” in cell B9 which corresponds to row 2, column 9:

# Python
B9 = sheet_data.cell(2, 9)
B9.value = "Total"

It looks like this in MATLAB:

% MATLAB
B9 = sheet_data.cell(int64(2), int64(9));
B9.value = "Total";

The int64() type casts are necessary because numeric literals in MATLAB have type double while the Python function expects integers.

Typeface size, font, color

A cell’s font can be modified by creating a “font style” with the Font class then applying this style to the cell. The code example below creates a style, ft_title, having a 14 point blue Arial typeface with bold italics font.

# Python
import openpyxl.styles as styles
Font = styles.Font
ft_title = Font(name="Arial", size=14, color="0000FF",
                bold=True, italics=True)

MATLAB 2021b and earlier use pyargs() to set keyword value pairs:

% MATLAB 2021b and newer
ft_title = Font( pyargs("name","Arial", "size",int64(14),  ...
                        "color","0000FF", "bold",py.True,  ...
                        "italics",py.True) );

MATLAB 2022a and newer can use pyargs() as above or, even better, let you assign keyword values the same way as Python:

% MATLAB 2022a and newer
ft_title = Font(name=Arial, size=int64(14), color="0000FF", ...
                bold=py.True, italics=py.True);

The font style can then be applied to the text in a cell by setting the cell’s .font attribute:

# Python
B9 = sheet_data.cell(2, 9)
B9.value = "Total"
B9.font  = ft_title

and

% MATLAB
B9 = sheet_data.cell(int64(2), int64(9));
B9.value = "Total";
B9.font  = ft_title;

Alignment

Text in a cell can be aligned vertically and horizontally, rotated, indented, and shrunk to fit a given container using the Alignment class. Text in merged cells looks good when it is centered both horizontally and vertically. That can be done with

# Python
import openpyxl.styles as styles
Alignment = styles.Alignment
B9.alignment = Alignment(horizontal="center", vertical="center")

and

% MATLAB 2021b and newer
B9.alignment = Alignment(pyargs("horizontal","center", ...
                                "vertical","center"));
% MATLAB 2022a and newer
B9.alignment = Alignment(horizontal="center", vertical="center");

Currency, percent, and other numeric formats

A cell can be formatted to show numeric values as currency or a percentage through the cell’s .number_format attribute. The numerical format notation is rather unusual in that it allows positive values to be formatted diffently than negative values. If such a distinction is desired, a semicolon and the negative value format are appended to the postive value format. Financial ledgers sometimes show negative values in red and/or in parentheses so these variations are shown in the example below. Oddly, one may use words like ‘Red’ or ‘Blue’ to define the color within a .number_format but all other color definition options only allow hex values.

The .number_format attribute also controls date format styles and scientific notation. See the documentation for numeric styles for all options.

B9.number_format = '"$"#,##0.00;[Red]("$"#,##0.00)'  # eg  $12,345.67
B9.number_format = '0.0%;[Red]0.0%'                  # eg  98.7%

The dollar symbol can be replaced with other currency symbols simply by replacing "$" with the desired character — "€" for example.

Cell background color

Cell backgrounds can be set to a variety of colors and patterns with the PatternFill class. Following the same technique as with fonts and number styles, one first defines a pattern style then applies this to a cell’s .fill attribute. The style bg_green defines a solid lime green background color:

# Python
import openpyxl.styles as styles
PatternFill = styles.PatternFill
bg_green = PatternFill(fgColor="C5FD2F", fill_type="solid")
B9 = sheet_data.cell(2,9)
B9.value = "Total"
B9.font  = ft_title
B9.fill  = bg_green
% MATLAB 2020b and newer
bg_green = PatternFill(pyargs("fgColor","C5FD2F","fill_type","solid"));
B9 = sheet_data.cell(int64(2),int64(9));
B9.value = "Total";
B9.font  = ft_title;
B9.fill  = bg_green;
% MATLAB 2022a and newer
bg_green = PatternFill(fgColor="C5FD2F", fill_type="solid");
B9 = sheet_data.cell(int64(2),int64(9));
B9.value = "Total";
B9.font  = ft_title;
B9.fill  = bg_green;

Merged cells

A block of adjacent cells can be coalessed into a single cell using a worksheet’s .merge_cells() method. The method takes a cell block range as an argument. The example below merges the nine cells B2, B3, B4, C2, C3, C4, D2, D3, and D4 into one. After the merge, the cell’s handle is accessed using the row and column coordinates of the top left cell:

# Python
sheet_data.merge_cells("B2:D4")
B2 = sheet_data.cell(2,2)
B2.value = "A Title"
B2.font = ft_title
B2.alignment = Alignment(horizontal="center", vertical="center")
% MATLAB 2020b and newer
sheet_data.merge_cells("B2:D4");
B2 = sheet_data.cell(int64(2),int64(2));
B2.value = "A Title";
B2.font = ft_title;
B2.alignment = Alignment(pyargs("horizontal","center", ...
                                "vertical","center");
% MATLAB 2022a and newer
sheet_data.merge_cells("B2:D4");
B2 = sheet_data.cell(int64(2),int64(2));
B2.value = "A Title";
B2.font = ft_title;
B2.alignment = Alignment(horizontal="center", vertical="center");

Equations

Equations are easily added to spreasheets as they are merely cells containing strings that begin with an equals sign and an operator. For example, to add an equation that sums values in the cell range E2:E50, one merely populates a cell with '=SUM(E2:E50)'.

B9.value = '=SUM(E2:E50)'

Putting it all together

This spreadsheet includes all customizations described above:

worldwide_sales.xlsx created by all_together.py or all_together.m. Note: values are randomly generated and so will vary with each run.

worldwide_sales.xlsx created by all_together.py or all_together.m. Note: values are randomly generated and so will vary with each run.

File: all_together.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#!/usr/bin/env python
import numpy as np
import openpyxl as OP
import openpyxl.styles as styles
Font = styles.Font
Alignment = styles.Alignment
PatternFill = styles.PatternFill

book = OP.Workbook()
sheet_data = book.create_sheet("Profits 2022", 0)
sheet_data.sheet_properties.tabColor = "15AA15"
Region = ['N America', 'S America', 'Europe', 'Asia', 'Australia', 'Antarctica']
nR = len(Region)
profits = 10000 * (0.75 - np.random.rand(nR,2))
dollars = '"$"#,##0.00;[Red]("$"#,##0.00)'
pct     = '0.0%;[Red]0.0%'

sheet_data.merge_cells("B2:F3")
B2 = sheet_data.cell(2,2)
B2.value = 'Worldwide Sales'
B2.font = Font(name="Arial", size=16, color="4444FF", bold=True)
B2.alignment = Alignment(horizontal="center", vertical="center")

row, col = 5, 2

header_font = Font(name="Arial", size=12, bold=True)

region_cell = sheet_data.cell(row, 2)   # B5
region_cell.value = "Region"
region_cell.font  = header_font

last_year = sheet_data.cell(row, 3)     # C5
last_year.value = "Last Year"
last_year.font  = header_font

this_year = sheet_data.cell(row, 4)     # D5
this_year.value = "This Year"
this_year.font  = header_font

growth = sheet_data.cell(row, 5)        # E5
growth.value = "Growth"
growth.font  = header_font

rank = sheet_data.cell(row,6)           # F5
rank.value = 'Rank'
rank.font  = header_font

row += 1
highest_growth, highest_growth_row = -9.9e+99, 0
all_growth = np.zeros((nR,))
for r in range(nR):
    where     = sheet_data.cell(row+r, 2)
    where.value = Region[r]
    where.font = Font(name="Calibri", italic=True)
    last_year = sheet_data.cell(row+r, 3)
    this_year = sheet_data.cell(row+r, 4)
    growth    = sheet_data.cell(row+r, 5)
    last_year.value = profits[r,0]
    last_year.number_format = dollars
    this_year.value = profits[r,1]
    this_year.number_format = dollars
    growth.value = (this_year.value - last_year.value)/abs(last_year.value)
    all_growth[r] = growth.value
    growth.number_format = pct
    if growth.value > highest_growth:
        highest_growth = growth.value
        highest_growth_row = row + r

# highlight row with highest growth
highlight_color = PatternFill(fgColor="E4FF00",fill_type="solid")
for c in range(2,7):  # columns B to F
    cell = sheet_data.cell(highest_growth_row,c)
    cell.fill = highlight_color

# add values in the ranking column
rank = nR - all_growth.argsort().argsort()
for r in range(nR):
    cell = sheet_data.cell(row+r,6)
    cell.value = rank[r]

# cummulative row
total = sheet_data.cell(row+nR, 2)
total.value = "Total"
total.font  = header_font
for i,letter in enumerate(['C', 'D']):
    cell = sheet_data.cell(row+nR,i+3)
    cell.value = f'=SUM({letter}{row}:{letter}{row+nR-1})'
    cell.number_format = dollars

# cummulative growth
row += nR
cell = sheet_data.cell(row,5)
cell.value = f'=(D{row} - C{row})/ABS(C{row})'
cell.number_format = pct

book.save("worldwide_sales.xlsx")

MATLAB 2020b & newer using pyargs():

File: all_together_2020b.m

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
% MATLAB 2020b & newer
clear
OP = py.importlib.import_module('openpyxl');
styles = py.importlib.import_module('openpyxl.styles');

Font = styles.Font;
Alignment = styles.Alignment;
PatternFill = styles.PatternFill;

book = OP.Workbook();
sheet_data = book.create_sheet("Profits 2022", int64(0));

sheet_data.sheet_properties.tabColor = "15AA15";
Region = {'N America', 'S America', 'Europe', 'Asia', 'Australia', 'Antarctica'};
nR = int64(length(Region));
profits = 10000 * (0.75 - rand(nR,2));
dollars = '"$"#,##0.00;[Red]("$"#,##0.00)';
pct     = '0.0%;[Red]0.0%';

sheet_data.merge_cells("B2:E3");
B2 = sheet_data.cell(int64(2),int64(2));
B2.value = 'Worldwide Sales';
B2.font = Font(pyargs("name","Arial","size",int64(16),...
                      "color","4444FF","bold",py.True));
B2.alignment = Alignment(pyargs("horizontal","center","vertical","center"));

row = int64(5);
col = int64(2);

header_font = Font(pyargs("name","Arial","size",int64(12),"bold",py.True));

region_cell = sheet_data.cell(row, int64(2));   % B5
region_cell.value = "Region";
region_cell.font  = header_font;

last_year = sheet_data.cell(row, int64(3));     % C5
last_year.value = "Last Year";
last_year.font  = header_font;

this_year = sheet_data.cell(row, int64(4));     % D5
this_year.value = "This Year";
this_year.font  = header_font;

growth       = sheet_data.cell(row, int64(5));  % E5
growth.value = "Growth";
growth.font  = header_font;

rank = sheet_data.cell(row, int64(6));          % F5
rank.value = "Rank";
rank.font  = header_font;

row = row + 1;
highest_growth = -9.9e+99;
highest_growth_row = int64(0);
all_growth = zeros(nR,1);
for r = 1:nR
    where = sheet_data.cell(row+r, int64(2));
    where.value = Region{r};
    where.font = Font(pyargs("name","Calibri","italic",py.True));
    last_year = sheet_data.cell(row+r, int64(3));
    this_year = sheet_data.cell(row+r, int64(4));
    growth    = sheet_data.cell(row+r, int64(5));
    last_year.value = profits(r,1);
    last_year.number_format = dollars;
    this_year.value = profits(r,2);
    this_year.number_format = dollars;
    growth.value = (this_year.value - last_year.value)/abs(last_year.value);
    all_growth(r) = growth.value;
    growth.number_format = pct;
    if growth.value > highest_growth
        highest_growth = growth.value;
        highest_growth_row = row + r;
    end
end

% highlight row with highest growth
highlight_color = PatternFill(pyargs("fgColor","E4FF00","fill_type","solid"));
for c = int64(2):int64(5)
    cell = sheet_data.cell(highest_growth_row,c);
    cell.fill = highlight_color;
end

% add values in the ranking column
[~, index] = sort(all_growth);
[~, rank] = sort(index);
rank = 1 + nR - int64(rank);
for r = 1:nR
    cell = sheet_data.cell(row+r,int64(6));
    cell.value = rank(r);
end

% cummulative row
row = row + int64(1);
total = sheet_data.cell(row+nR, int64(2));
total.value = "Total";
total.font  = header_font;
i = 0;
for letter = {'C', 'D'}
    cell = sheet_data.cell(row+nR,int64(i+3));
    cell.value = sprintf('=SUM(%s%d:%s%d)', ...
                         letter{1}, row, letter{1}, row+nR-1);
    cell.number_format = dollars;
    i = i + 1;
end

% cummulative growth
row = row + nR;
cell = sheet_data.cell(row,int64(5));
cell.value = sprintf('=(D%d - C%d)/ABS(C%d)', row, row, row);
cell.number_format = pct;

book.save("worldwide_sales.xlsx")

MATLAB 2022a & newer using x=y keyword argument assignments instead of pyargs():

File: all_together.m

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
% MATLAB 2022a & newer
clear
OP = py.importlib.import_module('openpyxl');
styles = py.importlib.import_module('openpyxl.styles');

Font = styles.Font;
Alignment = styles.Alignment;
PatternFill = styles.PatternFill;

book = OP.Workbook();
sheet_data = book.create_sheet("Profits 2022", int64(0));

sheet_data.sheet_properties.tabColor = "15AA15";
Region = {'N America', 'S America', 'Europe', 'Asia', 'Australia', 'Antarctica'};
nR = int64(length(Region));
profits = 10000 * (0.75 - rand(nR,2));
dollars = '"$"#,##0.00;[Red]("$"#,##0.00)';
pct     = '0.0%;[Red]0.0%';

sheet_data.merge_cells("B2:E3");
B2 = sheet_data.cell(int64(2),int64(2));
B2.value = 'Worldwide Sales';
B2.font = Font(name="Arial", size=int64(16), color="4444FF", bold=py.True);
B2.alignment = Alignment(horizontal="center", vertical="center");

row = int64(5);
col = int64(2);

header_font = Font(name="Arial", size=int64(12), bold=py.True);

region_cell = sheet_data.cell(row, int64(2));   % B5
region_cell.value = "Region";
region_cell.font  = header_font;

last_year = sheet_data.cell(row, int64(3));     % C5
last_year.value = "Last Year";
last_year.font  = header_font;

this_year = sheet_data.cell(row, int64(4));     % D5
this_year.value = "This Year";
this_year.font  = header_font;

growth       = sheet_data.cell(row, int64(5));  % E5
growth.value = "Growth";
growth.font  = header_font;

rank = sheet_data.cell(row, int64(6));          % F5
rank.value = "Rank";
rank.font  = header_font;

row = row + 1;
highest_growth = -9.9e+99;
highest_growth_row = int64(0);
all_growth = zeros(nR,1);
for r = 1:nR
    where = sheet_data.cell(row+r, int64(2));
    where.value = Region{r};
    where.font = Font(name="Calibri", italic=py.True);
    last_year = sheet_data.cell(row+r, int64(3));
    this_year = sheet_data.cell(row+r, int64(4));
    growth    = sheet_data.cell(row+r, int64(5));
    last_year.value = profits(r,1);
    last_year.number_format = dollars;
    this_year.value = profits(r,2);
    this_year.number_format = dollars;
    growth.value = (this_year.value - last_year.value)/abs(last_year.value);
    all_growth(r) = growth.value;
    growth.number_format = pct;
    if growth.value > highest_growth
        highest_growth = growth.value;
        highest_growth_row = row + r;
    end
end

% highlight row with highest growth
highlight_color = PatternFill(fgColor="E4FF00",fill_type="solid");
for c = int64(2):int64(5)
    cell = sheet_data.cell(highest_growth_row,c);
    cell.fill = highlight_color;
end

% add values in the ranking column
[~, index] = sort(all_growth);
[~, rank] = sort(index);
rank = 1 + nR - int64(rank);
for r = 1:nR
    cell = sheet_data.cell(row+r,int64(6));
    cell.value = rank(r);
end

% cummulative row
row = row + int64(1);
total = sheet_data.cell(row+nR, int64(2));
total.value = "Total";
total.font  = header_font;
i = 0;
for letter = {'C', 'D'}
    cell = sheet_data.cell(row+nR,int64(i+3));
    cell.value = sprintf('=SUM(%s%d:%s%d)', ...
                         letter{1}, row, letter{1}, row+nR-1);
    cell.number_format = dollars;
    i = i + 1;
end

% cummulative growth
row = row + nR;
cell = sheet_data.cell(row,int64(5));
cell.value = sprintf('=(D%d - C%d)/ABS(C%d)', row, row, row);
cell.number_format = pct;

book.save("worldwide_sales.xlsx")

openpyxl can do more

Only a small fraction of openpyxl’s capabilities are demonstrated here. This module can also create column filters, 2D and 3D stacked plots, 2D and 3D line plots with optional log scales, bar charts, scatter plots, pie charts, donut charts, radar plots, stock plots with high/low markers, 2D and 3D contour plots, and gauges.

Motivated MATLAB users will be able to make extraordinary Excel spreadsheets with openpyxl.


Table of Contents | Previous: Read and write SQLite databases | Next: Interact with PostgreSQL

by Al Danial

Read and write SQLite databases in MATLAB with Python

Part 3 of the Python Is The Ultimate MATLAB Toolbox series.

SQLite: the most widely deployed database on earth

Most likely the application you’re using to read this article is running SQLite because SQLite is built into the Firefox, Chrome, and Safari web browsers. The computer you’re on most definitely has it: SQLite is built into Android, iOS, macOS, Windows 10 and up, and is standard on Linux distributions. In addition to being everywhere, SQLite is also blazingly fast.

Given its ubiquity and high performance for storing and retrieving data, you may want to use SQLite in MATLAB. Your tool of choice for interacting with SQLite in MATLAB is MathWorks' Database Toolbox. If you have access to the Database Toolbox, the rest of this post may not interest you.

MATLAB + SQLite via Python

An alternative to the Database Toolbox is to use Python’s sqlite3 module in MATLAB.

In this article I’ll explore the CRUD—create, read, update, delete—operations with SQLite in MATLAB with a simple database representing package deliveries between offices of a corporation. The database has a table that stores each package’s type, weight, origin, and destination, and a second table with price per unit weight for each type. Here are represetative values:

orders table

origin item weight destination
Z66 tube 1.73 N51
O64 package 3.66 J43
U74 tube 4.73 U71
K54 letter 1.14 V08
W09 letter 2.58 D50
T89 package 2.53 H15
P70 tube 2.58 E95
L96 card 4.20 N56

rates table

item cost by weight
tube 0.46
package 0.65
letter 0.16
card 0.05

Create (and delete)

The “create” part of CRUD refers to three things: creating the database itself, creating tables within the database, and populating the tables with rows. An SQLite database is a file, typically saved with a .db extension, and is created the first time you attempt to “connect” to it. SQLite tables are created by sending SQL “create table” commands to the connected database, and popualted with SQL “insert” commands.

SQL table creation commands

Conventional SQL commands to create these two tables might look like this:

create table orders (origin text,
                     item   text,
                     weight float,
                     dest   text);
create table rates ( item   text,
                     cost_by_weight float);

If you put the six lines above in a text file called make_deliveries_db.sql, you can create a SQLite database by piping the file into the sqlite3 command line tool:

cat  make_deliveries_db.sql | sqlite3 deliveries.db      # Linux, macOS
type make_deliveries_db.sql | sqlite3.exe deliveries.db  # Windows

What we’re really after though is doing this in MATLAB. To get there we’ll start with a Python version since Python will be our gateway to using SQLite in MATLAB. A complete Python program to create the SQLite database file deliveries.db with our tables orders and rates is:

#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db') # creates this if it does not exist
cursor = conn.cursor()
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
create_orders =   """create table orders (origin text,
                                          item   text,
                                          weight float,
                                          dest   text);"""
create_rates  =   """create table rates ( item   text,
                                          cost_by_weight float);"""
cursor.execute(create_orders)
cursor.execute(create_rates)
conn.commit()
conn.close()

The two “drop table if exists” lines delete the named tables from the database if they are already defined. If we didn’t include these lines, the second time we try to run the program we’ll hit a “table already exists” error. By dropping the tables (if they exist) first, we’ll be able to rerun without errors—very handy during development.

Next: same thing in MATLAB. If you haven’t already, set up your MATLAB+Python environment. You’ll note the MATLAB code looks almost the same has the Python equivalent above.

% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db'); % creates this if it does not exist
cursor = conn.cursor();
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
create_order_table = "create table orders (" + ...
                           "origin text,"    + ...
                           "item   text,"    + ...
                           "weight float,"   + ...
                           "dest   text);";
cursor.execute(create_order_table);
cursor.execute("create table rates(item text, cost_by_weight float);");
conn.commit();
conn.close();

Inserts

Our tables are ready for content. Generic SQL commands to populate our two tables with the representative data shown above look like this:

insert into orders values('Z66', 'tube'   , 1.73, 'N51');
insert into orders values('O64',' package', 3.66, 'J43');
insert into orders values('U74',' tube'   , 4.73, 'U71');
insert into orders values('K54',' letter' , 1.14, 'V08');
insert into orders values('W09',' letter' , 2.58, 'D50');
insert into orders values('T89',' package', 2.53, 'H15');
insert into orders values('P70',' tube'   , 2.58, 'E95');
insert into orders values('L96',' card'   , 4.20, 'N56');

insert into rates values('tube'   , 0.46);
insert into rates values('package', 0.65);
insert into rates values('letter' , 0.16);
insert into rates values('card'   , 0.05);

Row inserts can be done with Python’s sqlite3 module either individually or in bulk. After you’ve created the cursor variable, the two forms of inserts can be done like this in Python:

# Python

# single row insert
row  =   [ 'Z66', 'tube'   , 1.73, 'N51']
cursor.execute('insert into orders values (?,?,?,?)', row)

# multiple row insert
rows = [ [ 'O64',' package', 3.66, 'J43'],
         [ 'U74',' tube'   , 4.73, 'U71'],
         [ 'K54',' letter' , 1.14, 'V08'],
         [ 'W09',' letter' , 2.58, 'D50'],
         [ 'T89',' package', 2.53, 'H15'],
         [ 'P70',' tube'   , 2.58, 'E95'],
         [ 'L96',' card'   , 4.20, 'N56'], ]
cursor.executemany('insert into orders values (?,?,?,?)', rows)

The multiple row insertion is done in a single transaction and yields much higher performance than single row inserts. The lines above look like this in MATLAB:

% MATLAB

% single row insert
row  =   { 'Z66', 'tube'   , 1.73, 'N51'};
cursor.execute('insert into orders values (?,?,?,?)', row);

% multiple row insert
rows = { { 'O64',' package', 3.66, 'J43'}, ...
         { 'U74',' tube'   , 4.73, 'U71'}, ...
         { 'K54',' letter' , 1.14, 'V08'}, ...
         { 'W09',' letter' , 2.58, 'D50'}, ...
         { 'T89',' package', 2.53, 'H15'}, ...
         { 'P70',' tube'   , 2.58, 'E95'}, ...
         { 'L96',' card'   , 4.20, 'N56'}, };
cursor.executemany('insert into orders values (?,?,?,?)', rows);

A data generator

A database whose biggest table has just eight rows underwhelms. This small MATLAB program creates a file, orders.txt, containing an arbitrary number of rows of random entries for the orders table (100,000 in this case). We can use it to produce a large database and test insert and query performance.

Aside: Python’s Faker module is an excellent resource for generating arbitrary fake data for testing.

File: fake_data.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
% MATLAB
n_rows = 100000;

item = {'letter', 'card', 'tube', 'package'};
n_item = length(item);
fh = fopen('orders.txt','w');
for i = 1:n_rows
    origin = char([64 + randi(26)  47 + randi(10,1,2)]);
    obj    = item{randi(n_item)};
    weight = 5*rand();
    dest   = char([64 + randi(26)  47 + randi(10,1,2)]);
    fprintf(fh,'%s %-8s  %5.2f %s\n', origin, obj, weight, dest);
end
fclose(fh);

Output is a text file, orders.txt, with four columns of random values. The first few lines of orders.txt might be

Y91 package    4.79 M81
R77 card       3.28 E70
H00 package    3.47 I90
L37 package    0.93 M46
S72 tube       3.28 E14
Y35 letter     3.76 G56

Order data saved to this text file needs to be loaded by our programs into a row major variable suitable for use by cursor.executemany(). In other words, the first level index i to the variable returns the i-th row of data. In Python this can be done with a list comprehension over lines of the file, where each line is separated into a list of values:

# Python
import pathlib
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]

MATLAB’s data loaders such as textscan() are column major so a bit of manipulation is needed to coerce the required structure:

% MATLAB
fh = fopen('orders.txt');
cols = textscan(fh, '%s %s %f %s');
fclose(fh);
n_rows = length(cols{1});
rows = cell(1,n_rows);
for r = 1:n_rows
    rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
end

Note the subtle indexing difference for column 3. Parentheses, rather than braces, must be used to dereference numeric values within a cell.

Complete programs to create and populate our deliveries database using data from orders.txt now look like this.

Python:

File: make_db.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/env python
import time
import sqlite3
import pathlib
conn = sqlite3.connect('deliveries.db') # creates if the file doesn't exist
cursor = conn.cursor()

# delete/create
cursor.execute("drop table if exists orders;")
cursor.execute("drop table if exists rates;")
cursor.execute("create table rates(item text, cost_by_weight float);");
create_orders_table = """create table orders (
                              origin text,
                              item   text,
                              weight float,
                              dest   text);"""
cursor.execute(create_orders_table)

rows = [ ['tube'   , 0.46],
         ['package', 0.65],
         ['letter' , 0.16],
         ['card'   , 0.05] ]
cursor.executemany('insert into rates values (?,?)', rows)

# load orders data from the text file
T_s = time.time()
P = pathlib.Path('orders.txt')
rows = [_.split() for _ in P.read_text().split('\n') if _]
print(f'Data read time = {time.time() - T_s} seconds')

T_s = time.time()
cursor.executemany('insert into orders values (?,?,?,?)', rows)
conn.commit()
print(f'Data insert time = {time.time() - T_s} seconds')

conn.close()

MATLAB:

File: make_db.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db'); % create if it does not exist
cursor = conn.cursor();

% delete/create
cursor.execute("drop table if exists rates");
cursor.execute("drop table if exists orders");
cursor.execute("create table rates(item text, cost_by_weight float);");
create_order_table = "create table orders (" + ...
                           "origin text,"    + ...
                           "item   text,"    + ...
                           "weight float,"   + ...
                           "dest   text);";
cursor.execute(create_order_table);

% populate the rates table
rows = { {'tube'   , 0.46}, ...
         {'package', 0.65}, ...
         {'letter' , 0.16}, ...
         {'card'   , 0.05} };
cursor.executemany('insert into rates values (?,?)', rows);

% load orders data from the text file and insert to orders table
tic;
if (0)
    % BEFORE -- MATLAB file read + transpose
    fh = fopen('orders.txt');
    cols = textscan(fh, '%s %s %f %s');
    fclose(fh);
    n_rows = length(cols{1});
    rows = cell(1,n_rows);
    for r = 1:n_rows
        rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
    end
else
    % AFTER -- Python file read
    rows = py.load_orders.readfile('orders.txt');
end
fprintf('Data read time = %.6f seconds\n', toc)

tic;
cursor.executemany('insert into orders values (?,?,?,?)', rows);
conn.commit();
fprintf('Data insert time = %.6f seconds\n', toc)
conn.close();

Insert performance

Our programs' insert performance depends on how rapidly they can ingest the data from files and how long the bulk insert itself takes. Times in seconds on my Ubuntu 20.04 laptop using MATLAB 2022a and Python 3.8.8 are

n_rows MATLAB read MATLAB insert Python read Python insert
100,000 1.10 0.68 0.10 0.20
1,000,000 6.47 4.65 2.36 2.15

MATLAB is slower, but it must also do more work: it has to 1) transpose the loaded data from a column-major to a row-major container and 2) internally translate MATLAB strings and numeric values to Python equivalents when calling cursor.executemany().

We can boost MATLAB’s speed by using Python to ease both of these issues simply by replacing MATLAB’s file reading code with the Python reader. The Python reader creates a Python-native variable that’s immediately ready for the bulk insert with cursor.executemany(). This bypasses the MATLAB-to-Python impedance mismatch that MATLAB otherwise handles for us under the hood—at a time expense, of course.

Here’s a small Python module with a function that implements the file loader we can call from MATLAB:

File: load_orders.py

1
2
3
4
5
# Python; file load_orders.py
import pathlib
def readfile(File):
    P = pathlib.Path(File)
    return [_.split() for _ in P.read_text().split('\n') if _]

All that remains is to replace the MATLAB ingest code with a call to the above Python function. Here are the before and after MATLAB lines, where the ‘before’ code is blocked via if (0) and the ‘after’ code exists in the else block:

if (0)
    % BEFORE -- MATLAB file read + transpose
    fh = fopen('orders.txt');
    cols = textscan(fh, '%s %s %f %s');
    fclose(fh);
    n_rows = length(cols{1});
    rows = cell(1,n_rows);
    for r = 1:n_rows
        rows{r} = { cols{1}{r}, cols{2}{r}, cols{3}(r), cols{4}{r} };
    end
else
    % AFTER -- Python file read
    rows = py.load_orders.readfile('orders.txt');
end

MATLAB’s performance with load_orders.readfile() is much closer to Python’s:

n_rows MATLAB+Python read MATLAB insert Python read Python insert
100,000 1.10 0.68 0.10 0.20
1,000,000 3.11 2.34 2.36 2.15

Read

Database reads are done with the SQL select statement. Here’s a query that computes the cummulative cost of shipments by originating office:

select origin,sum(weight)*cost_by_weight from orders O, rates R
  where O.item = R.item group by origin;

The query can be done programmatically in Python with

File: select.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db')
cursor = conn.cursor()

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" \
        "    where O.item = R.item group by origin order by origin;"
for row in cursor.execute(query).fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

conn.close()

and equivalently in MATLAB with

File: select.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db');
cursor = conn.cursor();

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
for row = cursor.execute(query).fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()

Query performance is essentially the same in Python and MATLAB.

Update

A database update modifies the content of existing rows. Say the scale used to weigh packages (as opposed to the scale used for cards, letters, and tubes) was set to the wrong units, for example pounds instead of kilograms. This SQL update scales the weight of packages by 2.2 while leaving the wieghts of cards, letters, and tubes alone:

update orders set weight = weight*2.2 where item = 'package';

The programs below repeat the select statement from the previous section to show total cost by origin. Next the programs apply the weight update for packages then repeat the query. The programs below show how this is done programmatically in Python:

File: select_update.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env python
import sqlite3
conn = sqlite3.connect('deliveries.db')
cursor = conn.cursor()

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" \
        "    where O.item = R.item group by origin order by origin;"
print('Original cost:')
for row in cursor.execute(query).fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

# update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'")

print('After the weight correction:')
for row in cursor.execute(query).fetchall():
    print(f'{row[0]}  {row[1]:7.2f}')

conn.close()

and in MATLAB:

File: select_update.m

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
% MATLAB
sqlite3 = py.importlib.import_module('sqlite3');
conn = sqlite3.connect('deliveries.db');
cursor = conn.cursor();

query = "select origin,sum(weight)*cost_by_weight from orders O, rates R" + ...
        "    where O.item = R.item group by origin order by origin;";
fprintf('Original cost:\n')
for row = cursor.execute(query).fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

% update:  scale the weight of packages
cursor.execute("update orders set weight = weight*2.2 where item = 'package'");

fprintf('After the weight correction:\n')
for row = cursor.execute(query).fetchall()
    fprintf('%s  %7.2f\n', string(row{1}{1}), row{1}{2});
end

conn.close()


Table of Contents | Previous: Read and write YAML, .ini, and TOML files | Next: Write custom Excel files