Storing Sparse Matrices: C++
In this post, we will talk about sparse matrices and the data structures that can be used to store them. We will mainly discuss the Triplet Form and the Compressed Column Storage form.
Let us take an example of a simple sparse matrix with $m = 4$ rows and $n = 5$ columns:
\[A = \begin{bmatrix}2 & 0 & 1& 1 & 0\\0 & 1 & 0 & 2 &0\\0 &0 &1 &0 &3\\2 &0 &4 &0 &0 \end{bmatrix}\]We will be using 0-based indexing for coding and 1-based indexing whenever we are discussing linear algebraic notation.
The simplest way to represent sparse matrix is by using triplet form. We will simply list the row index column index and the value for each non-zero entry in the matrix $A$
row_index col_index value 0 0 2 3 0 2 1 1 1 0 2 1 2 2 1 3 2 4 0 3 1 1 3 2 2 4 3
It should be noted that we need not have the elements in the triplet format to in sorted format. We clearly see that we need less space to represent the sparse matrix. But we can do better by using Compressed Column Storage (CCS).
Compressed Column Storage (CCS)
A matrix stored in CCS format needs three vectors to represent it, namely two <int>
vectors, col_ptr
and row_index
and one <double>
vector, vals
. The vals
store all the non-zero values of matrix $A$ in a contiguous piece of memory. We can traverse the matrix $A$ column by column. The size of vals
vector is nnz(A)
(number of non-zeros in matrix $A$ ). The size of row_index
vector is also nnz(A)
. The row-index
contains the row index of each non-zero element of $A$.
Going column by column, we can write the vals
vectors as follows
The corresponding row indexes of each of the above value in vals
vector is (we are using 0 based index )
It should be noted that we have the same values of row_index
and vals
vectors in both triplet and CCS format.
The col_ptr
vector has the locations in the vals
vector where each column starts. Its size is n+1
It can also be thought as the cumulative sum of column counts. The col_ptr
starts with 0 and since the matrix $A$’s first column has two elements, the second element in col_ptr
is 2. The next element in col_ptr
will be col_ptr[1] + nnz(A(:,2) = 2 + 1 = 3
. The last element of col_ptr
is nnz+1
.
The data structure to represent compressed column storage is given below.
#include "SSM_Includes.h"
template <typename T, typename S>
struct SSparseMat{
T nzmax; //maximum number of entries
T m; //Number of rows
T n; //number of columns
std::vector<T> row_index;
std::vector<T> col_ptr;
std::vector<S> values;
int nz;
SSparseMat() = default;
SSparseMat(T numRows, T numCols, T nzFlag, T nzmaxVal) :
m(numRows), n(numCols), nz(nzFlag), nzmax(nzmaxVal)
{
col_ptr = std::vector<T> ((numCols+1));
row_index = std::vector<T>(nzmaxVal);
values = std::vector<S>(nzmaxVal);
}
};
We can use the above struture to store both the triplet matrices and CCS matrices.
A simple main function to use the above datastructure is below:
#include <iostream>
#include "SSM_Includes.h"
int main() {
SSparseMat<int, double> C(4,5,9,9);
std::vector<int>v1 {0,3,1,0,2,3,0,1,2};
std::vector<int>v2 {0,0,1,2,2,2,3,3,4};
std::vector<double>v3{2,2,1,1,1,4,1,2,3};
C.row_index = v1;
C.col_ptr = v2;
C.values = v3;
printVector(v3);
You can find the full source at github
Most of this discussion is based on my readings of the Excellent book by Tim Davis. I encourage eveyone to take a look at the book.
References
- Tim Davis, Direct Methods for Sparse Linear Systems, 2006, SIAM