[patch] Add solve_algo::markowitz for large sparse systems

29 May 2018

      Hi, folks. I've been trying to apply GiNaC to solving systems of
sparse linear equations: it is currently too slow, and I have
an improvement.

The setup of the issue is this: I have a system of 2279 equations
in 749 variables, which you can find in 'sys749.txt' at [1].
When I give this system directly to lsolve(), like this:

    // compile with:
    // c++ -std=c++11 -o sys749 sys749.cpp -lginac -lcln -ldl

    #include <ginac/ginac.h>
    #include <ginac/parser.h>
    #include <fstream>

    using namespace GiNaC;
    using namespace std;

    int
    main(int argc, char *argv[])
    {
        symtab table;
        parser reader(table);
        ifstream i("sys749.txt");
        ex data = reader(i);
        lst eqns;
        for (auto &&e : data.op(0)) {
            if (!e.is_zero()) eqns.append(expand(e) == 0);
        }
        cout << lsolve(eqns, data.op(1), solve_algo::gauss);
    }

... after ~5 hours of time and 1.5GB of RAM, I conclude that
this will never terminate. In contrast, it takes ~2.5 minutes
for Maple to solve this system, so I know we can do better.

The main slowdown for GiNaC comes from the classical 'fill-in'
problem during Gaussian elimination. Consider a sparse matrix
like this:

    [* * * * *]
    [* * 0 0 0]
    [* 0 * 0 0]
    [* 0 0 * 0]
    [* 0 0 0 *]

When you apply naive Gaussian elimination to this matrix, on
the first step you will subtract the first row from the rest
of the matrix -- this will make all the zero entries non-zero
(this is the "fill-in" of zeros), thus making the matrix dense,
and killing all the performance because each subsequent row
elimination step will require N normal() calls. On the other
hand, if you where to permute the matrix like so:

    [* 0 0 0 *]
    [0 * 0 0 *]
    [0 0 * 0 *]
    [0 0 0 * *]
    [* * * * *]

... then naive Gaussian elimination would fully preserve the
sparsity structure, and at each step most of the operations
you'll need to perform would amount to subtracting zeros, and
could be optimized away (so, nothing to normal()ize).

This sort of permutation to reduce fill-in is an essential part
of solving sparse linear systems, and is the main reason for
Maple's performance, I think. In fact, if you'll just sort the
list of equations by the number of terms before passing that list
into lsolve(), this already allows GiNaC to solve sys749.txt,
in under an hour (don't remember the exact timing). We can do
better than this, though.

There's an overview of the current state of the art in this sort
of techniques at [2]. Out of all the tricks described there,
the most straightforward to me is the Markowitz ordering, which
is a heuristic that says: at each step of Gaussian elimination,
choose the pivot (i,j) that minimizes the amount of fill-in you'll
need to perform in this step, approximated as (r[i]-1)*(c[j]-1),
where r[i] is the number of non-zero entries in row i, and c[j]
is the number of non-zero entries in column j. This heuristic
is expensive to calculate, but it is well worth it for sys749.

The patch I'm attaching implements Markowitz-ordered Gaussian
elimination as a new solve_algo::markowitz.

An additional part of the slowness is the part of lsolve()
that constructs the coefficient matrix: it does so by calling
coeff() repeatedly for every variable -- this alone takes about
10 minutes for sys749. To optimize this part I first obtain the
full set of all symbols in an equation, and then call coeff()
only if the symbol in question belongs to that set. This changes
10 minutes into 10 seconds.

Still, I think there's more room for improvement here: it should
be possible to collect the whole coefficient row in one pass
over the equation, without calling coeff() multiple times. Since
it takes about 1.3 seconds to read the whole expression from
disk, it should take no more than that to determine the full
coefficient matrix, I think. If you know how to achieve that
though, I'd be interested in learning.

In summary, with the proposed optimizations GiNaC can solve
sys749 in 60-90 seconds (differs every time; at one point I
was getting a consistent 45 seconds, but I've probably changed
something since then), which is actually faster than Maple, at
least for this particular system.

I would appreciate some correctness testing on this new algorithm,
because Markowitz ordering requires full pivoting, and that is
trickier to get right than the partial pivoting which GiNaC has
had used until now, especially when 'vars' in matrix::solve()
is wider than n*1. I've modified the check_lsolve test to also
test this algorithm, but it doesn't look comprehensive. Style
changes or algorithmic suggestions are also welcome.

[1] http://tx97.net/~magv/sys749.txt
[2] http://faculty.cse.tamu.edu/davis/publications_files/survey_tech_report.pdf

Vitaly Magerya

tags

participants (1)