[patch] Add solve_algo::markowitz for large sparse systems
Hi, folks. I've been trying to apply GiNaC to solving systems of sparse linear equations: it is currently too slow, and I have an improvement. The setup of the issue is this: I have a system of 2279 equations in 749 variables, which you can find in 'sys749.txt' at [1]. When I give this system directly to lsolve(), like this: // compile with: // c++ -std=c++11 -o sys749 sys749.cpp -lginac -lcln -ldl #include <ginac/ginac.h> #include <ginac/parser.h> #include <fstream> using namespace GiNaC; using namespace std; int main(int argc, char *argv[]) { symtab table; parser reader(table); ifstream i("sys749.txt"); ex data = reader(i); lst eqns; for (auto &&e : data.op(0)) { if (!e.is_zero()) eqns.append(expand(e) == 0); } cout << lsolve(eqns, data.op(1), solve_algo::gauss); } ... after ~5 hours of time and 1.5GB of RAM, I conclude that this will never terminate. In contrast, it takes ~2.5 minutes for Maple to solve this system, so I know we can do better. The main slowdown for GiNaC comes from the classical 'fill-in' problem during Gaussian elimination. Consider a sparse matrix like this: [* * * * *] [* * 0 0 0] [* 0 * 0 0] [* 0 0 * 0] [* 0 0 0 *] When you apply naive Gaussian elimination to this matrix, on the first step you will subtract the first row from the rest of the matrix -- this will make all the zero entries non-zero (this is the "fill-in" of zeros), thus making the matrix dense, and killing all the performance because each subsequent row elimination step will require N normal() calls. On the other hand, if you where to permute the matrix like so: [* 0 0 0 *] [0 * 0 0 *] [0 0 * 0 *] [0 0 0 * *] [* * * * *] ... then naive Gaussian elimination would fully preserve the sparsity structure, and at each step most of the operations you'll need to perform would amount to subtracting zeros, and could be optimized away (so, nothing to normal()ize). This sort of permutation to reduce fill-in is an essential part of solving sparse linear systems, and is the main reason for Maple's performance, I think. In fact, if you'll just sort the list of equations by the number of terms before passing that list into lsolve(), this already allows GiNaC to solve sys749.txt, in under an hour (don't remember the exact timing). We can do better than this, though. There's an overview of the current state of the art in this sort of techniques at [2]. Out of all the tricks described there, the most straightforward to me is the Markowitz ordering, which is a heuristic that says: at each step of Gaussian elimination, choose the pivot (i,j) that minimizes the amount of fill-in you'll need to perform in this step, approximated as (r[i]-1)*(c[j]-1), where r[i] is the number of non-zero entries in row i, and c[j] is the number of non-zero entries in column j. This heuristic is expensive to calculate, but it is well worth it for sys749. The patch I'm attaching implements Markowitz-ordered Gaussian elimination as a new solve_algo::markowitz. An additional part of the slowness is the part of lsolve() that constructs the coefficient matrix: it does so by calling coeff() repeatedly for every variable -- this alone takes about 10 minutes for sys749. To optimize this part I first obtain the full set of all symbols in an equation, and then call coeff() only if the symbol in question belongs to that set. This changes 10 minutes into 10 seconds. Still, I think there's more room for improvement here: it should be possible to collect the whole coefficient row in one pass over the equation, without calling coeff() multiple times. Since it takes about 1.3 seconds to read the whole expression from disk, it should take no more than that to determine the full coefficient matrix, I think. If you know how to achieve that though, I'd be interested in learning. In summary, with the proposed optimizations GiNaC can solve sys749 in 60-90 seconds (differs every time; at one point I was getting a consistent 45 seconds, but I've probably changed something since then), which is actually faster than Maple, at least for this particular system. I would appreciate some correctness testing on this new algorithm, because Markowitz ordering requires full pivoting, and that is trickier to get right than the partial pivoting which GiNaC has had used until now, especially when 'vars' in matrix::solve() is wider than n*1. I've modified the check_lsolve test to also test this algorithm, but it doesn't look comprehensive. Style changes or algorithmic suggestions are also welcome. [1] http://tx97.net/~magv/sys749.txt [2] http://faculty.cse.tamu.edu/davis/publications_files/survey_tech_report.pdf
participants (1)
-
Vitaly Magerya