[GiNaC-list] A bad idea: GiNaC as the symbolic manipulation engine in Sage

11 Aug 2008

      Hello!

On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:
...
While GiNaC seems to be the best option for Sage,
GiNaC is a special purpose symbolic manipulation library. The purpose of
GiNaC is to do "simple" manipulations with "large" expressions using a common
programming language (as opposed to ill designed, unstable ones as offered
by most of CASes). So, I don't think GiNaC is a good choice for Sage
(which attempts to be a general purpose _interactive_ CAS).
...
I think all these problems can be resolved with some effort
Don't take me wrong, but I think it's better to put this effort elsewhere.
I.e. most of "issues" you've mentioned are design decisions.
...
and even the speed of basic arithmetic in GiNaC can be improved with
modifying it's data structures.
No matter what data representation you choose, there are always some expensive
operations. Sure, one can optimize data structures so that some particular
operation is fast (at the expence of slowing down other operations). So, to
write a good code one need to be aware of the complexity of operations and use
expensive operations consciously.

To be more specific, here is an example. Let's construct the expression
A_N = \sum_{i = 0}{N} (x+y)^i (with N = 10000). A naive way:

$ cat testadd_dumb.cpp

#include <ginac/ginac.h>
#include <cstdlib> // for atoi
using namespace GiNaC;
using namespace std;

ex do_dumb(const ex& base, const unsigned max_degree)
{
        ex e = 0;
        for (unsigned i = 0; i <= max_degree; i++)
                e += pow(base, i);
        return e;
}

int main(int argc, char** argv)
{
        unsigned max_degree = 10000;
        if (argc > 1)
                max_degree = atoi(argv[1]);

        const symbol x("x"), y("y");
        ex base = x+y;
        ex e = do_dumb(base, max_degree);
        return 0;
}

$ g++ -O2 -march=k8 -finline-limit=2000 testadd_dump.cpp -lginac
$ time ./a.out

real    0m4.916s
user    0m4.644s
sys     0m0.272s

It's very slow, because every += operation calls add:eval() method, which
puts the terms of the sum into a canonical order. A better variant is

$ cat testadd.cpp

#include <ginac/ginac.h>
#include <cstdlib> // for atoi()
using namespace GiNaC;
using namespace std;

// same as in above Maxima code: first construct all terms,
// than make a sum of them.
ex do_smart(const ex& base, const unsigned max_degree)
{
        exvector v;
        v.reserve(max_degree + 1);
        for (unsigned i = 0; i <= max_degree; i++)
                v.push_back(pow(base, i));
        return (new add(v))->setflag(status_flags::dynallocated);
        // setflag incantation tells GiNaC to manage memory for us.
}

int main(int argc, char** argv)
{
        unsigned max_degree = 10000;
        if (argc > 1)
                max_degree = atoi(argv[1]);
        const symbol x("x"), y("y");
        ex base = x+y;
        ex e = do_smart(base, 10000);
        return 0;
}

Here the sum is constructed in a one shot, so terms are sorted only once
instead of 10000 times, so the program works *much* faster:

$ g++ -O2 -march=k8 -finline-limit=2000 testadd.cpp -lginac
$ time ./a.out

real    0m0.078s
user    0m0.064s
sys     0m0.016s

The same applies to Maxima. Constructing the sum in a proper way takes
a reasonable time:

$ cat testadd.mac
args : []; for i from 0 thru 10000 do ( args : cons((x+y)^i, args) )$
expr : apply("+", args)$

$ time ( maxima --batch=testadd.mac >/dev/null )

real    0m0.519s
user    0m0.428s
sys     0m0.092s

N.B. The timing is not quite fair, since it includes the startup time.
...
Sage via the maxima interface didn't finish this task in < 5 minutes.
I guess that's because you (or Sage) do it in an extremly inefficient manner.
In other words, if you use Maxima, write the code the Maxima way, not
the Mathematica (Maple, whatever) one, and everybody will be happier. BTW,
the same applies to almost every piece of a software.
...
- cln duplicates functionality which is provided by different libraries
already distributed with Sage (mpfr, mpir, flint)
The libraries duplicating CLN functionality [1] typically have several
serious drawbacks:
 a. The syntax is really awkward (to put it very mildly). For example,
    instead of a simple 
		z = x+a*y;
		one need to write several lines of unreadable code.
 b. The user is responsible for memory management. This is absolutely
    inacceptable.

None of these libraries gives any considerable performance gain, so why
bother?

[1] As a matter of fact, CLN existed *long* before the libraries you've
mentioned were developed.
...
at a considerable speed penalty.
Could you please give any benchmark backing up this claim?
...
- Creating symbolic functions at runtime is not supported.
That's a design decision.
...
- IIRC, GiNaC documentation suggests the printing order for the
variables is stable between different sessions, regardless of
creation order.
Could you please point out which document (and where) says that?

The order of expressions (in particular, symbols) is not stable. This is
a price for a fast comparison of expressions. 

Anyway, since GiNaC is non-interactive, the order of terms doesn't actually
matter: typically the output of a program is a small expression (quite often
it is a single number), and the big intermediate expressions are not going
to be examined by human (except for debugging).
...
- It takes ages to build
  The packages above took ~25 minutes to build on my desktop machine
Just for the record: building GiNaC and CLN takes ~30 minutes on my old
Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use
as a desktop. That said, patches reducing the build time are wellcome (if
they don't reduce the performance and don't make the code more complicated).
...
- cln seems to support on only gcc,
1. CLN maintainers accept patches to support other platforms.
2. Cleaning up CLN from non-portable code is in my TODO list (although
   the priority is *very* low).
3. gcc is available for (almost) every architecture, so who cares, anyway?

Best regards,
	Alexei

-- 
All science is either physics or stamp collecting.

[GiNaC-list] A bad idea: GiNaC as the symbolic manipulation engine in Sage

Alexei Sheplyakov