Hello! On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:
While GiNaC seems to be the best option for Sage,
GiNaC is a special purpose symbolic manipulation library. The purpose of GiNaC is to do "simple" manipulations with "large" expressions using a common programming language (as opposed to ill designed, unstable ones as offered by most of CASes). So, I don't think GiNaC is a good choice for Sage (which attempts to be a general purpose _interactive_ CAS).
I think all these problems can be resolved with some effort
Don't take me wrong, but I think it's better to put this effort elsewhere. I.e. most of "issues" you've mentioned are design decisions.
and even the speed of basic arithmetic in GiNaC can be improved with modifying it's data structures.
No matter what data representation you choose, there are always some expensive operations. Sure, one can optimize data structures so that some particular operation is fast (at the expence of slowing down other operations). So, to write a good code one need to be aware of the complexity of operations and use expensive operations consciously. To be more specific, here is an example. Let's construct the expression A_N = \sum_{i = 0}{N} (x+y)^i (with N = 10000). A naive way: $ cat testadd_dumb.cpp #include <ginac/ginac.h> #include <cstdlib> // for atoi using namespace GiNaC; using namespace std; ex do_dumb(const ex& base, const unsigned max_degree) { ex e = 0; for (unsigned i = 0; i <= max_degree; i++) e += pow(base, i); return e; } int main(int argc, char** argv) { unsigned max_degree = 10000; if (argc > 1) max_degree = atoi(argv[1]); const symbol x("x"), y("y"); ex base = x+y; ex e = do_dumb(base, max_degree); return 0; } $ g++ -O2 -march=k8 -finline-limit=2000 testadd_dump.cpp -lginac $ time ./a.out real 0m4.916s user 0m4.644s sys 0m0.272s It's very slow, because every += operation calls add:eval() method, which puts the terms of the sum into a canonical order. A better variant is $ cat testadd.cpp #include <ginac/ginac.h> #include <cstdlib> // for atoi() using namespace GiNaC; using namespace std; // same as in above Maxima code: first construct all terms, // than make a sum of them. ex do_smart(const ex& base, const unsigned max_degree) { exvector v; v.reserve(max_degree + 1); for (unsigned i = 0; i <= max_degree; i++) v.push_back(pow(base, i)); return (new add(v))->setflag(status_flags::dynallocated); // setflag incantation tells GiNaC to manage memory for us. } int main(int argc, char** argv) { unsigned max_degree = 10000; if (argc > 1) max_degree = atoi(argv[1]); const symbol x("x"), y("y"); ex base = x+y; ex e = do_smart(base, 10000); return 0; } Here the sum is constructed in a one shot, so terms are sorted only once instead of 10000 times, so the program works *much* faster: $ g++ -O2 -march=k8 -finline-limit=2000 testadd.cpp -lginac $ time ./a.out real 0m0.078s user 0m0.064s sys 0m0.016s The same applies to Maxima. Constructing the sum in a proper way takes a reasonable time: $ cat testadd.mac args : []; for i from 0 thru 10000 do ( args : cons((x+y)^i, args) )$ expr : apply("+", args)$ $ time ( maxima --batch=testadd.mac >/dev/null ) real 0m0.519s user 0m0.428s sys 0m0.092s N.B. The timing is not quite fair, since it includes the startup time.
Sage via the maxima interface didn't finish this task in < 5 minutes.
I guess that's because you (or Sage) do it in an extremly inefficient manner. In other words, if you use Maxima, write the code the Maxima way, not the Mathematica (Maple, whatever) one, and everybody will be happier. BTW, the same applies to almost every piece of a software.
- cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint)
The libraries duplicating CLN functionality [1] typically have several serious drawbacks: a. The syntax is really awkward (to put it very mildly). For example, instead of a simple z = x+a*y; one need to write several lines of unreadable code. b. The user is responsible for memory management. This is absolutely inacceptable. None of these libraries gives any considerable performance gain, so why bother? [1] As a matter of fact, CLN existed *long* before the libraries you've mentioned were developed.
at a considerable speed penalty.
Could you please give any benchmark backing up this claim?
- Creating symbolic functions at runtime is not supported.
That's a design decision.
- IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order.
Could you please point out which document (and where) says that? The order of expressions (in particular, symbols) is not stable. This is a price for a fast comparison of expressions. Anyway, since GiNaC is non-interactive, the order of terms doesn't actually matter: typically the output of a program is a small expression (quite often it is a single number), and the big intermediate expressions are not going to be examined by human (except for debugging).
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine
Just for the record: building GiNaC and CLN takes ~30 minutes on my old Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use as a desktop. That said, patches reducing the build time are wellcome (if they don't reduce the performance and don't make the code more complicated).
- cln seems to support on only gcc,
1. CLN maintainers accept patches to support other platforms. 2. Cleaning up CLN from non-portable code is in my TODO list (although the priority is *very* low). 3. gcc is available for (almost) every architecture, so who cares, anyway? Best regards, Alexei -- All science is either physics or stamp collecting.