GiNaC as the symbolic manipulation engine in Sage
Hello, This e-mail is going to two lists (sage-devel and ginac-list), so there will be some redundant information. I hope you still take the time to read it to the end, where you can see some cool benchmarks. :) William often states that Sage is now capable of fully supporting his research. Unfortunately, for my research in symbolic summation & integration Sage lacks some of the very basic facilities. If Sage is to become a "viable alternative to Maple and Mathematica", it needs to be able to support research in symbolic computation, and development of new algorithms. At a bare minimum this means providing support for fast symbolic manipulation and arithmetic as well as some form of pattern matching. Most people reading this e-mail are probably aware of the shortcomings of the symbolics code in Sage at the moment, but here are some timings to make the situation clear. MMA: In[1]:= a = (x+y)^2; In[2]:= Timing[Sum[a^i,{i,1,10000}];] Out[2]= {0.028001, Null} Maple:
a:=(x+y)^2: st:=time(): t:=add(a^i,i=1..10000): time()-st; 0.024
Sage via the maxima interface didn't finish this task in < 5 minutes. Timings for some of the alternatives that were considered as a backend for symbolics: Sympy (version 0.6.0 in Sage-3.0.6) didn't complete the above operation in < 5 minutes. sympycore (SVN revision 1047): sage: from sympycore import Symbol sage: x, y = Symbol("x"), Symbol("y") sage: a = (x+y)^2 sage: %time t = sum( [a^i for i in xrange(10000)] ) CPU times: user 3.45 s, sys: 0.78 s, total: 4.23 s Wall time: 4.23 s Note that even with its highly optimized structures, sympycore is far from reaching Maple/MMA performance from pure Python. At the ACA'08 conference, I've met some physicists using GiNaC (http://www.ginac.de/), a C++ library that provides symbolic manipulation and pattern matching facilities. Its lastest release was on Apr 04 2008, it also has extensive documentation, and active mailing lists. I wrote a basic Cython wrapper to benchmark its performance. Here are the numbers: sage: var("x y z",ns=1) (x, y, z) sage: a = (x+y)^2 sage: %time t = sum( [a^i for i in xrange(10000)] ) CPU times: user 1.83 s, sys: 0.00 s, total: 1.83 s Wall time: 1.83 s And some more: sage: b = (y+z)^2 sage: %time t = sum( [a^i*b^j for i in xrange(1,200,5) for j in xrange (1, 300, 3)] ) CPU times: user 0.35 s, sys: 0.00 s, total: 0.35 s Wall time: 0.36 s Maple:
a := (x+y)^2: b := (y+z)^2: st := time(): S := 0: for i from 1 to \ 200 by 5 do for j from 1 to 300 by 3 d\ o S := S+a^i*b^j end do end do: time() - st; 3.464
MMA: In[1]:= a=(x+y)^2; In[2]:= b=(y+z)^2; In[3]:= S = 0; In[4]:= Timing[ Do[ Do[ S=S+a^i*b^j, {j, 1, 300, 3}], {i, 1, 200, 5}]; ] Out[4]= {11.0527, Null} Thinking that these timings might be off because I don't know the proper way to do this in Maple or MMA, here is a different python timing: sage: s = a.parent(0) sage: st = cputime() sage: for i in xrange(1,200,5): ....: for j in xrange(1,300,3): ....: s += a^i*b^j ....: sage: et = cputime() sage: et-st 0.42002600000000001 Like many other subsystems in Sage, the symbolic arithmetic / manipulation backend should consist of a c/c++ library with a wrapper which allows the user to implement more advanced algorithms such as substitutions/tree traversals in Sage/Cython. One can of course implement the arithmetic in pure C from Cython, but this goes very much against the Sage philosophy of not reinventing the wheel. I suggest GiNaC should be the library we use in Sage for this backend. Its performance is very impressive and it can get us to a competitive position in the short term. Once we have a stable symbolics interface, we can work on more advanced algorithms such as limits / integration and improving the backend at the same time. For those who want to try my wrapper, you need to install these two spkgs (in order): http://sage.math.washington.edu/home/burcin/spkg/cln-1.2.2.spkg http://sage.math.washington.edu/home/burcin/spkg/ginac-1.4.3.spkg And apply this patch: http://www.risc.jku.at/people/berocal/sage/ginac-wrapper.patch While GiNaC seems to be the best option for Sage, it has some problems (apart from the awkward capitalization of the name which is a pain to type). - It takes ages to build The packages above took ~25 minutes to build on my desktop machine (15 for cln, 9 for ginac) - cln seems to support on only gcc, which might be a problem for the windows port (which will be using ms compilers) - cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint) at a considerable speed penalty. - Creating symbolic functions at runtime is not supported. I can probably work around this in the wrapper. I've done something similar in my SCrypt package which provides symbolic computation capabilities over characteristic 2 for experimenting in crypto. It relies on PolyBoRi for arithmetic, which is just a polynomial library, i.e. knows nothing about symbolics. It would still be better to have support for runtime creation of symbolic functions in GiNaC. - IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order. However, running the doctests in the experimental wrapper reveals different results. We may need to write a custom pretty printer. - The gethash() function of GiNaC objects doesn't return stable results, which is probably the cause of the problem above. We need a stablehash() function. I think all these problems can be resolved with some effort, and even the speed of basic arithmetic in GiNaC can be improved with modifying it's data structures. Admittedly some of these tasks are much easier than others, but supporting an already stable and mature library like GiNaC is a much better approach than trying to write our own. Thoughts? Comments? Cheers, Burcin
Hi from GiNaC side, just some short comments: Burcin Erocal schrieb:
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine (15 for cln, 9 for ginac)
Did you configure with the --disable-static option? Otherwise your build time is just twice the usual time (looks like it).
- cln seems to support on only gcc, which might be a problem for the windows port (which will be using ms compilers)
<blame others> That is a real problem, but it can only be solved by the cln people. </blame others>
- cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint) at a considerable speed penalty.
It would be quite useful for ginac if the numeric class that wraps the cln library would allow for other libraries as well, even pure C double precision ones. The problem in realizing this idea was always that other libraries were missing certain features (some functions, modular arithmetic,...) and to make up for these would incur a lot of extra work. It didn't seem like it was worth the hassle (yet).
- IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order. However, running the doctests in the experimental wrapper reveals different results. We may need to write a custom pretty printer.
True.
- The gethash() function of GiNaC objects doesn't return stable results, which is probably the cause of the problem above. We need a stablehash() function.
Yes, the two points above are related and I am to a certain point responsible for that mess. To explain: The original tinfo system in ginac was changed a few years ago to avoid fixed magic numbers which are a problem if you have a lot of user supplied ginac classes (made more urgent by the plan to make functions into full ginac classes). The current system assigns these dynamically during compilation. But it is not done in a smart way, so the tinfo numbers usually differ between different compilations. Now, these tinfo numbers also go into the hash calculation and the term ordering ... Personally, I would like to have this issue solved and have ginac to produce a fixed term ordering (as long as the declaration order of the symbols done by the user is the same). But at the moment I don't have the time to do this. Any volunteers? :-) Regards, Jens
On Mon, 11 Aug 2008 11:39:42 +0200 Jens Vollinga <jensv@nikhef.nl> wrote:
Hi from GiNaC side,
just some short comments:
Burcin Erocal schrieb:
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine (15 for cln, 9 for ginac)
Did you configure with the --disable-static option? Otherwise your build time is just twice the usual time (looks like it).
I didn't do anything to try to make these times better. I was more concerned with benchmarking speed of basic manipulations, and thought that these problems can be solved easily later. Your suggestion reduces the compilation time of ginac to 5 minutes. Looks like we're already below William's 8 minute limit. Now can we fix the cln problem? :) Thanks. Burcin
- cln seems to support on only gcc, which might be a problem for the windows port (which will be using ms compilers)
<blame others> That is a real problem, but it can only be solved by the cln people. </blame others>
- cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint) at a considerable speed penalty.
It would be quite useful for ginac if the numeric class that wraps the cln library would allow for other libraries as well, even pure C double precision ones. The problem in realizing this idea was always that other libraries were missing certain features (some functions, modular arithmetic,...) and to make up for these would incur a lot of extra work. It didn't seem like it was worth the hassle (yet).
- IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order. However, running the doctests in the experimental wrapper reveals different results. We may need to write a custom pretty printer.
True.
- The gethash() function of GiNaC objects doesn't return stable results, which is probably the cause of the problem above. We need a stablehash() function.
Yes, the two points above are related and I am to a certain point responsible for that mess. To explain: The original tinfo system in ginac was changed a few years ago to avoid fixed magic numbers which are a problem if you have a lot of user supplied ginac classes (made more urgent by the plan to make functions into full ginac classes). The current system assigns these dynamically during compilation. But it is not done in a smart way, so the tinfo numbers usually differ between different compilations. Now, these tinfo numbers also go into the hash calculation and the term ordering ...
Personally, I would like to have this issue solved and have ginac to produce a fixed term ordering (as long as the declaration order of the symbols done by the user is the same). But at the moment I don't have the time to do this. Any volunteers? :-)
Regards, Jens _______________________________________________ GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
On Mon, 11 Aug 2008 12:45:36 +0200 Burcin Erocal <burcin@erocal.org> wrote:
On Mon, 11 Aug 2008 11:39:42 +0200 Jens Vollinga <jensv@nikhef.nl> wrote:
Hi from GiNaC side,
just some short comments:
Burcin Erocal schrieb:
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine (15 for cln, 9 for ginac)
Did you configure with the --disable-static option? Otherwise your build time is just twice the usual time (looks like it).
I didn't do anything to try to make these times better. I was more concerned with benchmarking speed of basic manipulations, and thought that these problems can be solved easily later.
Your suggestion reduces the compilation time of ginac to 5 minutes. Looks like we're already below William's 8 minute limit. Now can we fix the cln problem? :)
Using the --disable-static option while configuring cln brings down its build&install time to 9 minutes. One thing that remains in favor of cln is that it uses memory pools. This would be hard to do with a simple rewrite. I don't know how much this effects performance though. Burcin
Hi, On Mon, Aug 11, 2008 at 11:39:42AM +0200, Jens Vollinga wrote:
It would be quite useful for ginac if the numeric class that wraps the cln library would allow for other libraries as well,
What is the point of that? As I've already mentioned, those libraries provide very unnatural syntax, and leave the memory management as an "exercise for the reader".
even pure C double precision ones.
I think wrapping doubles (and integers) is not a good idea. A better option is to make them immediate using the same tricks as CLN [1]. This idea is particulary attractive for x86_64 CPUs. Here we have extra 16 bits for type tags. We can use these to encode the whole GiNaC class hierarcy, so our custom RTTI would be *really* fast. [1] http://www.ginac.de/~kreckel/talks/ICMS2006.pdf
- The gethash() function of GiNaC objects doesn't return stable results, which is probably the cause of the problem above. We need a stablehash() function.
Yes, the two points above are related and I am to a certain point responsible for that mess.
Not really. Ordering of symbols always was unstable, so was the ordering of expressions. New tinfo mechanism just made that instability more explicit.
Now, these tinfo numbers also go into the hash calculation and the term ordering ...
I think a custom printing function is a better approach. Any attempt to "stabilize" the hash function (i.e. comparing symbol names instead of their serials, comparing class names instead of tinfos, etc.) would substantially slow it down, which means virtually every operation would be much slower. Best regards, Alexei -- All science is either physics or stamp collecting.
Hello! On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:
While GiNaC seems to be the best option for Sage,
GiNaC is a special purpose symbolic manipulation library. The purpose of GiNaC is to do "simple" manipulations with "large" expressions using a common programming language (as opposed to ill designed, unstable ones as offered by most of CASes). So, I don't think GiNaC is a good choice for Sage (which attempts to be a general purpose _interactive_ CAS).
I think all these problems can be resolved with some effort
Don't take me wrong, but I think it's better to put this effort elsewhere. I.e. most of "issues" you've mentioned are design decisions.
and even the speed of basic arithmetic in GiNaC can be improved with modifying it's data structures.
No matter what data representation you choose, there are always some expensive operations. Sure, one can optimize data structures so that some particular operation is fast (at the expence of slowing down other operations). So, to write a good code one need to be aware of the complexity of operations and use expensive operations consciously. To be more specific, here is an example. Let's construct the expression A_N = \sum_{i = 0}{N} (x+y)^i (with N = 10000). A naive way: $ cat testadd_dumb.cpp #include <ginac/ginac.h> #include <cstdlib> // for atoi using namespace GiNaC; using namespace std; ex do_dumb(const ex& base, const unsigned max_degree) { ex e = 0; for (unsigned i = 0; i <= max_degree; i++) e += pow(base, i); return e; } int main(int argc, char** argv) { unsigned max_degree = 10000; if (argc > 1) max_degree = atoi(argv[1]); const symbol x("x"), y("y"); ex base = x+y; ex e = do_dumb(base, max_degree); return 0; } $ g++ -O2 -march=k8 -finline-limit=2000 testadd_dump.cpp -lginac $ time ./a.out real 0m4.916s user 0m4.644s sys 0m0.272s It's very slow, because every += operation calls add:eval() method, which puts the terms of the sum into a canonical order. A better variant is $ cat testadd.cpp #include <ginac/ginac.h> #include <cstdlib> // for atoi() using namespace GiNaC; using namespace std; // same as in above Maxima code: first construct all terms, // than make a sum of them. ex do_smart(const ex& base, const unsigned max_degree) { exvector v; v.reserve(max_degree + 1); for (unsigned i = 0; i <= max_degree; i++) v.push_back(pow(base, i)); return (new add(v))->setflag(status_flags::dynallocated); // setflag incantation tells GiNaC to manage memory for us. } int main(int argc, char** argv) { unsigned max_degree = 10000; if (argc > 1) max_degree = atoi(argv[1]); const symbol x("x"), y("y"); ex base = x+y; ex e = do_smart(base, 10000); return 0; } Here the sum is constructed in a one shot, so terms are sorted only once instead of 10000 times, so the program works *much* faster: $ g++ -O2 -march=k8 -finline-limit=2000 testadd.cpp -lginac $ time ./a.out real 0m0.078s user 0m0.064s sys 0m0.016s The same applies to Maxima. Constructing the sum in a proper way takes a reasonable time: $ cat testadd.mac args : []; for i from 0 thru 10000 do ( args : cons((x+y)^i, args) )$ expr : apply("+", args)$ $ time ( maxima --batch=testadd.mac >/dev/null ) real 0m0.519s user 0m0.428s sys 0m0.092s N.B. The timing is not quite fair, since it includes the startup time.
Sage via the maxima interface didn't finish this task in < 5 minutes.
I guess that's because you (or Sage) do it in an extremly inefficient manner. In other words, if you use Maxima, write the code the Maxima way, not the Mathematica (Maple, whatever) one, and everybody will be happier. BTW, the same applies to almost every piece of a software.
- cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint)
The libraries duplicating CLN functionality [1] typically have several serious drawbacks: a. The syntax is really awkward (to put it very mildly). For example, instead of a simple z = x+a*y; one need to write several lines of unreadable code. b. The user is responsible for memory management. This is absolutely inacceptable. None of these libraries gives any considerable performance gain, so why bother? [1] As a matter of fact, CLN existed *long* before the libraries you've mentioned were developed.
at a considerable speed penalty.
Could you please give any benchmark backing up this claim?
- Creating symbolic functions at runtime is not supported.
That's a design decision.
- IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order.
Could you please point out which document (and where) says that? The order of expressions (in particular, symbols) is not stable. This is a price for a fast comparison of expressions. Anyway, since GiNaC is non-interactive, the order of terms doesn't actually matter: typically the output of a program is a small expression (quite often it is a single number), and the big intermediate expressions are not going to be examined by human (except for debugging).
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine
Just for the record: building GiNaC and CLN takes ~30 minutes on my old Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use as a desktop. That said, patches reducing the build time are wellcome (if they don't reduce the performance and don't make the code more complicated).
- cln seems to support on only gcc,
1. CLN maintainers accept patches to support other platforms. 2. Cleaning up CLN from non-portable code is in my TODO list (although the priority is *very* low). 3. gcc is available for (almost) every architecture, so who cares, anyway? Best regards, Alexei -- All science is either physics or stamp collecting.
Hi, Alexei Sheplyakov schrieb:
GiNaC is a special purpose symbolic manipulation library. The purpose of GiNaC is to do "simple" manipulations with "large" expressions using a common programming language (as opposed to ill designed, unstable ones as offered
well, I don't quite agree but maybe I am misinterpreting the quotation marks ;-). If you want to handle large expression, then use FORM. And what a simple manipulation is depends on whom you ask. Maybe for a mathematician symbolic integration for example is a very simple manipulation ... GiNaC's raison d'etre is to help doing physics calculations. And it might be that even some not so simple symbolic manipulation are needed for that end.
by most of CASes). So, I don't think GiNaC is a good choice for Sage (which attempts to be a general purpose _interactive_ CAS).
They would use ginac via a python wrapper, so I don't see any problems here.
The order of expressions (in particular, symbols) is not stable. This is a price for a fast comparison of expressions.
Additional nitpicking: that is not quite true. There is a difference between a stable order and a human-friendly order. I mentioned the reasons for non-stability in my last email. It has nothing to do with speed. Speed of comparison has only an effect on the human-friendliness. In a GUI system with mostly short expressions an additional reordering for the output is feasible and could be done outside ginac by the python code. Regards, Jens
On Mon, Aug 11, 2008 at 04:50:54PM +0200, Jens Vollinga wrote:
well, I don't quite agree but maybe I am misinterpreting the quotation marks ;-). If you want to handle large expression, then use FORM.
No way. Its programming language is extremly brain dead. It has a lot of stupid "features", i.e. it unconditionally expands every expression. It lacks basic polynomial operations (gcd, collect, to name few). FORM is impossible to extend. Last, but not least, FORM is a proprietary software.
what a simple manipulation is depends on whom you ask.
Sure, hence the quotation marks. An example of a "simple" manipulation is reading an input expression, or adding several expressions and collecting similar terms. Apparently not every CAS is able to to this, see e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=407109
The order of expressions (in particular, symbols) is not stable. This is a price for a fast comparison of expressions.
Additional nitpicking: that is not quite true.
I don't agree. The purpose of custom GiNaC RTTI ("tinfo keys") is exactly fast ordering of expressions. A long(er) explanation follows. The purpose of hash function is to quickly guess if two expressions are NOT the same in order to minimize the number of (expensive) element-to-element comparison operations. Notice that two expressions of different type are always different. Hence the need for an efficient ordering operation for _types_. IOW, we need a fast RTTI. Standard C++ library provides RTTI, but in ancient versions of GNU C++ library ordering operation (i.e. typeinfo::before() method) was very slow, because it was implemented as a comparison of (mangled) type names. Hence GiNaC invented its own (fast) RTTI with tinfo keys. As a side note, RTTI implementation in the up to date versions of GNU libstdc++ is much better. It uses the technique similar to GiNaC's tinfo keys, although it's a bit more complicated (since it needs to handle multiple inheritence and other obscure stuff). As a matter of fact, using it instead of our custom RTTI doesn't yeld any measurable overhead. So we can stick with standard C++ RTTI. This won't solve term ordering "issue", but writing new GiNaC classes would be easier.
I mentioned the reasons for non-stability in my last email.
The only missing part is why GiNaC needs those "tinfo keys" in first place... Best regards, Alexei -- All science is either physics or stamp collecting.
On Mon, 11 Aug 2008 18:27:54 +0400 Alexei Sheplyakov <varg@theor.jinr.ru> wrote:
Hello!
On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:
While GiNaC seems to be the best option for Sage,
GiNaC is a special purpose symbolic manipulation library. The purpose of GiNaC is to do "simple" manipulations with "large" expressions using a common programming language (as opposed to ill designed, unstable ones as offered by most of CASes). So, I don't think GiNaC is a good choice for Sage (which attempts to be a general purpose _interactive_ CAS).
I suggested using it in Sage for the symbolic manipulation engine only. I n your first two sentences, you seem to point out that GiNaC is good for this purpose. Sage relies on many other libraries to be "general purpose." It uses libSingular for multivariate polynomials, flint for univariate polynomials over integers, and so on. Are you denying that GiNaC is the best "symbol manipulation engine" I can find? Your statements also seem to suggest that there is something wrong with interactive use. GiNaC also provides support for this via ginsh, pyginac, etc.
I think all these problems can be resolved with some effort
Don't take me wrong, but I think it's better to put this effort elsewhere. I.e. most of "issues" you've mentioned are design decisions.
I am disappointed.
and even the speed of basic arithmetic in GiNaC can be improved with modifying it's data structures.
No matter what data representation you choose, there are always some expensive operations. Sure, one can optimize data structures so that some particular operation is fast (at the expence of slowing down other operations). So, to write a good code one need to be aware of the complexity of operations and use expensive operations consciously.
OK, so I chose a bad example for a benchmark, and this was pointed out before. When I said modifying data structures, I meant that you can keep the arguments of an operator in a different structure, instead of a list. This would probably only make sense on very large expressions, and at this point, I don't know what the penalty for smaller expressions would be. As for different ways of storing the arguments of an operator, we can look at methods for storing and calculating with polynomials. One method is to use a hash table to keep the monomials ,Magma, a commercial system that has the fastest polynomial arithmetic, uses this method. Another is to use GeoBuckets which is explained here: http://portal.acm.org/citation.cfm?id=275394 Singular, which happens to be the fastest free software library for multivariate polynomial arithmetic uses this. You can also use binary heaps as suggested here: http://portal.acm.org/citation.cfm?id=1086847 Michael Monagan and Roman Pierce use this method in the new blazingly fast multivariate polynomial libray they wrote for maple: http://www.cecm.sfu.ca/~mmonagan/papers/sdmp19.pdf <snip>
- cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint)
The libraries duplicating CLN functionality [1] typically have several serious drawbacks: a. The syntax is really awkward (to put it very mildly). For example, instead of a simple z = x+a*y; one need to write several lines of unreadable code. b. The user is responsible for memory management. This is absolutely inacceptable.
None of these libraries gives any considerable performance gain, so why bother?
[1] As a matter of fact, CLN existed *long* before the libraries you've mentioned were developed.
at a considerable speed penalty.
Could you please give any benchmark backing up this claim?
- Creating symbolic functions at runtime is not supported.
That's a design decision.
- IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order.
Could you please point out which document (and where) says that?
I might be remembering this wrong. I just put it down as an item from my notes while writing the wrapper. I don't think writing a pretty printer which sorts the terms would be hard.
The order of expressions (in particular, symbols) is not stable. This is a price for a fast comparison of expressions.
Anyway, since GiNaC is non-interactive, the order of terms doesn't actually matter: typically the output of a program is a small expression (quite often it is a single number), and the big intermediate expressions are not going to be examined by human (except for debugging).
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine
Just for the record: building GiNaC and CLN takes ~30 minutes on my old Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use as a desktop. That said, patches reducing the build time are wellcome (if they don't reduce the performance and don't make the code more complicated). <snip>
I got the above timing on this machine: $ cat /proc/cpuinfo | grep name model name : Intel(R) Pentium(R) D CPU 3.40GHz Cheers, Burcin
On Mon, Aug 11, 2008 at 05:09:30PM +0200, Burcin Erocal wrote:
Are you denying that GiNaC is the best "symbol manipulation engine" I can find?
That depends on a definition of a "symbol manipulation engine", i.e. what kind of features do you expect from it (i.e. GiNaC does not implement polynomial factorization, which is necessary for symbolic integration). Also to use GiNaC (or Maxima) in an efficient manner you need to forget your Mathematica (Maple, whatever) experience and learn from scratch.
Your statements also seem to suggest that there is something wrong with interactive use.
GiNaC is not designed for this kind of use. Interactive manipulation is kind of pointless if the expressions in question consists of 10^6 -- 10^8 of terms, isn't it?
OK, so I chose a bad example for a benchmark, and this was pointed out before.
I think the benchmark is OK. My point is a bit different: the method you've used to construct the sum is very inefficient, at least for GiNaC and Maxima. Apparently it's also inefficient for Mathematica (I guess it's also inefficient for almost any CAS too, but I can't tell for sure): $ cat mma_bench1.m A = 0; For[i = 0, i <= 10000, i = i + 1, A = A + (x+y)^i; ]; $ time math < mma_bench.m >/dev/null real 0m15.535s user 0m15.509s sys 0m0.016s Hence Mathematica provides a more efficient method, it's called Sum: $ cat mma_bench2.m A = Sum[(x+y)^i, {i, 0, 10000}]; $ time math < mma_bench2.m >/dev/null real 0m0.035s user 0m0.028s sys 0m0.004s The point is: although both methods are correct, the non-obvious one performs _3 orders of magnitude_ better. Likewise, GiNaC (and Maxima) also provide efficient methods for doing some operations. You might want to learn them before doing some changes to data structures.
When I said modifying data structures, I meant that you can keep the arguments of an operator in a different structure, instead of a list.
I understand (BTW GiNaC uses arrays for that, not lists).
This would probably only make sense on very large expressions, and at this point, I don't know what the penalty for smaller expressions would be.
I don't agree. I think the large expression should be kept as flat as possible for (at least) two reasons: 1. Otherwise eval() and other inherently recursive functions/methods would run out of stack. 2. Any other structure requires more memory to store the same terms.
As for different ways of storing the arguments of an operator, we can look at methods for storing and calculating with polynomials. One method is to use a hash table to keep the monomials
The same method was used in ancient versions of GiNaC. That was a bad idea. It makes code much more complicated and does not really improve performance (in some situation it's even worse). N.B. You might want to grep GiNaC source for EXPAIRSEQ_USE_HASHTAB. To reiterate, a flat N-ary tree (which is what GiNaC::expairseq essentially is) seems to be the best way to store big expressions.
I got the above timing on this machine:
$ cat /proc/cpuinfo | grep name model name : Intel(R) Pentium(R) D CPU 3.40GHz
Although the build time depends on optimization options and a number of other factors, 25 minutes is way too long for such a box (even if you build both static and shared libraries). Unless you use some insane optimization options, this looks like a compiler and/or linker bug (see e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=376066). What are exactly the compiler and binutils versions (as in g++ --version; ld --version)? Anyway, there are a number of ways to speed up the build. Jens already mentioned --disable-static. Another useful option is to run build in a parallel manner, i.e. invoking make as make -j N where N \approx number of CPU cores + 1, and appending '-pipe' option to CXXFLAGS. Note: this is useful even if your CPU is single core, because CLN consists of lots of small source files. Yet another method is to use a fast(er) shell, such as ash or pdksh, i.e. SHELL=/bin/ash CONFIG_SHELL=/bin/ash /bin/ash configure --disable-static This works because GNU libtool (which is responsible for building GiNaC and CLN, as well as a number of other Free and even not so free libraries) is a shell script. Of course, ash is not suitable as an interactive shell, but it's very good at parsing/running big scripts (such as GNU libtool). @developers: libtool version 2.2 is reportedly much faster than 1.5 (which is what we use), so we might want to upgrade to that version when releasing next version of GiNaC. Best regards, Alexei -- All science is either physics or stamp collecting.
Hi! Alexei Sheplyakov wrote:
Your statements also seem to suggest that there is something wrong with interactive use.
GiNaC is not designed for this kind of use.
So many correct things have been said, but I do feel that I have to address this one: Many aspects in GiNaC are designed such that interactive use from interpreters *is* easily possible. Just consider the entire ex wrapper class, for instance.
Interactive manipulation is kind of pointless if the expressions in question consists of 10^6 -- 10^8 of terms, isn't it?
It sure is. :) Best -richy. -- Richard B. Kreckel <http://www.ginac.de/~kreckel/>
Hi! Burcin Erocal wrote:
- It takes ages to build The packages above took ~25 minutes to build on my desktop machine (15 for cln, 9 for ginac)
What is the reason the Sage project is so focused on compilation time? BTW: The best way to improve that would be by improving GCC.
- cln seems to support on only gcc, which might be a problem for the windows port (which will be using ms compilers)
Oh, a good patch would be more than welcome!
- cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint) at a considerable speed penalty.
There are areas where MPFR beats CLN speed-wise, and vice-versa. But there is definitely no "considerable speed penalty". And both libraries sit on top of GMP's MPN layer and benefit equally from any improvement there (like better x86_64 code). Cheers -richy. -- Richard B. Kreckel <http://www.ginac.de/~kreckel/>
Hello! On Mon, Aug 11, 2008 at 10:56:49PM +0200, Richard B. Kreckel wrote:
- cln seems to support on only gcc, which might be a problem for the windows port (which will be using ms compilers)
Oh, a good patch would be more than welcome!
The patchset which replaces CL_REQUIRE/CL_PROVIDE cruft is available here: http://theor.jinr.ru/~varg/no-cl-require.tar.bz2 It's not quite nice (like any patch which fixes ugly code), and it's impossible to build CLN as woe32 DLL non-GNU C++ compilers yet (because proper __dllexport/__dllimport junk^W^W annotations are missing). Best regards, Alexei -- All science is either physics or stamp collecting.
participants (4)
-
Alexei Sheplyakov
-
Burcin Erocal
-
Jens Vollinga
-
Richard B. Kreckel