Hello, This e-mail is going to two lists (sage-devel and ginac-list), so there will be some redundant information. I hope you still take the time to read it to the end, where you can see some cool benchmarks. :) William often states that Sage is now capable of fully supporting his research. Unfortunately, for my research in symbolic summation & integration Sage lacks some of the very basic facilities. If Sage is to become a "viable alternative to Maple and Mathematica", it needs to be able to support research in symbolic computation, and development of new algorithms. At a bare minimum this means providing support for fast symbolic manipulation and arithmetic as well as some form of pattern matching. Most people reading this e-mail are probably aware of the shortcomings of the symbolics code in Sage at the moment, but here are some timings to make the situation clear. MMA: In[1]:= a = (x+y)^2; In[2]:= Timing[Sum[a^i,{i,1,10000}];] Out[2]= {0.028001, Null} Maple:
a:=(x+y)^2: st:=time(): t:=add(a^i,i=1..10000): time()-st; 0.024
Sage via the maxima interface didn't finish this task in < 5 minutes. Timings for some of the alternatives that were considered as a backend for symbolics: Sympy (version 0.6.0 in Sage-3.0.6) didn't complete the above operation in < 5 minutes. sympycore (SVN revision 1047): sage: from sympycore import Symbol sage: x, y = Symbol("x"), Symbol("y") sage: a = (x+y)^2 sage: %time t = sum( [a^i for i in xrange(10000)] ) CPU times: user 3.45 s, sys: 0.78 s, total: 4.23 s Wall time: 4.23 s Note that even with its highly optimized structures, sympycore is far from reaching Maple/MMA performance from pure Python. At the ACA'08 conference, I've met some physicists using GiNaC (http://www.ginac.de/), a C++ library that provides symbolic manipulation and pattern matching facilities. Its lastest release was on Apr 04 2008, it also has extensive documentation, and active mailing lists. I wrote a basic Cython wrapper to benchmark its performance. Here are the numbers: sage: var("x y z",ns=1) (x, y, z) sage: a = (x+y)^2 sage: %time t = sum( [a^i for i in xrange(10000)] ) CPU times: user 1.83 s, sys: 0.00 s, total: 1.83 s Wall time: 1.83 s And some more: sage: b = (y+z)^2 sage: %time t = sum( [a^i*b^j for i in xrange(1,200,5) for j in xrange (1, 300, 3)] ) CPU times: user 0.35 s, sys: 0.00 s, total: 0.35 s Wall time: 0.36 s Maple:
a := (x+y)^2: b := (y+z)^2: st := time(): S := 0: for i from 1 to \ 200 by 5 do for j from 1 to 300 by 3 d\ o S := S+a^i*b^j end do end do: time() - st; 3.464
MMA: In[1]:= a=(x+y)^2; In[2]:= b=(y+z)^2; In[3]:= S = 0; In[4]:= Timing[ Do[ Do[ S=S+a^i*b^j, {j, 1, 300, 3}], {i, 1, 200, 5}]; ] Out[4]= {11.0527, Null} Thinking that these timings might be off because I don't know the proper way to do this in Maple or MMA, here is a different python timing: sage: s = a.parent(0) sage: st = cputime() sage: for i in xrange(1,200,5): ....: for j in xrange(1,300,3): ....: s += a^i*b^j ....: sage: et = cputime() sage: et-st 0.42002600000000001 Like many other subsystems in Sage, the symbolic arithmetic / manipulation backend should consist of a c/c++ library with a wrapper which allows the user to implement more advanced algorithms such as substitutions/tree traversals in Sage/Cython. One can of course implement the arithmetic in pure C from Cython, but this goes very much against the Sage philosophy of not reinventing the wheel. I suggest GiNaC should be the library we use in Sage for this backend. Its performance is very impressive and it can get us to a competitive position in the short term. Once we have a stable symbolics interface, we can work on more advanced algorithms such as limits / integration and improving the backend at the same time. For those who want to try my wrapper, you need to install these two spkgs (in order): http://sage.math.washington.edu/home/burcin/spkg/cln-1.2.2.spkg http://sage.math.washington.edu/home/burcin/spkg/ginac-1.4.3.spkg And apply this patch: http://www.risc.jku.at/people/berocal/sage/ginac-wrapper.patch While GiNaC seems to be the best option for Sage, it has some problems (apart from the awkward capitalization of the name which is a pain to type). - It takes ages to build The packages above took ~25 minutes to build on my desktop machine (15 for cln, 9 for ginac) - cln seems to support on only gcc, which might be a problem for the windows port (which will be using ms compilers) - cln duplicates functionality which is provided by different libraries already distributed with Sage (mpfr, mpir, flint) at a considerable speed penalty. - Creating symbolic functions at runtime is not supported. I can probably work around this in the wrapper. I've done something similar in my SCrypt package which provides symbolic computation capabilities over characteristic 2 for experimenting in crypto. It relies on PolyBoRi for arithmetic, which is just a polynomial library, i.e. knows nothing about symbolics. It would still be better to have support for runtime creation of symbolic functions in GiNaC. - IIRC, GiNaC documentation suggests the printing order for the variables is stable between different sessions, regardless of creation order. However, running the doctests in the experimental wrapper reveals different results. We may need to write a custom pretty printer. - The gethash() function of GiNaC objects doesn't return stable results, which is probably the cause of the problem above. We need a stablehash() function. I think all these problems can be resolved with some effort, and even the speed of basic arithmetic in GiNaC can be improved with modifying it's data structures. Admittedly some of these tasks are much easier than others, but supporting an already stable and mature library like GiNaC is a much better approach than trying to write our own. Thoughts? Comments? Cheers, Burcin