Ordering of expressions

Luigi Capozza

8 Aug 2009 8 Aug '09

7:28 p.m.

Dear people, I have the following issue with the canonicalization of expressions. If I run many times the same program without changing anything, I get the output for the same expression ordered in different ways. As far as I understand, that seems to depend on the hash value that is assigned to each symbol, which happens to be different every time I run the program. Here is an example (example.cxx): ------------------------------------------ #include <iostream> #include "ginac/ginac.h" using namespace std; using namespace GiNaC; int main() { symbol x("x"),y("y"),z("z"); ex Ex1 = 2*(x + y); ex Ex2 = z*(x + y); cout<<x.gethash()<<endl <<y.gethash()<<endl <<z.gethash()<<endl; cout<<Ex1<<endl <<Ex2<<endl; return 0; } ------------------------------------------ And here is the output I get (I am running on Ubuntu 8.04.1 with the GCC 4.2.4 compiler): -------------------------------------------------- =>g++ -lginac example.cxx -o example =>example 2178355982 851138097 3818887508 2*y+2*x (y+x)*z =>example 3784401678 2457183793 1129965908 2*y+2*x (y+x)*z =>example 2632340238 1305122353 4272871764 2*y+2*x (y+x)*z =>example 3711329038 2384111153 1056893268 2*y+2*x z*(y+x) => ... -------------------------------------------------- With more complicated expressions also the run time can vary from execution to execution up to a factor of 2 or more. Is this normal or am I doing something wrong? I saw that the hash value is calculated in a fairly sophisticated manner but is it eventually possible to set it by hand? Thanks in advance. Regards, Luigi

Show replies by date

Alexei Sheplyakov

9 Aug 9 Aug

12:22 p.m.

Hello, On Sat, Aug 08, 2009 at 07:28:49PM +0200, Luigi Capozza wrote:

...

I have the following issue with the canonicalization of expressions. If I run many times the same program without changing anything, I get the output for the same expression ordered in different ways.

This behavior is documented in the manual (section 5.7.2, titled as 'Expanding and collecting'): "Again, since the canonical form in GiNaC is not easy to guess you should be prepared to see different orderings of terms in such sums!".

...

With more complicated expressions also the run time can vary from execution to execution up to a factor of 2 or more.

What exactly is a `run time'? CPU time? Wall clock time? Something else?

...

Is this normal or am I doing something wrong? I saw that the hash value is calculated in a fairly sophisticated manner but is it eventually possible to set it by hand?

No, this is an implementation detail, and users are not supposed to fiddle with it (the only exception is implementing calchash() method in user defined classes). @developers: I guess we should add this question to the FAQ, shouldn't we? Best regards, Alexei

Luigi Capozza

5:19 p.m.

Hi Alexei, On Sun, 9 Aug 2009, Alexei Sheplyakov wrote:

...

...
I have the following issue with the canonicalization of expressions. If I run many times the same program without changing anything, I get the output for the same expression ordered in different ways.

This behavior is documented in the manual (section 5.7.2, titled as 'Expanding and collecting'): "Again, since the canonical form in GiNaC is not easy to guess you should be prepared to see different orderings of terms in such sums!".

Thanks, I had not interpreted that sentence in the right way.

...

...
With more complicated expressions also the run time can vary from execution to execution up to a factor of 2 or more.

What exactly is a `run time'? CPU time? Wall clock time? Something else?

I mean CPU time, sorry.

...

...
Is this normal or am I doing something wrong? I saw that the hash value is calculated in a fairly sophisticated manner but is it eventually possible to set it by hand?

No, this is an implementation detail, and users are not supposed to fiddle with it (the only exception is implementing calchash() method in user defined classes).

I was considering doing that. For the given simple example I found the easy solution of implementing a class inheriting from symbol and setting the hash value already in the constructor. Alas, it does not seem to scale to more elaborated expressions. As you suggest: I'll just stop fiddling about with the hash. Thank you. Best regards, Luigi

Burcin Erocal

10 Aug 10 Aug

10:44 a.m.

Hi Luigi, On Sun, 9 Aug 2009 17:19:52 +0200 (CEST) Luigi Capozza <capozza@kph.uni-mainz.de> wrote:

...

On Sun, 9 Aug 2009, Alexei Sheplyakov wrote:

...
...
I have the following issue with the canonicalization of expressions. If I run many times the same program without changing anything, I get the output for the same expression ordered in different ways. <snip> Is this normal or am I doing something wrong? I saw that the hash value is calculated in a fairly sophisticated manner but is it eventually possible to set it by hand?

No, this is an implementation detail, and users are not supposed to fiddle with it (the only exception is implementing calchash() method in user defined classes).

I was considering doing that. For the given simple example I found the easy solution of implementing a class inheriting from symbol and setting the hash value already in the constructor. Alas, it does not seem to scale to more elaborated expressions. As you suggest: I'll just stop fiddling about with the hash.

In order to use ginac as the symbolics backend in Sage [1], we also had to work around this random printing problem. For this purpose, I changed the comparison functions to enforce something similar to a degree lexicographic ordering, where the variables are ordered alphabetically. [1] http://sagemath.org/ You can find the bulk of the changes here: http://pynac.sagemath.org/hg/rev/bc4751a44758 and the bug fixes that came afterwards: http://pynac.sagemath.org/hg/rev/f33d6d8b3bfd http://pynac.sagemath.org/hg/rev/615ec8c2b9bf http://pynac.sagemath.org/hg/rev/ad8d3aff5b75 In retrospect, this was not the right way to fix the printing problem. I should have done as the ginac developers suggested when I first asked about this [2]. Unfortunately, I didn't know much about the design of the ginac library then, and I wanted to get things done quickly. [2] http://www.ginac.de/pipermail/ginac-list/2008-August/001406.html When I find the time, I plan to move the new comparison functions out of the way and bring back the old ones to make things fast again. Then, add some code in expairseq to sort its arguments with the new comparison functions before printing, and cache the sorted sequence somewhere. I don't expect to have the time to do this in the near future however. I hope some of this will be useful if you try to stabilize printing order in ginac. Cheers, Burcin

Richard B. Kreckel

11:06 p.m.

Burcin Erocal wrote:

...

In order to use ginac as the symbolics backend in Sage [1], we also had to work around this random printing problem.

May I ask why stable printing is a requirement for use in Sage?

...

In retrospect, this was not the right way to fix the printing problem. I should have done as the ginac developers suggested when I first asked about this [2]. Unfortunately, I didn't know much about the design of the ginac library then, and I wanted to get things done quickly.

[2] http://www.ginac.de/pipermail/ginac-list/2008-August/001406.html

When I find the time, I plan to move the new comparison functions out of the way and bring back the old ones to make things fast again. Then, add some code in expairseq to sort its arguments with the new comparison functions before printing, and cache the sorted sequence somewhere. I don't expect to have the time to do this in the near future however.

I hope some of this will be useful if you try to stabilize printing order in ginac.

It won't be difficult to sort terms lexicographically, when printing, as opposed to based on hash values. But I doubt that caching the sorted sequence will be of practical use. Cheers -richy. -- Richard B. Kreckel <http://www.ginac.de/~kreckel/>

Burcin Erocal

12 Aug 12 Aug

2:45 p.m.

On Mon, 10 Aug 2009 23:06:30 +0200 "Richard B. Kreckel" <kreckel@ginac.de> wrote:

...

Burcin Erocal wrote:

...
In order to use ginac as the symbolics backend in Sage [1], we also had to work around this random printing problem.

May I ask why stable printing is a requirement for use in Sage?

Sage is also meant to be used interactively, besides just a computation engine where you run a program and expect a number as output. Inspection of intermediate results is so much easier if the output is consistent between different sessions.

...

...
In retrospect, this was not the right way to fix the printing problem. I should have done as the ginac developers suggested when I first asked about this [2]. Unfortunately, I didn't know much about the design of the ginac library then, and I wanted to get things done quickly.

[2] http://www.ginac.de/pipermail/ginac-list/2008-August/001406.html

When I find the time, I plan to move the new comparison functions out of the way and bring back the old ones to make things fast again. Then, add some code in expairseq to sort its arguments with the new comparison functions before printing, and cache the sorted sequence somewhere. I don't expect to have the time to do this in the near future however.

I hope some of this will be useful if you try to stabilize printing order in ginac.

It won't be difficult to sort terms lexicographically, when printing, as opposed to based on hash values. But I doubt that caching the sorted sequence will be of practical use.

I thought that caching might help with printing large expressions repeatedly. It could well be that it is pointless. Now that I have the compression functions, using them only for printing should be easy. I just don't have much time to work on this. BTW, I would welcome any comments and suggestions on the patches we've added to GiNaC for Sage, which you can browse here: http://pynac.sagemath.org/hg/ Especially, automatic simplification of powers and multiples of exp http://pynac.sagemath.org/hg/rev/24e8ecd16228 exp(a)*exp(b) -> exp(a+b) http://pynac.sagemath.org/hg/rev/5c4862f90e17 (e^x)^y -> e^(x*y) and new constants for infinity http://pynac.sagemath.org/hg/rev/7ebab844bcef http://pynac.sagemath.org/hg/rev/0b112e7dd282 might be useful. Note that there is still more work to be done on making special functions handle infinity properly, what I added in the second patch above is a blind copy of Mathematica behavior. I was hoping that the capability to work with infinity would help series expansions, where 1/ (1/0) could be evaluated as 1/infinity -> 0. Here is an example which demonstrates this problem: http://groups.google.com/group/sage-devel/browse_thread/thread/d95c952e05951... Cheers, Burcin

Alexei Sheplyakov

4:28 p.m.

New subject: Silent cancellation of singularities, and others

Hi, On Wed, Aug 12, 2009 at 02:45:07PM +0200, Burcin Erocal wrote:

...

I thought that caching might help with printing large expressions repeatedly. It could well be that it is pointless.

I doubt someone will want to look at large expressions except for debugging. And for debugging one needs a printing function which is fast, simple, and free of side effects (such as sorting the sequence).

...

I was hoping that the capability to work with infinity would help series expansions, where 1/ (1/0) could be evaluated as 1/infinity -> 0.

I *really* dislike when singularities get silently canceled. Also, is it +0 or -0? 0 + i 0 or 0 - i 0? (these details are important more often than not).

...

Especially, automatic simplification of powers and multiples of exp

http://pynac.sagemath.org/hg/rev/24e8ecd16228 exp(a)*exp(b) -> exp(a+b)

The code looks correct. But I dislike random features like this.

...

http://pynac.sagemath.org/hg/rev/5c4862f90e17 (e^x)^y -> e^(x*y)

I think it can be trivially done with pattern matching. Best regards, Alexei

Alexei Sheplyakov

10 Aug 10 Aug

9:29 p.m.

New subject: Variations of CPU time for the *same* calculation

Hi Luigi, On Sun, Aug 09, 2009 at 05:19:52PM +0200, Luigi Capozza wrote:

...

...
...
With more complicated expressions also the run time can vary from execution to execution up to a factor of 2 or more.

What exactly is a `run time'? CPU time? Wall clock time? Something else?

I mean CPU time, sorry.

I haven't experienced such a weird variations, neither with `real-life' code nor with benchmarks (which try to make term ordering even more random *on purpose*). So you definitely found something abnormal unless some of the following conditions are met: - you have other CPU-bound task(s) running - you have real-time processes running (although these guys typically don't consume much CPU time they can preempt other processes very frequently) - you use power saving utility (either user-space or in-kernel). Best regards, Alexei

Richard B. Kreckel

9 Aug 9 Aug

11:39 p.m.

Alexei Sheplyakov wrote:

...

This behavior is documented in the manual (section 5.7.2, titled as 'Expanding and collecting'): "Again, since the canonical form in GiNaC is not easy to guess you should be prepared to see different orderings of terms in such sums!".

[...]

...

No, this is an implementation detail, and users are not supposed to fiddle with it (the only exception is implementing calchash() method in user defined classes).

@developers: I guess we should add this question to the FAQ, shouldn't we?

I suppose you mean the canonicalization question? Or do you mean the one about hash values? -richy. -- Richard B. Kreckel <http://www.ginac.de/~kreckel/>

Alexei Sheplyakov

10 Aug 10 Aug

7:45 a.m.

Hello, On Sun, Aug 09, 2009 at 11:39:28PM +0200, Richard B. Kreckel wrote:

...

...
@developers: I guess we should add this question to the FAQ, shouldn't we?

I suppose you mean the canonicalization question?

Sure. Best regards, Alexei

5800

Age (days ago)

5804

Last active (days ago)

List overview

Download

9 comments

4 participants

participants (4)

Alexei Sheplyakov
Burcin Erocal
Luigi Capozza
Richard B. Kreckel