Hello, On Sun, May 25, 2014 at 12:30:54AM +0200, Valery Yundin wrote:
I happen to know what kind of expressions one can get in high energy physics quite well. Determinism has nothing to do with humans reading output, but with reproducibility of results.
The output of GiNaC is reproducible: the same calculation gives equal expressions, it's just the term order which might differ.
Reproducibility is an important property of any computational program, even Monte-Carlo simulations are deterministic. Being able to compare two 10G files of expressions by simple diff or md5sum/etc is rather convenient.
( echo 'expand(('; cat file1; echo ') - ('; cat file2 '));' ) | ginsh
You are right that it isn't doing what was intended and any class which doesn't overload calchash will use the same seed, which is a mistake. It is of course arguable whether different classes must use different seed, because the only drawback of a hash collision for different types is one extra typeid call in basic::compare.
I'm terribly sorry, but everything which makes the hash function worse (slower or produces more collisions) is not acceptable.
I'm attaching a revised version of a patch, it adds virtual functions to every derived class of basic, which effectively cache the class name hashes in local static variables.
Unfortunately this version is even worse than the previous one.
--- a/ginac/basic.h +++ b/ginac/basic.h @@ -226,4 +226,5 @@ protected:
virtual unsigned calchash() const; + virtual unsigned gethashseed() const;
1) Now everyone have to copy-paste gethashseed() into their classes.
@@ -782,5 +781,5 @@ return_type_t basic::return_type_tinfo() const unsigned basic::calchash() const { - unsigned v = make_hash_seed(typeid(*this)); + unsigned v = gethashseed(); for (size_t i=0; i<nops(); i++) { v = rotate_left(v); @@ -797,4 +796,11 @@ unsigned basic::calchash() const
2) Previously the hash seed was computed by a simple (inline) function. Now the calculation involves a call of the virtual method (which is *much* slower).
It is rather small (if noticeable) price to pay for greater convenience for users.
If you don't mind making your computations 10% slower you can just unconditionally #define GINAC_HASH_USE_MANGLED_NAME 1 in hash_seed.h (no, this change is not going to be applied to the official version of GiNaC).
Yeah, it always puzzled me why GiNaC doesn't have a sorted_print function but suggests users to write their own.
None of developers happened to need such a function. Best regards, Alexei