CLN-list

Download

cln-list@ginac.de

November 2025

3 participants
2 discussions

Integer hash collision
by Florian Lorkowski 28 Nov '25

28 Nov '25

Dear Bruno and Richard, I encountered an issue with the way CLN computes hashes of small integers. The following test program computes the hashes of the integers 0-49: > #include <iostream> > #include <cln/integer.h> > > int main(int argc, char** args) { > for(int i = 0; i < 50; i++) > std::cout << "hash(" << i << ") = " << > cln::equal_hashcode(cln::cl_I(i)) << std::endl; > return 0; > } The output is > hash(0) = 0 > hash(1) = 65 > hash(2) = 66 > hash(3) = 98 > hash(4) = 67 > hash(5) = 83 > hash(6) = 99 > hash(7) = 115 > hash(8) = 68 > hash(9) = 76 > hash(10) = 84 > hash(11) = 92 > hash(12) = 100 > hash(13) = 108 > hash(14) = 116 > hash(15) = 124 > hash(16) = 69 > hash(17) = 73 > hash(18) = 77 > hash(19) = 81 > hash(20) = 85 > hash(21) = 89 > hash(22) = 93 > hash(23) = 97 > hash(24) = 101 > hash(25) = 105 > hash(26) = 109 > hash(27) = 113 > hash(28) = 117 > hash(29) = 121 > hash(30) = 125 > hash(31) = 129 > hash(32) = 70 > hash(33) = 72 > hash(34) = 74 > hash(35) = 76 > hash(36) = 78 > hash(37) = 80 > hash(38) = 82 > hash(39) = 84 > hash(40) = 86 > hash(41) = 88 > hash(42) = 90 > hash(43) = 92 > hash(44) = 94 > hash(45) = 96 > hash(46) = 98 > hash(47) = 100 > hash(48) = 102 > hash(49) = 104 Already in this range, there are several hash collisions: hash(9) = hash(35) hash(3) = hash(46) hash(10) = hash(39) hash(11) = hash(43) hash(12) = hash(47) Hash collisions should be extremely rare. The fact that they happen for such simple examples seems concerning to me. This is a problem for programs that compare hashes to check complicated expressions for equality, as expressions might appear equal despite having different integer coefficients inside them. The issue is that small integers also yield small hashes. This is incorrect behaviour for a hash function, which should return values uniformly distributed in its output range, in this case [0,2^32-1]. Larger integers sometimes give appropriate hashes, but not always, e.g.: hash(1234) = 536871000 hash(2000) = 136 The problem also affects rational numbers, whose hashes are just the difference of the hashes between their numerators and denominators, e.g.: hash(3/4) = hash(7/10) hash(12/13) = hash(25/27) I believe this is a bug in the computation of integer hashes. Is there a reason to a custom implementation of this purpose, rather than a standard algorithm such as CRC32? Best regards Florian

3 3

Re: Integer hash collision
by Bruno Haible 27 Nov '25

27 Nov '25

Florian Lorkowski wrote: > We were working on an algorithm that caches composite objects in GiNaC > by their hash value. We will implement a system to handle hash > collisions. You don't need to "implement a system to handle hash collisions". You can use the hash tables defined in cln/src/base/hash/{cl_hash.h,cl_hash1.h,cl_hash2.h}. Or, if you prefer the C++ Standard Library, I'm sure it has something for that use-case as well. Bruno

1 0