Dear Bruno and Richard, I encountered an issue with the way CLN computes hashes of small integers. The following test program computes the hashes of the integers 0-49:
#include <iostream> #include <cln/integer.h>
int main(int argc, char** args) { for(int i = 0; i < 50; i++) std::cout << "hash(" << i << ") = " << cln::equal_hashcode(cln::cl_I(i)) << std::endl; return 0; }
The output is
hash(0) = 0 hash(1) = 65 hash(2) = 66 hash(3) = 98 hash(4) = 67 hash(5) = 83 hash(6) = 99 hash(7) = 115 hash(8) = 68 hash(9) = 76 hash(10) = 84 hash(11) = 92 hash(12) = 100 hash(13) = 108 hash(14) = 116 hash(15) = 124 hash(16) = 69 hash(17) = 73 hash(18) = 77 hash(19) = 81 hash(20) = 85 hash(21) = 89 hash(22) = 93 hash(23) = 97 hash(24) = 101 hash(25) = 105 hash(26) = 109 hash(27) = 113 hash(28) = 117 hash(29) = 121 hash(30) = 125 hash(31) = 129 hash(32) = 70 hash(33) = 72 hash(34) = 74 hash(35) = 76 hash(36) = 78 hash(37) = 80 hash(38) = 82 hash(39) = 84 hash(40) = 86 hash(41) = 88 hash(42) = 90 hash(43) = 92 hash(44) = 94 hash(45) = 96 hash(46) = 98 hash(47) = 100 hash(48) = 102 hash(49) = 104
Already in this range, there are several hash collisions: hash(9) = hash(35) hash(3) = hash(46) hash(10) = hash(39) hash(11) = hash(43) hash(12) = hash(47) Hash collisions should be extremely rare. The fact that they happen for such simple examples seems concerning to me. This is a problem for programs that compare hashes to check complicated expressions for equality, as expressions might appear equal despite having different integer coefficients inside them. The issue is that small integers also yield small hashes. This is incorrect behaviour for a hash function, which should return values uniformly distributed in its output range, in this case [0,2^32-1]. Larger integers sometimes give appropriate hashes, but not always, e.g.: hash(1234) = 536871000 hash(2000) = 136 The problem also affects rational numbers, whose hashes are just the difference of the hashes between their numerators and denominators, e.g.: hash(3/4) = hash(7/10) hash(12/13) = hash(25/27) I believe this is a bug in the computation of integer hashes. Is there a reason to a custom implementation of this purpose, rather than a standard algorithm such as CRC32? Best regards Florian