Integer hash collision

24 Nov 2025

      Dear Bruno and Richard,

I encountered an issue with the way CLN computes hashes of small 
integers. The following test program computes the hashes of the integers 
0-49:
...
#include <iostream>
#include <cln/integer.h>
int main(int argc, char** args) {
    for(int i = 0; i < 50; i++)
        std::cout << "hash(" << i << ") = " << 
cln::equal_hashcode(cln::cl_I(i)) << std::endl;
    return 0;
}
The output is
...
hash(0) = 0
hash(1) = 65
hash(2) = 66
hash(3) = 98
hash(4) = 67
hash(5) = 83
hash(6) = 99
hash(7) = 115
hash(8) = 68
hash(9) = 76
hash(10) = 84
hash(11) = 92
hash(12) = 100
hash(13) = 108
hash(14) = 116
hash(15) = 124
hash(16) = 69
hash(17) = 73
hash(18) = 77
hash(19) = 81
hash(20) = 85
hash(21) = 89
hash(22) = 93
hash(23) = 97
hash(24) = 101
hash(25) = 105
hash(26) = 109
hash(27) = 113
hash(28) = 117
hash(29) = 121
hash(30) = 125
hash(31) = 129
hash(32) = 70
hash(33) = 72
hash(34) = 74
hash(35) = 76
hash(36) = 78
hash(37) = 80
hash(38) = 82
hash(39) = 84
hash(40) = 86
hash(41) = 88
hash(42) = 90
hash(43) = 92
hash(44) = 94
hash(45) = 96
hash(46) = 98
hash(47) = 100
hash(48) = 102
hash(49) = 104
Already in this range, there are several hash collisions:

hash(9) = hash(35)
hash(3) = hash(46)
hash(10) = hash(39)
hash(11) = hash(43)
hash(12) = hash(47)

Hash collisions should be extremely rare. The fact that they happen for 
such simple examples seems concerning to me.
This is a problem for programs that compare hashes to check complicated 
expressions for equality, as expressions might appear equal despite 
having different integer coefficients inside them.

The issue is that small integers also yield small hashes. This is 
incorrect behaviour for a hash function, which should return values 
uniformly distributed in its output range, in this case [0,2^32-1].
Larger integers sometimes give appropriate hashes, but not always, e.g.:

hash(1234) = 536871000
hash(2000) = 136

The problem also affects rational numbers, whose hashes are just the 
difference of the hashes between their numerators and denominators, e.g.:

hash(3/4) = hash(7/10)
hash(12/13) = hash(25/27)

I believe this is a bug in the computation of integer hashes.
Is there a reason to a custom implementation of this purpose, rather 
than a standard algorithm such as CRC32?

Best regards
Florian

Florian Lorkowski

Bruno Haible

Florian Lorkowski

Richard B. Kreckel

tags

participants (3)