From: Richard B. Kreckel <kreckel@thep.physik.uni-mainz.de>
Hmm, I think a simple cast to long would get the job done, in the same manner as the numeric ctors from builtin types work.
Yes, a change to `long' would push the limit for even arguments to 65534. However I think the real problem lies in the "philosophical" nature of my question. I guess nobody will want to calculate Bernoulli numbers this big (the limit of the 1.0.3 code seems to be 8190). Thus I think things could be left as they are, since the CLN manual hints that conversions from `long' are less efficient. By altogether abolishing the theoretical limit and letting CLN calculate (p3-i)*(p2-i), we would slow the procedure down by some per cent without any real gain. But I suppose this road should be taken if 23168 did become an issue, like if (p < 23168) { normal_inner_loop; } else { slow_inner_loop_with_CLN; } If you like this better I can make a patch. Markus
On Fri, 18 Jan 2002, Markus Nullmeier wrote:
Yes, a change to `long' would push the limit for even arguments to 65534. However I think the real problem lies in the "philosophical" nature of my question. I guess nobody will want to calculate Bernoulli numbers this big (the limit of the 1.0.3 code seems to be 8190).
Sure, they are notoriously untractable. But was there really such a limit in the old code? I was under the impression that I once had it compute B_{30000} but I might be wrong...
Thus I think things could be left as they are, since the CLN manual hints that conversions from `long' are less efficient.
They involve a function call and the constructed number isn't immiediate any more but heap-allocated instead. But I doubt you'll see the differnce in this case.
By altogether abolishing the theoretical limit and letting CLN calculate (p3-i)*(p2-i), we would slow the procedure down by some per cent without any real gain. But I suppose this road should be taken if 23168 did become an issue, like if (p < 23168) { normal_inner_loop; } else { slow_inner_loop_with_CLN; } If you like this better I can make a patch.
I do not think the difference in times will be worth the effort, but I haven't tried. Making it safe such that somebody who wants to see it break down would have to let it run for a week or so is more important. Regards -richy. -- Richard B. Kreckel <Richard.Kreckel@Uni-Mainz.DE> <http://wwwthep.physik.uni-mainz.de/~kreckel/>
Sure, they are notoriously untractable. But was there really such a limit in the old code? I was under the impression that I once had it compute B_{30000} but I might be wrong...
Oh, sorry about this confusion. I overlooked some points while reading the 1.0.3 code. I rather should have said that the (long) value of (m*n) overflows 2^32-1 for B_{32768}. I'm guessing that this should impact the calculated values.
I do not think the difference in times will be worth the effort, but I haven't tried. Making it safe such that somebody who wants to see it break down would have to let it run for a week or so is more important.
Make-safe patch to follow ... :) Markus
Hi, On Sat, 19 Jan 2002, Markus Nullmeier wrote:
Sure, they are notoriously untractable. But was there really such a limit in the old code? I was under the impression that I once had it compute B_{30000} but I might be wrong...
Oh, sorry about this confusion. I overlooked some points while reading the 1.0.3 code. I rather should have said that the (long) value of (m*n) overflows 2^32-1 for B_{32768}. I'm guessing that this should impact the calculated values.
Ah, okay.
I do not think the difference in times will be worth the effort, but I haven't tried. Making it safe such that somebody who wants to see it break down would have to let it run for a week or so is more important.
Make-safe patch to follow ... :)
While you are at it, could you also make sure to call resize() on results with an appropiate argument before entering the loop in wich the push_back() is being done? This is, because push_back() is likely to trigger a couple of reallocations in there and this is quite expensive for any vectors, since they are contiguous in their memory layout. (The old code should have already done that.) Regards -richy. -- Richard B. Kreckel <Richard.Kreckel@Uni-Mainz.DE> <http://wwwthep.physik.uni-mainz.de/~kreckel/>
On Sat, 19 Jan 2002, Douglas Gregor wrote:
On Saturday 19 January 2002 10:05 am, you wrote:
While you are at it, could you also make sure to call resize() on results
Just to be picky, you want to call reserve() and not resize().
Err, sure. I guess I didn't have enough cereals this morning...
Make-safe patch to follow ... :)
Here it is, relative to the first patch and maybe a bit ugly. But using "(p < 32768)" does gain a speedup of about 5% for arguments smaller than approximately 1000. Cancelling 2 from both arguments of exquo changed the strange limiting value of the previous patch. Storing the binomial coefficients would certainly accelerate the calculation, again at the expense of something like doubled memory usage... Regards, Markus --- GiNaC-1.0.3_new_bernoulli/ginac/numeric.cpp Fri Jan 18 15:48:29 2002 +++ GiNaC-1.0.3_new_bernoulli_safe/ginac/numeric.cpp Mon Jan 21 18:02:03 2002 @@ -1534,24 +1534,35 @@ // algorithm not applicable to B(2), so just store it if (!next_r) { + results.push_back(); // results[0] is not used results.push_back(cln::recip(cln::cl_RA(6))); next_r = 4; } + if (n < next_r) + return results[n/2]; + + results.reserve(n/2 + 1); for (unsigned p = next_r; p <= n; p += 2) { - cln::cl_I c = 1; + cln::cl_I c = 1; // binonmial coefficients cln::cl_RA b = cln::cl_RA(1-p)/2; unsigned p3 = p+3; - unsigned p2 = p+2; unsigned pm = p-2; - unsigned i, k; - for (i=2, k=0; i <= pm; i += 2, k++) { - c = cln::exquo(c * ((p3 - i)*(p2 - i)), (i - 1)*i); - b = b + c * results[k]; - } + unsigned i, k, p_2; + // test if intermediate unsigned int results < 2^29 + if (p < 32768) + for (i=2, k=1, p_2=p/2; i <= pm; i += 2, k++, p_2--) { + c = cln::exquo(c * ((p3 - i) * p_2), (i - 1) * k); + b = b + c * results[k]; + } + else + for (i=2, k=1, p_2=p/2; i <= pm; i += 2, k++, p_2--) { + c = cln::exquo((c * (p3 - i)) * p_2, cln::cl_I(i - 1) * k); + b = b + c * results[k]; + } results.push_back(- b / (p + 1)); - next_r += 2; - } - return results[n/2 - 1]; + } + next_r = n + 2; + return results[n/2]; }
On Mon, 21 Jan 2002, Markus Nullmeier wrote: [...]
But using "(p < 32768)" does gain a speedup of about 5% for [...] + if (p < 32768) [...]
I suspect the condition should really be `(p < (1UL<<cl_value_len/2))', in order to be portable across platforms... By the way, I just noticed that the condition cln::cl_immediate_p in numeric::numeric(int) is total bogus. cln::cl_immediate_p() is not meant for this. EhrgglgasdaaAAAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHH!!!!! Oh, no. Pass the brown paper bag again, please... Embarrassed -richy. -- Richard B. Kreckel <Richard.Kreckel@Uni-Mainz.DE> <http://wwwthep.physik.uni-mainz.de/~kreckel/>
I suspect the condition should really be `(p < (1UL<<cl_value_len/2))', in order to be portable across platforms...
Well, I just followed the documentation which made me believe that int -> cl_I conversion should work at least up to values of 2^29-1. So maybe cl_value_len should be documented, as it indeed makes one feel somewhat more secure. Now reading the headers it looks like a weird (hypothetical?) system with alignment = 8 and sizeof(long) = "32 bits" will break the documented behaviour :-/ Anyhow, I'm just too lazy to incorporate this version of p^2/2 < 2^(cl_value_len-1) into my sources and waiting for the next release... Cheers, Markus
On Tue, 22 Jan 2002, Markus Nullmeier wrote:
I suspect the condition should really be `(p < (1UL<<cl_value_len/2))', in order to be portable across platforms...
Well, I just followed the documentation which made me believe that int -> cl_I conversion should work at least up to values of 2^29-1.
Hmm, from CLN's docu: : Small integers (typically in the range `-2^29'...`2^29-1', for 32-bit : machines) are especially efficient, because they consume no heap : allocation. Otherwise the distinction between these immediate integers : (called "fixnums") and heap allocated integers (called "bignums") is : completely transparent.
So maybe cl_value_len should be documented, as it indeed makes one feel somewhat more secure. Now reading the headers it looks like a weird (hypothetical?) system with alignment = 8 and sizeof(long) = "32 bits" will break the documented behaviour :-/
??? On any machine where an address is 64 Bit (Alpha, ia64,...) and matching alignment we store a full 32-Bit word and tag it as not being an address (as immediate). That gives the range `-2^31'...`2^31-1'. Pointer size and alignment==8, but sizeof(long)==4, hmmm, dunno what would have to be changed... What do you say is gonna break there? (BTW, the `-2^29'...`2^29-1' range should even hold for m68k, where alignment==2. There we just have one tag bit less.)
Anyhow, I'm just too lazy to incorporate this version of p^2/2 < 2^(cl_value_len-1) into my sources and waiting for the next release...
I have already put it into CVS with the (p<(1UL<<cl_value_len/2)) condition. It looks correct, at least to me. And it's fast. Thanks! Regards -richy. -- Richard B. Kreckel <Richard.Kreckel@Uni-Mainz.DE> <http://wwwthep.physik.uni-mainz.de/~kreckel/>
Now the point I was going to make is simply to update the CLN documentation. As for the debatable optimization I made in the posted patches, I totally agree to use cl_value_len there.
weird (hypothetical?) system with alignment = 8 and sizeof(long) = "32 bits" will break the documented behaviour :-/
well, that would mean cl_value_len = 29 insted of the normal cl_value_len = 30 for 32 bit architectures and cl_value_len = 32 for 64 bit architectures, if my interpretation of CLN's header didn't fail (but such failure is quite likely anyway...): 4-byte alignment eats 2 bits, 8-byte alignment eats 3 bits off cl_value_len, at least for 32 bit architectures. And, the headers get the architecture idea from something like sizeof(long) in intparam.h: e.g. #define long_bitsize 32 (made by autoconf, I guess).
Pointer size and alignment==8, but sizeof(long)==4, hmmm, dunno what would
I don't know if (Pointer size) != sizeof(long) will work for CLN's headers. I was thinking of Pointer size = 4 and alignment = 8. Maybe I'm a fool and anybody will prove that there were never such systems and never will be...
have to be changed... What do you say is gonna break there? (BTW, the
The only thing that would break on these non-existing (?) system is the "quick conversion" from C++'s int to CLN's cl_I, for values of e. g. 2^28, contrary to the documentation. Therefore IMHO CLN's documetation could be updated to speak of 2^(cl_value_len-1) instead of 2^29. Anybody still reading? Markus
Hi, Markus! On Fri, 1 Feb 2002, Markus Nullmeier wrote:
Now the point I was going to make is simply to update the CLN documentation.
Maybe. I'll try to think up a formulation.
As for the debatable optimization I made in the posted patches, I totally agree to use cl_value_len there.
weird (hypothetical?) system with alignment = 8 and sizeof(long) = "32 bits" will break the documented behaviour :-/
well, that would mean cl_value_len = 29 insted of the normal cl_value_len = 30 for 32 bit architectures and cl_value_len = 32 for 64 bit architectures,
if my interpretation of CLN's header didn't fail (but such failure is quite likely anyway...): 4-byte alignment eats 2 bits, 8-byte alignment eats 3 bits off cl_value_len, at least for 32 bit architectures.
No, your interpretation did not fail, at least as long as you refer to the value of cl_word_alignment. See below...
And, the headers get the architecture idea from something like sizeof(long) in intparam.h: e.g. #define long_bitsize 32 (made by autoconf, I guess).
Yes, intparam.h is generated by compiling and running intparam.cc while running configure.
Pointer size and alignment==8, but sizeof(long)==4, hmmm, dunno what would
I don't know if (Pointer size) != sizeof(long) will work for CLN's headers.
Okay, just for the record: I have never heard of any 64-bit machine which does not adhere to the LP64-model. Has anybody else?
I was thinking of Pointer size = 4 and alignment = 8. Maybe I'm a fool and anybody will prove that there were never such systems and never will be...
Well, we should see if anybody really comes up with something thus pervert and just fix the header <cln/object.h> then... ...by dropping the macro cl_word_alignment into <cln/object.h> as usual. I am talking about lying, here. We can simply tell CLN the alignment is actually 4. Nothing is spoiled if the system never sets this additional bit in the malloc'ed address. They can even make it 12, 16, 20 or 128 if they think this is a bright idea, as long as the last two bits are not set. ;-) Sounds reasonable?
have to be changed... What do you say is gonna break there? (BTW, the
The only thing that would break on these non-existing (?) system is the "quick conversion" from C++'s int to CLN's cl_I, for values of e. g. 2^28, contrary to the documentation. Therefore IMHO CLN's documetation could be updated to speak of 2^(cl_value_len-1) instead of 2^29.
OTOH, if you compute with int variables you need to be careful in C, anyways: int x = 1000000; int y = x*x; is not a mistake but anybody who expects y to be equal to 1000000000000 then is naive, at best. This limit just happens to be pushed down from 2^32 to 2^29.
Anybody still reading? Markus
Even answering! -richy. -- Richard B. Kreckel <Richard.Kreckel@Uni-Mainz.DE> <http://wwwthep.physik.uni-mainz.de/~kreckel/>
participants (3)
-
Douglas Gregor
-
Markus Nullmeier
-
Richard B. Kreckel