Re: [GiNaC-list] Term ordering and compiling C++ code
kreckel wrote: ...and second, sharing is currently not pursued aggressively! Rather, it is exploited whenever it is trivial to do so. (The product rule of differentiation is an example where exactly the same terms pop up again and again so exploiting sharing comes at no cost.) I think the benefit would still be large, even if coverage isn't 100%. See below. Well, I think that if the size of generated code is so prohibitively large and compiler CSE doesn't help you may be better off writing your own algorithm collecting similar terms in an associative array. You could then artificially introduce temporaries, in order to help the compiler. This would boil down to a more aggressive variant of GiNaC's subexpression sharing. What do you think? This is exactly what I have done. As mentioned, I will post that solution here in the next few days. This works well, but is very slow, as it takes something that is sub-linear in the size of the expression tree and makes it O(n^2). If GiNaC expression tree nodes had a unique identifier, this approach could achieve sub-linear performance. Specifically, it would be linear in the size of the unique subexpressions, not the size of the actual expression. (If you have a lot of sharing, your expression can be large, but the number of unique expressions can be small.) Even if GiNaC doesn't identify every single instance of a shared sub-expression, the benefits could still be large. In fact, I tried an experiment to test this. I treated ex::gethash() as a unique identifier, even though it isn't. The code I generated was wrong, of course, but it was very fast to generate it, and very fast to compile and execute. Thus, whatever number of unique expressions were not actually identified as such was small enough that this approach would still be quite useful. Also, FYI, when using the hash value as a unique identifier, I only got a few collisions out of thousands of sub-expressions. That is, I only found a few expressions with the same hash even though they were different expressions. Thanks, Doug
Doug wrote: kreckel wrote: ...and second, sharing is currently not pursued aggressively! Rather, it is exploited whenever it is trivial to do so. (The product rule of differentiation is an example where exactly the same terms pop up again and again so exploiting sharing comes at no cost.) I think the benefit would still be large, even if coverage isn't 100%. See below. I fully agree with this, at least when it is done to a level that does not affect overall performance, possibly there is room for improvement in this direction. Well, I think that if the size of generated code is so prohibitively large and compiler CSE doesn't help you may be better off writing your own algorithm collecting similar terms in an associative array. You could then artificially introduce temporaries, in order to help the compiler. This would boil down to a more aggressive variant of GiNaC's subexpression sharing. What do you think? This is exactly what I have done. As mentioned, I will post that solution here in the next few days. This works well, but is very slow, as it takes something that is sub-linear in the size of the expression tree and makes it O(n^2). If GiNaC expression tree nodes had a unique identifier, this approach could achieve sub-linear performance. Specifically, it would be linear in the size of the unique subexpressions, not the size of the actual expression. (If you have a lot of sharing, your expression can be large, but the number of unique expressions can be small.) I fully agree with that, and I think this should be possible, probably simple, and won't hurt anybody. Even if GiNaC doesn't identify every single instance of a shared sub-expression, the benefits could still be large. In fact, I tried an experiment to test this. I treated ex::gethash() as a unique identifier, even though it isn't. The code I generated was wrong, of course, but it was very fast to generate it, and very fast to compile and execute. Thus, whatever number of unique expressions were not actually identified as such was small enough that this approach would still be quite useful. I don't know how much would impact in performance the increase of expression sharing. I'm not aware of the copy on write implementation, but I think that restricting the effect of copy to the non-shareable parts, can be probably the single piece needed to increase performance. The implementation should be straightforward, and the computational overhead, probably is not that big. I think this performance can be there, without hurting anybody just providing a mechanism to runtime activate it. Memory savings could be Huge!!!. May be it is enough to adopt a programing style to construct expresions that would increase sharing without further changes within GiNaC, or any other mechanism is available to increase sharing??. Also, FYI, when using the hash value as a unique identifier, I only got a few collisions out of thousands of sub-expressions. That is, I only found a few expressions with the same hash even though they were different expressions. Is this correct behavior?. Shouldn't hash values be unique?. Thanks, Doug We are looking forward to your code. I would be nice I you could describe your algorithm, when you finish, in a little more detail. Are you exporting a single expression or a set of them?. Thank you, Javier
participants (2)
-
Doug
-
jros