On 12/03/2017 10:28 AM, Vladimir V. Kisil wrote:
On Sun, 3 Dec 2017 01:54:14 +0100, robens <robens@pa.msu.edu> said:
TR> thanks. Are there any efforts to change this ?? I think it might TR> be useful (in general) for future users/ developments/ etc
My understanding: for automated simplifications GiNaC produces a large amount of dynamically created objects and they are internally indexed by consecutive numbers. If parts of an expression are transformed in parallel threads, then there will be inconsistency between different objects with the same index.
Additionally, GiNaC and CLN both use reference counting for garbage collection of their objects. The naive approach to make that thread-safe is to add locks around reference counting operations -- and that involves a performance penalty (which probably grows the more processors you have). I did a small experiment to simply change reference counts into atomic variables, and got an instant 30% performance hit for single-threaded code on a 2-core machine, and maybe a 20% overall performance improvement for one particular '#program omp for' loop from my code (in the instances when it didn't crash; changing the reference counts to atomics is not enough to make the whole thing thread-safe, I only did it to measure potential performance impact). There's potentially a different way to solve garbage collection problem: just drop all of the reference counting code, and link with something like Boehm GC [1], which is known to work (and even well) in multi-threaded environment. If anyone wants to investigate the potential performance impact/benefit of this approach, I would really like to see the results. In the mean while, I think a more practical way to scale GiNaC code to multi-processor systems is process-based parallelism and message passing. Fortunately GiNaC's 'archive' class provides a fairly convenient and reasonably fast way to serialize/deserialize arbitrary 'ex' objects, so you could build a sort of map-reduce thing on top of that and your favorite message-passing library (e.g. MPI, ZeroMQ, that sort of thing) and use it instead of '#program omp for'. This is much less convenient than OpenMP, of course, and requires a good bit more of engineering effort. [1] http://www.hboehm.info/gc/