little optimization => lots of Memory usage
Hello everybody, i've been using ginac for some months and is very optimum, but recently i've been facing a problem with the amount of memory used to store some expressions. i have a linear system of ecuations, with the size of around 8000 elements. my first approach on building the matrix was use a map<string, ex*> where each element would correspond to a row element of the matrix. and all incognints of that row would be inside the expression ex. for a 4x4 system ecuaiton, the storage would be like this <"certainKey1", ex1* > = B1*symbol2 + A1*symbol1 + D1*symbol4 <"certainKey2", ex2*> = D2*symbol4 + B2*symbol2 + C2*symbol3 <"certainKey3", ex3* > = B3*symbol2 + A3*symbol1 + D3*symbol4 + C3*symbol3 <"certainKey4", ex4*> = A4*symbol1 + B4*symbol2 + C4*symbol3 this method was using an acceptable ammount of memory, around 15% of 32GB which was pretty good. the bad side of this approach was that when building the Ginac::matrix, i had to sort the elements and search each of them on their corresponding row, additionaly some where hidden because they had coefficient "0", so i had a list of symbols and on each row i searched for them, in a sorted way using the "coeff" method that Ginac provides. But i repeat, this was slow. The point is that i decided to search for other solution, aiming on reducing the time needed to build the Ginac::Matrix. i came with the idea to use map< string, <string, ex> > mapMatrix, which is actually the matrix but sorted automatically because of the strings which are the incognits keyCode. now building the matrix is almost instant, no search needed, and all elements already sorted. however, the program is using all the RAM = 32GB when doing the 8000x8000 problem. i checked and rechecked for any memory leak, but havent found any so far, now the question = why did memory increase so much for just segmenting the expressions into separated smaller ones?? is they same ammount of terms, just in separated sub-expressions., the ex object is too expensive??
If I understand correctly, previously you were storing N strings, now you are storing N^2. This is 64M strings, as opposed to 8000. This might account for at least some of the memory! Regards, James On 20 Oct 2010, at 16:45, Cristobal Navarro <axischire@gmail.com> wrote:
Hello everybody,
i've been using ginac for some months and is very optimum, but recently i've been facing a problem with the amount of memory used to store some expressions. i have a linear system of ecuations, with the size of around 8000 elements.
my first approach on building the matrix was use a map<string, ex*> where each element would correspond to a row element of the matrix. and all incognints of that row would be inside the expression ex. for a 4x4 system ecuaiton, the storage would be like this
<"certainKey1", ex1* > = B1*symbol2 + A1*symbol1 + D1*symbol4 <"certainKey2", ex2*> = D2*symbol4 + B2*symbol2 + C2*symbol3 <"certainKey3", ex3* > = B3*symbol2 + A3*symbol1 + D3*symbol4 + C3*symbol3 <"certainKey4", ex4*> = A4*symbol1 + B4*symbol2 + C4*symbol3
this method was using an acceptable ammount of memory, around 15% of 32GB which was pretty good. the bad side of this approach was that when building the Ginac::matrix, i had to sort the elements and search each of them on their corresponding row, additionaly some where hidden because they had coefficient "0", so i had a list of symbols and on each row i searched for them, in a sorted way using the "coeff" method that Ginac provides. But i repeat, this was slow.
The point is that i decided to search for other solution, aiming on reducing the time needed to build the Ginac::Matrix. i came with the idea to use map< string, <string, ex> > mapMatrix, which is actually the matrix but sorted automatically because of the strings which are the incognits keyCode. now building the matrix is almost instant, no search needed, and all elements already sorted.
however, the program is using all the RAM = 32GB when doing the 8000x8000 problem.
i checked and rechecked for any memory leak, but havent found any so far,
now the question = why did memory increase so much for just segmenting the expressions into separated smaller ones?? is they same ammount of terms, just in separated sub-expressions., the ex object is too expensive??
_______________________________________________ GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
yes, it adds some memoery usage, but doing the math i found that it isnt too much, for the 8000x8000 problem, each string should be in the worst case length 21 (for example string s = "Ld0x1d2x3d4x5d6x7d8x9" ), assuming std::string is 1 byte per length unit: stringMemory <= 21 bytes * 64M <= 1344Mbytes <= 1.3GB i was considering the equivalent situation but for the expressions, which now are 64M smaller ones compared to 8000 bigger ones from before could it be that they use so much memory? On Wed, Oct 20, 2010 at 1:27 PM, James Jackson <james.jackson@cern.ch>wrote:
If I understand correctly, previously you were storing N strings, now you are storing N^2. This is 64M strings, as opposed to 8000. This might account for at least some of the memory!
Regards, James
On 20 Oct 2010, at 16:45, Cristobal Navarro <axischire@gmail.com> wrote:
Hello everybody,
i've been using ginac for some months and is very optimum, but recently i've been facing a problem with the amount of memory used to store some expressions. i have a linear system of ecuations, with the size of around 8000 elements.
my first approach on building the matrix was use a map<string, ex*> where each element would correspond to a row element of the matrix. and all incognints of that row would be inside the expression ex. for a 4x4 system ecuaiton, the storage would be like this
<"certainKey1", ex1* > = B1*symbol2 + A1*symbol1 + D1*symbol4 <"certainKey2", ex2*> = D2*symbol4 + B2*symbol2 + C2*symbol3 <"certainKey3", ex3* > = B3*symbol2 + A3*symbol1 + D3*symbol4 + C3*symbol3 <"certainKey4", ex4*> = A4*symbol1 + B4*symbol2 + C4*symbol3
this method was using an acceptable ammount of memory, around 15% of 32GB which was pretty good. the bad side of this approach was that when building the Ginac::matrix, i had to sort the elements and search each of them on their corresponding row, additionaly some where hidden because they had coefficient "0", so i had a list of symbols and on each row i searched for them, in a sorted way using the "coeff" method that Ginac provides. But i repeat, this was slow.
The point is that i decided to search for other solution, aiming on reducing the time needed to build the Ginac::Matrix. i came with the idea to use map< string, <string, ex> > mapMatrix, which is actually the matrix but sorted automatically because of the strings which are the incognits keyCode. now building the matrix is almost instant, no search needed, and all elements already sorted.
however, the program is using all the RAM = 32GB when doing the 8000x8000 problem.
i checked and rechecked for any memory leak, but havent found any so far,
now the question = why did memory increase so much for just segmenting the expressions into separated smaller ones?? is they same ammount of terms, just in separated sub-expressions., the ex object is too expensive??
_______________________________________________ GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
i kept investigating, based on the test i've made, im almost sure, 99%, that the increase in memory usage is due to the segmentation of the expressions. based on this.... i can deduce that somehow having the ecuation system is much economic than having the matrix form, up to an order of magnitude mor economic, can the developers confirm this?? On Wed, Oct 20, 2010 at 2:47 PM, Cristobal Navarro <axischire@gmail.com>wrote:
yes, it adds some memoery usage, but doing the math i found that it isnt too much,
for the 8000x8000 problem, each string should be in the worst case length 21 (for example string s = "Ld0x1d2x3d4x5d6x7d8x9" ), assuming std::string is 1 byte per length unit: stringMemory <= 21 bytes * 64M <= 1344Mbytes <= 1.3GB
i was considering the equivalent situation but for the expressions, which now are 64M smaller ones compared to 8000 bigger ones from before could it be that they use so much memory?
On Wed, Oct 20, 2010 at 1:27 PM, James Jackson <james.jackson@cern.ch>wrote:
If I understand correctly, previously you were storing N strings, now you are storing N^2. This is 64M strings, as opposed to 8000. This might account for at least some of the memory!
Regards, James
On 20 Oct 2010, at 16:45, Cristobal Navarro <axischire@gmail.com> wrote:
Hello everybody,
i've been using ginac for some months and is very optimum, but recently i've been facing a problem with the amount of memory used to store some expressions. i have a linear system of ecuations, with the size of around 8000 elements.
my first approach on building the matrix was use a map<string, ex*> where each element would correspond to a row element of the matrix. and all incognints of that row would be inside the expression ex. for a 4x4 system ecuaiton, the storage would be like this
<"certainKey1", ex1* > = B1*symbol2 + A1*symbol1 + D1*symbol4 <"certainKey2", ex2*> = D2*symbol4 + B2*symbol2 + C2*symbol3 <"certainKey3", ex3* > = B3*symbol2 + A3*symbol1 + D3*symbol4 + C3*symbol3 <"certainKey4", ex4*> = A4*symbol1 + B4*symbol2 + C4*symbol3
this method was using an acceptable ammount of memory, around 15% of 32GB which was pretty good. the bad side of this approach was that when building the Ginac::matrix, i had to sort the elements and search each of them on their corresponding row, additionaly some where hidden because they had coefficient "0", so i had a list of symbols and on each row i searched for them, in a sorted way using the "coeff" method that Ginac provides. But i repeat, this was slow.
The point is that i decided to search for other solution, aiming on reducing the time needed to build the Ginac::Matrix. i came with the idea to use map< string, <string, ex> > mapMatrix, which is actually the matrix but sorted automatically because of the strings which are the incognits keyCode. now building the matrix is almost instant, no search needed, and all elements already sorted.
however, the program is using all the RAM = 32GB when doing the 8000x8000 problem.
i checked and rechecked for any memory leak, but havent found any so far,
now the question = why did memory increase so much for just segmenting the expressions into separated smaller ones?? is they same ammount of terms, just in separated sub-expressions., the ex object is too expensive??
_______________________________________________ GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
Hi again, On Fri, Oct 22, 2010 at 12:22 AM, Cristobal Navarro <axischire@gmail.com> wrote:
i kept investigating, based on the test i've made, im almost sure, 99%, that the increase in memory usage is due to the segmentation of the expressions.
I don't think the problem has anything to do with GiNaC. Hint: try sticking ints (instead of ex*) into that map. Check the memory usage, and compare it with your calculations. Use a different data structure (perhaps hash map or a sorted array), and store ex instead of pointers. Best regards, Alexei
Alexei thanks for helping, i am going to do something equivalent to what you suggested, i will fill all expresions of the mapMatrix with zeros, and see how much RAM i consume because of the strings. ill post my results in a moment. best regards Cristobal On Fri, Oct 22, 2010 at 9:53 AM, Alexei Sheplyakov < alexei.sheplyakov@gmail.com> wrote:
Hi again,
On Fri, Oct 22, 2010 at 12:22 AM, Cristobal Navarro <axischire@gmail.com> wrote:
i kept investigating, based on the test i've made, im almost sure, 99%, that the increase in memory usage is due to the segmentation of the expressions.
I don't think the problem has anything to do with GiNaC. Hint: try sticking ints (instead of ex*) into that map. Check the memory usage, and compare it with your calculations.
Use a different data structure (perhaps hash map or a sorted array), and store ex instead of pointers.
Best regards, Alexei _______________________________________________ GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
i made the test, here is what i can tell you from my experiments 1) for a mapMatrix ( map<string, map<string, ex> > ) of 8524 x 8524 where all expression where 0, the RAM usage was 6GB ~ 18% of 32GB, very close to your stimation Alexei, you where right on that 2) however, if i repeat the experiment but let the expressions be what they should be (which are pretty large), then RAM usage goes beyond 32GB, i predict is something like 40GB of usage. 3) i would be complaining about the high ammount of RAM usage if it werent because of the following peculiarity. If instead of a mtrix, i decide to store the expresions in a system ecuation form with its corresponding variables (in the way that all elements of a row are grouped into one big polinomial expression), RAM usage is becomes low again! 2) and 3) have the same ammount of terms, just grouped in a different way, it is strange, i am gonna do some more tests and any news i will post back, Alexei, what is your opinion on my experience? On Fri, Oct 22, 2010 at 12:34 PM, Cristobal Navarro <axischire@gmail.com>wrote:
Alexei
thanks for helping,
i am going to do something equivalent to what you suggested, i will fill all expresions of the mapMatrix with zeros, and see how much RAM i consume because of the strings. ill post my results in a moment. best regards Cristobal
On Fri, Oct 22, 2010 at 9:53 AM, Alexei Sheplyakov < alexei.sheplyakov@gmail.com> wrote:
Hi again,
On Fri, Oct 22, 2010 at 12:22 AM, Cristobal Navarro <axischire@gmail.com> wrote:
i kept investigating, based on the test i've made, im almost sure, 99%, that the increase in memory usage is due to the segmentation of the expressions.
I don't think the problem has anything to do with GiNaC. Hint: try sticking ints (instead of ex*) into that map. Check the memory usage, and compare it with your calculations.
Use a different data structure (perhaps hash map or a sorted array), and store ex instead of pointers.
Best regards, Alexei _______________________________________________ GiNaC-list mailing list GiNaC-list@ginac.de https://www.cebix.net/mailman/listinfo/ginac-list
Hello, On Wed, Oct 20, 2010 at 8:47 PM, Cristobal Navarro <axischire@gmail.com> wrote:
yes, it adds some memoery usage, but doing the math i found that it isnt too much, for the 8000x8000 problem, each string should be in the worst case length 21 (for example string s = "Ld0x1d2x3d4x5d6x7d8x9" ), assuming std::string is 1 byte per length unit: stringMemory <= 21 bytes * 64M <= 1344Mbytes <= 1.3GB
I'm afraid your calculation is wrong. Let's forget about GiNaC for a moment and estimate the memory usage of map<string, map<string, long> >. std::map is a red-black tree. Every node has 3 pointers (2 children and parent), color (int), and the actual data (key and value). So, every node consumes 3*sizeof(void *) + sizeof(int) + padding + sizeof(string) + sizeof(long) + strlen(TypicalString) = 4*sizeof(void *) + sizeof(int) + padding + sizeof(long) + strlen(TypicalString) (std::string is essentialy a char* as far as memory usage is concerned). We got N trees each of N nodes, thus we need (4*sizeof(void *) + sizeof(int) + padding + sizeof(long) + strlen(TypicalString))*N^2 bytes. On x86_64 this will be (48 + strlen(TypicalString))*N^2. If typical string is 16 characters long, this equals 64*N^2 # N kb 1000 62500 2000 250000 3000 562500 4000 1000000 To check this theory consider the following program (it doesn't use GiNaC): // mapmap.cpp #include <map> #include <string> #include <sstream> #include <iostream> #include <cstdlib> #include <unistd.h> #include <sys/resource.h> using namespace std; int main(int argc, char** argv) { cout << "sizeof(string) " << sizeof(string) << endl; const int dflt_size = 800; int size = dflt_size; if (argc >= 2) size = atoi(argv[1]); if (size <= 10) size = dflt_size; map<string, map<string, long> > big_map; for (int i = 0; i < size; ++i) { map<string, long> m; for (int j = 0; j < size; ++j) { ostringstream oss; oss << "Row " << i << " Col" << j; // ~ 16 chars m[oss.str()] = i + j; } ostringstream oss; oss << "Row" << i; big_map[oss.str()] = m; } struct rusage rus; getrusage(RUSAGE_SELF, &rus); cout << size << "\t" << rus.ru_maxrss << std::endl << std::flush; return 0; } $ g++ -O2 -Wall -pipe -march=native -o mapmap mapmap.cpp $ for i in 1000 2000 3000 4000; do ./mapmap $i; done On my system (Athlon 64 X2 running Debian GNU/Linux) it prints # N kb 1000 112096 2000 450804 3000 1019464 4000 1833620 Experimental values are ~ 2x bigger than ones predicted by theory. Perhaps I've missed a pointer to vtbl somewhere. Anyway, it's clear that memory usage is O(N^2), and N = 8000 would consume ~ 2^2 * 2Gb = 8Gb (I'm unable to check this experimentally -- my computer has 2Gb of RAM). Conclusion: the binary tree is not suitable large amounts of data. Perhaps you need to use a different data structure. Also, use ex instead of ex* (ex is actually a wrapped pointer). Hope this helps, Alexei
participants (3)
-
Alexei Sheplyakov
-
Cristobal Navarro
-
James Jackson