Questions about expairseq::match().
Hi! I've stumbled upon the following code in ginac/exparseq.cpp:expairseq::match() 422 // Now, for every term of the pattern, look for a matching term in 423 // the expression and remove the match 424 for (size_t i=0; i<pattern.nops(); i++) { 425 ex p = pattern.op(i); 426 if (has_global_wildcard && p.is_equal(global_wildcard)) 427 continue; 428 exvector::iterator it = ops.begin(), itend = ops.end(); 429 while (it != itend) { So far so good... 430 lst::const_iterator last_el = repl_lst.end(); 431 --last_el; 432 if (it->match(p, repl_lst)) { 433 ops.erase(it); 434 goto found; 435 } This fragment looks a bit confusing. it->match(p, repl_lst) can change repl_lst, so iterators become invalid. Fortunately, in that case match returns true, so we jump out of the loop. So, we can initialize the iterator last_el *after* checking for possible matches, like this: if (it->match(p, repl_lst)) { ops.erase(it); goto found; } lst::const_iterator last_el = repl_lst.end(); --last_el; Unless I'm missing something this code is equivalent to the previous variant, but it's more readable (i.e. no confusion about possible invalidation of last_el iterator). 436 while(true) { 437 lst::const_iterator next_el = last_el; 438 ++next_el; 439 if(next_el == repl_lst.end()) 440 break; 441 else 442 repl_lst.remove_last(); 443 } There's definitely something fishy about this chunk. First of all, it seems like next_el at the line 439 is always repl_lst.end(), so this code doesn't do anything at all. Secondly, if it next_el ever changes its value from repl_lst.end(), repl_lst.remove_last() invalidates iterators (both next_el *and* last_el), so the result of next iteration is unpredictable. So, the questions are: 1. What I'm missing here? 2. What is the actual purpose of this code (lines 436 -- 443)? 3. According to git-blame, this (a little bit strange) code was introduced by commit 73f0ce4cf8d91f073f35a45443f5fbe886921c5c. The commit message says: "Fixed bug in ::match." What was the bug in first place? Could anyone please enlighten me? Best regards, Alexei -- All science is either physics or stamp collecting.
Hi, see http://www.ginac.de/pipermail/ginac-devel/2006-April/000942.html for an explanation. Alexei Sheplyakov schrieb:
This fragment looks a bit confusing. it->match(p, repl_lst) can change repl_lst, so iterators become invalid. Fortunately, in that case match returns true, so we jump out of the loop. So, we can initialize the iterator last_el *after* checking for possible matches, like this:
if (it->match(p, repl_lst)) { ops.erase(it); goto found; } lst::const_iterator last_el = repl_lst.end(); --last_el;
Unless I'm missing something this code is equivalent to the previous variant, but it's more readable (i.e. no confusion about possible invalidation of last_el iterator).
It also looks strange to me, but I am not 100% sure whether it is wrong or serves some purpose, yet.
436 while(true) { 437 lst::const_iterator next_el = last_el; 438 ++next_el; 439 if(next_el == repl_lst.end()) 440 break; 441 else 442 repl_lst.remove_last(); 443 }
There's definitely something fishy about this chunk. First of all, it seems like next_el at the line 439 is always repl_lst.end(), so this code doesn't do anything at all. Secondly, if it next_el ever changes its value from repl_lst.end(), repl_lst.remove_last() invalidates iterators (both next_el *and* last_el), so the result of next iteration is unpredictable.
First, in the current code last_el can be different from repl_lst.end() if in line 432 the call to match changes repl_lst. Second, (almost) no iterators are invalidated here. These are list iterators, not vector or deque iterators! The "almost" refers to the recursive call of match in line 432. Obviously, repl_lst is expected to change, but the code seems to be okay only for the case that the last element in repl_lst is not deleted. I am not sure about this yet, as I wrote above. Regards, Jens
Dear Jens, On Tue, Jul 15, 2008 at 08:58:09AM +0200, Jens Vollinga wrote:
see http://www.ginac.de/pipermail/ginac-devel/2006-April/000942.html for an explanation.
Thanks, now I understand my mistake.
This fragment looks a bit confusing. it->match(p, repl_lst) can change repl_lst, so iterators become invalid. Fortunately, in that case match returns true, so we jump out of the loop.
This statement is incorrect. The manual (ginac.texi) says: 4419 This function returns @code{true} when the expression matches the pattern 4420 and @code{false} if it doesn't. If used in the second form, the actual 4421 subexpressions matched by the wildcards get returned in the @code{repls} 4422 object as a list of relations of the form @samp{wildcard == expression}. 4423 If @code{match()} returns false, @code{repls} is undefined. I propose to change this (a bit confusing) behaviour and leave repls unchanged if the expression does not match the pattern. Best regards, Alexei -- All science is either physics or stamp collecting.
As of now the match() method modifies the list of matched subexpressions (its second argument) even if the expression in question does not match the pattern. Thus, this simple program #include <iostream> #include <ginac/ginac.h> using namespace GiNaC; int main(int argc, char** argv) { symbol x; ex e = pow(x, 5); ex pattern = pow(wild(), -1); lst repl; bool test = e.match(pattern, repl); std::cout << "repl = " << repl << std::endl; } prints repl = {x} Such behaviour is a bit unexpected. Sometimes it confuses even GiNaC developers, see e.g. http://www.ginac.de/pipermail/ginac-devel/2006-April/000942.html Hence this patch. Now the above program prints repl = {} as expected. --- doc/tutorial/ginac.texi | 5 +---- ginac/basic.cpp | 8 +++++++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/doc/tutorial/ginac.texi b/doc/tutorial/ginac.texi index 555b2d6..05fdca4 100644 --- a/doc/tutorial/ginac.texi +++ b/doc/tutorial/ginac.texi @@ -4420,10 +4420,7 @@ This function returns @code{true} when the expression matches the pattern and @code{false} if it doesn't. If used in the second form, the actual subexpressions matched by the wildcards get returned in the @code{repls} object as a list of relations of the form @samp{wildcard == expression}. -If @code{match()} returns false, the state of @code{repls} is undefined. -For reproducible results, the list should be empty when passed to -@code{match()}, but it is also possible to find similarities in multiple -expressions by passing in the result of a previous match. +If @code{match()} returns false, @code{repls} remains unmodified. The matching algorithm works as follows: diff --git a/ginac/basic.cpp b/ginac/basic.cpp index a3de04a..7a0633d 100644 --- a/ginac/basic.cpp +++ b/ginac/basic.cpp @@ -585,12 +585,18 @@ bool basic::match(const ex & pattern, lst & repl_lst) const if (!match_same_type(ex_to<basic>(pattern))) return false; + // Even if the expression does not match the pattern, some of + // its subexpressions could match it. For example, x^5*y^(-1) + // does not match the pattern $0^5, but its subexpression x^5 + // does. So, save repl_lst in order to not add bogus entries. + lst tmp_repl = repl_lst; // Otherwise the subexpressions must match one-to-one for (size_t i=0; i<nops(); i++) - if (!op(i).match(pattern.op(i), repl_lst)) + if (!op(i).match(pattern.op(i), tmp_repl)) return false; // Looks similar enough, match found + repl_lst = tmp_repl; return true; } } -- 1.5.6 -- All science is either physics or stamp collecting.
participants (2)
-
Alexei Sheplyakov
-
Jens Vollinga