Meaningful Xml Diffs with SWI-Prolog
Introduction
Building on the
previous article I wrote on using the SWI Prolog Xpath predicate for
analyzing Xml Node Set relationships, I now describe a way to produce
a tree oriented diff based on similar techniques.
The goal of such a
diff is to identify objects of 2 Xml trees which differ by a given
sub-node.
In a sense is that
diff not a fully general diff, as it focuses on 2 or 3 levels:
- object
- sub-or-descendant node resp. property
- when required, property of the sub-or-descendant node
specified by the
user, within the Xml tree structure.
Another important
goal of this diff is to be able to accommodate a different ordering
of these objects in the Xml structure, which is normally a constraint
which most editors or developer tools can’t cope with.
In the following
sections I will again use a similar fictitious tree example
as in the precedent article.
Resources for this article
Prolog source code and XML examples are available on [github], and are linked to this article.
To help enhance the readability of the predicates’ output, some Xml elements below are sometimes “ellipsed” with a “…” .
Specifying a diff of nests for which birds elements are changed
Birds change events
in this example will be:
- a bird has a changed attribute – e.g. age or weight (this is a recursive definition which applies on child or descendant nodes)
- a bird is new in the nest, given the same nest in the original tree
- a bird has left the nest, compared to the same nest in the original tree
Fallen or picked up
oranges are ignored.
To describe this
diff I will use queries such as the following being applied to the
Xml original (tree1) and modified
(tree2) trees:
diff(on(//’NEST’,@type),
from(//bird,@name))
diff(on(//’NEST’,@type),
from(//bird))
diff(on(//’NEST’),
from(//bird))
with the following
semantics:
- match the nests in the original and modified Xml source, using their type property resp. their root tag & attributes
- match the birds for each of the matched nests, using their name property resp. their root tag & attributes
- show common birds with some differences, and new or deleted birds
In this example is
the following output expected:
?- diff(on(//'NEST',
@type), from(//bird, @name)).
%
no matching NEST in tree2
not found:
cuckoos-nest
N1 = element('NEST',
[type='cuckoos-nest'], []) ;
%
owli is one month younger in tree1 compared to tree2
N1 = element('NEST',
[type='owl-nest'], …,
N2 = element('NEST',
[type='owl-nest'], …,
B = element(bird,
[name=owli, type=owl, age='2'], []) ;
%
cucko is lighter in tree1 compared to tree2
N1 = element('NEST',
[type='owl-nest'], …,
N2 = element('NEST',
[type='owl-nest'], …,
B = element(bird,
[name=cucko, type=cuckoo, age='6'], [element(weight, [], ['400'])]) ;
%
owli is one month older in tree2 compared to tree1
N1 = element('NEST',
[type='owl-nest'], …,
N2 = element('NEST',
[type='owl-nest'], …,
B = element(bird,
[name=owli, type=owl, age='3'], []) ;
%
cucko is heavier in tree2 compared to tree1
N1 = element('NEST',
[type='owl-nest'], …,
N2 = element('NEST',
[type='owl-nest'], …,
B = element(bird,
[name=cucko, type=cuckoo, age='6'], [element(weight, [], ['411'])]) ;
%
jack has left his nest
N1 = element('NEST',
[type='sparrow-nest'], …,
N2 = element('NEST',
[type='sparrow-nest'], …,
B = element(bird,
[name=jack, type=sparrow, age='12'], []) ;
%
junior is born
N1 = element('NEST',
[type='sparrow-nest'], …,
N2 = element('NEST',
[type='sparrow-nest'], …,
B = element(bird,
[name=junior, type=sparrow, age='1'], []) ;
%
there’s a new nest with woody
N2 = element('NEST',
[type='woodpecker-nest'], …,
B = element(bird,
[name=woody, age='10'], []) ;
false.
Implementing the diff solution based on SWI-Prolog Xpath predicates
%
diff(on(//’NEST’,@type), from(//bird,@name))
?-
tree_d2_map(//'NEST', /self(@type), //bird, /self(@name), N1, N2, B).
%
diff(on(//’NEST’,@type),
from(//bird))
?-
tree_d2_map(//'NEST', /self(@type), //bird, /self, N1, N2, B).
This predicate is
using unification on the U1 variable, to match tree1 and tree2 NEST
nodes,
as 2 synchronized
Node Sets referenced by the SE1 and SE2 variables.
3 strategies came to
my mind, for comparing the list of sub-nodes (birds) for the matched
nests:
- using the “longuest common subsequence” a.k.a. “LCS” to handle the case where some identical sub-nodes are present for a given nest … I realized that this might not be the most meaningful technique to use in the context of the tree based diff, as some intermediary nodes might play a role in the distribution resp. grouping of the sub-nodes, so LCS applied on lists of sub-nodes are not in every case adequate
- building xpath-like expressions representing the navigation path from matched objects down to their sub-nodes, and unify these paths to match sub-nodes of tree 1 and 2 ; this strategy will be investigated in a next article, along with an interesting way to structure the code for several diff strategies
- using a map (bird_id or bird_tag => bird node) to limit comparisons to nodes with the same “identity”
This map is based on
Prolog dynamic predicates…foundbird1 and foundbird2. Their key can
be either:
- an atom, such as ‘owli’,
- or an element e.g. element(bird, [name=woody, age='10'], [])
depending on the
xpath expression passed to tree_d2_map.
The map is actually
initialized by a side effect, using a backtracking expression
encapsulated by an ignore predicate, equivalent to a Prolog findall.
Another important
aspect with Prolog’s pattern matching is that one can very simply
compare sub-trees for equality, using e.g.:
which allows one to
use write/reuse very general solutions with “node as value”
comparisons.
Implementing the diff predicate
In a next article I
will demonstrate the use of SWI Prolog shift/reset
to implement a more generic diff based on a reusable code structure
that works with multiple Node Set match strategies.
Comments
Post a Comment