Powerful Xml Analysis with Prolog

Introduction

SWI Prolog provides a Prolog-friendly version of Xpath which consists of defining Xpath expressions as Prolog terms.

The actual API is documented here.

There are a few hurdles which one has to face, coming from using standard Xpath expressions (e.g. with XSLT) and having to transpose them into the Prolog flavor.

Having integrated this difference with the standard, one can quickly measure the immense flexibility offered by these Prolog xpath predicates for performing some complex Xml Analysis.

Key to this is the possibility to use unification involving some Xml intermediate Node objects, using them as current (local) context nodes, which in Xpath is referred to as the “Axis”.
Another concept of Xpath, the NodeSet, can also be covered elegantly by unification.

I would like now to showcase this on a small example, using a fictitious tree description, and some imaginary information one would like to extract from it.


Resources for this article

Prolog source code and XML examples are available on [github], and are linked to this article.

To help enhance the readability of the predicates’ output, some Xml elements below are sometimes “ellipsed” with a “…” .

Below is the actual Xml source for my example, reproduced.





















Getting the GPS position of the tree

% first a broken example…
false.

% …which is then resolved by:
V = '46.8-7.15' ;
false.

In the first attempt, gps-position does not qualify as Prolog term, due to the “-“ character. One can quickly check this fact with:

?- atom(gps-position).
false.

?- atom('gps-position').
true.

The same rule applies for Xml nodes with uppercase characters. They have to be single quoted in the Prolog expression, otherwise the xpath predicate will silently return “false”.

Getting all the birds in the tree

?- get_birds(Bird).
Bird = element(bird, [type=owl, age='2'], []) ;
Bird = element(bird, [type=cuckoo, age='6'], []) ;
Bird = element(bird, [type=owl, age='36'], []) ;
Bird = element(bird, [type=sparrow, age='12'], []) ;
Bird = element(bird, [type=sparrow, age='28'], []) ;
false.

Unifying on intermediate nodes allows to refine extractions naturally

?- get_cuckoos(C).
C = element(bird, [type=cuckoo, age='6'], []) ;
false.

Where are the cuckoos ? (bottom-up search)

A slight digression
For my work XML configuration files have often to be processed in “bottom-up” fashion, whereby matches on inner nodes (e.g. a property valid for a Java EE web application, such as a servlet mapping or filter) will dictate which upper nodes (e.g. a web-app node describing a web application) have to be considered for maintenance or comparison.
The example with cuckoos and branches/nests is in that sense a paraphrase of this process.

Below is the call to load_tree now made visible in the output.

?- nests_with_cuckoos(B, N).
Loading Xml tree.
Loading Xml tree.
Loading Xml tree.
B = element(branch, [height='310'], ['\n ', element(orientation, [], [north]), …,
N = element('NEST', [type='owl-nest'], ['\n ', element(bird, [type=owl, age='2']…
Loading Xml tree.

However one finds out that without further “cutting” is the load_tree predicate called several times on backtracking.

Introducing Tabling

By adding the following clauses at the beginning of our source
:- use_module(library(tabling)).
:- table load_tree/1.
is the previous problem resolved, by memoizing the result once and for all, with tabling.

?- nests_with_cuckoos(B, N).
Loading Xml tree.
B = element(branch, [height='310'], ['\n ', element(orientation, [], [north]), …,
N = element('NEST', [type='owl-nest'], ['\n ', element(bird, [type=owl, age='2']…;
false.

Fazit (conclusion)

The examples predicates demonstrated here are just 2 or 3 lines long, since the Prolog unification allows to reuse predicates and combine them for:
  • NodeSet selection and incremental filtering
  • expressing complex relationships between NodeSets
… in a way that really feels natural.

In a coming article I’d like to build on this knowledge and present various strategies one can come up with to implement meaningful “diffs” involving several Xml trees, in Prolog.

Comments

Popular posts from this blog

A possible solution for using the janus interface on MacOS with docker

Meaningful Xml Diffs with SWI-Prolog

Prolog DCG "distilled" - Part 3 - Capturing and assembling textual elements from rules