Prolog DCG "distilled" - Part 1

Introduction


Here’s a new series of short articles that concentrate on one powerful tool within Prolog Programming, “Definite Clause Grammars”. These are covered in a number of tutorials on the web (refer to the information in the appendix). 

However I found a step by step approach, "distilled" with tests, to be the most appropriate learning path, as the topic might otherwise be a bit daunting for newcomers to comprehend.

The context where I personally used the technology is for parsing application server logs and parsing programming languages such as Java.

A number of problems arise when parsing text, such as:
  • the ability to reuse generic structures such as lists and options to structure grammars
  • the ease of capturing the elements recognized by the parser, and maintaining state across several rules (productions)
  • the handling of ambiguous contexts when looking up text tokens (“lookahead”)
  • the handling of rules with left recursion
The effectiveness of the Prolog DCG comes from offering a great deal of control over such aspects.

Forthcoming articles

  • Building your own vocabulary to structure grammars, covered in this article and the next.
  • Capturing and assembling textual elements from rules.
  • Handling conflicting rules with lookahead information.
  • Handling left recursion.
  • Parsing Xml Files
  • Parsing Yaml Files
  • Parsing Java 8 Code

Defining reusable building blocks for grammars

The code and tests for this article may be found [here]

The vocabulary defines 4 reusable DCG rules, which may be combined with other rules as follows:


hat(H) --> {H='hat'}. % terminal (atom)
hut(H) --> {H='hut'}. % terminal (atom)
chapeau(H) --> (      % grammar production
    hat(H)
  | hut(H)
  ).
chapeaux(Cs) --> list_of(chapeau, Cs).      % list 0.. chapeaux
chap_min(Cs) --> list_min1(chapeau, Cs).    % list 1.. chapeaux
chap_sep(Cs) --> list_sep(chapeau, ',', Cs).% list 1.. chapeaux with separator
chap_opt(C)  --> maybe(chapeau, C).         % optional chapeau

One of the tests may be run in SWI-Prolog with those steps:

              ?- consult('/path/to/dcg_vocabulary.pl').
              true.
            ?- test_list_of_2(Cs).
            Cs = [hat, hut];
            false.

This technique of combining DCGs to build a common vocabulary (list_of, list_sep, etc... ) is an important step to structure more complex grammars for any substantial parsing work.

Defining reusable building blocks for parsing strings

Similar to the previous rules designed to recognize lists of atoms such as ‘hat’, ‘hut’ etc…, one can define the following 4 rules (conc_of, conc_min1, conc_sep, maybe_str) to recognize concatenation of strings (in SWI-Prolog strings can be converted to lists of character codes thanks to the string_codes/2 predicate).

The conc_sep rule presented below is slightly more generic than list_sep, as it accepts a reference to a rule defining which sequence of characters constitutes a separator:

bird1(B)  --> "cuckoo", { string_codes("cuckoo", B) }.    % codes are [99, 117, 99, 107, 111, 111]
bird2(B)  --> "sparrow", { string_codes("sparrow", B) }. % codes are [115…]
birdie(B) --> ( bird1(B) | bird2(B) ).                                 % grammar production
wspace(W) --> {W=32}.                                                  % code is 32

% separator is either the comma or a sequence of whitespaces
separ(C)  --> (
    list_min1(wspace, C)
  | ",",    {string_codes(",", C)}
  ).
birdies(Bs) --> conc_sep(birdie, separ, Bs). % a list (concatenation) of birds, with a separator
bird_opt(B) --> maybe_str(birdie, B).              % a bird or nothing

In the next article we'll see in detail how this works, with the help of the SWI-Prolog debugger.

Additional Reading

The predicates presented here are analogous to those offered by this SWI Prolog library: dcg/high_order

This article covers EBNF grammar syntax and support with library dcg4pt:(PDF)

Appendix : Introductory articles on DCG




Comments

Popular posts from this blog

A possible solution for using the janus interface on MacOS with docker

Meaningful Xml Diffs with SWI-Prolog

Prolog DCG "distilled" - Part 3 - Capturing and assembling textual elements from rules