13.2.2.1 Ambiguous Regular Expressions

The regular expressions of the grammar's terminal symbols must not be ambiguous, i.e. there must not be two regular expressions in a grammar definition that match the same character sequence.

Examples for ambiguous regular expressions are given in table Ambiguous regular expressions.

Ambiguous regular expressions
Regular expression Match
[A-Za-z0-9] and [0-9] Both expressions match a character sequence consisting only of natural numbers: 123, 0333, 98
[A-Za-z0-9] and ERROR | WARNING | INFO Both expressions match the character sequences ERROR, WARNING, INFO and nothing else.
[~(0-9)] and [A-Z] Both expressions match a character sequence consisting of capital letters of the alphabet, e.g. ERROR, LOG, FOLDER but not Line2, A&O

Ambiguous regular expressions lead to problems when Ascolog Insight parses a log. Ascolog Insight displays an error message when it encounters an ambiguous regular expression. If an ambiguous regular expression is detected the one that better describes the character sequences which can occur in a log file should be kept. It is not necessarily a disadvantage when regular expressions are used that are quite general, e.g. [A-Za-z] instead of [A-Z].

When Ascolog Insight analyzes a log the first step is the lexical analysis. In the lexical analysis the goal is to assign character sequences of the log to terminals. If terminals with ambiguous regular expressions are allowed (e.g. [0-9] and [A-Za-z0-9]) the application cannot decide what terminal better matches a character sequence because Ascolog Insight does not know anything about the meaning of the provided information. The log records in example Two log records with a hexadecimal code after the timestamp will be used to illustrate the problem with ambiguous regular expressions.

2011-09-30 23:59:14 1234 Setup started.
2011-09-30 23:59:16 AB12 Setup did not complete.

Two log records with a hexadecimal code after the timestamp

The character sequence 1234 from the first log record is both numeric and alphanumeric. If terminals with ambiguous regular expressions are used (e.g. [0-9] and [A-Za-z0-9] the application cannot decide what terminal better matches 1234 because Ascolog Insight does not know the meaning of the provided information (that 1234 is a hexadecimal code). In the next log record the character sequence AB12 occurs which is alphanumeric but not numeric. In order to see the subsequent problems let us assume that Ascolog Insight identifies 1234 as a numeric terminal and AB12 as an alphanumeric terminal. In order to identify the headers of log records Ascolog Insight will have to use production rules in the syntactic analysis which is done after the lexical analysis. Since 1234 was identified as a numeric terminal the production rule <LogRecordType> ::= alphanumericTerminal will not work. As a consequence, this production rule must be replaced by <LogRecordType> ::= alphanumericTerminal | numericTerminal, otherwise the header cannot be correctly identified. This illustrates that ambiguous regular expressions require more complex production rules.