13.1 How Grammars are used in Ascolog Insight

In Ascolog Insight a grammar (formal grammar) is used to describe the structure of log records. A grammar consists of nonterminal symbols, terminal symbols, production rules and a start symbol. These elements of a grammar can be modified in the Grammar Definition window. The table Ascolog Insight's usage of a grammar shows how a grammar's elements are used in Ascolog Insight to identify the pieces of information of log records.

Ascolog Insight's usage of a grammar
Terminology Explanation Usage in Ascolog Insight
Terminal symbols A terminal symbol is a symbol which cannot be replaced by other symbols.

Terminals in Ascolog Insight could be timestamps, numbers, alphanumeric characters, letters, colons, etc.

Ascolog Insight uses regular expressions to define the terminal symbols.

Alphabet An alphabet is the set of all terminal symbols of a formal grammar.  
Nonterminal symbols

A nonterminal symbol is replaced by other nonterminal or terminal symbols by applying a production rule.

Notation: Nonterminals are marked in this documentation by angle brackets: <nonterminal>

Example: The header of a log record can be represented by a nonterminal symbol. By applying production rules an abstract header is transformed to a “real” header (a word in formal grammar terminology):

<Header>::=<Timestamp>;<Description> is transformed to

2013-12-24 10:23:23,000;A dummy description!

Production rule A production rule describes how to transform a nonterminal symbol. The nonterminal symbol to be replaced is located on the left side of the character sequence “::=”. It is replaced by the sequence of symbols on the right side of “::=”. Both nonterminal and terminal symbols are allowed on the right side. On the left side of “::=” there is exactly one nonterminal symbol in the grammars supported by Ascolog Insight.

Example: <Severity> ::= HIGH|MEDIUM|LOW

Explanation: The nonterminal <Severity> can be replaced by either the terminal HIGH, MEDIUM or LOW.

 

Example:

<LogRecordHeader>::=<Timestamp>;<Severity>|<Timestamp>

Explanation: A header of a log record consists of a timestamp (nonterminal), separated by a semi-colon (terminal) from a severity level (nonterminal) OR only of a timestamp.

Word or string A word is a finite sequence of characters. A word is created from an alphabet by applying production rules. The words in Ascolog Insight are the valid headers (the complete header, not only parts!) for the log records of a log type. Example: The severity INFO is not a word but a terminal!
Formal Language The set of all strings which can be created by applying the production rules of the formal grammar. All headers of a log belonging to a certain log type.
“|” The vertical bar indicates a choice. The following statement defines that a header consists of a <Level> OR the terminal “STOP”:<Header>::=<Level>|STOP
Epsilon Denotes the empty terminal. The empty terminal is used for options.

The following statement makes the nonterminal <Level> optional:

<Header>::=<Level>|epsilon

Symbol repetitions Repetitions must be defined by using recursions, i.e. the nonterminal symbol on the left occurs also on the right side.

The nonterminal <Digit> should be allowed to occur several times:

<DigitSequence>::=<Digit>|<Digit><DigitSequence>

When Ascolog Insight applies a grammar on a log file it first does a lexical analysis, i.e. it splits the content of the log file into meaningful pieces of information (tokens) based on the terminal definitions of the grammar. The next step is the syntactic analysis (parsing) where the tokens and production rules of the grammar are used to create a parse tree. In this step Ascolog Insight checks if the log file is valid according to the grammar. When the log file is valid the tree is parsed in order to assign the pieces of information of the log file to columns.

Ascolog Insight uses a context-free LL(1) grammar which means that the context of a non-terminal symbol is not considered by a production rule. The context of a non-terminal symbol are its preceding and succeeding symbols. Thus on the left side of the character sequence “::=” there must be only one nonterminal symbol. LL(1) means that the production rule which must be applied depends only on the first next token in the log file that was not processed yet. Ascolog Insight uses a notation similar to the Backus-Naur form to describe a grammar. The Backus-Naur form is a common notation for context-sensitive grammars.

In Ascolog Insight a header must be located after a carriage return or at the beginning of the log. If all log records are identified the structure of the log is known. If no header is found at the beginning of a log an error message is displayed. In order to identify a header regular expressions would also be sufficient in many cases. However, regular expressions cannot be used to identify the single parts of a header which is needed to effectively analyze the content of a header.