Nina: automata description language


Reference Top - PREV: Definition part - NEXT: Sub-automata describing part

Automaton describing part

In this part, the body of automaton is described.

States

The state is described as a rectangular which is made by characters. You can describe freely a region in the rectangular. (The region may be parsed by some parsing engines.) A character at the upper left is recognized as a meta character.

  1. A normal state which has an uppercase label
     *LABEL****
     *        *
     **********
    
  2. A normal state which has a label; The label name must be enclosed by single quote(')
     *'a label'****
     *            *
     **************
    
  3. A normal state which has an upper case label and the type of attribute of the state
     *LABEL<String>**
     *              *
     ****************
    
  4. A normal state which has no labels
     **************
     *            *
     **************
    
  5. An accept state which has an uppercase label
     @LABEL@@@@
     @        @
     @@@@@@@@@@
    
  6. The initial state; Transition of the automaton begins from the state The initial state must have the label whose name is "S". Any automaton must have one inital state.
     *S********** ===========
     *new style * =old style=
     ************ ===========
    
  7. The initial and accept state
     @S@@@@@@@@@@ &&&&&&&&&&&
     @new style @ &old style&
     @@@@@@@@@@@@ &&&&&&&&&&&
    
  8. The dead state which has a label named "D".
    If the engine of the automaton can find no states and the dead state is specified, the engine tries to move the dead state.
     *D**********
     *          *
     ************
    
  9. Handling of exceptions(Java, C#). A state whose name is ended by 'Exception', which captures an exception corresponds to the label name.
     *'ArithmeticException'*
     *                     *
     ***********************
    

Edges

Edges are described by the characters '-', '|', '+'.
A edge can have a transition input. If a edge have no inputs, the transition will be an empty (ε) transition.
Characters feature
- moves left or right
| moves up or down
+ branches the edge
(*)Edges can only branch, not join. Notation shown as follows is illegal.
****
*  >-a-\
****   |   ****
       +--->  *
****   |   ****
*  >-b-/
****
    
'...' specifies a label by a string which enclosed by ''.
In the single quote, escape sequences(\n, \r, \t, \\, \') are available.
If the label is a string, Nina translator makes an automaton fragment which recognize the string.
i.e. the automaton fragment shown as follows
      ****         ****
      *  >--'abc'-->  *
      ****         ****
    
is equivalent the automaton fragment shown as follows.
      ****     ****     ****     ****
      *  >--a-->  >--b-->  >--c-->  *
      ****     ****     ****     ****
    
"..." specifies a label by a string which enclosed by "".
If builder is DFABuilder and input alphabet is char, the engine tries to match all characters of string and backtrack inputs if the engine can not match all characters. i.e. the automaton fragment shown as follows
      ****         ****
      *  >--"abc"-->  *
      ****         ****
    
matches only the input "abc". And if the input would be "abd", the input "ab" would be backtracked.
".../lookahead" works same as "..." but if builder is DFABuilder and input alphabet is char, the input correspond to lookahead will be unread.
".../`lookahead-regexp`" works same as "..." but if builder is DFABuilder and input alphabet is char, accepts if the end of inputs matched by lookahead-regexp and inputs correspond to lookahead-regexp will be unread.
".../!lookahead" works same as "..." but if builder is DFABuilder and input alphabet is char, accepts if the end of inputs do NOT matched by lookahead and inputs correspond to lookahead-regexp will be unread.
[...] specifies a character set. The metacharacters are available as a character set.
a character(which is not any metacharacter) represents the character itself
\xnn represents a character whose code is nn(hexadecimal).
\unnnn reprecents a character whose code is nnnn by Unicode.
character1-character2 represents a range of characters
\d, \D represents ASCII digit or non-ASCII digit, respectively
\w, \W represents ASCII word or non-ASCII word, respectively
\s, \S represents ASCII space or non-ASCII space, respectively
\p{category}, \P{category} represents a character which is matched by the Unicode category.
The category name is the same as the regex of Java
character/rangecharacter/range represents the union of these characters/ranges
character/range&character/range represents the intersection of these characters. That priority is lower than union
[character/range] groups the character set
[^character/range] represents the complement of the character set
{...} invokes the subautomaton whose name is enclosed by {}
subautomaton can invoke recursively, and you can describe a grammar correspond to LL(1).
(...) moves a labeled terminal which name is enclosed by ()
(*)The name of terminal must be one-to-one. You can not describe shown as follows.
****           ****
*  >-a(1) (1)-->  *
****           ****

               ****
          (1)-->  *
               ****
    
You can specify a name of state as a label name.
If you use a name of state as the name, the name may be one-to-many.
label/'...' specifies an action if the label is transited
/regex/ specifies a label which matches by regex(GNFA)
Metacharacters shown as follows are available.
a character(which is not any metacharacter) represents the character itself
\xnn represents a character whose code is nn(hexadecimal).
\unnnn reprecents a character whose code is nnnn by Unicode.
\d, \D represents ASCII digit or non-ASCII digit, respectively
\w, \W represents ASCII word or non-ASCII word, respectively
\s, \S represents ASCII space or non-ASCII space, respectively
\p{category}, \P{category} represents a character which is matched by the Unicode category.
The category name is the same as the regex of Java
[range] represents a character set
regexregex represents concatenation of regexes
regex|regex represents alternation of regexes
regex* represents repetation with more than 0 repeats
regex+ represents repetation with more than 1 repeats
regex? represents repetation with 0 or 1 repeats
regex{n,m} represents repetation with a minimum of n repeats and a maximum of m repeats
regex{,m} represents repetation with a maximum of m repeats
regex{n,} represents repetation with a minimum of n repeats
(regex) groups the regex, the engine do not capture

The result captures in $buffer whose type is StringBuffer(Java), StringBuilder(C#), or string(JavaScript). Prefixes shows as follows are available.
?: do not capture in $buffer
?> matches "atomically".
?= matches as lookahead
${name} refers the label which is defined by #label, #define
$ matches the end of input
. When the engine is NFABuilder, this matches any characters
When the engine is DFABuilder, this matches the other characters
/ turns the direction. You should use '+' instead of this.
beforeafter
upright
rightup
downleft
leftdown
\ turns the direction. You should use '+' instead of this.
beforeafter
upleft
rightdown
downright
leftup

Connecting a state and an edge

Connectors describe by the charecters such as '^', '>', '<' and 'v'.
Characters feature
^ An edge connected if the upper side of the state.
An edge is connected to this state if the lower side of the state.
v An edge connected if the lower side of the state.
An edge is connected to this state if the upper side of the state.
< An edge connected if the left side of the state.
An edge is connected to this state if the right side of the state.
> An edge connected if the right side of the state.
An edge is connected to this state if the left side of the state.


Reference Top - PREV: Definition part - NEXT: Sub-automata describing part
Yuichiro Moriguchi
yuichiro-moriguchi´╝ánifty.com
SourceForge.JP