_________ __ __ \_ ___ \_____ ______ ____/ | ____ / |_ / \ \/\__ \ / \ / __ \ | / _ \ __\ \ \___ / __ \ | | \ ___/ |_( <_> ) | \______ (_____/__|__| _\____\_____\____/|__| \/ \/ (c) Lada 'Ray' Lostak (c) Orcave inc. (c) 2003 Content: Camelot - brief documentation History: 03/04/2003 Ray - initial version 1.0. Gramar file syntax ----------------------- Camelot uses for parsing 'rules' glued together into a single file (optionally multiple). This file can be used for generate valid FC component, which 'parses' source or can be directly executed. Executing is slower than generated code, but suefull for debug or runtime parsing (where rules are genrated someway). Parser (generated or interpretted) uses 'destiny' object to store input datas. Functions of this destiny object fully depends on user. Generated code calls these functions directly, interpreted version uses callbacks. Order, parametres and other thigs depends on creator. Destiny and given gramar file works in conjuction. Gramar itselfs recognizes 2 modes: 1. char stream source stream is processed char per char without any knowledge about tokens. mode is usefull for source type, where code generates other code (source can't be parser before executing). typical use of char stream can be preprocessors 2. token stream before processing source is converted to token-stream, which maked parsing signitifically faster. this mode is usefull for static source. It mean sources, which are not changed during executing itselfs Gramar file itselfs ------------------- Syntax of gramar file is slightly different for token/char modes. Differences are marked. File supports standard ANSI C commonents (short and long). everything between /* and */ or after // to end of line is threated as comment and is not processed in any way. File consists of 2 parts: gramar-properties %% rules Both sections are divided by %% followed by new line. Rules can appear only in rules section as well as gramar properties appear on its section. gramar-properties ----------------- Gramar properties consists of various infromations, such as error message texts, token deffinitions, regular expression matchs and others. Every gramar property start by % character, which need to be first non-blank character. Immediatelly after % char follows name of property. Some properties uses also additional parametres. messages -------- %msg type identifier "text" Message instruction defined 'message', facility, identifier and text. Facility is one of following keywords: fatal - if any rule raise this message, camelot stop parsing error - after raising this message, camelot process excpetion arise warning - similar to error Identifier is any combination of letters and numbers. By this number, rules can raise/show messages. Text is 'body' of message. It can reference parameters from 'stack' by refering to %1 ... %9. If you want to use single % character, you have to use %% instead. \ is processed as escape character. Escape character as usual. regual-expressions-matchs ------------------------- %[NAME] regular_expression Regual expression is standard one, you can use set of characters [], repeats ? * + and boolear or |. There is big difference from normal regualr expressions. Regular expression is SIMPLE to be matched fast. it means, () are not supported, regular expression matchs (while repeating) the LONGEST match always. | can't be used in conjuction with repeating. Regular expressions start with / and ends also with / - for example: /[a-zA-Z_][a-zA-Z0-9_]*/ If it doesn't start with / but with ANY other character, it is takes in 'exact' matchs and all characters to eol (inlcuding blanks !) are matched as-is. Matchs can cal another matchs form its bodies. Regular expression should be written in UPPERCASE. additional-properties --------------------- %start start_rule sets starting rule for parsing (if not given, first rule will be started) %mode modes,modes sets working modes: emit - enable collecting 'output' char - enable char mode token - enable token mode Group modes (char/token, ...) can't be used together ofcourse. gramar-rules ------------ Gramar rules can be glued into 'sections' which can be called. Called throw stack (after processing sub-rule section continue by next rule). Section start by rule name (combination of letters, numbers and - char) followed by : and new line. Rule section end with ; followed by new line. example-rule: ; Every line inside rule section hold physical rules. It's general form: [flags] match [| match2 | ...]: command1 command2 command3 ; or single 'line' form match [| match2 | ...]: command So, new line after ':' means sub-block of rules. So, rule consists of match and commands which will be executed if match is matched. Match can be ommited and only 'command' can be given. In this case, : is also ommited. Rules are executed (tested) up-down, left-right respectively. What can be matched ? As match can be used: 1. regular expression match which is defined in the first part of gramar file 2. another rule section (subblock) 3. special operatos (reserved words) 4. stack contents - %X where X is decimal number 5. any character by using . 6. anything without eating input - just leave match empty What commands can be executed ? 1. stack-call of rule-section: just name of rule-section which to execute 2. jump to specified rule-section: put > immediatelly before rule-section name 3. special commands (raise, catch, destiny referencing, ....) What flags can be used ? 1. > if rule is matched, process its commands and return from current section 2. + store match result to 'calling stack' 3. - do not store result of match to stack 4. ! reverse rule match 5. | procees only if previous rule was NOT macthed If match consists of more sub-matchs delimited by | character, rule is executed if ONE of given match is matched. Rules are matched up-down. It means, of ONE rule is macthed, its commands are processed and then NEXT rule in the list is macthed. If next rules uses | in the flags, they are skipped. Every match stores its result to stack. Stack can be referenced by %X where X is decimal number which menas 'how long to stack you want to look'. This default behavious can be affected by flags ofcourse. Every/command NEED TO BE present on the ONE SINGLE line. Only commands can be written on more lines using block (see above). Example: test-rule: blaks // calls rule-section 'blanks' identifier: // if rule section identifier is matched, call 'destiny.fuck' method destiny.fuck ; DONE: >away // if DONE regualr expression is mactched, jump to section 'away' RES_1: res_1 // if RES_1 regular expression is matched, call res_1 (and conitnue with rule PROP_1) | RES_2: res_2 // if RES_1 wasn't macthed and RES_2 was macthed, call res_2 section (and cointnue with PROP_1) | . : >error // else jump to error section PROP_1: prop_1 // if PROP_1 matched, call prop_1 section !END: >error // if END not matched, jump to error ; Special commands: raise(text_iden) raise text identifier as exception. Go throw call stack and find handler. If handler not found, walk all rules from start to end and try to find match. When match found, jump here Special matchs: catch list_of_text_idens catch given text iden exceptions. * ? can be used as wildcards. When match with catch is processed without expcetion, it is *NOT* matched and skipped. Stack ----- Every match stores into stack its 'result' (if not given - flag). These results are accesible throw %1, %2, .... Stack is valid only within contents of function calling. It means, of rule-section pushses ABC to ctack, it can be accessed by its 'childs' by NOT by its parent. Function calling stack ---------------------- If you need to call destiny collector, you have to push parametres to calling stack. Parametres, its count and other depends on destiny type.