Author Topic: microB tokenizer  (Read 515 times)

0 Members and 1 Guest are viewing this topic.

Aurel

  • Sr. Member
  • ****
  • Posts: 402
microB tokenizer
« on: March 24, 2019, 01:42:12 AM »
Hi ..
I open new topic because i don't want to make mess on another topic "token engine".
This should be minimal but complete tokenizer (lexer) with error checking, for small subset of basic like interpreter.
It should be added to main program as include file but for now it is standalone for testing purpose.


Code: [Select]
'microB tokenizer by Aurel 24.3.2019
Include "microBh.inc"
int tkNULL=0, tkPLUS=1, tkMINUS=2, tkMULTI=3, tkDIVIDE=4
int tkCOLON=5, tkCOMMA=6, tkLPAREN=7, tkRPAREN=8, tkLBRACKET=9, tkRBRACKET=10
int tkPRINT=11, tkDOT=12, tkLINE=13, tkCIRCLE=14 , tkEOL = 20
string tokList[256] : int typList[256]   'token/type arrays
int start , p = 1 ,start = p ,tp ,n      'init
int lineCount, Lpar, Rpar, Lbrk, Rbrk, tokerr
string code,ch,tch,tk ,crlf=chr(13)+chr(10),bf
'--------------------------------------------------------------------
code = "func(a,b): var1+ 0.5*7: str s="+ chr(34)+ "micro"+chr(34) + crlf + "if a>b: arr 10]" + crlf     ' test or load_src?
'--------------------------------------------------------------------
sub tokenizer(src as string) as int
lineCount=1
while p <= len(src)
    ' print "P:" + str(p)       
     ch = mid(src,p,1)                                                 'get char

 If asc(ch)=32 : p=p+1 : end if                                        ' skip blank space[ ]
 If asc(ch)=9  : p=p+1 : end if                                        ' skip TAB [    ]
 If asc(ch)=13 and mid(src,p+1,1)= chr(10)                             ' skip CRLF & lineCount+1 / EOL
if Lpar > Rpar  : tokerr=3 : goto tokExit : end if        ' if Rparen ((...)
if Lpar < Rpar  : tokerr=4 : goto tokExit : end if   ' if Lparen (...))
if Lbrk > Rbrk  : tokerr=5 : goto tokExit : end if                ' if Lbracket [..
if Lbrk < Rbrk  : tokerr=6 : goto tokExit : end if                ' if Rbracket ...]
 lineCount++ : p=p+2
 End if
 
'--------------------------------------------------------
 If asc(ch)=34                                                         ' if char is QUOTE "
 p++ :  ch = mid(src,p,1) : tk=ch : p++                                ' skip quote :add ch TO tk buffer: p+1
while asc(ch) <> 34       
   ch = mid(src,p,1) : if asc(ch)= 34 then exit while
        tk=tk+ch : p++
        IF ch = chr(10): tokerr = 2: goto tokExit : end if
wend
    tp++ : tokList[tp] = tk : tk="":ch="": p++                         ' add quoted string to token list
 End if
'-------------------------------------------------------           
 If (asc(ch)>96 and asc(ch)<123)          ' [a-z]
   while (asc(ch)>96 and asc(ch)<123) or (asc(ch)>47 and asc(ch)<58)   ' [a-z0-9]*
         tk=tk+ch : p++ : ch = mid(src,p,1)
   wend
      'print "TOK-AZ:" + tk + " PAZ:" + p
       tp++ : tokList[tp] = tk : tk="":ch=""       
       'return IDENT;
 End If
'--------------------------------------------------------------
'While (Asc(Look) > 47 And Asc(Look) < 58) Or Asc(Look) = 46'
 If (asc(ch)>47 and asc(ch)<58)                                       ' [0-9.]
    while (asc(ch)>47 AND asc(ch)<58) OR asc(ch)=46                   ' [0-9[0.0]]*
        tk=tk+ch :p++ : ch = mid(src,p,1)
    wend
        'print "Pnum:" + str(p)
       tp++ : tokList[tp] = tk : tk="":ch=""
       'return NUMBER;
 End if
'---------------------------------------------------
 If asc(ch)=43 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' + plus
 If asc(ch)=45 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' - minus
 If asc(ch)=42 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' * multiply
 If asc(ch)=47 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' / divide
 If asc(ch)=40 : tp++ : tokList[tp] = ch : ch="" : p++ : Lpar++ : End if  ' ( Lparen
 If asc(ch)=41 : tp++ : tokList[tp] = ch : ch="" : p++ : Rpar++ : End if  ' ) Rparen
 If asc(ch)=44 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' , comma
 If asc(ch)=58 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' : colon
 If asc(ch)=60 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' < less
 If asc(ch)=61 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' = equal
 If asc(ch)=62 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' > more(greater)
 If asc(ch)=91 : tp++ : tokList[tp] = ch : ch="" : p++ : Lbrk++ :End if  ' [ Lbracket
 If asc(ch)=93 : tp++ : tokList[tp] = ch : ch="" : p++ : Rbrk++ :End if  ' ] Rbracket
 If asc(ch)=38 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' & AND
 If asc(ch)=124 :tp++ : tokList[tp] = ch : ch="": p++ : End if  ' | OR
 If asc(ch)=33 : tp++ : tokList[tp] = ch : ch="" : p++ : End if  ' ! NOT

 'elseif...
 'End if
IF ASC(ch)>125: tokerr = 1 : goto tokExit: END IF

wend
return tp
tokExit:
  IF tokerr > 0
if tokerr = 1: MsgBox "Unknown token!-[ " + ch +" ] at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 2: MsgBox "Unclosed Quote!- at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 3: MsgBox "Missing right paren! ((...)- at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 4: MsgBox "Missing left paren!- at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 5: MsgBox "Missing right bracket!- at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 6: MsgBox "Missing left bracket!- at LINE: " + str(lineCount),"T:Error"  : end if

Return 0
  END IF
end sub

'call tokenizer..tested(ident,numbers)
int tn: tn = tokenizer(code) : if tn=0 then goto ExitProgram
print "Number of tokens: " + str(tn) + crlf + "Number of lines: " + str(lineCount)
for n = 1 to tn : bf = bf + tokList[n] + crlf : next n
print  bf

ExitProgram:
if tn=0: print "Program Terminated!": end if

and include microBh.inc
Code: [Select]
'microB.include by Aurel 24.3.2019
! MessageBox Lib "user32.dll" Alias "MessageBoxA" (ByVal hWnd As Long, ByVal lpText As String, ByVal lpCaption As String, ByVal dwType As Long) As Long
! MsgBox (byval lpText AS STRING,byval lpCaption AS STRING) as INT

'MsgBox---------------------------------------------------------------------
Function MsgBox (byval lpText AS STRING,byref lpCaption AS STRING) as INT
If lpCaption = "" then lpCaption="<MsgBox>"
Function =  MessageBox 0, lpText, lpCaption, 0
End Function
« Last Edit: March 24, 2019, 02:09:51 AM by Aurel »
my site:BLOG and FORUM
https://aurelsoft.ucoz.com/

Aurel

  • Sr. Member
  • ****
  • Posts: 402
Re: microB tokenizer
« Reply #1 on: March 26, 2019, 09:51:57 AM »
After some fixing ... now is time to execute expression.
This is just first test of token evaluator and work only with numbers.
Complete syntax /logic error part should be aded later as separate routine.

microB:
Code: [Select]
'microB_Interpreter - with recursive descent token evaluator
' by Aurel - 26.3.2019
include "microAT.inc"                               'tokenizer include
#lookahead
int tc=0                                            ' token count
string token                                        ' def token

'test 1 - load source string from microB.inc Tokenizer
string code = "2+3+4*(-2+3)*0.55" + crlf             'enter your expression / crlf=EOL
codeLen=len(code)
tn = run_tokenizer(code)
MsgBox  str(tn) ,"Tokenizer Out"                     ' 1 means OK!
'---------------------------------------------------------------
' sintax / logic error block ?
'---------------------------------------------------------------
'if tokenization error=0 then OK!..execute
If tokerr = 0
exec()
End if




'-----------------------------------------------------
sub gettok()
tc++
token = tokList[tc]
'test
if tokList[tc+1] <> "" then return
end sub
'----------------------------------------------------
sub expr() as float
float v
If token = "-"
 v = -(term())
else
 v = term()
end if
 
while token = "+" or token = "-"
if token = "+": gettok() : v = v + term(): end if
if token = "-": gettok() : v = v - term(): end if
wend

return v
end sub
'---------------------------------------------------
sub term() as float
float v
v = factor()

while token = "*" or token = "/"
if token = "*": gettok() : v = v * factor(): end if
if token = "/": gettok() : v = v / factor(): end if
wend

return v
end sub
'-------------------------------------------------------

sub factor() as float
float v
if asc(token)>47  and asc(token)<58 'nums
v = val(token)
'print str(v)+ " factor"
gettok()
end if

if asc(token)=40 and asc(token)<>41 'match (...)
gettok() : v = expr() : gettok()
end if


return v
end sub

'execute-----------------------------------------------------
sub exec
gettok()'start
float res = expr()
print "RESULT=" + str(res)
end sub



microAT.inc:
Code: [Select]
'microAT tokenizer by Aurel 26.3.2019
Include "microBh.inc"
declare sub tokenizer( src as string) as INT
int tkNULL=0, tkPLUS=1, tkMINUS=2, tkMULTI=3, tkDIVIDE=4
int tkCOLON=5, tkCOMMA=6, tkLPAREN=7, tkRPAREN=8, tkLBRACKET=9, tkRBRACKET=10
int tkIDENT = 11 , tkNUMBER = 12 , tkSTRING = 13, tkCOMMAND =14 ,tkEOL = 15
int tkEQUAL = 16, tkMORE = 17, tkLESS =18,tkAND=19, tkOR=20, tkNOT = 21
int tkHASH=22 , tkSSTR=23, tkMOD=24

string tokList[1024] : int typList[1024]   'token/type arrays
int start , p = 1 ,start = p ,tp , tn, n ,ltp=1     'init
int lineCount, Lpar, Rpar, Lbrk, Rbrk, tokerr ,codeLen=0
string code,ch,tch,tk ,crlf=chr(13)+chr(10),bf,ntk
'--------------------------------------------------------------------
'code = "2*(3+4)"     + crlf  +  ' line 1
       '"': b =6 "   + crlf  +  ' line 2
      ' ":if a>b"    + crlf     ' line 3
'--------------------------------------------------------------------
sub tokenizer(src as string) as int
'print "tokenizer run;" + src
lineCount=0:ltp=start
while p <= len(src)
 '................................................................................................         
    ch = mid(src,p,1)                                                  'get char
 If asc(ch)=32 : p=p+1 : end if                                        ' skip blank space[ ]
 If asc(ch)=9  : p=p+1 : end if                                        ' skip TAB [    ]
 if asc(ch)=13 : p=p+1 : end if                                        ' skip CR
 if asc(ch)=39                                                         ' skip comment line[ ' ]                                                       
    while asc(ch) <> 10
      p++ : ch = mid(src,p,1) : if asc(ch)= 10 then exit while
    wend
   p++: goto endLoop                                                   ' jump to end of loop
 end if

 If asc(ch)=10                                                         ' EOL
if Lpar > Rpar  : tokerr=3 : goto tokExit : end if   ' if Rparen ((...)
if Lpar < Rpar  : tokerr=4 : goto tokExit : end if   ' if Lparen (...))
if Lbrk > Rbrk  : tokerr=5 : goto tokExit : end if   ' if Lbracket [..
if Lbrk < Rbrk  : tokerr=6 : goto tokExit : end if   ' if Rbracket ...]
 lineCount++ : tp++ : tokList[tp]="EOL" :typList[tp]= tkEOL: tk="": ch="" : p++
 End if
'--------------------------------------------------------
 If asc(ch)=34                                                         ' if char is QUOTE "
 p++ :  ch = mid(src,p,1) : tk=ch : p++                                ' skip quote :add ch TO tk buffer: p+1
while asc(ch) <> 34       
   ch = mid(src,p,1) : if asc(ch)= 34 then exit while
        tk=tk+ch : p++
        IF ch = chr(10): tokerr = 2: goto tokExit : end if
wend
    tp++ : tokList[tp]= tk :typList[tp]= tkSTRING: tk="":ch="": p++    ' add quoted string to token list
 End if
'-------------------------------------------------------           
 If (asc(ch)>96 and asc(ch)<123)          ' [a-z]
   while (asc(ch)>96 and asc(ch)<123) or (asc(ch)>47 and asc(ch)<58)   ' [a-z0-9]*
         tk=tk+ch : p++ : ch = mid(src,p,1)
   wend
      ' ' add token ,add token type/IDENT:{VAR/COMMAND}
       tp++ : tokList[tp] = tk :typList[tp]= tkIDENT: tk="":ch=""       
 End If
'--------------------------------------------------------------
 If (asc(ch)>47 and asc(ch)<58)                                       ' [0-9.]
    while (asc(ch)>47 AND asc(ch)<58) OR asc(ch)=46                   ' [0-9[0.0]]*
        tk=tk+ch :p++ : ch = mid(src,p,1)
    wend
       ' add token ,add token type/NUMBER
       tp++ : tokList[tp] = tk : typList[tp]= tkNUMBER: tk="":ch=""
 End if
'--------------------------------------------------------------------
 If asc(ch)=43 : tp++ : tokList[tp] = ch :typList[tp]= tkPLUS:    ch="" : p++ : End if  ' + plus
 If asc(ch)=45 : tp++ : tokList[tp] = ch :typList[tp]= tkMINUS:   ch="" : p++ : End if  ' - minus
 If asc(ch)=42 : tp++ : tokList[tp] = ch :typList[tp]= tkMULTI:   ch="" : p++ : End if  ' * multiply
 If asc(ch)=47 : tp++ : tokList[tp] = ch :typList[tp]= tkDIVIDE:  ch="" : p++ : End if ' / divide
 If asc(ch)=40 : tp++ : tokList[tp] = ch :typList[tp]= tkLPAREN:  ch="" : p++ : Lpar++ : End if ' ( Lparen
 If asc(ch)=41 : tp++ : tokList[tp] = ch :typList[tp]= tkRPAREN:  ch="" : p++ : Rpar++ : End if ' ) Rparen
 If asc(ch)=44 : tp++ : tokList[tp] = ch :typList[tp]= tkCOMMA:   ch="" : p++ : End if  ' , comma
 If asc(ch)=58 : tp++ : tokList[tp] = ch :typList[tp]= tkCOLON:   ch="" : p++ : End if  ' : colon
 If asc(ch)=60 : tp++ : tokList[tp] = ch :typList[tp]= tkLESS:    ch="" : p++ : End if  ' < less
 If asc(ch)=61 : tp++ : tokList[tp] = ch :typList[tp]= tkEQUAL:   ch="" : p++ : End if  ' = equal
 If asc(ch)=62 : tp++ : tokList[tp] = ch :typList[tp]= tkMORE:    ch="" : p++ : End if  ' > more(greater)
 If asc(ch)=91 : tp++ : tokList[tp] = ch :typList[tp]= tkLBRACKET:ch="" : p++ : Lbrk++ :End if  ' [ Lbracket
 If asc(ch)=93 : tp++ : tokList[tp] = ch :typList[tp]= tkRBRACKET:ch="" : p++ : Rbrk++ :End if  ' ] Rbracket
 If asc(ch)=38 : tp++ : tokList[tp] = ch :typList[tp]= tkAND:     ch="" : p++ : End if  ' & AND
 If asc(ch)=124 :tp++ : tokList[tp] = ch :typList[tp]= tkOR:      ch="" : p++ : End if       ' | OR
 If asc(ch)=33 : tp++ : tokList[tp] = ch :typList[tp]= tkNOT:     ch="" : p++ : End if  ' ! NOT
 If asc(ch)=35 : tp++ : tokList[tp] = ch :typList[tp]= tkHASH:    ch="" : p++ : End if  ' # hash
 If asc(ch)=36 : tp++ : tokList[tp] = ch :typList[tp]= tkSSTR:    ch="" : p++ : End if  ' $ $TRING
 If asc(ch)=37 : tp++ : tokList[tp] = ch :typList[tp]= tkMOD :    ch="" : p++ : End if  ' % percent/MOD
 
IF ASC(ch)>125: tokerr = 1 : goto tokExit: END IF
'.............................................................................................
endLoop:
wend
Return tp
tokExit:
  IF tokerr > 0
if tokerr = 1: MsgBox "Unknown token!-[ " + ch +" ] at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 2: MsgBox "Unclosed Quote!- at LINE: " + str(lineCount),"T:Error"              : end if
if tokerr = 3: MsgBox "Missing right paren! ((...)- at LINE: " + str(lineCount),"T:Error"  : end if
if tokerr = 4: MsgBox "Missing left paren!- at LINE: " + str(lineCount),"T:Error"          : end if
if tokerr = 5: MsgBox "Missing right bracket!- at LINE: " + str(lineCount),"T:Error"       : end if
if tokerr = 6: MsgBox "Missing left bracket!- at LINE: " + str(lineCount),"T:Error"        : end if
Return 0
  END IF
end sub

/*'call tokenizer..tested(ident,numbers) /////////////////////////////////
int tn: tn = tokenizer(code)
*/
 'if tn=0 then goto ExitProgram
sub run_tokenizer(s as string )
 tn = tokenizer(s)
print "Number of tokens: " + str(tn) + crlf + "Number of lines: " + str(lineCount)
for n = 1 to tn : bf = bf + tokList[n] + crlf : next n
MsgBox bf,"Token List:"
end sub

if codeLen>0
ExitProgram:
print "Program Terminated!"
end if

microBh.inc:
Code: [Select]
'microBh.include by Aurel 24.3.2019
! MessageBox Lib "user32.dll" Alias "MessageBoxA" (ByVal hWnd As Long, ByVal lpText As String, ByVal lpCaption As String, ByVal dwType As Long) As Long
! MsgBox (byval lpText AS STRING,byval lpCaption AS STRING) as INT

'MsgBox---------------------------------------------------------------------
Function MsgBox (byval lpText AS STRING,byref lpCaption AS STRING) as INT
If lpCaption = "" then lpCaption="<MsgBox>"
Function =  MessageBox 0, lpText, lpCaption, 0
End Function
my site:BLOG and FORUM
https://aurelsoft.ucoz.com/

Aurel

  • Sr. Member
  • ****
  • Posts: 402
Re: microB tokenizer
« Reply #2 on: August 25, 2019, 02:12:32 AM »
Compile fine and work properly with 0.2.6
 :D

just one small quirk...
i must change
Dim user32 .. to -> INT user32 to define lib holder
« Last Edit: August 25, 2019, 03:41:59 AM by Aurel »
my site:BLOG and FORUM
https://aurelsoft.ucoz.com/

jack

  • Full Member
  • ***
  • Posts: 136
Re: microB tokenizer
« Reply #3 on: August 25, 2019, 03:20:03 AM »
Aurel, there are a number of places where you use Sub instead of Function, even though O2 will tolerate that, it does not look good to me
I translated you code to FreeBASIC because my PC is a Mac and therefore O2 is not available, it seems to run ok

Aurel

  • Sr. Member
  • ****
  • Posts: 402
Re: microB tokenizer
« Reply #4 on: August 25, 2019, 03:36:20 AM »
Quote
even though O2 will tolerate

Jack
In o2 it is the same to use SUB or FUNCTION
sorry i don't know that is not same in Free basic bacause i use it very rare.
(only when i found some interesting example then i compile it with FB)

I use name SUB when routine is small and when don't require Returned value.
if is larger then I use FUNCTION.

Code for this program is simple as can be and should work in many BASIC dialects.
« Last Edit: August 25, 2019, 03:43:05 AM by Aurel »
my site:BLOG and FORUM
https://aurelsoft.ucoz.com/

Aurel

  • Sr. Member
  • ****
  • Posts: 402
Re: microB tokenizer
« Reply #5 on: August 25, 2019, 03:39:09 AM »
jack

do you can attach screenshot of program in FB on Mac. ?
my site:BLOG and FORUM
https://aurelsoft.ucoz.com/

jack

  • Full Member
  • ***
  • Posts: 136
Re: microB tokenizer
« Reply #6 on: August 25, 2019, 03:59:43 AM »
Aurel, here's the output , I replaced MessageBox with a simple Print
Code: [Select]
Number of tokens: 15
Number of lines: 1
2
+
3
+
4
*
(
-
2
+
3
)
*
0.55
EOL

15
RESULT=7.2

jack

  • Full Member
  • ***
  • Posts: 136
Re: microB tokenizer
« Reply #7 on: August 25, 2019, 04:04:07 AM »
I use name SUB when routine is small and when don't require Returned value.
if is larger then I use FUNCTION.
Code: [Select]
sub tokenizer(src as string) as int
about 8 more similar to that

jack

  • Full Member
  • ***
  • Posts: 136
Re: microB tokenizer
« Reply #8 on: August 25, 2019, 04:56:24 AM »
Jack
In o2 it is the same to use SUB or FUNCTION
from OxigenBasic doc
Quote
Oxygen Basic Procedures
...
function   Defines a procedure that returns a value.
sub   Defines a procedure that does not return a value.

Aurel

  • Sr. Member
  • ****
  • Posts: 402
Re: microB tokenizer
« Reply #9 on: August 25, 2019, 05:18:55 AM »
Ahh yes
you have a right i use it this program more probably because i am lazy to type function
thanks for test..it is right result. :D

This doc is not right.
In Oxygen  sub or function is same , and in fact are same
put it on this way function is a sub-routine with a fancy name.

YES works the same in o2.

( if you don't like to call it sub just use editor and replace all sub-s with functions)
my site:BLOG and FORUM
https://aurelsoft.ucoz.com/