Page 1 of 3 123 LastLast
Results 1 to 10 of 25

Thread: Tokenizer- user-keys, but how ???

  1. #1
    thinBasic MVPs
    Join Date
    Oct 2012
    Location
    Germany
    Age
    54
    Posts
    1,529
    Rep Power
    170

    Tokenizer- user-keys, but how ???

    I try the tokenizer-engine to parse a script and to recognize thinBasic-keywords and equates.

    But how does it work?

    I tried this way (and a few others already)

    Uses "console", "Tokenizer" 
    
    Begin Const
    
      %Token_TBKeyword  = 100
      %Token_TBEquate
      %Token_Comment
      %Token_Parenthesis
    End Const
    
    
    Function TBMain()
      
      ' read in all keywords:
      '    ---  enter a valid path here if thinBasic is not installed on "C:\" !
    
      SetupKeywords( "c:\thinBasic\thinAir\Syntax\thinBasic\thinBasic_Keywords.ini" )
      ' run tokenizer on this script:
      Tokenize(APP_SourceName)
      
      PrintL "------------------------- key to end"
      WaitKey
       
    End Function
    
    Sub SetupKeywords(ByVal sFile As String)
    
      Local allKeywords() As String
    
      Local i     As Long
           
      Parse File sFile, allKeywords, $CRLF
      Array Sort allKeywords, Descend       ' brings empty elements to the end
      
      While StrPtrLen(StrPtr(allKeywords(UBound(allKeywords)))) = 0  
                                            ' remove empty elements  
        ReDim Preserve allKeywords(UBound(allKeywords)-1)      
      Wend
      Array Sort allKeywords, Ascend        ' now sort as needed
      
      Tokenizer_Default_Char("#", %TOKENIZER_DEFAULT_ALPHA)
      Tokenizer_Default_Char("$", %TOKENIZER_DEFAULT_ALPHA)
      Tokenizer_Default_Char("%", %TOKENIZER_DEFAULT_ALPHA)
      Tokenizer_Default_Char(":", %TOKENIZER_DEFAULT_NEWLINE)
      
      Tokenizer_KeyAdd("'", %Token_Comment,      0)
      Tokenizer_KeyAdd("(", %Token_Parenthesis,  1)
      Tokenizer_KeyAdd(")", %Token_Parenthesis, -1)
      
      For i = 1 To UBound(allKeywords)
        Select Case Peek(Byte, StrPtr(allKeywords(i)))
          Case 36, 37  ' $, %
             Tokenizer_KeyAdd(allKeywords(i), %Token_TBEquate, i)
          Case Else
             Tokenizer_KeyAdd(allKeywords(i), %Token_TBKeyword, i)
        End Select
      Next
      
    End Sub
    
    Sub Tokenize(sFile As String)
      
      
      Local sToken, sCode                           As String 
      Local lPos, lMain, lSub, lParenthesis, lLines As Long
      Local pKey                                    As DWord
      
      sCode = Load_File(sFile)
      If StrPtrLen(StrPtr(sCode)) = 0 Then Exit Sub
      
      lPos  = 1
      Do
        pKey = Tokenizer_GetNextToken(sCode, lPos, lMain, sToken, lSub)    
        Incr lLines
        
        Select Case lMain
          Case %TOKENIZER_FINISHED
            Exit Do
          Case %TOKENIZER_ERROR
            Exit Do
          Case %TOKENIZER_QUOTE
            PrintL "quoted string : " & sToken
          Case %TOKENIZER_DELIMITER
            PrintL "delimiter : " & sToken
          Case %TOKENIZER_NUMBER
            PrintL "number : " & sToken
          Case %TOKENIZER_EOL
            PrintL "(new line)"  
          Case Else
            Select Case Tokenizer_KeyGetMainType(pKey)
              Case %Token_TBKeyword 
                PrintL "TBKeyword :" & sToken
              Case %Token_TBEquate
                PrintL "TBEquate :" & sToken
              Case %Token_Parenthesis 
                lParenthesis += Tokenizer_KeyGetSubType(pKey)
                PrintL = "parenthesis :" & sToken & Str$(lParenthesis)
              Case %Token_Comment
                Tokenizer_MoveToEol(sCode, lPos, TRUE)
                PrintL "comment"
              Case Else 
                PrintL "other token :" & sToken  
            End Select
        End Select 
        
        If lLines > 20 Then
          PrintL "------------------- key to continue --------------"
          WaitKey
          lLines = 0
        EndIf
      
      Loop
      
      If lParenthesis <> 0 Then PrintL "found unbalanced parenthesis"
      
    End Sub
    

    why does it not recognize my user-tokens?
    Last edited by ReneMiner; 01-11-2015 at 20:39.
    I think there are missing some Forum-sections as beta-testing and support

  2. #2
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    Hi René,

    sorry for the delaying replying but I was getting crazy about this while the problem is so simple.
    When you use Tokenizer_KeyAdd(...) you must specify the Key in Upper Case otherwise the tokenizer is not able to find it.

    So change your key load loop to something like:
      For i = 1 To UBound(allKeywords)
        Select Case Peek(Byte, StrPtr(allKeywords(i)))
          Case 36, 37  ' $, %
             Tokenizer_KeyAdd(ucase$(allKeywords(i)), %Token_TBEquate, i)
          Case Else
             Tokenizer_KeyAdd(ucase$(allKeywords(i)), %Token_TBKeyword, i)
        End Select
      Next
    
    It is not written into the manual.
    I'm thinking to automatically do it in the Module.
    I need to double check if this change will work in all the situations.

    Ciao
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  3. #3
    thinBasic MVPs
    Join Date
    Oct 2012
    Location
    Germany
    Age
    54
    Posts
    1,529
    Rep Power
    170
    thanks for the info. Now it's getting interesting and Tokenizer much more powerful.

    Now some very basic question to this: assume i make tokens from global variables or UDT's/Types/Subs/Functions in a script and i want to load another script to tokenize then and the tokens of the previous script are no longer valid. I was searching for something as
    pKey = Tokenizer_KeyFind("MY_UNNEEDED_TOKEN")
    
    Tokenizer_KeyRemove(pKey) ' but i did not find anything like this
    
    What operation to perform to kill or remove a user-key from Tokenizer-Engine?


    +++ one more (see example above incl. your suggested changes)
    I did something as
      Tokenizer_KeyAdd("'", %Token_Comment,      0)
      Tokenizer_KeyAdd("(", %Token_Parenthesis,  1)
      Tokenizer_KeyAdd(")", %Token_Parenthesis, -1)
    ' Tokenizer_KeyAdd($DQ & "console" & $DQ, %Token_Comment, 123) ' - i also tried Ucase$() here...
    
    but these are still just delimiters. I tried quoted strings also, as shown above.

    Are user-tokens limited to %Tokenizer_String-group only?
    Last edited by ReneMiner; 12-11-2015 at 10:26.
    I think there are missing some Forum-sections as beta-testing and support

  4. #4
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    I think in the short term I can do something like Tokenizer_Reset in order to clear all internal Tokenizer data structure and programmer can reload a new grammar.

    Regarding string delimiters, most of them are "predefined" and you cannot change "predefined" one like ()' ...
    I will check what I can do, maybe the possibility is already in place but not explained.

    Will see this night
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  5. #5
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    René,

    I've checked the code of the module and I think we cannot go too much far with current setup unless we break syntax compatibility with module functions as they are now developed.
    I'm developing a Tokenizer Module Class able to wrap current functionalities and to be further developed in order to have many tokenizers (different objects of the Tokenizer class) working at the same time each with its rule.

    As soon as I have something to test, I will publish here.

    Ciao
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  6. #6
    thinBasic MVPs
    Join Date
    Oct 2012
    Location
    Germany
    Age
    54
    Posts
    1,529
    Rep Power
    170
    Quote Originally Posted by ErosOlmi View Post
    ...
    I'm developing a Tokenizer Module Class able to wrap current functionalities and to be further developed in order to have many tokenizers (different objects of the Tokenizer class) working at the same time each with its rule.

    As soon as I have something to test, I will publish here.

    Ciao
    Eros
    sounds good.
    Very nice if different tokenizers could run in different setups depending on the current task
    I think there are missing some Forum-sections as beta-testing and support

  7. #7
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    I'm working on Tokenizer. So far I have done some initial work in order to create a module class able to setup some features and load and check keys.
    Not really something that can do interesting things but is seems to get a shape.
    Now I'm working on real tokenizer. After that I will publish a working module.

    All previous features will continue to work as now.

    Below an example on how syntax is:

    '---
    '---Loads needed modules
    '---
    Uses "File"
    Uses "Console"
    Uses "Tokenizer"
    
    
    Long cBrightCyanBGBlue  = %CONSOLE_FOREGROUND_BLUE | %CONSOLE_FOREGROUND_GREEN | %CONSOLE_FOREGROUND_INTENSITY | %CONSOLE_BACKGROUND_BLUE
    Long cBrightRed         = %CONSOLE_FOREGROUND_RED | %CONSOLE_FOREGROUND_INTENSITY
    
    
    
    
    Function TBMain() As Long
      Quad T0, T1       '---Performance timing
      Long Counter
    
    
      '---Declare a new tokenizer engine
      Dim MyParser As cTokenizer
    
    
      '---Instantiate new tokenizer
      MyParser = New cTokenizer()
    
    
      PrintL "-Configure Tokenizer---" In cBrightCyanBGBlue
        '---Change some default behave in char parsing
        MyParser.Default_Char("$", %TOKENIZER_DEFAULT_ALPHA)
        MyParser.Default_Char("%", %TOKENIZER_DEFAULT_ALPHA)
        MyParser.Default_Char(":", %TOKENIZER_DEFAULT_NEWLINE)
        
        '---Those two lines are equivalent
        MyParser.Default_Char(";", %TOKENIZER_DEFAULT_NEWLINE)
        MyParser.Default_Code(37 , %TOKENIZER_DEFAULT_NEWLINE)
        
        '---A set of chars can be indicated in one go using Default_Set method
        MyParser.Default_Set("$%", %TOKENIZER_DEFAULT_ALPHA)
        MyParser.Default_Set(":;", %TOKENIZER_DEFAULT_NEWLINE)
      
        MyParser.Options.CaseSensitive = %FALSE
        PrintL "Tokenizer option, CaseSensitive = " & MyParser.Options.CaseSensitive
    
    
      PrintL
      PrintL "-Loading keys---" In cBrightCyanBGBlue
        '---Create a new keywords group. Assign a value >= 100
        Dim MyKeys    As Long Value 100
      
        Dim sFile         As String = "c:\thinBasic\thinAir\Syntax\thinBasic\thinBasic_Keywords.ini"
        Dim allKeywords() As String
        Dim nKeys         As Long
      
        T0 = Timer
        nKeys = Parse File sFile, allKeywords, $CRLF
        PrintL "Number of keys I'm going to load:", nkeys
        For Counter = 1 To UBound(allKeywords)
          MyParser.Key.Add(allKeywords(Counter), MyKeys, Counter)
        Next
        T1 = Timer
        PrintL "Loading time: " & Format$(T1 - T0) & " mSec"
    
    
      
      PrintL
      PrintL "-Checking some keys---" In cBrightCyanBGBlue
        
        PrintL "If the following few lines will return a number, all is ok"
        PrintL MyParser.Contains("Dim"   )
        PrintL MyParser.Contains("As"    )
        PrintL MyParser.Contains("PrintL" )
        PrintL MyParser.Contains("Uses"  )
        PrintL MyParser.Contains("Zer")
        
      WaitKey
    
    
    End Function
    
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  8. #8
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732
    Eros,

    do you ever sleep ? Fantastic...

    I must admit some concepts in original Tokenizer a bit confusing - for example, the Default* functions talk about groups, but these are like character groups, while user groups are about whole tokens, strings... This could be made more clear/separated.

    For example:
    MyParser.SpecialChars.Set|Get|Add|AddAscii|Remove

    - Set => assigns them all at once, erasing any previous setup
    - Get => returns current characters as string
    - Add => add to the characters, if not present already
    - AddAscii => add to the characters by ASCII code, if not present already
    - Remove => removes character from SpecialChars

    Then we have user keywords.

    I think user should not care about some reserved items in range 0..99. He could be simply provided with .CreateGroupType to receive a "handle" he can work with further. Then .DestroyGroupType could release all data for given group.

    tbKeywords = myParser.CreateKeyGroup("ThinBASIC Keywords")
    myParse.Keys.Add("DIM", tbKeywords)
    myParse.Keys.Add("AS", tbKeywords)
    myParse.Keys.Add("LONG", tbKeywords)

    SubType could come as 3rd parameter, because it could be optional.

    .Keys.Remove could be useful.

    Just ideas, better to discuss via Skype, maybe?


    Petr
    Last edited by Petr Schreiber; 19-11-2015 at 00:20.
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  9. #9
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    In reality DEFAULTs is an internal array of each ascii codes indexed by the the ASC code of each char.
    Inside the array each letter is marked with its default type: alphabetic, numeric, delimiter, new line, ... and so on.
    When you set some default you just change the type of that char.

    Anyway, I agree syntax is not that elegant in this case
    Will consider a change.

    Regarding internal generation of parsing dictionary groups ... great idea!

    In the meantime I've developed keys search
        PrintL "Checking MainType and SubType of DIM key"
        PrintL "DIM MainType: " & MyParser.Key("Dim").MainType
        PrintL "DIM SubType : " & MyParser.Key("Dim").SubType
    
    
      PrintTitle "-A BIG search of 1M Keys ---"    
    lTimer.Start
        For Counter = 1 To 1000000
          MyParser.Key("Dim").MainType
        Next
    lTimer.Stop
        PrintL "1M search time in mSec: " & lTimer.Elapsed(%CTIMER_MILLISECONDS)
    
    I'm using PowerBasic PowerCollection in order to store keys. It is not efficient like my personal hash table but it allows me more flexibility.
    Searching for 1M times a key inside a almost 7000 keys it takes 4 seconds on my PC.
    Hope to half the time at the end

    Thanks
    Eros
    Last edited by ErosOlmi; 19-11-2015 at 00:18.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  10. #10
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    Quote Originally Posted by Petr Schreiber View Post
    do you ever sleep ?
    Today: 2 hours travel to work, 10 hours at work, 1.5 hours to return home
    Some eat, 1.5 hour in programming at home.
    Now I want a BIG 6 hours sleeping.
    Tomorrow morning  awake at 6:30
    GOTO Today
    
    oops: I've used a GOTO
    Last edited by ErosOlmi; 19-11-2015 at 00:27.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

Page 1 of 3 123 LastLast

Similar Threads

  1. Comments and other keys
    By Michael Hartlef in forum M15 file format
    Replies: 2
    Last Post: 15-02-2008, 23:52
  2. Tokenizer: Configurizable?
    By Michael Hartlef in forum Tokenizer
    Replies: 12
    Last Post: 02-05-2007, 22:20

Members who have read this thread: 1

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •