Results 1 to 7 of 7

Thread: Is there any existing code that tokenizes a single line?

  1. #1

    Is there any existing code that tokenizes a single line?

    Hi,

    I've been all over the web, and tried some of the code from several Basic dialects. But I can't seem to find any simple code that will just take a single line/string (not a file), which might contain quoted strings, and extract out either symbolic tokens or just the elements. Bint32 was mentioned several times, but I haven't been able to locate the source for that.


    The type of code I'm looking for, doesn't require while...wend or if...endif support. It would just need to be able to extract e.g.


    from
    sVar := "whats happening" (or sVar = "whats happening?" or sVar="whats happening?" )

    into
    {
    "sVar",
    ":=",
    "whats happening?"
    }


    Is there anything like that around? One of the problems with trying out each mention of 'tokenizer' or 'lexer' on the web, is that you never know ahead of time, whether it will handle quoted strings.


    Sorry if this is off-topic or already addressed, but I really have searched a lot, for the past month.

  2. #2
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732
    Hi TBQuerier,

    I think the ThinBASIC Tokenizer module can do what you ask for. Have a look at this basic example (derived from code by Eros):
    Uses "Console", "Tokenizer"
    
    Function TBMain()
                           
      String MyBuffer        ' -- Will contain string buffer to be parsed
      Long   CurrentPosition ' -- Current buffer pointer position
      Long   TokenMainType   ' -- Will contain current token main type
      String Token           ' -- Will contain current string token
      Long   TokenSubType    ' -- Will contain current token sub type       
      
      ' -- Parser tuning   
      %CustomKeywords = 100
      %CustomKeyword_Var = 1           
      %CustomKeyword_String = 2           
      Tokenizer_KeyAdd("VAR"                  , %CustomKeywords, %CustomKeyword_Var)
      Tokenizer_KeyAdd("STRING"               , %CustomKeywords, %CustomKeyword_String)
      
      
      Tokenizer_Default_Set(";", %TOKENIZER_DEFAULT_NEWLINE)
      
      ' -- Prepare text for parsing
      MyBuffer = "var sVar : string;" +
                 "sVar := ""whats happening""" 
                 
      
      
      ' -- Init current buffer position. THIS IS IMPORTANT
      CurrentPosition = 1
      
      ' -- Loops until token is end of buffer
      While TokenMainType <> %TOKENIZER_FINISHED
      
        ' -- Here we are. Most important point here is that all passed parameters
        '   must be a single variable and not an expression. This is necessary because
        '   parameters are passed by reference in order to return information about token
        ' -- 
        '   MyBuffer        must contain the string you want to parse
        '   CurrentPosition must be initialized to 1. After execution this parameter will contains
        '                   current position just after current token
        '   TokenMainType   on exit, it will contain the main type of the token found
        '   Token           on exit, it will contain the string representation of the token found
        '   TokenSubType    on exit, it will contain the sub type of the token found (if relevant)
        ' -- 
        Tokenizer_GetNextToken(MyBuffer, CurrentPosition, TokenMainType, Token, TokenSubType)                                           
        
        ' -- Write some info
        PrintL LSet$(Token, 32) + DecodeType_ToString(TokenMainType, TokenSubType)    
        
      Wend
        
      PrintL "Press any key to quit..."
      WaitKey
    
    End Function
    
    Function DecodeType_ToString( nType As Long, nSubType As Long ) As String
      String sResult
      
      Select Case nType 
        Case %TOKENIZER_FINISHED
          Return "Tokenizer finished..."
        
        Case %TOKENIZER_ERROR
          sResult = "Error"
          
        Case %TOKENIZER_UNDEFTOK
          sResult = "Undefined token"
          
        Case %TOKENIZER_EOL
          sResult = "End of line"
        
        Case %TOKENIZER_DELIMITER
          sResult = "Delimiter"
          
        Case %TOKENIZER_NUMBER
          sResult = "Number"
    
        Case %TOKENIZER_STRING
          sResult = "String"
    
        Case %TOKENIZER_QUOTE
          sResult = "Quoted"
          
        Case %CustomKeywords
          sResult = "Custom keyword / " + Choose$(nSubType, "%CustomKeyword_Var", "%CustomKeyword_String")      
      End Select   
      
      Return sResult
    
    End Function
    
    You can check ThinBasic/SampleScripts/Tokenizer for 2 more examples.


    Petr
    Last edited by Petr Schreiber; 19-08-2013 at 09:19.
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  3. #3
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    Hi TBQuerier and welcome to thinBasic community forum.

    Petr, thanks a lot for the example.
    It should give TBQuerier an idea of Tokenizer module.

    Tokenizer module has been designed to be enough general to adapt to any string data in order to identify tokens and transform into something else.
    I developed it because I too was quite frustrated to search for simple tokenizer able to do the job without much effort in writing grammars or things like that.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  4. #4
    Quote Originally Posted by ErosOlmi View Post
    Hi TBQuerier and welcome to thinBasic community forum.

    Petr, thanks a lot for the example.
    It should give TBQuerier an idea of Tokenizer module.

    Tokenizer module has been designed to be enough general to adapt to any string data in order to identify tokens and transform into something else.
    I developed it because I too was quite frustrated to search for simple tokenizer able to do the job without much effort in writing grammars or things like that.
    Perfect, thanks Ptr and Eros, I'll play around with it this week.

  5. #5
    Quote Originally Posted by ErosOlmi View Post
    Hi TBQuerier and welcome to thinBasic community forum.

    Petr, thanks a lot for the example.
    It should give TBQuerier an idea of Tokenizer module.

    Tokenizer module has been designed to be enough general to adapt to any string data in order to identify tokens and transform into something else.
    I developed it because I too was quite frustrated to search for simple tokenizer able to do the job without much effort in writing grammars or things like that.

    Thanks for the foresight, Eros. We're allowed to use any of the ThinBasic DLLs with apps from other Basic dialects (Power, Pure, Oxygen, etc.). Is that correct?

  6. #6
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    In theory yes, I have no problem.

    In practice none of the thinBasic modules (almost all the DLLs present into \thinBasic\Lib\ directory, for example thinBasic_Tokenizer.dll) can be used without the main thinBasic Core Dll called thinCore.dll because all modules calls special parsing functions present into main thinCore.dll.

    So, in few words, you cannot just use functions inside the modules you need like a standard dll. You need to run a thinBasic script or embed thinCore.dll as internal scripting language into another application.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  7. #7
    Quote Originally Posted by ErosOlmi View Post
    In theory yes, I have no problem.

    In practice none of the thinBasic modules (almost all the DLLs present into \thinBasic\Lib\ directory, for example thinBasic_Tokenizer.dll) can be used without the main thinBasic Core Dll called thinCore.dll because all modules calls special parsing functions present into main thinCore.dll.

    So, in few words, you cannot just use functions inside the modules you need like a standard dll. You need to run a thinBasic script or embed thinCore.dll as internal scripting language into another application.

    Ok, thanks Eros.

Similar Threads

  1. insert new tbgl objects into existing scene?
    By Lionheart008 in forum TBGL General
    Replies: 5
    Last Post: 22-01-2011, 17:18

Members who have read this thread: 2

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •