Results 1 to 7 of 7

Thread: Bracket parsing

  1. #1
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732

    Bracket parsing

    Hi Eros,

    I am trying to use new Tokenizer for parsing JSONs, but I have hard time defining []{} as my custom types - they keep being "delimiters". Any ideas?
    #MINVERSION 1.9.16.6
    
    Uses "Console", "Tokenizer"
    
    String buffer = "[{""a"":1, ""b"":2},{""c"":3, ""d"":4}]"
    
    Dim tokenizer As cTokenizer
    tokenizer = New cTokenizer()
    
    Long brackets  = tokenizer.NewMainType("brackets")
    tokenizer.Char.Set.Delim   (",:")
    tokenizer.Keys.Add("[", brackets, 10)
    tokenizer.Keys.Add("]", brackets, 11)
    tokenizer.Keys.Add("{", brackets, 20)
    tokenizer.Keys.Add("}", brackets, 21)
    
    
    tokenizer.Scan(buffer)
    
    Long i
    For i = 1 To tokenizer.Tokens.Count
      If tokenizer.Token(i).Data = "[" Then ' -- No idea why this IF passes even for non "[" btw
          PrintL "Data    :" & tokenizer.Token(i).Data
          PrintL "MainType:" & tokenizer.Token(i).MainType & " (" & tokenizer.Token(i).MainType.ToString & ")"
          PrintL "SubType :" & tokenizer.Token(i).SubType
          PrintL                  
      End If
    Next                             
    
    WaitKey
    

    Petr
    Last edited by Petr Schreiber; 17-12-2015 at 21:16.
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  2. #2
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    Ciao Petr,

    attached a new version of thinBasic_Tokenizer.dll module that implements
    <tokenizer>.Char.Set.SubType (sDelimiters, UserSubType)
    
    I'm sorry but due to how I implemented recognition of delimiters, a standard delimiter like []{} cannot be a keys.
    But using something like
    tokenizer.Char.Set.SubType ("[", 10)
    
    you can associate a subtype number (from 1 to 255) to a single char delimiter.
    In this way you can recognize special char delimiters you want to keep track among all other delimiters.


    Regarding IF not working ... it is something I knew that sooner or later would have occurred but so far I have no a solution. It is a little tricky.
    When thinBasic Core engine encounter an IF ... or a SELECT ... it has to decide if the next expression evaluates in a string or a numeric expression.
    To achieve this it put in practice what is the so called "look ahead" technique that is the parser save the actual pointer and then look ahead for one or two tokens in order to try to understand if the next statement evaluates into a string or numeric expression. Than it goes back into the saved pointer and decide how to go on.

    But now that we have dotted notation with multiple levels, understanding if the next expressions evaluates into a string or a numeric is not that simple.
    Parser should go on for many tokes and also it would need a way to interrogate a module class asking if such sequence of tokens evaluate into a string or a number.
    At the moment I have no way to do that. I know it is a BIG PROBLEM and I will go on trying to find a way.

    A simple work around is to add en empty string "". In this way thinBasic will find a quote and immediately it will understand that the next will be a string expression
    If "" & tokenizer.Token(i).Data = "[" Then
    
    or
    If "[" = tokenizer.Token(i).Data Then
    
    otherwise it will be a numeric expression that will evaluate into
    If Val(tokenizer.Token(i).Data) = Val("[") Then
    
    and at the end the above will give
    If 0 = 0 Then
    

    Here a complete example
    #MINVERSION 1.9.16.6
    
    
    Uses "Console", "Tokenizer"
    
    
    String buffer = "[{""a"":1, ""b"":2},{""c"":3, ""d"":4}]"
    
    
    Dim tokenizer As CTOKENIZER
    tokenizer = New CTOKENIZER()
    
    
    Long brackets = tokenizer.NewMainType("brackets")
    tokenizer.Char.Set.Delim (",:")
    'tokenizer.Keys.Add("[", brackets, 10)
    'tokenizer.Keys.Add("]", brackets, 11)
    'tokenizer.Keys.Add("{", brackets, 20)
    'tokenizer.Keys.Add("}", brackets, 21)
    tokenizer.Char.Set.SubType ("[", 10)
    tokenizer.Char.Set.SubType ("]", 11)
    tokenizer.Char.Set.SubType ("{", 20)
    tokenizer.Char.Set.SubType ("}", 21)
    
    
    
    
    tokenizer.Scan(buffer)
    
    
    Long i
    For i = 1 To tokenizer.Tokens.Count
      If tokenizer.Token(i).MainType = %TOKENIZER_DELIMITER Then
        If tokenizer.Token(i).SubType Then 
          PrintL "Data :" & tokenizer.Token(i).Data
          PrintL "MainType:" & tokenizer.Token(i).MainType & " (" & tokenizer.Token(i).MainType.ToString & ")"
          PrintL "SubType :" & tokenizer.Token(i).SubType
          PrintL
        End If
      End If
    Next
    
    
    WaitKey
    
    Attached Files Attached Files
    Last edited by ErosOlmi; 17-12-2015 at 22:15.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  3. #3
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    I had another idea but with some compromise.

    You can force []{} to be Alphabetic and not delimiters.
    In this way you can use Keys.

    The compromise is that consecutive Alphabetic chars are considered a single token until a delimiter or a space is encountered.
    So [{ or }] (or any possible combinations) are considered a single token and you need to define all the possible keys.

    Ciao
    Eros


    #MINVERSION 1.9.16.8
    
    
    Uses "Console", "Tokenizer"
    
    
    String buffer = "[{""a"":1, ""b"":2},{""c"":3, ""d"":4}]"
    
    
    Dim tokenizer As CTOKENIZER
    tokenizer = New CTOKENIZER()
    
    
    tokenizer.Char.Set.Delim (",:")
    tokenizer.Char.Set.Alpha ("[]{}")  '---<<< Define as alphabetic
    
    
    Long brackets = tokenizer.NewMainType("brackets")
    tokenizer.Keys.Add("[", brackets, 10)
    tokenizer.Keys.Add("]", brackets, 11)
    tokenizer.Keys.Add("{", brackets, 20)
    tokenizer.Keys.Add("}", brackets, 21)
    tokenizer.Keys.Add("[{", brackets, 1020)
    tokenizer.Keys.Add("}]", brackets, 2111)
    
    
    tokenizer.Scan(buffer)
    
    
    Long i
    For i = 1 To tokenizer.Tokens.Count
      If tokenizer.Token(i).MainType = brackets Then
        PrintL "Data :" & tokenizer.Token(i).Data
        PrintL "MainType:" & tokenizer.Token(i).MainType & " (" & tokenizer.Token(i).MainType.ToString & ")"
        PrintL "SubType :" & tokenizer.Token(i).SubType
        PrintL
      End If
    Next
    
    
    WaitKey
    
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  4. #4
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10
    Attached another version of thinBasic_Tokenizer,dll version.

    I've added Alpha_Single possibility in order to specify some characters that are considered alphabetic but must be taken singularly:
    <tokenizer>.Char.Set.Alpha_Single ("[]{}")
    
    Let me know if it works in all cases.
    Ciao
    Eros

    Example:
    #MINVERSION 1.9.16.8
    
    
    Uses "Console", "Tokenizer"
    
    
    String buffer = "a[{""a"":1, ""b"":2},{""c"":3, ""d"":4}]"
    
    
    Dim tokenizer As CTOKENIZER
    tokenizer = New CTOKENIZER()
    
    
    tokenizer.Char.Set.Delim (",:")
    tokenizer.Char.Set.Alpha_Single ("[]{}")
    
    
    Long brackets = tokenizer.NewMainType("brackets")
    tokenizer.Keys.Add("[", brackets, 10)
    tokenizer.Keys.Add("]", brackets, 11)
    tokenizer.Keys.Add("{", brackets, 20)
    tokenizer.Keys.Add("}", brackets, 21)
    'tokenizer.Keys.Add("[{", brackets, 1020)
    'tokenizer.Keys.Add("}]", brackets, 2111)
    
    
    tokenizer.Scan(buffer)
    
    
    Long i
    For i = 1 To tokenizer.Tokens.Count
      If tokenizer.Token(i).MainType = brackets Then
        PrintL "Data :" & tokenizer.Token(i).Data
        PrintL "MainType:" & tokenizer.Token(i).MainType & " (" & tokenizer.Token(i).MainType.ToString & ")"
        PrintL "SubType :" & tokenizer.Token(i).SubType
        PrintL
      End If
    Next
    
    
    WaitKey
    
    Attached Files Attached Files
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  5. #5
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732
    You are saint, thank you Eros

    So far, so good. Will let you know, if I find anything


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  6. #6
    thinBasic MVPs
    Join Date
    Oct 2012
    Location
    Germany
    Age
    54
    Posts
    1,525
    Rep Power
    170
    More than 700 of the 3000 pages in help look as this

    Navigation: ThinBASIC Modules > Tokenizer > Tokenizer Module Classes > cTokenizer >

    <cTokenizer>.Token




    Enter topic text here.





































    Approximately 300 - more or less - pages show something as



    For more infomation about xyz see

    MSDN: http://msdn.microsoft.com/en-us/libr...24(VS.85).aspx









    Not to mention that these links are not leading to the desired information.

    On my harddrive there are more than 1800 !!! Scripts that i wrote myself in the past 3 years.
    probably 60 of them are working. Thats 0.3%

    75% of the scripts have bugs that actually are none - all if/endif select case/end select function/ end function do's and loops are checked, all parenthesis' count is correct and was double-checked using 2 different programs parse them char by char.

    The interpreter was in much better shape in the years 2014 to 2016 - loop-for-while-repeat-nesting was working correct until
    an unbelieveable depth - today a script with 2 or 3 nested loops is barely getting to the point where the innermost of 5 loops should start. But


    The most often occuring reason is a default-value on optional parameters. Since more than 2 1/2 years this is broken - workarounds using Function_CParam are not reliable

    - when variants are involved Function_CParam fails.

    Also when numerals with a value of 0 or strings with the content of "" are passed - the count is incorrect.

    If a script contains large parts of comments or multiline-strings, thinAir complains about many wrong bugs that are none.

    For simple tasks as counting matching parenthesis to eliminate these as very first major-sources of bugs - especially parsing-bugs because the interpreter understands something is wrong but its not checking the correct reasons - and there is no function provided
    but that is the second most reason why scripts do not run and wrong errors are diagnosed as missing if for EndIf or Invalid Type declaration inside of Sub - even the script does not contain a single sub.

    A line that starts with IF as
    If All(True, True, True)
    
    Else
    
    EndIf
    
    is a real killer. The missing "Then" is not detected - the script gets started ... the consequences... try it 3, 4 or 5 times...

    On other places it detects "End Function" but expected was something else - but all there is are commented lines - not even
    keyword END anywhere in the text.

    the third-most unfound bug that brings thinAir to crash is a faulty error-report where the scriptname.tbasic.lastruntimeError.ini has a length of 300 to 400 kB. Other Editors also crash when trying to load this but i just let it check the filesize and erase it if more than 5 kB - in thinair that mechanic is hidden and after a crash its impossible to interfere before thinair kills itself.

    Also that graph-thing - its the most useless feature to add at all - the cost of performance does not pay off the interesting look without a purpose. It tells nothing but how a script can be slowed down.


    And now tokenizer, i assigned the keyword STOP not case-sensitive as Member of custom group "TERMINATE" with
    a user-defined ID of -1.
    Maintype.ToString should return "TERMINATE" but what does it do?
    it becomes an ID of 12 and returns the name of an equate! How does that get in there?

    i do not understand this language any more
    Ready...>stop
    
    
    number of tokens : 1
    Data :stop
    MainType:12 (%TOKENIZER_STRING)
    SubType :0
    Function to call : Action_%TOKENIZER_STRING
    Ready...>
    
    Last edited by ReneMiner; 19-06-2022 at 21:54.
    I think there are missing some Forum-sections as beta-testing and support

  7. #7
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732
    Hi Rene,

    I am sorry the docs are not complete - we are working on it per partes once our jobs allow us, the volume is huge.

    I can confirm the cTokenizer docs need improvement. I will consider it as next step once I finish documenting cAppLog interface.


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

Similar Threads

  1. Request for better parsing of IF statements
    By Robert Hodge in forum Suggestions/Ideas discussions
    Replies: 6
    Last Post: 17-07-2013, 22:27
  2. Parsing CSV data file
    By ErosOlmi in forum General purpose scripts
    Replies: 1
    Last Post: 22-03-2007, 05:35

Members who have read this thread: 2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •