Results 1 to 5 of 5

Thread: regular expressions usage

  1. #1

    regular expressions usage

    i want to remind the users about using regular expressions in searching for patterns, suppose you want to search pi for your birthday year 1902 then zero or more of any digits then your wife birthday year 1924 without overlapping patterns, we will use the example as a template: VBRegExp_Test_MatchesAndCollections.tbasic in the C:\thinBasic\SampleScripts\VBRegExp
    the pattern "1902.*?1924" and the text is the attached pi.txt (beware it is a continous digits up to 2 millions digits without new lines so your notepad or wordpad may hang in windows xp, i am using freeware notepad++ from http://notepad-plus-plus.org/release/5.9 it can display such file)
    the search result will be saved to a file, since the console can't display the possible big text files.
    if the string to search is 34190242819244412441902234192456 then applying the regex 1902.*?1924 will result in patterns:
    19024281924
    19022341924
    the meaning of .*? in 1902.*?1924 : . any char or digit, * zero or more of the previous (.) , and we put ? to suppress the greedy behaviour of the engine from searching to the widest pattern possible to searc for the smallest patterns.
    attached the same VBRegExp_Test_MatchesAndCollections.tbasic modified slightly and the pi.txt, i have attached pi.txt to experiment more with huge text but you can use any text and any regex.
      '---The following code illustrates how to obtain a SubMatches collection from a regular 
      '---expression search and how to access its individual members.
      Uses "VBREGEXP", "file"
     
      dim lpRegExp  as dword
      dim lpMatches as dword
      dim lpMatch   as dword
      Dim strValue, sPi  As String
     
      '---Allocate a new regular expression instance
      lpRegExp = VBREGEXP_New
      sPi = FILE_Load(APP_SourcePath + "pi.txt")
      '---Check if it was possible to allocate and if not stop the script
      if isfalse lpRegExp then
        MSGBOX 0, "Unable to create an instance of the RegExp object." & $crlf & "Script terminated"
        stop
      end if
     
      '---Set pattern
      VBRegExp_SetPattern lpRegExp, "1902.*?1924"
      '---Set case insensitivity
      VBREGEXP_SetIgnoreCase lpRegExp, -1
      '---Set global applicability
      VBRegExp_SetGlobal lpRegExp, -1
      '---Execute search
      lpMatches = VBRegExp_Execute(lpRegExp, sPi)
      IF ISFALSE lpMatches THEN
        MSGBOX 0, "1. No match found"
      else
     
        dim nCount as long value VBMatchCollection_GetCount(lpMatches)
        IF nCount = 0 THEN
          MSGBOX 0, "2. No match found"
        else
          '---Iterate the Matches collection
          dim I as long
          strValue += "Total matches found:  " & nCount & $CRLF & string$(50, "-") & $crlf
          FOR i = 1 TO nCount
            lpMatch = VBMatchCollection_GetItem(lpMatches, i)
            IF ISFALSE lpMatch THEN EXIT FOR
     
            strValue += "Match number " & i & " found at position: " & VBMatch_GetFirstIndex(lpMatch) & " length: " & VBMatch_Getlength(lpMatch) & $CRLF
            strValue += "Value is: " & VBMatch_GetValue(lpMatch) & $CRLF
            strValue += "--------------" & $CRLF
     
            VBREGEXP_Release lpMatch
     
          NEXT
     
          'MSGBOX 0, strValue
          'PrintL strValue 
          FILE_Save(APP_SourcePath +"results.txt",strValue)
        END IF
     
      END IF
     
      IF istrue lpMatches  THEN VBREGEXP_Release(lpMatches)
      IF istrue lpRegExp   THEN VBREGEXP_Release(lpRegExp)
      MsgBox 0,"results saved to a results.txt"
    
    Attached Files Attached Files
    Last edited by zak; 22-04-2011 at 14:04.

  2. #2
    Junior Member
    Join Date
    Jan 2009
    Location
    UK
    Posts
    15
    Rep Power
    17

    Lightbulb regular expressions usage

    Quote Originally Posted by zak View Post
    i want to remind the users about using regular expressions in searching for patterns...

    ...the pattern "1902.*?1924"...

    ...the meaning of .*? in 1902.*?1924 : . any char or digit, * zero or more of the previous (.) , and we put ? to suppress the greedy behaviour of the engine from searching to the widest pattern possible to searc for the smallest patterns.
    Hi Zak,

    Thank you very much for the very useful example you gave.
    It has re-awakened my interest in using RegEx for web-scraping (obtaining useful data from webpages).

    I have re-written your example in the form of a simplified function, so that I can call it repeatedly (each time, with different start and ending strings, which frame the data of interest in the webpage) from the main section of the program.

    However, I have realised that the .*? sequence will not allow me to match any character which is found in HTML code. For example, I think it won't match \ or " or * or & or < several other symbols.

    I have tried to read several RegEx textbooks on the web, but I find them very hard to understand. Could you possibly advise a replacement for the .*? sequence which will get matches for any symbol found in HTML, please?

    regards
    JohnP

  3. #3
    Hi JohnP
    the previously attached example is not from me , it is from Eros who included it in the C:\thinBasic\SampleScripts\VBRegExp .
    regarding the special characters wich can't match, there is an escape character \ which when we insert it before a special character the character then will match. as an example:
    replace the text in pi.txt with the following:
    eeeeeeexhttp://www.google.commmrtjjhttp://www.google.comewer

    and in the code sample replace the correspendent line with:
    VBRegExp_SetPattern lpRegExp, "http\:\/\/www\.google\.com"
    
    ie we want to use the pattern "http\:\/\/www\.google\.com"
    note that / : . are preceded by \ so to be considered.
    when we run the code the result should be:
    Total matches found: 2
    --------------------------------------------------
    Match number 1 found at position: 9 length: 21
    Value is: http://www.google.com
    --------------
    Match number 2 found at position: 36 length: 21
    Value is: http://www.google.com
    --------------

    yes regular expressions can be hard first, i know little about it, but once learned it is a very powerfull tool.
    the best introductory book is:
    Sams Teach Yourself Regular Expressions in 10 Minutes
    By Ben Forta
    note that most perl regular expressions can work with thinbasic which are using VBRegExp engine, with a few restrictions such as look backward: ie if i want "zxc" but preceded or not by "y" character. so the tutorials available for perl are mostly working here.
    also there is a freeware program for testing regexes , it is Expresso from:
    http://www.ultrapico.com/Expresso.htm

    PS: there is a long list of Pattern meanings available in the thinbasic help file, just put the cursor over VBREGEXP_New in line 13 in the code and press F1 then go back twice, in that help page there is the Pattern meanings such as \d means a digit only \D non digit ... etc
    Last edited by zak; 23-04-2011 at 11:03.

  4. #4
    Junior Member
    Join Date
    Jan 2009
    Location
    UK
    Posts
    15
    Rep Power
    17

    regular expressions usage

    Zak,

    Thank you for a really full response to my request for help.

    Your explanation has revealed one of my misunderstandings and led me to a workable solution.

    I now have a couple of functions which work well; one inserts escape characters into my selected Start and End marker strings (which bracket the wanted data), where needed, and the other uses RegEx, with those Start & End marker strings, to extract the wanted data from the webpage. The data is stored in a 2-D string array and can easily be displayed complete, or selected items extracted as required. As you say, it works very quickly.

    It's a really useful start to re-working some of my existing programs.

    Thanks again for your time,

    JohnP

  5. #5
    testing the primality of numbers is the last think i expect possible using regular expressions, i have found here
    http://www.noulakaz.net/weblog/2007/...prime-numbers/
    and here
    http://montreal.pm.org/tech/neil_kandalgaonkar.shtml
    how to do that , it is described in the first link, with a program in ruby. attached is the thinbasic version. the program convert the number to string of '1' from the number such as 5 converted to "11111", and the regex to run is ^1?$|^(11+?)\1+$
    indeed i am still don't understand the regex and will try to re learn the subject.
    when you try number like 123479 it will last several seconds so i guess it will last for ever for bigger numbers.
    put your number in yourNumber variable.
    i have used as a template the same thinbasic example found in C:\thinBasic\SampleScripts\VBRegExp

      '---The following code illustrates how to obtain a SubMatches collection from a regular 
      '---expression search and how to access its individual members.
      Uses "VBREGEXP", "console"
     
      dim lpRegExp  as dword
      dim lpMatches as dword
      Dim lpMatch   As DWord
      Dim yourNumber As Long
      Dim strValue, sNumber  As String
     
      '---Allocate a new regular expression instance
      lpRegExp = VBREGEXP_New
      yourNumber = 12347
      sNumber  = String$(yourNumber, "1")
      '---Check if it was possible to allocate and if not stop the script
      if isfalse lpRegExp then
        MSGBOX 0, "Unable to create an instance of the RegExp object." & $crlf & "Script terminated"
        stop
      end if
     
      '---Set pattern
      VBRegExp_SetPattern lpRegExp, "^1?$|^(11+?)\1+$"
      '---Set case insensitivity
      VBREGEXP_SetIgnoreCase lpRegExp, -1
      '---Set global applicability
      VBRegExp_SetGlobal lpRegExp, -1
      '---Execute search
      lpMatches = VBRegExp_Execute(lpRegExp, sNumber)
      IF ISFALSE lpMatches THEN
        MsgBox 0, "1. No match found"
      else
     
        dim nCount as long value VBMatchCollection_GetCount(lpMatches)
        IF nCount = 0 THEN
          PrintL yourNumber, "is prime --   press any key to continue "
        else
          '---Iterate the Matches collection
          dim I as long
          strValue += "Total matches found:  " & nCount & $CRLF & string$(50, "-") & $crlf
          FOR i = 1 TO nCount
            lpMatch = VBMatchCollection_GetItem(lpMatches, i)
            IF ISFALSE lpMatch THEN EXIT FOR
            strValue = "is not prime   --   press any key to continue "
     
            VBREGEXP_Release lpMatch
     
          NEXT
     
          PrintL yourNumber, strValue 
     
        END IF
     
      END IF
     
      IF istrue lpMatches  THEN VBREGEXP_Release(lpMatches)
      IF istrue lpRegExp   THEN VBREGEXP_Release(lpRegExp)
     
      WaitKey
    
    Attached Files Attached Files
    Last edited by zak; 25-04-2011 at 04:54.

Similar Threads

  1. Constant Expressions
    By Charles Pegge in forum O2h Compiler
    Replies: 1
    Last Post: 04-05-2009, 19:32
  2. Replies: 5
    Last Post: 10-09-2008, 16:38

Members who have read this thread: 0

There are no members to list at the moment.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •