Results 1 to 10 of 10

Thread: Replacing ARRAY SORT

  1. #1

    Replacing ARRAY SORT

    powerbasics ARRAY SORT is a convenient way sort arrays of any data type. It also allows you to tag a second array of any kind to be sorted. Under the hood there is a lot going on and it took many years to implement.

    When converting PB code to another language, this functionality needs to be replaced. This thread accomplishes that task by using a string sorting algorithm posted by Stan Durham for string arrays and Quick sort for integer arrays. Both are as fast or faster than the PB implementation and the code is in a high level language for easy porting.

    Under the hood, PB selects sorting algorithms for the data type. Since the data type of an array is held in the array descriptor table (and only available using the keyword ARRAYATTR) a seperate function must be used for each datatype. Most real world implementations will be complete with three, STRING, LONG, QUAD, as most everything can be squeezed into these (floats can be multiplied by 1000 for example and fed through sorting as integers)

    Strings are by far the tricky one. The challenge is to not only implement a fast method, but give it traction meaning tag an array, allow case to be ignored and sort ascend/descend with a call that is not overly long and combersome to use. For strings the criteria are:

    Handle any length String including Zero length
    ASCENDING/DESCENDING order
    TAG an ARRAY of pointers
    Select FIRST & LAST elements TO SORT
    Equivalent speed to PBs SORT ARRAY
    Zero or 1+ based Arrays
    A generic CALL that is user friendly
    High level language compatible (no ASM or PB keywords)

    The Algorithm sorts the handles the PB Dynamic BStr Strings.
    A Dynamic String array in powerbasic is an array of pointers to BStr Strings.
    pArr = VARPTR(b(1)) will return the address of the first element of this array
    pStr = @pArr will return a pointer to the actual BStr STRING
    @pStr will return the STRING contents.
    This is equivelent to the powerbasic reserved word pStr = STRPTR(b(1))

    But these two operations are not equivelent.
    When a zero length string is assigned with b(x) = "" powerbasic does not request memory
    for a non existant string, so the pointer is set to zero (NULL Pointer).
    To compensate for this, STRPTR returns a pointer to a real memory location (one address is used for all NULL strings) to avoid a GPF when dereferencing a NULL pointer. Unfortunatly this is not the case when when you do the two step de-referencing.

    As this algorithm derives part of its speed by sorting the pointers to the strings and swapping them
    (not swapping the actual strings themselves) it uses the two step deferencing to find the character
    at each position in each string. When a zero length string is encountered a GPF would occur.
    To make this implementation work, a conditional statement must check each pointer.

    In testing with 1300 first names, it performed a little slower 5400 Clks than PBs 3400.
    This is because there are many slots of two names that are swapped unecessarily. This
    is a product of the method the algorithm uses
    Additional testing would be required to determine if additional conditional statements
    would make a significant difference to this.

    But when 2000, random strings up to 32 Chars are used it performed 2x faster 1700 Clks than PBs 3500

    Undoubtably PB uses ASM which would speed this algorithm up a little, but the purpose is to use
    a high level language to replace the PB ARRAY SORT functionality and thus convert PB code to
    another language (with friendly support).


    First = First element in the array = LBOUND()
    Last = Last element in the array = UBOUND()

    In this example code, a UDT is sorted as the TAG array. Notice the Tagged pointer array
    is declared AS MyTYPE PTR. This is tricky, because PB does not allow an array of pointers
    to be passed BYREF for some reason. To get around this, we use the same trick of passing
    a pointer to the array table (an array of DWORDS).

    Then we use DIM AT to create and array in the SUB that can be used to access the
    pointers as though we had sent them BYREF.

    This was found to be 2x faster when sorting 1m LONG values with QuickSort rather than using
    pointer offsets @pArr[i].






  2. #2

    Re: Replacing ARRAY SORT

    Quick Sort implementation:

    Sorting an array of 100k LONGs, PB avg time is 1650 Clks/String, QuickSort is 1450 Clks/String


    A LONG/QUAD Array sorting algorithm designed for:
    Handle all values including Zero
    ASCENDING/DESCENDING order
    TAG an ARRAY of pointers
    Select FIRST & LAST elements TO SORT
    Equivelent speed to PBs SORT ARRAY
    Zero or 1+ based Arrays
    A generic CALL that is user friendly
    High level language compatible (no ASM or PB keywords)

    Since PB does not allow passing a pointer array BYREF as an argument,
    a pointer to the array is passed and then DIM AT is used to create an
    array of the pointers within the Function.

    This function was found to be 2x faster (when sorting 1m elements) using
    an array of pointers rather than pointer offsets ie @pArr[i].


    ' Sorting an array of 100k LONGs, PB avg time is 1650 Clks/String, QuickSort is 1450 Clks/String
           
    ' A littler faster WITHOUT the benfit of ASM.
    
    
    ' http://www.cse.iitk.ac.in/users/dsrk...quickSort.html
    ' http://en.wikipedia.org/wiki/Quick_sort
            
    #If 0
    
     A LONG/QUAD Array sorting algorithm designed for: 
       Handle all values including Zero
       ASCENDING/DESCENDING order
       TAG an ARRAY of pointers
       Select FIRST & LAST elements TO SORT 
       Equivelent speed to PBs SORT ARRAY 
       Zero or 1+ based Arrays 
       A generic CALL that is user friendly 
       High level language compatible (no ASM or PB keywords)
        
    Since PB does not allow passing a pointer array BYREF as an argument,
    a pointer to the array is passed and then DIM AT is used to create an
    array of the pointers within the Function. 
    
    This function was found to be 2x faster (when sorting 1m elements) using 
    an array of pointers rather than pointer offsets ie @pArr[i].
    
    
    #EndIf
      
    
    ' Mike Trader 2009
    
    #COMPILE EXE "QwikSort.exe"
    #DIM ALL 
    
    
    #INCLUDE "WIN32API.INC"
    
    GLOBAL hDbg, TotSwaps AS LONG
               
    
    TYPE MyTYPE 
      Rand AS LONG
      Num  AS LONG
      Beg  AS LONG 
      sName AS STRING * 32
    END TYPE 
        
    GLOBAL MaxIx AS LONG
    
    
    '¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
    SUB time_stamp_count(tick AS QUAD) ' CPU Clock count Charles E V Pegge
    
     '              ' approx because it is not a serialised instruction
     '              ' it may execute before or after other instructions in the pipeline.
     ! mov ebx,tick       ' var address where count is to be stored.
     ! db &h0f,&h31       ' RDTSC read time-stamp counter into edx:eax hi lo.
     ! mov [ebx],eax       ' save low order 4 bytes.
     ! mov [ebx+4],edx      ' save high order 4 bytes. 
    
    END SUB
              
      
    
    '¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤' 
    SUB QSortQUAD( BYVAL pArr AS DWORD, BYVAL nElem AS LONG, OPT BYVAL pTag AS DWORD, BYVAL DescendFlag AS LONG )
                              
     LOCAL i, j, StkIx, First, Last AS LONG
     LOCAL Temp AS QUAD '          
                
    
      i = 128 * (1+(Last\1000000)) ' 128 per 1m elements should be more than double what's needed
      DIM QStack(i) AS LONG ' 
         
      First = 1 
      Last = First + nElem
    
      IF pTag = 0 THEN ' Create a Dummy array        
       DIM Tg(First TO Last) AS LONG 
       pTag = VARPTR(Tg(0)) ' ALL Zero Based Arrays from here on 
      END IF 
      DIM Arr(First TO Last) AS LONG AT pArr 
      DIM Tag(First TO Last) AS DWORD AT pTag 
    
      DO
       DO
        Temp = Arr((Last+First)\2) ' seek midpoint
        i = First
        j = Last ' 
          
        IF DescendFlag = 0 THEN
         DO ' 
          WHILE Arr(i) < Temp
           INCR i        
          WEND ' 
          WHILE Arr(j) > Temp    
           DECR j       
          WEND ' 
          IF i > j THEN EXIT DO
          IF i < j THEN SWAP Arr(i), Arr(j) : SWAP Tag(i), Tag(j) ' : INCR TotSwaps 
           INCR i        
           DECR j        
         LOOP WHILE i <= j 
    
        ELSE ' Descending
         DO 
          WHILE Arr(i) > Temp
           INCR i        
          WEND          
          WHILE Arr(j) < Temp  
           DECR j        
          WEND
          IF i > j THEN EXIT DO
          IF i < j THEN SWAP Arr(i), Arr(j) : SWAP Tag(i), Tag(j) ' : INCR TotSwaps 
           INCR i       
           DECR j       
         LOOP WHILE i <= j  
        END IF ' 
    
        IF i < Last THEN       ' Done
         QStack(StkIx)   = i    ' Push i
         QStack(StkIx + 1) = Last  ' Push Last
         StkIx = StkIx + 2      '
        END IF               
                          
        Last = j              
       LOOP WHILE First < Last       
                          
       IF StkIx = 0 THEN EXIT DO     
       StkIx = StkIx - 2        ' 
       First = QStack(StkIx)      ' Pop First
       Last = QStack(StkIx + 1)    ' Pop Last
      LOOP               
          
    END SUB
    
    
      
    
    '&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;' 
    SUB QSortLONG( BYVAL pArr AS DWORD, BYVAL nElem AS LONG, OPT BYVAL pTag AS DWORD, BYVAL DescendFlag AS LONG )
                              
     LOCAL i, j, StkIx, First, Last AS LONG
     LOCAL Temp AS LONG '          
             
    
      i = 128 * (1+(Last\1000000)) ' 128 per 1m elements should be more than double what's needed
      DIM QStack(i) AS LONG ' 
         
      First = 1 
      Last = First + nElem
    
      IF pTag = 0 THEN ' Create a Dummy array        
       DIM Tg(First TO Last) AS LONG 
       pTag = VARPTR(Tg(0)) ' ALL Zero Based Arrays from here on 
      END IF 
      DIM Arr(First TO Last) AS LONG AT pArr 
      DIM Tag(First TO Last) AS DWORD AT pTag 
    
      DO
       DO
        Temp = Arr((Last+First)\2) ' seek midpoint
        i = First
        j = Last ' 
          
        IF DescendFlag = 0 THEN
         DO ' PRINT #hDbg, "A: Temp=" + STR$(Temp)  
          WHILE Arr(i) < Temp
           INCR i        
          WEND ' 
          WHILE Arr(j) > Temp    
           DECR j       
          WEND ' PRINT #hDbg, "B: j=" + STR$(j)
          IF i > j THEN EXIT DO
          IF i < j THEN SWAP Arr(i), Arr(j) : SWAP Tag(i), Tag(j) ' : INCR TotSwaps 
           INCR i        
           DECR j        
         LOOP WHILE i <= j 
    
        ELSE ' Descending
         DO 
          WHILE Arr(i) > Temp
           INCR i        
          WEND          
          WHILE Arr(j) < Temp  
           DECR j        
          WEND
          IF i > j THEN EXIT DO
          IF i < j THEN SWAP Arr(i), Arr(j) : SWAP Tag(i), Tag(j) ' : INCR TotSwaps 
           INCR i       
           DECR j       
         LOOP WHILE i <= j  
        END IF ' 
    
        IF i < Last THEN       ' Done
         QStack(StkIx)   = i    ' Push i
         QStack(StkIx + 1) = Last  ' Push Last
         StkIx = StkIx + 2      ' IF StkIx > MaxIx THEN MaxIx = StkIx : PRINT #hDbg, "A MaxIx=" + STR$(MaxIx) 
        END IF               
                          
        Last = j              
       LOOP WHILE First < Last       
                          
       IF StkIx = 0 THEN EXIT DO     
       StkIx = StkIx - 2        ' IF StkIx > MaxIx THEN MaxIx = StkIx : PRINT #hDbg, "B MaxIx=" + STR$(MaxIx) 
       First = QStack(StkIx)      ' Pop First
       Last = QStack(StkIx + 1)    ' Pop Last
      LOOP               
          
    END SUB
    
    
    
         
    
    '&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;' 
    FUNCTION PBMAIN() 
         
    hDbg = FREEFILE : OPEN "QwikSort_Debug.txt" FOR OUTPUT LOCK SHARED AS hDbg 
    PRINT #hDbg, "Time=" + TIME$ + ", Date=" + DATE$
    
    
     LOCAL i, j, RetVal, First, Last, TotElem, Count, Descending AS LONG
     LOCAL s AS STRING 
    
     LOCAL pStr AS STRING PTR 
    
     LOCAL sTxt() AS STRING 
      
     LOCAL cBeg, cEnd AS QUAD ' for time stamp, measuring cpu Clks
    
     LOCAL LongVal AS LONG    
    
    
      First = 1 
      Last = 99000 
    
      DIM aI(Last)    AS LONG ' 
      DIM bI(Last)    AS LONG ' 
      DIM aQ(Last)    AS QUAD ' LONG 
      DIM bQ(Last)    AS QUAD ' LONG 
      DIM DataType(Last) AS MyTYPE 
      DIM pData(Last)  AS MyTYPE PTR ' Cannot pass an array of pointers BYVAL with PB         
       
      TotElem = Last - First + 1 ' 
    PRINT #hDbg, "TotElem=" + STR$(TotElem)
             
           
      '- Fill Arrays
      Count = 0 
      FOR i = First TO Last 
        INCR Count           
        LongVal      = RND(-2147483648, 2147483647) ' LongVal = RND(-254, 255) 
        aI(i)       = LongVal
        bI(i)       = LongVal '
        aQ(i)       = LongVal
        bQ(i)       = LongVal ' 
        DataType(i).Rand = LongVal 
        DataType(i).Num  = i 
        DataType(i).sName = STRING$(3, STR$(LongVal)) 
        pData(i)     = VARPTR(DataType(i)) 
    IF i < 100 THEN PRINT #hDbg, STR$(i) + " " + STR$(@pData(i).Rand) + " " + STR$(@pData(i).Num)
      NEXT i
            
    
      s = s + "TotElem="+STR$(TotElem) + $CRLF + $CRLF 
      time_stamp_count(cBeg) ' measuring cpu Clks. The overhead just for making this call is about 25 clocks
      ARRAY SORT aI(First)' , DESCEND 
    '  ARRAY SORT aQ(First)' , DESCEND
      time_stamp_count(cEnd) ' measuring cpu Clks. The overhead just for making this call is about 25 clocks
      s = s + "PB ARRAY SORT Clks="+STR$( (cEnd-cBeg)\TotElem ) + $CRLF 
          
       
    Descending = 0
      TotSwaps = 0 
      time_stamp_count(cBeg) ' measuring cpu Clks. The overhead just for making this call is about 25 clocks 
    
    '  CALL QSortLONG( VARPTR(bI(First)), Last-First ) ' Simple Call 
      CALL QSortLONG( VARPTR(bI(First)), Last-First, VARPTR(pData(First)), Descending ) ' Descending 
    
      time_stamp_count(cEnd) ' measuring cpu Clks. The overhead just for making this call is about 25 clocks
      s = s + "QSortLONG Clks="+STR$( (cEnd-cBeg)\TotElem ) + $CRLF 
      s = s + "TotSwaps="+STR$(TotSwaps) + ", MaxIx=" + STR$(MaxIx) + $CRLF 
    
         
    '  TotSwaps = 0     
    '  time_stamp_count(cBeg) ' measuring cpu Clks. The overhead just for making this call is about 25 clocks
    '  CALL QSortQUAD( VARPTR(bQ(First)), Last-First, VARPTR(pData(First)), Descending ) ' Descending
    '  time_stamp_count(cEnd) ' measuring cpu Clks. The overhead just for making this call is about 25 clocks
    '  s = s + "QSortQUAD Clks="+STR$( (cEnd-cBeg)\TotElem ) + $CRLF  
    '  s = s + "TotSwaps="+STR$(TotSwaps) + ", MaxIx=" + STR$(MaxIx) + $CRLF 
     
    
    PRINT #hDbg, $CRLF + " New Order" 
      FOR i = First TO Last   
    IF i < 100 THEN PRINT #hDbg, STR$(i)+ " "+STR$(bI(i))+ " "+STR$(@pData(i).Rand)+ " "+STR$(@pData(i).Num)
      NEXT i
    
      
    
    PRINT #hDbg, "Compare sort results" 
      '- Compare sort results
      FOR i = First TO Last
       IF aI(i) <> bI(i) THEN
        MSGBOX "out of order at element" + STR$(i)
        EXIT FOR
       END IF 
      NEXT i  
           
           
    
    'PRINT #hDbg, s
    '  FOR i = First TO Last 
    'PRINT #hDbg, "" 
    'PRINT #hDbg, "PB: " + STR$(aI(i))
    'PRINT #hDbg, "QS: " + TRIM$(@pData(i).sName) 
    'IF i > 100 THEN EXIT FOR
    '  NEXT i
    
    
     MSGBOX s,64,"QSort"
    
    
    END FUNCTION 
    
    '&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;&#164;'
    

  3. #3
    thinBasic MVPs kryton9's Avatar
    Join Date
    Nov 2006
    Location
    Naples, Florida & Duluth, Georgia
    Age
    61
    Posts
    3,869
    Rep Power
    398

    Re: Replacing ARRAY SORT

    That's really great that the PowerBasic programmers are adding to thinBasic. Thanks Mike.
    Acer Notebook: Win 10 Home 64 Bit, Core i7-4702MQ @ 2.2Ghz, 12 GB RAM, nVidia GTX 760M and Intel HD 4600
    Raspberry Pi 3: Raspbian OS use for Home Samba Server and Test HTTP Server

  4. #4

    Re: Replacing ARRAY SORT

    This version, is based on Stan's re-write and is 5x faster than PB's SORT ARRAY

    Sorting an array of 100k strings of random length (0-255 Chars), PB avg time is 9550 Clks/String, Sidewinder is 2040 Clks/String


    ' Mike Trader 2009 (based on work by Stan Durham)
    
    ' This version requires a little more memory (12*nElem bytes)
        
    ' Sorting an array of 100k strings of random length (0-255 Chars), PB avg time is 9550 Clks/String, Sidewinder is 2040 Clks/String
           
    ' Thats 5x faster WITHOUT the benfit of ASM.
    
    ' Can be called with: 
    
    '  CALL SWSort( VARPTR(b(First)), Last-First ) ' Simplist call 
      
    ' or with all the options:
    
    '  CALL SWSort( VARPTR(b(First)), Last-First, VARPTR(pData(First)), CaseIgnore, Descending ) ' Full Call 
           
    'Where:
    ' First = First element in the ARRAY = LBOUND()
    ' Last = Last element in the ARRAY = UBOUND()
    
    #If 0
    
     A LONG/QUAD ARRAY sorting algorithm designed FOR: 
       Handle any length String including Zero length
       ASCENDING/DESCENDING order
       TAG an ARRAY of pointers
       Select FIRST & LAST elements TO SORT 
       Equivelent speed to PBs SORT ARRAY 
       Zero or 1+ based Arrays 
       A generic CALL that is user friendly
       High level language compatible (no ASM or PB keywords)
        
     PB does not allow passing arrays BYREF as an argument,
     The function would be 2x faster using an array of pointers 
     rather than pointer offsets ie @pArr[i].
    
    #EndIf 
    
    #COMPILE EXE "Sidewinder.exe"
    #DIM ALL 
    
    
    #INCLUDE "WIN32API.INC"
    
    GLOBAL hDbg AS LONG
    
    TYPE BucketInfoTYPE ' For Sidewinder
      BukNum(255) AS LONG
      BktBeg(255) AS LONG
      BktEnd(255) AS LONG 
      LastBktNum AS LONG 
      LastBktBeg AS LONG 
    END TYPE
          
    
    TYPE MyTYPE ' An example UDT to Tag Sort
      Rand AS LONG
      Num  AS LONG
      Beg  AS LONG 
      sName AS STRING * 255
    END TYPE 
          
         
    
    'いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい'
    SUB time_stamp_count(tick AS QUAD) ' CPU Clock count Charles E V Pegge
    
     '---------------------------'
     '              ' approx because it is not a serialised instruction
     '              ' it may execute before or after other instructions
     '              ' in the pipeline.
     ! mov ebx,tick       ' var address where count is to be stored.
     ! db &h0f,&h31       ' RDTSC read time-stamp counter into edx:eax hi lo.
     ! mov [ebx],eax       ' save low order 4 bytes.
     ! mov [ebx+4],edx      ' save high order 4 bytes.
     '---------------------------'
    
    END SUB
         
      
        
    'いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい 
    FUNCTION ReadTextFile( sPathnFile AS STRING, sTxt() AS STRING ) AS LONG ' Zero Based Array
        
     LOCAL hFile, TotRec, Count AS LONG              
                
    
      hFile = FREEFILE ' nElements = LOF(hFile) \ SIZEOF (MyUDT)
      OPEN sPathnFile FOR INPUT AS #hFile ' LEN = SIZEOF(UDT(0)) ' path and file in element 2
       IF ERR THEN  
        CLOSE hFile 
        FUNCTION = -ERR
        EXIT FUNCTION
       END IF  
           
       FILESCAN #hFile, RECORDS TO TotRec         
       REDIM sTxt(TotRec) 
    
       Count = 0
       WHILE NOT EOF(hFile)
         INCR Count
         LINE INPUT #hFile, sTxt(Count)
       WEND ' PRINT #hDbg, "TotRec="+STR$(TotRec) + ", Count="+STR$(Count) 
    
      CLOSE hFile 
    
     FUNCTION = Count
             
    END FUNCTION 
    
    
             
    
           
    'いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい'
    SUB SidewinderRecur( BYVAL pArr AS LONG PTR, _ ' Pointer to String array                     
               BYVAL pBuf AS LONG PTR, _ ' Pointer to Long array for string sorting             
               BYVAL pTag AS LONG PTR, _ ' Pointer to Tagged Pointer array                 
               BYVAL pTuf AS LONG PTR, _ ' Pointer to Long array for Tagged Pointer array sorting      
               BYVAL pChr AS LONG PTR, _ ' Pointer to Array of character values               
               BYVAL Pos  AS LONG, _   ' Character position in the string                 
               BYVAL First AS LONG, _   ' First element in the String Array to sort            
               BYVAL Last AS LONG, _   ' Last element in the String Array to sort             
               BYVAL CsIg AS LONG, _   ' Ignore Case Flag                         
               BYVAL Desc AS LONG )   ' Sort Descending Flag                       
    
     LOCAL i, bVal AS LONG 
     LOCAL pByt AS BYTE PTR 
     LOCAL BI AS BucketInfoTYPE ' Faster than DIM each seperatly
          
    
    ' DIM Arr(First TO Last) AS QUAD AT pArr + 4*First ' PB cannot pass an array of pointers BYREF
    ' DIM Buf(First TO Last) AS QUAD AT pBuf + 4*First 
    ' DIM Tag(First TO Last) AS DWORD AT pTag + 4*First 
    ' DIM Tuf(First TO Last) AS QUAD AT pTuf + 4*First 
    ' DIM Chr(First TO Last) AS QUAD AT pChr + 4*First 
                     
      
      INCR Pos 
      BI.LastBktBeg = First
      BI.LastBktNum = 0  
      IF Desc THEN ' Descending
       '- Count occurence of each bVal 
       FOR i = First TO Last 
        pByt   = @pArr[i]  ' Arr(i)
        @pChr[i] = @pByt[Pos] ' PRINT #hDbg, "i=" + STR$(i) + ", bVal=" + STR$(@pChr[i]) 
        IF CsIg AND @pChr[i] > 96 AND @pChr[i] < 123 THEN @pChr[i] = @pChr[i] - 32  
        INCR BI.BukNum(@pChr[i])
       NEXT i 
            
       '- Determine the start pos/ Size of each bucket at the current character Pos
       FOR bVal = 255 TO 0 STEP -1 
        IF BI.BukNum(bVal) THEN
         BI.BktBeg(bVal) = BI.LastBktBeg  + BI.LastBktNum
         BI.BktEnd(bVal) = BI.BktBeg(bVal) - 1
         BI.LastBktBeg  = BI.BktBeg(bVal)
         BI.LastBktNum  = BI.BukNum(bVal)
        END IF
       NEXT bVal 
    
      ELSE ' Ascending
       FOR i = Last TO First STEP -1 
        pByt   = @pArr[i]  
        @pChr[i] = @pByt[Pos] 
        IF CsIg AND @pChr[i] > 96 AND @pChr[i] < 123 THEN @pChr[i] = @pChr[i] - 32  
        INCR BI.BukNum(@pChr[i])
       NEXT i 
    
       FOR bVal = 0 TO 255 
        IF BI.BukNum(bVal) THEN
         BI.BktBeg(bVal) = BI.LastBktBeg  + BI.LastBktNum
         BI.BktEnd(bVal) = BI.BktBeg(bVal) - 1
         BI.LastBktBeg  = BI.BktBeg(bVal)
         BI.LastBktNum  = BI.BukNum(bVal)
        END IF
       NEXT bVal 
      END IF
    
    
      '- Place each string in proper bucket
      FOR i = First TO Last 
       INCR BI.BktEnd(@pChr[i]) ' Chr(i)
       @pBuf[BI.BktEnd(@pChr[i])] = @pArr[i] ' Arr(i) 
       @pTuf[BI.BktEnd(@pChr[i])] = @pTag[i] ' Tag(i)
      NEXT i
    
    
      '- Copy sorted order to original array
      FOR i = First TO Last
       @pArr[i] = @pBuf[i] ' Buf(i)
       @pTag[i] = @pTuf[i] ' Tuf(i)
      NEXT i ' PRINT #hDbg, "Round"+STR$(Pos+1)+" Completed --------" + $CRLF
    
          
      '- Recursively sort each slot of elements for each Pos
      IF Desc THEN ' Descending
       FOR bVal = 1 TO 255
        IF BI.BukNum(bVal) > 1 THEN CALL SidewinderRecur( pArr, pBuf, pTag, pTuf, pChr, Pos, BI.BktBeg(bVal), BI.BktEnd(bVal), CsIg, Desc )
       NEXT bVal
      ELSE ' Ascending 
       FOR bVal = 255 TO 1 STEP - 1 
        IF BI.BukNum(bVal) > 1 THEN CALL SidewinderRecur( pArr, pBuf, pTag, pTuf, pChr, Pos, BI.BktBeg(bVal), BI.BktEnd(bVal), CsIg, Desc )
       NEXT bVal 
      END IF
    
    END SUB
    
    
    
    'いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい'
    SUB SWSort( BYVAL pArr AS DWORD PTR, BYVAL nElem AS LONG, _ 
          OPT BYVAL pTag AS DWORD PTR, BYVAL CsIg AS LONG, BYVAL Desc AS LONG ) 
    
     LOCAL i, First AS LONG 
              
    
      IF pTag = 0 THEN ' Create a Dummy array        
       DIM Tag(nElem) AS LONG 
       pTag = VARPTR(Tag(0)) ' ALL Zero Based Arrays from here on 
      END IF
    
    '  DIM Tag(nElem) AS DWORD AT pTag ' PB cannot pass an array of pointers BYREF
    '  DIM Arr(nElem) AS DWORD AT pArr 
           
      IF Desc THEN 
       FOR i = nElem TO 0 STEP -1
        IF @pArr[i] = 0 THEN ' Swap any NULL strings to far left 
         IF i < nElem THEN SWAP @pArr[i], @pArr[nElem] : SWAP @pTag[i], @pTag[nElem]
         DECR nElem
        END IF
       NEXT i ' PRINT #hDbg, "First=" + STR$(First) 
    
      ELSE 
       FOR i = 0 TO nElem
        IF @pArr[i] = 0 THEN ' Swap any NULL strings to far left 
         IF i > First THEN SWAP @pArr[i], @pArr[First] : SWAP @pTag[i], @pTag[First]
         INCR First
        END IF
       NEXT i ' PRINT #hDbg, "First=" + STR$(First)
    
      END IF
    
      DIM Chr(nElem) AS LONG ' Holds a Character from each string
      DIM Buf(nElem) AS LONG ' Buffer for sorting Stings
      DIM Tuf(nElem) AS LONG ' Buffer for sorting Tagged Pointers
    
      CALL SidewinderRecur( pArr, VARPTR(Buf(0)), pTag, VARPTR(Tuf(0)), VARPTR(Chr(0)), -1, First, nElem, CsIg, Desc )
    
    END SUB
       
    
    'いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい'
    FUNCTION PBMAIN() 
    
     LOCAL i, j, RetVal, CaseIgnore, Descending AS LONG 
     LOCAL First, Last, TotElem, Count AS LONG 
     LOCAL shortestString, longestString, lowestChar, highestChar AS LONG  
     LOCAL cBeg, cEnd AS QUAD ' for time stamp, measuring cpu Clks 
     LOCAL s AS STRING
    
     LOCAL sTxt() AS STRING 
     LOCAL a() AS STRING
     LOCAL b() AS STRING 
           
               
    
      hDbg = FREEFILE : OPEN "Sidewinder_Debug.txt" FOR OUTPUT LOCK SHARED AS hDbg 
      PRINT #hDbg, "Time=" + TIME$ + ", Date=" + DATE$
      '=========   
       
      First = 1 
      DIM sTxt(First)  
      TotElem = ReadTextFile( "UCaseNames.csv", sTxt() ) ' base 1
      PRINT #hDbg, "Read "+STR$(TotElem)+" Names" 
    
    
      Last = TotElem 
      Last = 99000
           
    
      DIM a(Last)    AS STRING ' 
      DIM b(Last)    AS STRING ' 
      DIM DataType(Last) AS MyTYPE 
      DIM pData(Last)  AS MyTYPE PTR ' Cannot pass an array of pointers BYVAL with PB  
    
    
    
      '- Xfer random length strings 
    '  FOR i = First TO Last 
    'IF i = 4 THEN sTxt(i) = "" ' Test NULL string handling
    'IF i = 9 THEN sTxt(i) = ""
    '   a(i)       = sTxt(i)
    '   b(i)       = sTxt(i) ' 
    '   DataType(i).sName = sTxt(i) 
    '   DataType(i).Rand = RND(-2147483648, 2147483647)
    '   DataType(i).Num  = i 
    '   pData(i)     = VARPTR(DataType(i)) 
    'IF i < 100 THEN PRINT #hDbg, STR$(i) + STR$(@pData(i).Num) + " " + STR$(STRPTR(b(i))) + " >" + TRIM$(@pData(i).sName) + "< "
    '  NEXT ' 
    'PRINT #hDbg, " --------" + $CRLF 
      '==========
          
    
            
      shortestString = 0  ' shortest random string         
      longestString = 255 ' longest random string         
      lowestChar   = 32 ' ASC() smallest random character    
      highestChar  = 126 ' ASC() largest random character   
         
    
      '- Make a series of random length strings
      FOR i = First TO Last
       s = ""
       FOR j = First TO RND(shortestString, longestString)
        s = s + CHR$(RND(lowestChar, highestChar))
       NEXT j
       a(i) = s
       b(i) = s 
       DataType(i).sName = s 
       DataType(i).Rand = RND(-2147483648, 2147483647)
       DataType(i).Num  = i 
       pData(i)     = VARPTR(DataType(i)) 
    IF i < 100 THEN PRINT #hDbg, STR$(i) + " Len=" + STR$(LEN(TRIM$(@pData(i).sName))) + " pData=" + STR$(pData(i)) + " >" + TRIM$(@pData(i).sName) + "< " + STR$(@pData(i).Num)
      NEXT i
      '========== 
              
    
    
      s = ""
      time_stamp_count(cBeg) ' measuring CPU Clks. The overhead just for making this call is about 25 clocks
      ARRAY SORT a(First) ', DESCEND
      time_stamp_count(cEnd) ' measuring CPU Clks. The overhead just for making this call is about 25 clocks
      s = s + "ARRAY SORT Clks="+STR$( (cEnd-cBeg)\Last ) + $CRLF 
          
           
    
    CaseIgnore = 0
    Descending = 0
      time_stamp_count(cBeg) ' measuring CPU Clks. The overhead just for making this call is about 25 clocks 
    '  CALL SWSort( VARPTR(b(First)), Last-First ) ' Simplist call w/no tag array
      CALL SWSort( VARPTR(b(First)), Last-First, VARPTR(pData(First)), CaseIgnore, Descending ) ' Full Call 
      time_stamp_count(cEnd) ' Measuring CPU Clks. The overhead just for making this call is about 25 clocks
      s = s + "SideWinder Clks="+STR$( (cEnd-cBeg)\Last ) + $CRLF + $CRLF 
      '=========
        
    
    
      '- Compare Sort Results 
    '  PRINT #hDbg, ""
    '  FOR i = First TO Last 
    '   IF TRIM$(a(i)) <> TRIM$(b(i)) THEN 
    '    PRINT #hDbg, STR$(i) + $TAB + ">" +TRIM$(a(i)) + "<  >" + TRIM$(b(i)) + "<--- out of order"
    '   ELSE
    '    IF i < 100 THEN PRINT #hDbg, STR$(i) + $TAB + a(i)
    '   END IF
    '  NEXT i 
    
    
      PRINT #hDbg, ""
      FOR i = First TO Last 
       IF TRIM$(a(i)) <> TRIM$(@pData(i).sName) THEN 
        PRINT #hDbg, STR$(i)
        PRINT #hDbg, ">" +TRIM$(a(i)) + "<"
        PRINT #hDbg, ">" +TRIM$(@pData(i).sName) + "<--- Bad Match"
    MSGBOX STR$(i)+"Sorted Element out of order" : EXIT FOR
    
       ELSE
    IF i < 100 THEN PRINT #hDbg, STR$(i) + $TAB + a(i)
       END IF
      NEXT i 
          
    
     PRINT #hDbg, s
     MSGBOX s,64,STR$(Last-First)+" Random Length STRINGs"
    
    END FUNCTION 
    
    'いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい'
    

  5. #5

    Re: Replacing ARRAY SORT


    Hi Mike,

    An efficient general purpose sorting algorithm requires about n*log2(n) cycles to sort an array of n elements. So for 1024 elements: 20x1024 cycles are required. for 2048 it would be 21x2048 cycles, and so forth.

    Taking the overall sorting time and dividing through by this factor will give you the time per cycle and this is a fair predictor of performance over a wide range of sample sizes. In your example of 2550 clocks for 100k strings, we can divide this figure by 17 to get an estimate of the per cycle clock count. Which yield a figure of 150 clocks. Similarly in your previous posting: 2000 strings in 1700 clocks per string when divided through by 11 yields 154 clocks per cycle.

    I hope this will be a useful benchmark.

    Charles

  6. #6

    Re: Replacing ARRAY SORT

    Charles,

    I bet some of that sort code you did for the word count code challenge could help with Mike's project. (if Oxygen is planed to be used)


    John
    ScriptBasic Project Manager
    Project Site
    support@scriptbasic.org

  7. #7

    Re: Replacing ARRAY SORT


    Yes John,

    Here is an example of the merge sort algorithm that I used in your code challenge. This is not the fastest version but it is almost entirely Basic.

    It uses a callback to a comparator function which decides which 1 of 2 indices points to the lower ordered element. This separates the sorting algorigm from the comparator function completely, allowing any type of data and comparison procedure to be used.

    This merge sort shuttles an array of indexes back and forth successively sorting groups of 2 4 8 16 32 .. elements each time until all the element indexes are within 1 fully sorted group.

    Charles

    [code=thinbasic]

    '======
    ' SORTM
    '######

    '
    ' Indexer using a MergeSort and Callbacks
    '
    '
    ' Charles E V Pegge
    ' 9 Aug 2009



    ' The sortm function is written in Oxygen, and based on a MergeSort
    ' which is one of the most efficient sort algorithms available, requiring
    ' n/2 * log2(n) comparisons.
    '
    ' For example, A database of 1 meg data elements would require 10 meg comparisons
    ' whereas a simple pick-one sort would take n*n/2 = 500 gig comparisons.
    '
    ' The sortm function requires twice the workspace of the final index, since it
    ' shuttles the indices from one buffer to the other during the merge sort process.
    '
    ' This function uses a callback Chooser function to make each comparison, so
    ' it can be applied to any type of data on any criterion. The output is
    ' an array of 4 byte integers indexing the array of data elements.

    ' The index array must first be initialised with a set of indices, one for each
    ' data element. The merge sort function then rearranges the order of these indices
    ' in the array.

    uses "oxygen","file"


    '-----------------------
    'NUMBER OF WORDS TO SORT
    '=======================


    dim sz as long=10000


    '-------------------------
    'FOR INDEX AND DATA ARRAYS
    '=========================


    dim ri(sz*2+1) as long ' must be twice the number of data elements
    dim rs(sz+1) as string ' test sample of random string data

    '---------------
    'FOR SAMPLE DATA
    '===============

    dim i,j as long
    dim r as long
    dim rst as string


    '---------------------------------
    'GENERATE RANDOM STRINGS AND INDEX
    '=================================

    for i=1 to sz
    ri(i)=i ' this is our initial index for each element in the string array.
    rst=string$(16," ")
    for j= 1 to 16 ' generate a string of 16 random uppercase characters
    r=rnd(1)*25+65
    mid$(rst,j)=chr$(r)
    next
    rs(i)=rst ' store a random string in each data element in the string array
    'print rst
    next


    dim src as string

    src="

    #basic

    '-------------------------------------
    ' CALLBACK FUNCTION TO MAKE THE CHOICE
    '=====================================

    ' choose between first and second or abort the sorting process
    ' this is called by the bisort function
    '
    ' Parameters:
    ' first index number for data element
    ' second index number for data element
    '
    ' Return:
    ' 1 for first choice 2 for second choice 0 to abort the bisort function
    '


    '----------------------------------------------------------------
    function ChooseWhich (byval first, byval second) as long callback
    '================================================================
    '
    ' BASIC
    '------
    ' dim as bstr rs at #rs
    ' if rs(first) > rs(second) then function=1
    ' exit function

    'ASM USING DIRECT STRING COMPARE
    '-------------------------------
    mov ecx,#rs : mov eax,first : mov edx,second
    comparestr [ecx+edx*4-4], [ecx+eax*4-4] 'DIRECT STRING COMPARE
    ( : jle exit : inc _function : )

    end function


    '-------------------
    'CALL THE MERGE SORT
    '===================

    'declare function sortm (byval p as long ptr,byval sz as long, byval cbk as long) as long


    '###########
    ' MERGE SORT
    '###########


    '
    ' requires an index array of twice the number of elements
    '
    ' parameters
    ' 1 p pointer to index array
    ' 2 sz size of index array in elements
    ' 3 cbk address of Choosing function for callback

    ' Return:
    ' 0
    '
    '----------------------------------------------------
    function sortmb(byval dp, byval sz, byval cb) as long
    '====================================================

    ' dp ' pointer to base of data index array
    ' cb ' pointer to comparator callback function
    ' sz ' point to number of elements to sort
    dim a ' comparator flag
    dim c ' chooser index
    dim e ' end of upper buffer block
    dim g ' element counter
    dim i ' first selector
    dim j ' second selector
    dim k ' target index
    dim n ' number of elements
    dim q ' flag for toggling source/target buffer blocks
    dim bi ' boundary for fist selector
    dim bj ' boundary for second selector
    dim st ' start of source buffer
    dim en ' end of source buffer

    dim dd at dp
    '
    'INITIAL
    '-------
    '
    n=*sz : e=n+n
    st=1 : g=n+1
    '
    do
    if g>n then
    if st>=n then exit do
    '
    'SWAP SOURCE SOURCE INDEX BLOCK AND TARGET INDEX BLOCK
    '
    q=1-q
    if q then
    k=n+1 : i=1 : en=n
    else
    k=1 : i=n+1 : en=e
    end if
    '
    j=i+st : bi=j : bj=bi+st
    if bj>en then bj=en+1 ' LIMIT
    shl st,1 'DOUBLE NEXT STRIDE
    g=1
    end if
    '
    if i>=bi then
    if j>=bj then
    i=bj : bi+=st : bj+=st : j=bi ' NEXT SET
    if bj>en then bj=en+1 ' LIMIT
    if bi>en then bi=en+1 ' LIMIT
    continue do
    else
    c=j : inc j
    end if
    elseif j>=bj then
    c=i : inc i
    else
    a=call cb dd(i),dd(j)
    if a then c=j : inc j else c=i : inc i
    end if
    '
    dd(k)=dd(c)
    inc k : inc g
    loop

    'TRANSFER SHUTTLE BUFFER TO START IF NECESSARY

    if q then for i=1 to n : dd(i)=dd(i+n) : next

    end function



    '-----------------------------------------------------------
    function check_sample(byval ds, byval ri,byval sz) as string
    '===========================================================

    dim as bstr rs at ds ' array of data strings
    dim as long dd at ri ' array of indexes to data
    dim as long e at sz ' number of elements
    '
    dim i,j
    function=`
    SAMPLE:
    ASCENDING ORDER


    `
    j=e
    if j>16 then j=16
    for i=1 to j
    function+=rs(dd(i))+`
    `
    next

    end function


    '-----------------
    'PERFORMANCE CHECK
    '=================
    '
    DECLARE FUNCTION QueryPerformanceCounter LIB `KERNEL32.DLL` ALIAS `QueryPerformanceCounter` (lpPerformanceCount AS QUAD) AS LONG
    DECLARE FUNCTION QueryPerformanceFrequency LIB `KERNEL32.DLL` ALIAS `QueryPerformanceFrequency` (lpFrequency AS QUAD) AS LONG
    dim as quad t1,t2,fr

    QueryPerformanceFrequency fr



    '----
    'MAIN
    '====

    '----------------
    'PERFORM THE TEST
    '================

    QueryPerformanceCounter t1
    sortmb #ri,#sz, &choosewhich
    QueryPerformanceCounter t2

    '------------
    'PREPARE DATA
    '============

    dim as string r,tab=chr 9
    r=check_sample (#rs,#ri,#sz)
    dim n=[#sz]
    dim as single t
    t=(t2-t1)/fr

    '------
    'REPORT
    '======


    print r `
    ` str(n) ` strings
    sorting time ` str(round(t*1000)) ` mSecs

    per cycle: ` str(round(t*1e9/(n*log2(n)))) ` nSec
    `
    terminate


    "





    'msgbox 0,o2_view src
    msgbox 0,sz+" words ready to sort"

    o2_basic src

    if len(o2_error) then
    msgbox 0,"Error:"+$cr+o2_error
    stop
    end if


    o2_exec


    '------------
    ' SAVE RESULT
    '============

    stop

    dim f as long
    f=file_open ("t.txt","output")
    for i=1 to e
    a=ri(i)
    file_LinePrint (f,str$(i)+$tab+str$(a)+$tab+rs(a))
    next
    file_close(f)

    'msgbox 0,"okay"
    [/code]

  8. #8

    Re: Replacing ARRAY SORT

    Thanks Charles. I don't have time to convert it today, but do you think that would be faster?

  9. #9

    Re: Replacing ARRAY SORT


    Hi Mike,

    What you have already performs very well. Mine could problably run a bit faster but with the current processing power of today's average PC, sorts of up to 100k elements are fast enough to be barely noticeable. But if you land a contract to build the Califoria State Health Service Database then you could use other techniques like bunching the data into linked lists based on the first two letters then sorting each list independently. This would further boost performans by a factor of around 8 - 10x.

    There's plenty you can do for special situations.

    Charles

  10. #10

    Re: Replacing ARRAY SORT

    Yes.
    With that in mind this version of the INTEGER ARRAY sorting is probably better

    SUB QSortLONG( BYVAL pArr AS LONG PTR, BYVAL nElem AS LONG, OPT BYVAL pTag AS DWORD PTR, BYVAL DescendFlag AS LONG )
                              
     LOCAL i, j, StkIx, First, Last AS LONG
     LOCAL MidPt AS LONG '          
         
      IF nElem < 1 THEN EXIT SUB ' nothing to sort      
    
      i = 128 * (1+(Last\1000000)) ' 128 per 1m elements should be more than double what's needed   
      DIM QStack(i) AS LONG ' 
         
      First = 0 
      Last = nElem
    
      IF pTag = 0 THEN ' Create a Dummy array        
       DIM Tg(First TO Last) AS LONG 
       pTag = VARPTR(Tg(0)) ' ALL Zero Based Arrays from here on 
      END IF 
    
    
      DO
       DO 
        i = (Last+First)\2 '
        MidPt = @pArr[i] ' seek midpoint
        i = First
        j = Last ' 
          
        IF DescendFlag = 0 THEN
         DO '
          WHILE @pArr[i] < MidPt                                      
           INCR i                                            
          WEND ' 
          WHILE @pArr[j] > MidPt                                      
           DECR j                                            
          WEND '                    
          IF i > j THEN EXIT DO '                                     
          IF i < j THEN SWAP @pArr[i], @pArr[j] : SWAP @pTag[i], @pTag[j] ' : INCR TotSwaps 
           INCR i        
           DECR j        
         LOOP WHILE i <= j 
    
        ELSE ' Descending
         DO 
          WHILE @pArr[i] > MidPt                                      
           INCR i                                              
          WEND                                                    
          WHILE @pArr[j] < MidPt                                      
           DECR j                                                 
          WEND                                              
          IF i > j THEN EXIT DO           
          IF i < j THEN SWAP @pArr[i], @pArr[j] : SWAP @pTag[i], @pTag[j] ' : INCR TotSwaps 
           INCR i       
           DECR j       
         LOOP WHILE i <= j  
        END IF ' 
    
        IF i < Last THEN       ' Done 
         QStack(StkIx)   = i    ' Push i 
         QStack(StkIx + 1) = Last  ' Push Last
         StkIx = StkIx + 2      ' 
        END IF               
                          
        Last = j              
       LOOP WHILE First < Last       
                          
       IF StkIx = 0 THEN EXIT DO     
       StkIx = StkIx - 2        ' 
       First = QStack(StkIx)      ' Pop First
       Last = QStack(StkIx + 1)    ' Pop Last
      LOOP               
          
    END SUB
    

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •