Results 1 to 2 of 2

Thread: Internal (SDK mostly) Vartype-thoughts about encoding and other stuff

  1. #1
    thinBasic MVPs
    Join Date
    Oct 2012
    Rep Power

    Internal (SDK mostly) Vartype-thoughts about encoding and other stuff

    I had a few thoughts, concerning the Vartypes in the sdk divided to VarMainTyppe and VarSubType.
    Just to remind
    %VarMainType_IsNumber             =  20&       ' 0x0014
      %VarMainType_IsString             =  30&       ' 0X001E
      %VarMainType_IsAsciiZ             =  25&       ' 0x0019
      %VarMainType_IsVariant            =  50&     ' 0x0032
      %VarMainType_IsUDT                =  60&
      %VarMainType_IsPTR                =  70&
      %VarMainType_IsObject             =  80&
      %VarMainType_IsClass              =  90&
      %VarMainType_IsFunction           =  95&
      %VarMainType_IsDispatch           = 100&
      %VarSubType_Byte                  =   1&
      %VarSubType_Integer               =   2&
      %VarSubType_Word                  =   3&
      %VarSubType_DWord                 =   4&
      %VarSubType_Long                  =   5&
      %VarSubType_Quad                  =   6&
      %VarSubType_Single                =   7&
      %VarSubType_Double                =   8&
      %VarSubType_Currency              =   9&  ' 0x0900
      %VarSubType_Ext                   =  10&    ' 0x0A00
      %VarSubType_Variant               =  50&   '0x3200
    thats essential it. Now through a little script i am currrently on i decided to save on parameters when i put maintype and subtype into 1 equate and use the
    as you already can see indicated rightmost byte for the maintype . the subtype-values were actually just to multiply by 256 (0x100) to add up then. In case
    of the Variant-types sub + mayntype add up to 0x3232 - and we have these %VT_-equates that are to determine what kind exactly - if strings or bytes or
    whatever type the variant just holds.
    the ending Maintype easy to see for the human reader and easy to understand 16 bits no meaning +8 bits detail +8 bit major information.

    the more left,
    the higher the number -
    the higher the detail

    Remind that we use Long variables for the equates - 32 Bit. means you see only half of the variables up there where you see a hex-number.
    It does not require additional equates if we know that adding var-subtype and -maintype is to be done like
    %VarMainType_IsNumber + 256 * %VarSubType_Ext
    and requires not much of imagination to add in further information as
    %VarSubType_Variant * 256 + %VarMainType_IsVariant + 65536 * %VT_BSTR
    = 0x00083232
    and we have a very detailled information of this variable in one parameter. Internal functions to decode this were not difficult to create since the bytes of the value
    can be read by just placing a
    Local B(4) As Byte At Varptr(myParameter)
    maintype = B(4)
    subtype  = B(3)
    and the meaning of the other 2 bytes depends on the main- & subtype.

    Without any changes to the current enumeration, ( i've used new names here) and tried to work with names that say something more
    than the currently mostly for compatibility with powerbasic used names. But to be honest - if you dont know what is the meaning of DWORD
    you can not guess it from anything but from WORD and the knowledge that there is a verb "Double" meaning somewhat as "twice" in the english language.

    Once you know "Int" means "Integer" and "U" means "Unsigned" and the number means the count of Bits then "UInt32" is faster to understand for someone
    who is willing to learn a language as thinBasic and doesn't know anything of Powerbasic or other languages that use similar names.
    And if they know maybe
    "Integer" - wow, but Integer even in Visual Studio means sometimes still 16, sometimes 32 and now also 64 bits... ,
    "Short" sometimes is a byte, sometimes the size of a word...
    Long Longlong and ExtraLargeLongLongDingDong :... what will be next?
    Someone open the door please and kick 'em out

    When we repeat the pattern in Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64 and will there ever be one : UInt64 ? the new users learning curve goes up very fast
    since one of these always appears somewhere. If we continue - already present for Float32, Float64 and Float80 that names floating point numerals - we could add Dec32 or Dec64 if we had decimals or if decimals 64 Bit would be available with 2 and 4 decimal digits after the decimal delimiter also that might -to keep it short- become Dec2d64 and Dec4d64 .

    Now we also have Strings. There was originaly only 1 type of them and named string only. Through all the Unicode-madness, Ansi, Dos, codepage. OEM all was squeezed into the same type and someone saw its required to have another one that is not working as the original -native string and has no length to know its size in advance but is terminated by a character that is never used for a text. the chr$(0) glued to its end - became AsciiZ or StringZ. then strings became wider using 2 bytes per char and
    -WAsciiZ was not invented because there are no ascii-chars , but we have WString and WStringZ... and thats 16 Bit...

    Meanwhile 32Bit was required since the company that "invented" Unicode did not really invent it. The name is wrong. Its definetly not only one and should be titled
    16Bit-Multicode. But we will not bother names... only facts and as clear as possible.

    i started an approach that looks as
     %VarType_Numeral            = &H0014     '   Multiplied all Subtypes * 256
      %VarType_Literal            = &H001E      '  added Subtypes + maintypes
      %Vartype_LiteralZ           = &H0019       '  subtype-bits of strings used to define details
       '                                            ' 3 bits used to differ the encodings with a value that is 
      '                                          '  equal to the "count of used Bytes per char" 
                                              '   (UTF7=1,UTF16=2,UTF32=4)
     %VarType_String             = &H001E  'PLAIN NOT ENCODED/UNKNOWN ENCODING OR Data         
                                                 '   UTF7 is plain ASSCII codes 0 to 127 only
      '                                         '   if any char in utf7-string had an asc-value > 127 it has to be considered buggy
      '                 + Bytes per char        '   and requires conversion to ANSI|UTF8|OEM|DOS
      '               + ByteOrderMark           '   ANSI (once in a while values > 127, rarely multiple times in a row > 127)
      '           + Encodings                 '   UTF8 never one value above 127 only, always 2 to 4 bytes >=128 continously)
                                                '   where it applies (in the hope never to have 64Bit chars in a 32Bit environment)
                                                '   8 is available to indicate the use of a BOM (byte order mark)
                                                ' leftmost bits apply to
                                                '                UTF7          UTF16             UTF32
                                                ' 128|&H80    ->UTF8            not               no need
      %VarType_SUTF7              = &H011E       '  64|&H40   ->ANSI          thought            no need
      %VarType_SUTF8              = &H811E       '  32|&H20   ->OEM          in detail           no need
      %VarType_WString            = &H021E       '  16|&H10   ->DOS          but used            no need
      %VarType_SUTF32             = &H041E       '   8|&H08    BOM             BOM                 BOM
      %VarType_AsciiZ             = &H1900 
                                                 ' with this code the Vartype_String 0x001E (unformatted) becomes SUTF8 0x811E (UTF8 no BOM) 
                                                ' SZUTF8   zero terminated utf8 0x00008119 (no BOM) 0x00008919 (same with BOM)
      %VarType_WStringZ           = &H0219       ' the dos/oem/ansi for the 7bit-strings and for utf16 could use the 2 leftmost bytes of the
      %VarType_SZUTF32            = &H041E       ' 32-Bit vartype equate for the notation of codepage /dos etc. to "convert" once the encoding
                                                 ' is known, the current unformatted 0x0000001E (String) or 0x00000019 (AsciiZ) could change
      %VarType_UDT                =      &H003C  ' to a number that tells if OEM or Codepage, Ansi or DOS-keyboard enumeration
      %VarType_PTR                =      &H0046    ' since the left bits of the right byte could give a clue about what there is.
      %VarType_Object             =      &H0050
      %VarType_Class              =      &H005A
      %VarType_Function           =      &H005F
      %VarType_Dispatch           =      &H0064
      %VarType_Byte                  =  &H0114
      %VarType_Integer               =  &H0214
      %VarType_Word                  =  &H0314
      %VarType_DWord                 =  &H0414
      %VarType_Long                  =  &H0514
      %VarType_Quad                  =  &H0614
      %VarType_Single                =  &H0714
      %VarType_Double                =  &H00000814
      %VarType_Currency              =  &H00000914
      %VarType_Ext                   =  &H00000A14
      %VarType_Variant               =  &H00003232
    As you see, the strings could include information about bytes that are used for a char but names are follwing the BitCount-usage of the numerals.
    And UTF-encodings also replect the bits. String - A leading S for a chain of concentanated characters because the C already is for classes. chars are no
    datatype (yet) but i tend to "Literals" Lit8, Lit16, Lit32 if there were to separate in mind the use of unicode from legacy char, wchar oem, ansi and stuff

    The 2 left bytes for string-derived datatypes could as in the remarks described alike the variant-%vt_ above hold the codepage or whatever dos,oem and Ansi (that all are using different charsets: ANSI does not mean one certain set of characters but some lookup-tables-collection of whatever keyboards. I did not study completely
    also the UTF16 have a probably 16-bit table of pages for encodings. If the last 3 bits of the last (subtype-)byte would say "bytes used per char", the second from the right (2) makes the 16Bit and on the left of our 32Bit equate are 2 bytes usused - no questions where to store the 16Bits to have string-types that have all detail included. The maintype would never change by another encoding
    - only the "pre-teminated" string thats length is in a dword left of the StrPtr so we know its end in advance and the other thats terminating zero in the end will know from one of the 3 rightmiost bits in the subtype if to terminate using MKByt$(0), MKWrd$(0) or MKDwd$(0)...

    Just thoughts... compatible to future

    edit- yes, improved a bit
    Last edited by ReneMiner; 05-02-2022 at 18:04.
    I think there are missing some Forum-sections as beta-testing and support

  2. #2
    thinBasic MVPs
    Join Date
    Oct 2012
    Rep Power
    just adding a few things

    there is no equate for the GUID that i used and i stumbled across some unique exception
    for the maintype i gave it 0x70 (112) and the subtype is just the length in bytes because there are several ways to display it
    usually in a string it has 0x26(38 ) bytes, asciiz one more 0x27(39)
    wstring doubles it to 0cx4C (76) + MKWrd$(0) = wstringZ 0x4E(78 )
      # FOR a GUID no TYPE-equates available
      %VarType_GUID                  =  &H1070  ' binary/ string * 16
      %VarType_sGUID                 =  &H2670  ' string
      %VarType_szGUID                =  &H2770  ' asciiz
      %VarType_wsGUID                =  &H4C70  ' wstring
      %VarType_wszGUID               =  &H4E70  ' wsztring
    Pointers currently only maintype but i think for some compatibility we have a use for 64Bit-pointers - even a dll will run in syswow64 (32Bit Emulation mode)
    there are some using DwordLong pointers ( UINT64 )
    %VarType_PTR               =      &H0046    ' current existing maintype 
    %VarType_Ptr32             =      &H0446
    %Vartype_Ptr64             =      &H0846
    Currency /CUR and CurrencyExtended /CUX are no subtypes . None at all. What do we have? CUR only i guess even both use 64Bits but i dont see a use. Especially since
    the variants provide currency too and i have something about Decimal using 96 Bit in mind - something was with that decimals - am not sure. In some VBA or VB these are
    not available is what i think...
    that was it. for now.
    see ya
    Last edited by ReneMiner; 06-02-2022 at 01:54.
    I think there are missing some Forum-sections as beta-testing and support

Similar Threads

  1. server internal error
    By primo in forum Shout Box Area
    Replies: 1
    Last Post: 26-02-2017, 17:44
  2. With {As|Like} vartype At- a virtual idea
    By ReneMiner in forum Suggestions/Ideas discussions
    Replies: 0
    Last Post: 30-01-2015, 19:57
  3. encoding Encyclopedia Britannica in one dot
    By zak in forum Shout Box Area
    Replies: 12
    Last Post: 20-08-2011, 08:47
  4. Accept-Encoding: gzip
    By martin in forum Shout Box Area
    Replies: 6
    Last Post: 30-03-2009, 13:08
  5. TBEM: internal multiple threads
    By ErosOlmi in forum TBEM module - thinBasic Event Manager module
    Replies: 3
    Last Post: 23-10-2008, 08:40

Members who have read this thread: 0

There are no members to list at the moment.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts