Results 1 to 4 of 4

Thread: A bug and thinking about utf8 encoding

  1. #1
    Member
    Join Date
    Jan 2017
    Location
    changsha China
    Age
    32
    Posts
    80
    Rep Power
    16

    Smile A bug and thinking about utf8 encoding

    I noticed that the 1.11.3.0 - 1.11.7.0 version changelog included a change to the FILE_Exists function, and then noticed that some of my files named in Chinese were not running through thinbasic.exe anymore.

    I then did some experiments and confirmed that this was due to an utf8 encoding issue, and that plain English path scripts were working.

    However, I converted the path to utf8 encoding and started thinbasic.exe, but still can not run, and in a pure English path of the script, through the #include reference to external scripts, and must use utf8 encoding to run properly.

    Irresponsible guess, maybe because thinbasic.exe is through the call FILE_Exists, FILE_Load and other functions to load the script file, and these functions in later versions changed to utf8 encoding support, which led to all this problem?

    Multi-language encoding support is a great thing, and I initially suggested that the author make changes in this regard, but then in time I realized that this might not be a very good choice.

    OEM coding always follows the windows distribution and is the preferred language for each country version of windows, usually we classify it as ANSI like, the advantage of this coding is that there is no cost of understanding and it is very easy to use, unless it is a systematic project that needs to be available to all countries and there is a need to be compatible with multiple languages on one PC, otherwise unicode is not a necessary choice.

    Of course, this doesn't mean I'm against unicode, but one of the more frustrating reasons for me is that when extensions like UI, File, etc. add support for utf8, it becomes more complicated to manipulate multiple languages, and in the last year when I've been training people around me to program in thinbasic, I've often had to explain to them why this area needs to be character I often had to explain to people around me when I was training them to use thinbasic for programming in the last year why character encoding was needed in this area, and then what character encoding was and why there were so many of them.

    Of course, this confusion may be caused by the fact that thinbasic does not make all functions firm when utf8. If all functions were utf8, then the encoding conversion would not be valid, but obviously not, which means that a lot of encoding conversion operations are needed when calling the system API, which reduces the efficiency of software execution.

    The BASIC like language has always been known for its simplicity, so maybe it's time for me to suggest to the authors to eliminate utf8, or maybe we can change the encoding to switchable? For example, set the encoding by some command, and after switching, the function will automatically convert the text according to the set encoding, but this is a big job and takes a long time.

    The above is my thinking and discussion about this matter, but thinbasic.exe can not run under the non-ascii path of the problem is real, perhaps in future versions, we can prioritize this problem will be fixed.

    Translated with www.DeepL.com/Translator (free version)

  2. #2
    Member
    Join Date
    Jan 2017
    Location
    changsha China
    Age
    32
    Posts
    80
    Rep Power
    16
    I have collected the functions that now use utf8 encoding by default.

    <Textbox>.Text
    <ButtonName>.Text

    MENU SET TEXT
    MENU GET TEXT
    MENU ADD STRING
    MENU ADD POPUP
    DIALOG NEW
    CONTROL_GetText
    CONTROL GET TEXT
    CONTROL_SetText
    CONTROL SET TEXT
    DIALOG GET TEXT
    DIALOG SET TEXT
    Control Append Text

    FILE_Exists
    FILE_Load
    FILE_Save
    FILE_Append

    Load_File
    Save_File

  3. #3
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732
    Hi xLeaves,

    thank you very much for your message.

    I fully agree there is a long way before thinBasic can be considered unicode friendly, we are getting there one step after another.

    It is good to be aware of this limitation, I will think how to reflect it in the documentation.


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  4. #4
    Member
    Join Date
    Jan 2017
    Location
    changsha China
    Age
    32
    Posts
    80
    Rep Power
    16
    We may be able to do so, the operation of unicode and oem encoding into two groups of functions, now at this stage to oem encoding, and gradually increase the scope of unicode support in the future, which may increase the size of the program, but with the progress of hardware, people are less sensitive to the size, it can be expected that the volume will probably increase by a few dozen KB, which is not a very serious problem.

    Translated with www.DeepL.com/Translator (free version)

    <Textbox>.TextW
    <ButtonName>.TextW

    MENU SET TEXTW
    MENU GET TEXTW
    MENU ADD STRINGW
    MENU ADD POPUPW
    DIALOG NEWW
    CONTROL_GetTextW
    CONTROL GET TEXTW
    CONTROL_SetTextW
    CONTROL SET TEXTW
    DIALOG GET TEXTW
    DIALOG SET TEXTW
    Control Append TextW

    FILE_ExistsW
    FILE_LoadW
    FILE_SaveW
    FILE_AppendW

    Load_FileW
    Save_FileW



    And the following functions still maintain access to the oem code:

    <Textbox>.Text
    <ButtonName>.Text

    MENU SET TEXT
    MENU GET TEXT
    MENU ADD STRING
    MENU ADD POPUP
    DIALOG NEW
    CONTROL_GetText
    CONTROL GET TEXT
    CONTROL_SetText
    CONTROL SET TEXT
    DIALOG GET TEXT
    DIALOG SET TEXT
    Control Append Text

    FILE_Exists
    FILE_Load
    FILE_Save
    FILE_Append

    Load_File
    Save_File



    OEM code refers to CP_ACP or CP_OEMCP, which is usually considered ANSI, but is used as the default multi-byte code page in windows operating systems in different languages in different countries.

    When I was overriding the AnsiToUTF8$ function before, I found that it used CP_ANSI or other fixed English code pages instead of CP_OEM, which would cause multibyte code pages other than English to not be converted to utf8 correctly.

    Translated with www.DeepL.com/Translator (free version)

Similar Threads

  1. Internal (SDK mostly) Vartype-thoughts about encoding and other stuff
    By ReneMiner in forum Suggestions/Ideas discussions
    Replies: 1
    Last Post: 06-02-2022, 01:47
  2. Replies: 2
    Last Post: 08-03-2012, 00:54
  3. encoding Encyclopedia Britannica in one dot
    By zak in forum Shout Box Area
    Replies: 12
    Last Post: 20-08-2011, 08:47
  4. Accept-Encoding: gzip
    By martin in forum Shout Box Area
    Replies: 6
    Last Post: 30-03-2009, 13:08

Members who have read this thread: 1

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •