Results 1 to 9 of 9

Thread: Line termination LF vs CRLF vs CR

  1. #1

    Line termination LF vs CRLF vs CR

    Hi.

    I had a multi-line file that I was processing with a thinBasic program I wrote. It came back reporting only one line was read.

    The problem was that the file used Unix line terminations (no CR preceding the LF).

    Would it be possible to change the FILE_LineInput function to accept any of the following as valid line terminations: LF (Unix-standard), CRLF (DOS/Windows), or CR (Macintosh)?

    Secondly, would you consider adding a FILE_LineTermination function which would set the character(s) that were written to the end of each line? Examples:

    [code=thinbasic]
    FILE_LineTermination($CR) ' Write file with Macintosh line terminations
    FILE_LineTermination($CRLF) ' Write file with DOS/Windows line terminations
    FILE_LineTermination($LF) ' Write file with Unix line terminations
    [/code]

    With those changes, one could read an ASCII file regardless of the system on which it was created and write a file that conformed to whichever line termination standard was needed.

    Just a suggestion...

  2. #2
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10

    Re: Line termination LF vs CRLF vs CR

    Good idea. I will check what I can do. Maybe I have to change native function.

    In the meantime a possible way is to load the whole file and parse it into an array of lines using something like the following. If file is up to some MByte it should do the work very very quickly.

    [code=thinbasic]
    uses "FILE"

    '---Change file name as needed
    dim MyFile as string value APP_SourceFullName

    '---Will contains all lines loaded
    Dim MyLines() AS STRING

    '---Will count number of lines found
    DIM nLines AS LONG

    '---Load the full file and parse it into tokens separated by $LF
    '---Returns number of tokens (in this case lines) found
    '---MyLines array will have all tokens loaded inside
    nLines = PARSE(file_load(MyFile), MyLines, $lf)

    msgbox 0, "Lines loaded: " & nLines
    [/code]

    Also (again as possible altyernative) consider FILE_Load and FILE_Save that work on the full file buffer. Than you can handle file content in string buffer.

    Ciao
    Eros

    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  3. #3
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10

    Re: Line termination LF vs CRLF vs CR

    Just in case you didn't had the opportunity to look at PARSE function, it has an additional field (called FieldDelim) that will let you parse fields inside parsed lines and automatically fill the array (in this case a matrix array). See following example.

    Ciao
    Eros

    [code=thinbasic]
    '-------------------------------------
    '---Matrix Example
    '-------------------------------------
    Dim MyMatrix() As String
    Dim nLines As Long

    '---The following line will automatically dimension and fill MyMatrix to 3 rows and 5 colums
    ' nLines will contain the number of lines parsed, that is 3 in this case
    nLines = PARSE("1,2,3,4,5|6,7,8,9,10|A,B,C", MyMatrix, "|", ",")

    MSGBOX 0, "Number of lines : " & UBound(MyMatrix(1))
    MSGBOX 0, "Number of columns: " & UBound(MyMatrix(2))
    [/code]
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  4. #4
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10

    Re: Line termination LF vs CRLF vs CR

    Fred,

    do you have a sample file to test? If yes, can you please attach to this thread?
    I think I have a solution developing a new dedicated module able to parse, line by line, file from DOS, Unix, Mac systems.

    Thanks a lot
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  5. #5

    Re: Line termination LF vs CRLF vs CR

    Eros,

    The file I've been working on is several million lines in length (and I can process it okay once the line terminations are CRLF), so it's not practical to include that here. It also is a bit large for the parse function to be used on the entire file. Also, each line consists of comma-separated data, so I would have to parse twice, once on LF to break it into lines, and once on commas to break it into data elements.

    The problem is that I do not know, in advance, which line terminations the file will have when I receive it. It depends on whether it is transferred as FTP ASCII or Binary, whether it is copied through removable media or a mapped, shared network drive, whether they are using a Mac, Linux PC, Windows PC, Solaris, or something else.

    Thanks.

    Regards,
    Fred
    Attached Files Attached Files

  6. #6
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10

    Re: Line termination LF vs CRLF vs CR

    OK Fred. I think I've something to work with.
    Please find here enclosed a new module and a test script

    I've tested all your example file plus a Unix file of around 25Mb containing about 518000 lines. It took less than 2 seconds to read it line by line (without console output of curse).
    I dind't have time to add documentation but it should be easy to follow.

    Copy thinBabasic_FileLine.dll in your \thinBasic\Lib\ directory
    Copy the test script where you prefer. Change file reference inside it in order to find input files to test.

    Module is very very untested so sorry if errors or GPF. But we can improve it if we are on the right road.
    Module uses memory mapped files technique so regardless the size of the file it should not influence too much memory consumtion.
    Line separators ($CRLF, $LF, $CR) are automatically recognised sono need to indicate anything. Important is that files are TEXT files.

    Let me know about your tests.

    Ciao
    Eros
    Attached Files Attached Files
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  7. #7

    Re: Line termination LF vs CRLF vs CR

    Eros,

    Initial testing at this end looks great. Seems to work with every text file thrown at it. I may try some mixed up ones that have multiple line termination types.

    The need to indicate the line termination type was for the File_LinePrint function, so that it would terminate the output file lines according to the user's needs (for users writing Mac or Unix files). I do not have a current need to write files in Unix or Mac format, but I know that I will at some time and so will others. Might as well give the flexibility to specify any string as a line termination.

    I will be converting my program to use the new feature you just provided (understanding that it is relatively untested) for further testing. If it works, will you be rolling the functions into the regular FILE module or will they remain separate?

    Regards,
    Fred

  8. #8
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10

    Re: Line termination LF vs CRLF vs CR

    Quote Originally Posted by fmaxwell
    If it works, will you be rolling the functions into the regular FILE module or will they remain separate?
    No, this will be developed as new official thinBasic module because it doesn't use standard I/O approach to files but it uses Memory Mapped Files to solve the problem of reading big text files. All the hard work is done by the Operating System and not by the module code.

    This module will be dedicated to reading files and not writing (for the moment).
    Writing the output is relatively simple becasue you can just create a new standard file in BINARY mode and PUT in it what you need with the line terminator you need as a standard string.

    Maybe I will change the name of the module (I'm not satisfied with "FileLine" name :-\ ) and so the name of the inside functions. Maybe some function will have a little changed syntax. So for the moment please do not develop big scripts but just test new functions to catch bugs or suggest improvements.

    Ciao
    Eros

    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  9. #9
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,777
    Rep Power
    10

    Re: Line termination LF vs CRLF vs CR

    Forgot to mention about a possible limitation of this module approach.

    Consecutive multiple $CRLF or $LF or $CR (in any sequence they are found) are considered only once. So empty lines in files are just ignored by parser and next line with one or more char will be returned.

    I will see if I can change this approach (if needed).
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

Members who have read this thread: 0

There are no members to list at the moment.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •