Are you kidding us? Please produce some proofs.
99.9999% compression ratio using Thinbasic.
lossless compression.
100 MB of wikipedia compressed to 1kb.
next, to test this week: compression ratio for 1 GB
theorically, this new codec, can compress GB, TB, PB in a few KB.
Last edited by alberto; 16-07-2019 at 22:38. Reason: typo
Are you kidding us? Please produce some proofs.
www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000
Hi Eros,
the background that made it possible is here: https://largestprimes.xyz
I am participating to the Prize for Compressing Human Knowledge with this codec.
They will test it a month, and evaluate if we broke their record.
it seems that their current compression record is near 15 mb for a 100mb wikipedia file using codecs phda9, decomp8,paq8
As I was working with very large numbers for prime, then it was easy to compress 100 million digits using thinbasic.
You mean this one?
https://en.wikipedia.org/wiki/Hutter_Prize
http://prize.hutter1.net/
www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000
This is my very little contribution to the challenge.
Attached script perform the following:
- download zipped file used for the challenge if not already present in current script directory
- extract included file into a string buffer of 100MB
- compress it into a new string
- report results .... very poor compared to current challenge results
Ciao
Eros
Capture.PNG
uses "ZLib" Uses "File" uses "console" uses "inet" printl "---------------------------------------------------------------" printl "Challenge: https://en.wikipedia.org/wiki/Hutter_Prize" printl " http://prize.hutter1.net/" printl "---------------------------------------------------------------" printl "download zipped file used for the challenge if not already present in current script directory" printl "extract included file into a string buffer of 100MB" printl "compress it into a new string" printl "report results .... very poor compared to current challenge results" printl "---------------------------------------------------------------" PrintL printl "Press any key to Start---" IN %CCOLOR_FYELLOW WaitKey string sUrlZipFile = "http://mattmahoney.net/dc/enwik8.zip" string sLocalZipFileName = APP_SourcePath & "enwik8.zip" printl "---Start downlaoding", sUrlZipFile if FILE_Exists(sLocalZipFileName) Then printl "---File already downloaded" Else printl " Dowloading ..." INET_UrlDownload(sUrlZipFile, APP_SourcePath & "enwik8.zip") end if printl " Local file name", sLocalZipFileName PrintL string sUncompressedFileName = "enwik8" printl "---Extracting " & sUncompressedFileName & " to string" printL " start", Time$ string sOriginal = ZLib_ExtractToString(sLocalZipFileName, "enwik8") printL " end", Time$ printl " Extraction done. Size of string uncompressed:", LenF(sOriginal) printl string sCompress printl "---Start compressing", Time$ sCompress = StrZip$(sOriginal) printl " End compressing", Time$ printl " Len Original string.....", lenf(sOriginal) printl " Len compressed string...", lenf(sCompress) PrintL printl "Press any key to end---" IN %CCOLOR_FYELLOW WaitKey
www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000
Hi Eros,
Yes, exactly, that is the prize.
Thank you Eros for your experience & contribution.
Thinbasic is very powerful
lots of commands to learn...
Hi,
good question,
they say their data 100mb: enwik8 is fairly uniform.
their link "Information about the enwik8 data file" is:
http://mattmahoney.net/dc/textdata.html
you will find there detailed information about the data, statistics, and graphics of the distribution of the data too:
This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 10power9 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006
enwik8: compressed size of first 108 bytes of enwik9. This data is used for the Hutter Prize, and is also ranked here but has no effect on this ranking.
enwik9: compressed size of first 109 bytes of enwiki-20060303-pages-articles.xml
they have been benchmarking well known codecs, for years.
Hi,
i wonder if you mean to the certainty of the outcomes of the compressed files generated.
then the entropy is zero.
" Entropy is zero whenone outcome is certain".
http://basicknowledge101.com/pdf/km/...%20theory).pdf
2 shannons of entropy: Information entropy is the log-base-2 of the number of possible outcomes; with two coins there are four outcomes, and the entropy is two bits.
Entropy is zero whenone outcome is certain.
it is the first time I read about shannon entropy, its good to learn each day something.
thanks
Bookmarks