summaryrefslogtreecommitdiffstats
path: root/vendor/github.com/klauspost/compress/zstd/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/github.com/klauspost/compress/zstd/README.md')
-rw-r--r--vendor/github.com/klauspost/compress/zstd/README.md393
1 files changed, 393 insertions, 0 deletions
diff --git a/vendor/github.com/klauspost/compress/zstd/README.md b/vendor/github.com/klauspost/compress/zstd/README.md
new file mode 100644
index 0000000000..bc977a3023
--- /dev/null
+++ b/vendor/github.com/klauspost/compress/zstd/README.md
@@ -0,0 +1,393 @@
+# zstd
+
+[Zstandard](https://facebook.github.io/zstd/) is a real-time compression algorithm, providing high compression ratios.
+It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder.
+A high performance compression algorithm is implemented. For now focused on speed.
+
+This package provides [compression](#Compressor) to and [decompression](#Decompressor) of Zstandard content.
+Note that custom dictionaries are not supported yet, so if your code relies on that,
+you cannot use the package as-is.
+
+This package is pure Go and without use of "unsafe".
+If a significant speedup can be achieved using "unsafe", it may be added as an option later.
+
+The `zstd` package is provided as open source software using a Go standard license.
+
+Currently the package is heavily optimized for 64 bit processors and will be significantly slower on 32 bit processors.
+
+## Installation
+
+Install using `go get -u github.com/klauspost/compress`. The package is located in `github.com/klauspost/compress/zstd`.
+
+Godoc Documentation: https://godoc.org/github.com/klauspost/compress/zstd
+
+
+## Compressor
+
+### Status:
+
+STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively
+used by several projects. This library is being continuously [fuzz-tested](https://github.com/klauspost/compress-fuzz),
+kindly supplied by [fuzzit.dev](https://fuzzit.dev/).
+
+There may still be specific combinations of data types/size/settings that could lead to edge cases,
+so as always, testing is recommended.
+
+For now, a high speed (fastest) and medium-fast (default) compressor has been implemented.
+
+The "Fastest" compression ratio is roughly equivalent to zstd level 1.
+The "Default" compression ratio is roughly equivalent to zstd level 3 (default).
+
+In terms of speed, it is typically 2x as fast as the stdlib deflate/gzip in its fastest mode.
+The compression ratio compared to stdlib is around level 3, but usually 3x as fast.
+
+Compared to cgo zstd, the speed is around level 3 (default), but compression slightly worse, between level 1&2.
+
+
+### Usage
+
+An Encoder can be used for either compressing a stream via the
+`io.WriteCloser` interface supported by the Encoder or as multiple independent
+tasks via the `EncodeAll` function.
+Smaller encodes are encouraged to use the EncodeAll function.
+Use `NewWriter` to create a new instance that can be used for both.
+
+To create a writer with default options, do like this:
+
+```Go
+// Compress input to output.
+func Compress(in io.Reader, out io.Writer) error {
+ w, err := NewWriter(output)
+ if err != nil {
+ return err
+ }
+ _, err := io.Copy(w, input)
+ if err != nil {
+ enc.Close()
+ return err
+ }
+ return enc.Close()
+}
+```
+
+Now you can encode by writing data to `enc`. The output will be finished writing when `Close()` is called.
+Even if your encode fails, you should still call `Close()` to release any resources that may be held up.
+
+The above is fine for big encodes. However, whenever possible try to *reuse* the writer.
+
+To reuse the encoder, you can use the `Reset(io.Writer)` function to change to another output.
+This will allow the encoder to reuse all resources and avoid wasteful allocations.
+
+Currently stream encoding has 'light' concurrency, meaning up to 2 goroutines can be working on part
+of a stream. This is independent of the `WithEncoderConcurrency(n)`, but that is likely to change
+in the future. So if you want to limit concurrency for future updates, specify the concurrency
+you would like.
+
+You can specify your desired compression level using `WithEncoderLevel()` option. Currently only pre-defined
+compression settings can be specified.
+
+#### Future Compatibility Guarantees
+
+This will be an evolving project. When using this package it is important to note that both the compression efficiency and speed may change.
+
+The goal will be to keep the default efficiency at the default zstd (level 3).
+However the encoding should never be assumed to remain the same,
+and you should not use hashes of compressed output for similarity checks.
+
+The Encoder can be assumed to produce the same output from the exact same code version.
+However, the may be modes in the future that break this,
+although they will not be enabled without an explicit option.
+
+This encoder is not designed to (and will probably never) output the exact same bitstream as the reference encoder.
+
+Also note, that the cgo decompressor currently does not [report all errors on invalid input](https://github.com/DataDog/zstd/issues/59),
+[omits error checks](https://github.com/DataDog/zstd/issues/61), [ignores checksums](https://github.com/DataDog/zstd/issues/43)
+and seems to ignore concatenated streams, even though [it is part of the spec](https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#frames).
+
+#### Blocks
+
+For compressing small blocks, the returned encoder has a function called `EncodeAll(src, dst []byte) []byte`.
+
+`EncodeAll` will encode all input in src and append it to dst.
+This function can be called concurrently, but each call will only run on a single goroutine.
+
+Encoded blocks can be concatenated and the result will be the combined input stream.
+Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or `DecodeAll`.
+
+Especially when encoding blocks you should take special care to reuse the encoder.
+This will effectively make it run without allocations after a warmup period.
+To make it run completely without allocations, supply a destination buffer with space for all content.
+
+```Go
+import "github.com/klauspost/compress/zstd"
+
+// Create a writer that caches compressors.
+// For this operation type we supply a nil Reader.
+var encoder, _ = zstd.NewWriter(nil)
+
+// Compress a buffer.
+// If you have a destination buffer, the allocation in the call can also be eliminated.
+func Compress(src []byte) []byte {
+ return encoder.EncodeAll(src, make([]byte, 0, len(src)))
+}
+```
+
+You can control the maximum number of concurrent encodes using the `WithEncoderConcurrency(n)`
+option when creating the writer.
+
+Using the Encoder for both a stream and individual blocks concurrently is safe.
+
+### Performance
+
+I have collected some speed examples to compare speed and compression against other compressors.
+
+* `file` is the input file.
+* `out` is the compressor used. `zskp` is this package. `gzstd` is gzip standard library. `zstd` is the Datadog cgo library.
+* `level` is the compression level used. For `zskp` level 1 is "fastest", level 2 is "default".
+* `insize`/`outsize` is the input/output size.
+* `millis` is the number of milliseconds used for compression.
+* `mb/s` is megabytes (2^20 bytes) per second.
+
+```
+The test data for the Large Text Compression Benchmark is the first
+10^9 bytes of the English Wikipedia dump on Mar. 3, 2006.
+http://mattmahoney.net/dc/textdata.html
+
+file out level insize outsize millis mb/s
+enwik9 zskp 1 1000000000 343833033 5840 163.30
+enwik9 zskp 2 1000000000 317822183 8449 112.87
+enwik9 gzstd 1 1000000000 382578136 13627 69.98
+enwik9 gzstd 3 1000000000 349139651 22344 42.68
+enwik9 zstd 1 1000000000 357416379 4838 197.12
+enwik9 zstd 3 1000000000 313734522 7556 126.21
+
+GOB stream of binary data. Highly compressible.
+https://files.klauspost.com/compress/gob-stream.7z
+
+file out level insize outsize millis mb/s
+gob-stream zskp 1 1911399616 234981983 5100 357.42
+gob-stream zskp 2 1911399616 208674003 6698 272.15
+gob-stream gzstd 1 1911399616 357382641 14727 123.78
+gob-stream gzstd 3 1911399616 327835097 17005 107.19
+gob-stream zstd 1 1911399616 250787165 4075 447.22
+gob-stream zstd 3 1911399616 208191888 5511 330.77
+
+Highly compressible JSON file. Similar to logs in a lot of ways.
+https://files.klauspost.com/compress/adresser.001.gz
+
+file out level insize outsize millis mb/s
+adresser.001 zskp 1 1073741824 18510122 1477 692.83
+adresser.001 zskp 2 1073741824 19831697 1705 600.59
+adresser.001 gzstd 1 1073741824 47755503 3079 332.47
+adresser.001 gzstd 3 1073741824 40052381 3051 335.63
+adresser.001 zstd 1 1073741824 16135896 994 1030.18
+adresser.001 zstd 3 1073741824 17794465 905 1131.49
+
+VM Image, Linux mint with a few installed applications:
+https://files.klauspost.com/compress/rawstudio-mint14.7z
+
+file out level insize outsize millis mb/s
+rawstudio-mint14.tar zskp 1 8558382592 3648168838 33398 244.38
+rawstudio-mint14.tar zskp 2 8558382592 3376721436 50962 160.16
+rawstudio-mint14.tar gzstd 1 8558382592 3926257486 84712 96.35
+rawstudio-mint14.tar gzstd 3 8558382592 3740711978 176344 46.28
+rawstudio-mint14.tar zstd 1 8558382592 3607859742 27903 292.51
+rawstudio-mint14.tar zstd 3 8558382592 3341710879 46700 174.77
+
+
+The test data is designed to test archivers in realistic backup scenarios.
+http://mattmahoney.net/dc/10gb.html
+
+file out level insize outsize millis mb/s
+10gb.tar zskp 1 10065157632 4883149814 45715 209.97
+10gb.tar zskp 2 10065157632 4638110010 60970 157.44
+10gb.tar gzstd 1 10065157632 5198296126 97769 98.18
+10gb.tar gzstd 3 10065157632 4932665487 313427 30.63
+10gb.tar zstd 1 10065157632 4940796535 40391 237.65
+10gb.tar zstd 3 10065157632 4638618579 52911 181.42
+
+Silesia Corpus:
+http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
+
+file out level insize outsize millis mb/s
+silesia.tar zskp 1 211947520 73025800 1108 182.26
+silesia.tar zskp 2 211947520 67674684 1599 126.41
+silesia.tar gzstd 1 211947520 80007735 2515 80.37
+silesia.tar gzstd 3 211947520 73133380 4259 47.45
+silesia.tar zstd 1 211947520 73513991 933 216.64
+silesia.tar zstd 3 211947520 66793301 1377 146.79
+```
+
+### Converters
+
+As part of the development process a *Snappy* -> *Zstandard* converter was also built.
+
+This can convert a *framed* [Snappy Stream](https://godoc.org/github.com/golang/snappy#Writer) to a zstd stream.
+Note that a single block is not framed.
+
+Conversion is done by converting the stream directly from Snappy without intermediate full decoding.
+Therefore the compression ratio is much less than what can be done by a full decompression
+and compression, and a faulty Snappy stream may lead to a faulty Zstandard stream without
+any errors being generated.
+No CRC value is being generated and not all CRC values of the Snappy stream are checked.
+However, it provides really fast re-compression of Snappy streams.
+
+
+```
+BenchmarkSnappy_ConvertSilesia-8 1 1156001600 ns/op 183.35 MB/s
+Snappy len 103008711 -> zstd len 82687318
+
+BenchmarkSnappy_Enwik9-8 1 6472998400 ns/op 154.49 MB/s
+Snappy len 508028601 -> zstd len 390921079
+```
+
+
+```Go
+ s := zstd.SnappyConverter{}
+ n, err = s.Convert(input, output)
+ if err != nil {
+ fmt.Println("Re-compressed stream to", n, "bytes")
+ }
+```
+
+The converter `s` can be reused to avoid allocations, even after errors.
+
+
+## Decompressor
+
+Staus: STABLE - there may still be subtle bugs, but a wide variety of content has been tested.
+
+This library is being continuously [fuzz-tested](https://github.com/klauspost/compress-fuzz),
+kindly supplied by [fuzzit.dev](https://fuzzit.dev/).
+The main purpose of the fuzz testing is to ensure that it is not possible to crash the decoder,
+or run it past its limits with ANY input provided.
+
+### Usage
+
+The package has been designed for two main usages, big streams of data and smaller in-memory buffers.
+There are two main usages of the package for these. Both of them are accessed by creating a `Decoder`.
+
+For streaming use a simple setup could look like this:
+
+```Go
+import "github.com/klauspost/compress/zstd"
+
+func Decompress(in io.Reader, out io.Writer) error {
+ d, err := zstd.NewReader(input)
+ if err != nil {
+ return err
+ }
+ defer d.Close()
+
+ // Copy content...
+ _, err := io.Copy(out, d)
+ return err
+}
+```
+
+It is important to use the "Close" function when you no longer need the Reader to stop running goroutines.
+See "Allocation-less operation" below.
+
+For decoding buffers, it could look something like this:
+
+```Go
+import "github.com/klauspost/compress/zstd"
+
+// Create a reader that caches decompressors.
+// For this operation type we supply a nil Reader.
+var decoder, _ = zstd.NewReader(nil)
+
+// Decompress a buffer. We don't supply a destination buffer,
+// so it will be allocated by the decoder.
+func Decompress(src []byte) ([]byte, error) {
+ return decoder.DecodeAll(src, nil)
+}
+```
+
+Both of these cases should provide the functionality needed.
+The decoder can be used for *concurrent* decompression of multiple buffers.
+It will only allow a certain number of concurrent operations to run.
+To tweak that yourself use the `WithDecoderConcurrency(n)` option when creating the decoder.
+
+### Allocation-less operation
+
+The decoder has been designed to operate without allocations after a warmup.
+
+This means that you should *store* the decoder for best performance.
+To re-use a stream decoder, use the `Reset(r io.Reader) error` to switch to another stream.
+A decoder can safely be re-used even if the previous stream failed.
+
+To release the resources, you must call the `Close()` function on a decoder.
+After this it can *no longer be reused*, but all running goroutines will be stopped.
+So you *must* use this if you will no longer need the Reader.
+
+For decompressing smaller buffers a single decoder can be used.
+When decoding buffers, you can supply a destination slice with length 0 and your expected capacity.
+In this case no unneeded allocations should be made.
+
+### Concurrency
+
+The buffer decoder does everything on the same goroutine and does nothing concurrently.
+It can however decode several buffers concurrently. Use `WithDecoderConcurrency(n)` to limit that.
+
+The stream decoder operates on
+
+* One goroutine reads input and splits the input to several block decoders.
+* A number of decoders will decode blocks.
+* A goroutine coordinates these blocks and sends history from one to the next.
+
+So effectively this also means the decoder will "read ahead" and prepare data to always be available for output.
+
+Since "blocks" are quite dependent on the output of the previous block stream decoding will only have limited concurrency.
+
+In practice this means that concurrency is often limited to utilizing about 2 cores effectively.
+
+
+### Benchmarks
+
+These are some examples of performance compared to [datadog cgo library](https://github.com/DataDog/zstd).
+
+The first two are streaming decodes and the last are smaller inputs.
+
+```
+BenchmarkDecoderSilesia-8 20 642550210 ns/op 329.85 MB/s 3101 B/op 8 allocs/op
+BenchmarkDecoderSilesiaCgo-8 100 384930000 ns/op 550.61 MB/s 451878 B/op 9713 allocs/op
+
+BenchmarkDecoderEnwik9-2 10 3146000080 ns/op 317.86 MB/s 2649 B/op 9 allocs/op
+BenchmarkDecoderEnwik9Cgo-2 20 1905900000 ns/op 524.69 MB/s 1125120 B/op 45785 allocs/op
+
+BenchmarkDecoder_DecodeAll/z000000.zst-8 200 7049994 ns/op 138.26 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000001.zst-8 100000 19560 ns/op 97.49 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000002.zst-8 5000 297599 ns/op 236.99 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000003.zst-8 2000 725502 ns/op 141.17 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000004.zst-8 200000 9314 ns/op 54.54 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000005.zst-8 10000 137500 ns/op 104.72 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000006.zst-8 500 2316009 ns/op 206.06 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000007.zst-8 20000 64499 ns/op 344.90 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000008.zst-8 50000 24900 ns/op 219.56 MB/s 40 B/op 2 allocs/op
+BenchmarkDecoder_DecodeAll/z000009.zst-8 1000 2348999 ns/op 154.01 MB/s 40 B/op 2 allocs/op
+
+BenchmarkDecoder_DecodeAllCgo/z000000.zst-8 500 4268005 ns/op 228.38 MB/s 1228849 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000001.zst-8 100000 15250 ns/op 125.05 MB/s 2096 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000002.zst-8 10000 147399 ns/op 478.49 MB/s 73776 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000003.zst-8 5000 320798 ns/op 319.27 MB/s 139312 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000004.zst-8 200000 10004 ns/op 50.77 MB/s 560 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000005.zst-8 20000 73599 ns/op 195.64 MB/s 19120 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000006.zst-8 1000 1119003 ns/op 426.48 MB/s 557104 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000007.zst-8 20000 103450 ns/op 215.04 MB/s 71296 B/op 9 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000008.zst-8 100000 20130 ns/op 271.58 MB/s 6192 B/op 3 allocs/op
+BenchmarkDecoder_DecodeAllCgo/z000009.zst-8 2000 1123500 ns/op 322.00 MB/s 368688 B/op 3 allocs/op
+```
+
+This reflects the performance around May 2019, but this may be out of date.
+
+# Contributions
+
+Contributions are always welcome.
+For new features/fixes, remember to add tests and for performance enhancements include benchmarks.
+
+For sending files for reproducing errors use a service like [goobox](https://goobox.io/#/upload) or similar to share your files.
+
+For general feedback and experience reports, feel free to open an issue or write me on [Twitter](https://twitter.com/sh0dan).
+
+This package includes the excellent [`github.com/cespare/xxhash`](https://github.com/cespare/xxhash) package Copyright (c) 2016 Caleb Spare.