Tags · parquet-go/parquet-go

v0.24.0

simplify big endian compat (#182)

Follow up to #164, this PR removes the `getOffset` function and moves
all offset computation to build time constants.

I also moved most use of `cpu.IsBigEndian` to using separate files with
build tags for consistency and to keep the main code logic as clean as
possible. In some cases, this might also help with inlining since
functions contain less code.

Let me know if you have any concerns about the change.

Nov 7, 2024
2d1aca6
zip
tar.gz
Notes

v0.23.0

rm min/max funcs in favor of built-ins (#148)

Jun 18, 2024
4bc33ce
zip
tar.gz
Notes

v0.22.0

Sorting merge data corruption (#140)

This adds a unit test that recreates the failure seen in #139 

The merged row reader will overwrite values in its internal buffers when
filling up the buffer during the `ReadRows` call.
This causes data corruption in the final `rows` that are returned to the
caller.

This adds a potential fix for this issue by called the `rowAllocator`
capture method we can copy in the byte arrays. However this creates some
significant hits to the benchmarks seen below. Is there a better
solution to prevent this buffer corruption?

May 30, 2024
d0d9efa
zip
tar.gz
Notes

v0.21.0

The minvalue of the column statistic incorrectly ignores empty string (…

…#131)

fix #130

Apr 29, 2024
e0969aa
zip
tar.gz
Notes

v0.20.1

Bug: Respect current page offset in reslice (#109)

Currently if you reslice a boolean page it will not respect it's
internal bit offset into the first byte. The impact is that if you slice
a boolean page multiple times on values that are not byte aligned the
page will return misaligned values when reading.

Feb 2, 2024
cdb927e
zip
tar.gz
Notes

v0.20.0

Return errors for missing page indices (#94)

As a result of #84, we can
now return a `nil` column or offset index when the `SkipPageIndex`
option is set. Furthermore, if the page index is not skipped when
opening a file, the return value for missing indices will be a
zero-value struct instead of nil.

This commit reconciles this inconsistency by returning
`ErrMissingColumnIndex` and `ErrMissingOffsetIndex` respectively when
either index is missing.

Follow up to
#85 (comment).

Dec 5, 2023
fba3d61
zip
tar.gz
Notes

v0.19.0

Error when uncompressed page size exceeds the max int32 value (#81)

This prevents writing corrupted files due to integer overflows in cases
where the uncompressed page size get's too large.

The PR helps to mitigate #79

Oct 23, 2023
1748e4b
zip
tar.gz
Notes

v0.18.0

Migrate WriterTest to use parquet-cli (#52)

Replace the deprecated parquet-tools command used in the WriterTest with
parquet-cli. Validate the output of the "meta" and "pages" subcommands,
which gives most of the same coverage as parquet-tools.

Ideally this could have included the "cat" command to validate each row.
However, the command has some problems with parquet schemas that don't
map cleanly onto Avro definitions, including many uses of repeated
fields, so this doesn't reliably work.

The PARQUET_GO_TEST_CLI environment variable provides an alternative way
to configure the parquet-cli command for use in the test, and supports
both an alternaive executable name and prefix CLI arguments. This is
necessary to support the method for running the CLI described in the
parquet-cli README:

`java -cp 'target/parquet-cli-1.12.3.jar:target/dependency/*'
org.apache.parquet.cli.Main`

See:
https://github.com/apache/parquet-mr/blob/master/parquet-cli/README.md

Aug 24, 2023
5895359
zip
tar.gz
Notes

v0.17.0

Merge pull request #35 from parquet-go/remove-preview-from-changelog

remove preview from changelog

Jul 22, 2023
623ff76
zip
tar.gz
Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.24.0

v0.23.0

v0.22.0

v0.21.0

v0.20.1

v0.20.0

v0.19.0

v0.18.0

v0.17.0

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Tags: parquet-go/parquet-go