Skip to content

Add Iceberg table provider #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Add Iceberg table provider #360

wants to merge 1 commit into from

Conversation

linhr
Copy link
Contributor

@linhr linhr commented Jan 27, 2025

Closes #172.

@linhr linhr mentioned this pull request Jan 27, 2025
Copy link

Spark Test Report

Commit Information

Commit Revision Branch
After 4f11fed refs/pull/360/merge
Before e600fd2 refs/heads/main

Test Summary

Suite Commit Failed Passed Skipped Warnings Time (s)
doctest-column After 1 32 3 6.09
Before 1 32 3 6.01
doctest-dataframe After 32 74 1 4 8.01
Before 32 74 1 4 7.91
doctest-functions After 159 243 7 8 12.09
Before 159 243 7 8 12.16
test-connect After 239 797 135 282 130.66
Before 239 797 135 282 130.20

Test Details

Error Counts
          431 Total
          235 Total Unique
-------- ---- ----------------------------------------------------------------------------------------------------------
           26 DocTestFailure
           15 UnsupportedOperationException: streaming query manager command
           13 AssertionError: AnalysisException not raised
           13 UnsupportedOperationException: lambda function
           10 PySparkAssertionError: [DIFFERENT_PANDAS_DATAFRAME] DataFrames are not almost equal:
           10 UnsupportedOperationException: unsupported data source format: Some("text")
           10 handle add artifacts
            8 UnsupportedOperationException: hint
            7 AssertionError: False is not true
            6 UnsupportedOperationException: function: window
            6 UnsupportedOperationException: write stream operation start
            5 AnalysisException: Cannot cast to Decimal128(14, 7). Overflowing on NaN
            5 AnalysisException: Execution error: 'Utf8("INTERVAL '0 00:00:00.000123' DAY TO SECOND") = CAST(#1 AS...
            5 UnsupportedOperationException: function: monotonically_increasing_id
            5 UnsupportedOperationException: sample
            4 AssertionError: "TABLE_OR_VIEW_NOT_FOUND" does not match "Error during planning: No table named 'v'"
            4 PySparkNotImplementedError: [NOT_IMPLEMENTED] rdd() is not implemented.
            4 UnsupportedOperationException: sample by
            4 UnsupportedOperationException: unknown aggregate function: hll_sketch_agg
            4 UnsupportedOperationException: unpivot
            3 AnalysisException: Error during planning: Error during planning: spark_array does not support zero a...
            3 ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of type day_time_interval ...
            3 IllegalArgumentException: invalid argument: empty data source paths
            3 UnsupportedOperationException: function: input_file_name
            3 UnsupportedOperationException: function: pmod
            3 UnsupportedOperationException: function: ~
            3 UnsupportedOperationException: handle analyze input files
            3 ValueError: Converting to Python dictionary is not supported when duplicate field names are present
            2 AnalysisException: Error during planning: two values expected: [Column(Column { relation: None, name...
            2 AnalysisException: Invalid or Unsupported Configuration: Could not find config namespace "spark"
            2 AssertionError
            2 AssertionError: AnalysisException not raised by <lambda>
            2 AssertionError: Lists differ: [Row([22 chars](key=1, value='1'), Row(key=10, value='10'), R[2402 cha...
            2 IllegalArgumentException: expected value at line 1 column 1
            2 KeyError: 22
            2 PythonException:  ZeroDivisionError: division by zero
            2 SparkRuntimeException: Internal error: start_from index out of bounds.
            2 UnsupportedOperationException: Aggregate can not be used as a sliding accumulator because `retract_b...
            2 UnsupportedOperationException: PlanNode::IsCached
            2 UnsupportedOperationException: approx quantile
            2 UnsupportedOperationException: collect metrics
            2 UnsupportedOperationException: freq items
            2 UnsupportedOperationException: function: bitmap_bit_position
            2 UnsupportedOperationException: function: crc32
            2 UnsupportedOperationException: function: encode
            2 UnsupportedOperationException: function: format_number
            2 UnsupportedOperationException: function: from_csv
            2 UnsupportedOperationException: function: from_json
            2 UnsupportedOperationException: function: inline
            2 UnsupportedOperationException: function: map_from_arrays
            2 UnsupportedOperationException: function: sec
            2 UnsupportedOperationException: function: shiftrightunsigned
            2 UnsupportedOperationException: handle analyze is local
            2 UnsupportedOperationException: handle analyze same semantics
            2 UnsupportedOperationException: pivot
            2 UnsupportedOperationException: position with 3 arguments is not supported yet
            2 UnsupportedOperationException: rebalance partitioning by expression
            2 UnsupportedOperationException: tail
            2 UnsupportedOperationException: unknown aggregate function: collect_set
            2 UnsupportedOperationException: unresolved regex
            2 UnsupportedOperationException: unsupported data source format: Some("orc")
            2 UnsupportedOperationException: user defined data type should only exist in a field
            2 handle artifact statuses
            2 received metadata size exceeds hard limit (19831 vs. 16384);  :status:42B content-type:60B grpc-stat...
            1 AnalysisException: Cannot cast string 'abc' to value of Float64 type
            1 AnalysisException: Cannot cast value 'abc' to value of Boolean type
            1 AnalysisException: Error during planning: Error during planning: Failed to coerce arguments to satis...
            1 AnalysisException: Error during planning: Execution error: User-defined coercion failed with Interna...
            1 AnalysisException: Error during planning: Failed to parse placeholder id: cannot parse integer from ...
            1 AnalysisException: Error during planning: Inconsistent data type across values list at row 1 column ...
(+1)        1 AnalysisException: Error during planning: UNION queries have different number of columns: left has 2...
            1 AnalysisException: Error during planning: three values expected: [Column(Column { relation: None, na...
            1 AnalysisException: Error during planning: three values expected: [Literal(Int32(1)), Literal(Int32(3...
            1 AnalysisException: Error during planning: two values expected: [Column(Column { relation: None, name...
            1 AnalysisException: Execution error: 'Utf8("1970-01-01 00:00:00") = CAST(#1 AS Utf8)' is not true!
            1 AnalysisException: Execution error: 'Utf8("2012-02-02 02:02:02") = CAST(#1 AS Utf8)' is not true!
            1 AnalysisException: Execution error: Error parsing timestamp from '2023-01-01' using format '%d-%m-%Y...
            1 AnalysisException: Execution error: Unable to find factory for TEXT
            1 AnalysisException: Execution error: map requires all value types to be the same
            1 AnalysisException: Invalid or Unsupported Configuration: could not find config namespace for key "ig...
            1 AnalysisException: Invalid or Unsupported Configuration: could not find config namespace for key "li...
            1 AnalysisException: cannot resolve attribute: ObjectName([Identifier("name")])
            1 AssertionError: "2000000" does not match "Internal error: raise_error expects a single UTF-8 string ...
(+1)        1 AssertionError: "Database 'memory:58cf41f2-fada-4c27-9c8b-a9f68b6f3a1a' dropped." does not match "in...
(+1)        1 AssertionError: "Database 'memory:9bfdafba-4204-4bfe-98ff-21aa13458f8c' dropped." does not match "in...
            1 AssertionError: "TABLE_OR_VIEW_NOT_FOUND" does not match "Execution error: The table test_table alre...
            1 AssertionError: "attribute.*missing" does not match "cannot resolve attribute: ObjectName([Identifie...
            1 AssertionError: "foobar" does not match "Internal error: raise_error expects a single UTF-8 string a...
            1 AssertionError: "timestamp values are not equal (timestamp='1968-12-31 17:01:01': data[0][1]='1969-0...
            1 AssertionError: '+---[17 chars]-----+\n|                        x|\n+--------[132 chars]-+\n' != '+-...
            1 AssertionError: 2 != 3
            1 AssertionError: ArrayIndexOutOfBoundsException not raised
            1 AssertionError: Exception not raised
            1 AssertionError: Lists differ: [Row([14 chars] _c1=25, _c2='I am Hyukjin\n\nI love Spark!'),[86 chars...
            1 AssertionError: Lists differ: [Row([23 chars](2019, 1, 1, 8, 0), aware=datetime.datetime(2019, 1, 1,...
            1 AssertionError: Lists differ: [Row(key='0'), Row(key='1'), Row(key='10'), Ro[1439 chars]99')] != [Ro...
            1 AssertionError: Lists differ: [Row(ln(id)=0.0, ln(id)=0.0, struct(id, name)=Row(id=[1232 chars]0'))]...
            1 AssertionError: Lists differ: [Row(name='Andy', age=30), Row(name='Justin', [34 chars]one)] != [Row(...
            1 AssertionError: Row(point=ExamplePoint([,1), pypoint=ExamplePoint([,3)) != Row(point='(1.0, 2.0)', p...
            1 AssertionError: StorageLevel(False, True, True, False, 1) != StorageLevel(False, False, False, False...
            1 AssertionError: Struc[31 chars]stampNTZType(), True), StructField('val', Inte[13 chars]ue)]) != Stru...
            1 AssertionError: Struc[32 chars]e(), False), StructField('b', DoubleType(), Fa[158 chars]ue)]) != Str...
            1 AssertionError: Struc[40 chars]ue), StructField('val', ArrayType(DoubleType(), False), True)]) != St...
            1 AssertionError: Struc[64 chars]Type(), True), StructField('i', StringType(), True)]), False)]) != St...
            1 AssertionError: Struc[69 chars]e(), True), StructField('name', StringType(), True)]), True)]) != Str...
            1 AssertionError: YearMonthIntervalType(0, 1) != YearMonthIntervalType(0, 0)
            1 AssertionError: [1.0, 2.0] != ExamplePoint(1.0,2.0)
            1 AssertionError: datetime.datetime(1970, 1, 1, 0, 0) != datetime.datetime(1970, 1, 1, 8, 0)
            1 AssertionError: {} != {'max_age': 5}
            1 AttributeError: 'DataFrame' object has no attribute '_ipython_key_completions_'
            1 AttributeError: 'DataFrame' object has no attribute '_joinAsOf'
(+1)        1 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmphao7put0'
(+1)        1 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmppzbu5gs2'
            1 IllegalArgumentException: 83140 is too large to store in a Decimal128 of precision 4. Max is 9999
            1 IllegalArgumentException: invalid argument: invalid digit found in string
            1 IllegalArgumentException: invalid argument: sql parser error: Expected: (, found: AS at Line: 1, Col...
            1 IllegalArgumentException: invalid argument: sql parser error: Expected: (, found: EOF
            1 KeyError: 'max'
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] foreach() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] foreachPartition() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] localCheckpoint() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] sparkContext() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] toJSON() is not implemented.
            1 PythonException:  AttributeError: 'NoneType' object has no attribute 'partitionId'
            1 PythonException:  AttributeError: 'list' object has no attribute 'x'
            1 PythonException:  AttributeError: 'list' object has no attribute 'y'
            1 QueryExecutionException: Json error: Not valid JSON: EOF while parsing a list at line 1 column 1
            1 QueryExecutionException: Json error: Not valid JSON: expected value at line 1 column 2
            1 SparkRuntimeException: External error: Arrow error: Invalid argument error: column types must match ...
            1 SparkRuntimeException: Optimizer rule 'optimize_projections' failed
            1 SparkRuntimeException: type_coercion
            1 UnsupportedOperationException: Aggregate can not be used as a sliding accumulator because `retract_b...
            1 UnsupportedOperationException: Aggregate can not be used as a sliding accumulator because `retract_b...
            1 UnsupportedOperationException: COUNT DISTINCT with multiple arguments
            1 UnsupportedOperationException: Insert into not implemented for this table
            1 UnsupportedOperationException: SQL show functions
            1 UnsupportedOperationException: bucketing
            1 UnsupportedOperationException: deduplicate within watermark
            1 UnsupportedOperationException: function exists
            1 UnsupportedOperationException: function: array_insert
            1 UnsupportedOperationException: function: array_sort
            1 UnsupportedOperationException: function: arrays_zip
            1 UnsupportedOperationException: function: bin
            1 UnsupportedOperationException: function: bit_count
            1 UnsupportedOperationException: function: bit_get
            1 UnsupportedOperationException: function: bitmap_bucket_number
            1 UnsupportedOperationException: function: bitmap_count
            1 UnsupportedOperationException: function: bround
            1 UnsupportedOperationException: function: conv
            1 UnsupportedOperationException: function: convert_timezone
            1 UnsupportedOperationException: function: csc
            1 UnsupportedOperationException: function: decode
            1 UnsupportedOperationException: function: e
            1 UnsupportedOperationException: function: elt
            1 UnsupportedOperationException: function: format_string
            1 UnsupportedOperationException: function: from_utc_timestamp
            1 UnsupportedOperationException: function: getbit
            1 UnsupportedOperationException: function: inline_outer
            1 UnsupportedOperationException: function: java_method
            1 UnsupportedOperationException: function: json_object_keys
            1 UnsupportedOperationException: function: json_tuple
            1 UnsupportedOperationException: function: last_day
            1 UnsupportedOperationException: function: make_dt_interval
            1 UnsupportedOperationException: function: make_interval
            1 UnsupportedOperationException: function: make_timestamp
            1 UnsupportedOperationException: function: make_timestamp_ltz
            1 UnsupportedOperationException: function: make_timestamp_ntz
            1 UnsupportedOperationException: function: make_ym_interval
            1 UnsupportedOperationException: function: map_concat
            1 UnsupportedOperationException: function: map_from_entries
            1 UnsupportedOperationException: function: mask
            1 UnsupportedOperationException: function: months_between
            1 UnsupportedOperationException: function: next_day
            1 UnsupportedOperationException: function: parse_url
            1 UnsupportedOperationException: function: printf
            1 UnsupportedOperationException: function: reflect
            1 UnsupportedOperationException: function: regexp_count
            1 UnsupportedOperationException: function: regexp_extract
            1 UnsupportedOperationException: function: regexp_extract_all
            1 UnsupportedOperationException: function: regexp_instr
            1 UnsupportedOperationException: function: regexp_substr
            1 UnsupportedOperationException: function: schema_of_csv
            1 UnsupportedOperationException: function: schema_of_json
            1 UnsupportedOperationException: function: sentences
            1 UnsupportedOperationException: function: session_window
            1 UnsupportedOperationException: function: sha
            1 UnsupportedOperationException: function: sha1
            1 UnsupportedOperationException: function: soundex
            1 UnsupportedOperationException: function: spark_partition_id
            1 UnsupportedOperationException: function: split
            1 UnsupportedOperationException: function: stack
            1 UnsupportedOperationException: function: str_to_map
            1 UnsupportedOperationException: function: to_char
            1 UnsupportedOperationException: function: to_csv
            1 UnsupportedOperationException: function: to_json
            1 UnsupportedOperationException: function: to_number
            1 UnsupportedOperationException: function: to_unix_timestamp
            1 UnsupportedOperationException: function: to_utc_timestamp
            1 UnsupportedOperationException: function: to_varchar
            1 UnsupportedOperationException: function: try_add
            1 UnsupportedOperationException: function: try_divide
            1 UnsupportedOperationException: function: try_element_at
            1 UnsupportedOperationException: function: try_multiply
            1 UnsupportedOperationException: function: try_subtract
            1 UnsupportedOperationException: function: try_to_binary
            1 UnsupportedOperationException: function: try_to_number
            1 UnsupportedOperationException: function: try_to_timestamp
            1 UnsupportedOperationException: function: typeof
            1 UnsupportedOperationException: function: url_decode
            1 UnsupportedOperationException: function: url_encode
            1 UnsupportedOperationException: function: width_bucket
            1 UnsupportedOperationException: function: xpath
            1 UnsupportedOperationException: function: xpath_boolean
            1 UnsupportedOperationException: function: xpath_double
            1 UnsupportedOperationException: function: xpath_float
            1 UnsupportedOperationException: function: xpath_int
            1 UnsupportedOperationException: function: xpath_long
            1 UnsupportedOperationException: function: xpath_number
            1 UnsupportedOperationException: function: xpath_short
            1 UnsupportedOperationException: function: xpath_string
            1 UnsupportedOperationException: handle analyze semantic hash
            1 UnsupportedOperationException: list functions
            1 UnsupportedOperationException: unknown aggregate function: bitmap_or_agg
            1 UnsupportedOperationException: unknown aggregate function: count_if
            1 UnsupportedOperationException: unknown aggregate function: count_min_sketch
            1 UnsupportedOperationException: unknown aggregate function: grouping_id
            1 UnsupportedOperationException: unknown aggregate function: histogram_numeric
            1 UnsupportedOperationException: unknown aggregate function: percentile
            1 UnsupportedOperationException: unknown aggregate function: try_avg
            1 UnsupportedOperationException: unknown aggregate function: try_sum
            1 UnsupportedOperationException: unknown function: distributed_sequence_id
            1 UnsupportedOperationException: unknown function: product
            1 ValueError: Code in Status proto (StatusCode.INTERNAL) doesn't match status code (StatusCode.RESOURC...
            1 ValueError: The column label 'id' is not unique.
            1 ValueError: The column label 'struct' is not unique.
(-1)        0 AnalysisException: Error during planning: UNION queries have different number of columns: left has 3...
(-1)        0 AssertionError: "Database 'memory:d0a46646-0aea-4a9f-9b03-4c9ad2314e70' dropped." does not match "in...
(-1)        0 AssertionError: "Database 'memory:e68e8b16-3d20-4adc-af82-eac7c60a9f95' dropped." does not match "in...
(-1)        0 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmper9_ivuk'
(-1)        0 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmprl1lseq9'
Passed Tests Diff

(empty)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iceberg Integration
1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy