Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to close #1453 by exposing several DataFusion scalar functions that exist upstream but were not previously available in the Python API, along with adding Python unit tests for the new bindings.
Changes:
- Added Python wrappers and exports for
arrow_metadata,get_field,union_extract,union_tag,version, plus a Python-levelrowalias forstruct. - Added unit tests covering the newly exposed functions (notably union functions and
version). - Updated
codespellskip paths formatting inpyproject.toml.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/datafusion/functions.py |
Adds new Python-level function wrappers/exports (arrow_metadata, get_field, union_*, version, row). |
crates/core/src/functions.rs |
Exposes new functions from the Rust extension module to Python via pyo3 (arrow_metadata, get_field, union_extract, union_tag, version). |
python/tests/test_functions.py |
Adds tests for the newly exposed functions. |
pyproject.toml |
Normalizes codespell skip path entries (removes ./ prefixes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
192593f to
2771621
Compare
…row_metadata, version, row Expose upstream DataFusion scalar functions that were not yet available in the Python API. Closes apache#1453. - get_field: extracts a field from a struct or map by name - union_extract: extracts a value from a union type by field name - union_tag: returns the active field name of a union type - arrow_metadata: returns Arrow field metadata (all or by key) - version: returns the DataFusion version string - row: alias for the struct constructor Note: arrow_try_cast was listed in the issue but does not exist in DataFusion 53, so it is not included. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests for get_field, arrow_metadata, version, row, union_tag, and union_extract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow arrow_cast, get_field, and union_extract to accept plain str arguments instead of requiring Expr wrappers. Also improve arrow_metadata test coverage and fix parameter shadowing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4384c1f to
df1ead1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def arrow_cast(expr: Expr, data_type: Expr | str) -> Expr: | ||
| """Casts an expression to a specified data type. | ||
|
|
||
| Examples: | ||
| >>> ctx = dfn.SessionContext() |
There was a problem hiding this comment.
This PR declares Closes #1453, but the issue also lists arrow_try_cast as a missing scalar function. I verified there is no arrow_try_cast wrapper anywhere in the repo (no Python wrapper in python/datafusion/functions.py and no Rust binding in crates/core/src/functions.rs). Either add arrow_try_cast (and a corresponding unit test) or adjust the PR description/linked issue closure so we’re not closing the issue prematurely.
Which issue does this PR close?
Closes #1453
Rationale for this change
These functions exist upstream but were not exposed to Python.
What changes are included in this PR?
Expose functions to Python
Add unit testss
Are there any user-facing changes?
New addition only.