Conversation
There was a problem hiding this comment.
Pull request overview
This PR exposes several upstream DataFusion aggregate functions that were previously missing from the Python API, and adds unit tests to validate the new bindings.
Changes:
- Expose
groupingandpercentile_contfrom Rust bindings to the Python API. - Add Python-level wrapper for
var_populationas an alias ofvar_pop. - Add unit tests covering
percentile_cont,grouping, andvar_population.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
crates/core/src/functions.rs |
Enables the grouping aggregate binding and adds a new percentile_cont pyfunction export. |
python/datafusion/functions.py |
Adds public Python wrappers/exports for grouping, percentile_cont, and var_population. |
python/tests/test_functions.py |
Adds unit tests for the newly exposed aggregate functions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ation Expose upstream DataFusion aggregate functions that were not yet available in the Python API. Closes apache#1454. - grouping: returns grouping set membership indicator (rewritten by the ResolveGroupingFunction analyzer rule before physical planning) - percentile_cont: computes exact percentile using continuous interpolation (unlike approx_percentile_cont which uses t-digest) - var_population: alias for var_pop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0c8dc78 to
d16cff1
Compare
ntjohnson1
left a comment
There was a problem hiding this comment.
Grouping examples would make it a lot clearer. Without looking around or thinking about the description more deeply the usage isn't clear to me.
| Args: | ||
| expression: The column to check grouping status for | ||
| distinct: If True, compute on distinct values only | ||
| filter: If provided, only compute against rows for which the filter is True |
python/tests/test_functions.py
Outdated
| assert result.column(0)[0].as_py() == 3.0 | ||
|
|
||
|
|
||
| def test_percentile_cont_with_filter(): |
There was a problem hiding this comment.
NIT: Would be nicer to just be a parametrized test since the only diff is the filter and the result
| assert result.column(0)[0].as_py() == 3.5 | ||
|
|
||
|
|
||
| def test_grouping(): |
There was a problem hiding this comment.
If you add more examples then probably don't need to also add tests here but this seems to only test one of the grouping cases
Add docstring example to grouping(), parametrize percentile_cont tests, and add multi-column grouping test case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Which issue does this PR close?
Closes #861
Closes #925
Closes #1454
Rationale for this change
These functions exist upstream but were not exposed to the Python API
What changes are included in this PR?
Expose functions to Python API
Add unit tests
Are there any user-facing changes?
New addition only.