gh-142659: Optimize set_swap_bodies for intersection_update#148155
gh-142659: Optimize set_swap_bodies for intersection_update#148155Siyet wants to merge 2 commits intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
@Siyet Could you add a few benchmarks for the affected cases? |
|
Benchmarks on a dedicated server (AMD EPYC 1 vCPU, 2 GB RAM, Ubuntu 24.04, no background load, Default build (with GIL):
Free-threading build (
The end-to-end improvement is modest (~1%) because Benchmark script"""Benchmark set.intersection_update for gh-142659.
Measures the performance of intersection_update across different set sizes
and usage patterns. Each benchmark recreates the set on every iteration
to measure the full intersection_update path including set_replace_body.
"""
import statistics
import sys
import timeit
ROUNDS = 50
BENCHMARKS = {
"small (5 & 5)": {
"setup": "",
"stmt": "set(range(5)).intersection_update({3, 4, 5, 6, 7})",
"number": 2_000_000,
},
"medium (100 & 100)": {
"setup": "b = set(range(50, 150))",
"stmt": "set(range(100)).intersection_update(b)",
"number": 500_000,
},
"large (10k & 10k)": {
"setup": "b = set(range(5000, 15000))",
"stmt": "set(range(10000)).intersection_update(b)",
"number": 2_000,
},
"multi-arg (100 & 100 & 50)": {
"setup": "b = set(range(50, 150)); c = set(range(75, 125))",
"stmt": "set(range(100)).intersection_update(b, c)",
"number": 500_000,
},
"empty result (100 & 0 overlap)": {
"setup": "b = set(range(200, 300))",
"stmt": "set(range(100)).intersection_update(b)",
"number": 500_000,
},
}
def main():
print(f"Python {sys.version}")
print(f"Rounds: {ROUNDS}")
print()
print(f"{'Benchmark':<32} {'Mean (ns)':>10} {'Stdev':>10} {'Min':>10} {'Max':>10}")
print("-" * 78)
for name, bench in BENCHMARKS.items():
times = timeit.repeat(
bench["stmt"],
setup=bench["setup"],
number=bench["number"],
repeat=ROUNDS,
)
per_iter = [t / bench["number"] * 1e9 for t in times]
mean = statistics.mean(per_iter)
stdev = statistics.stdev(per_iter)
print(
f"{name:<32} {mean:>10.1f} {stdev:>9.1f} {min(per_iter):>10.1f} {max(per_iter):>10.1f}"
)
if __name__ == "__main__":
main() |
Replace the general-purpose set_swap_bodies() with a specialized set_replace_body() that exploits the invariant that src is always a uniquely-referenced temporary about to be discarded.
d3f1a20 to
37a95e6
Compare
Summary
Replace the general-purpose
set_swap_bodies()with a specializedset_replace_body()that exploits the invariant that the source argument is always a uniquely-referenced temporary set about to be discarded.Follow-up to the observation in #132290 (comment) and #142659.
Changes
set_swap_bodies()was designed for arbitrary two-way swaps between any two sets, but it is only called fromset_intersection_update()andset_intersection_update_multi_impl(), where the second argument (tmp) is always a freshly created temporary withPy_REFCNT == 1.The new
set_replace_body()exploits this invariant:src: the temporary is not visible to other threads, so plain assignments suffice (saves atomic fence overhead in the free-threaded build).copy_small_tableforsrc: use plainmemcpyinstead of per-entry atomic stores when writing back to src's smalltable (Py_GIL_DISABLEDpath).assert).srcis never shared (enforced viaassert), so only one direction of the shared-marking check is needed — propagate shared status fromdsttosrcfor proper deallocation of old entries.Py_hash_t hvariable.All assumptions are guarded by
assert()to document and enforce the contract.