apache spark - Efficient boolean reductions `any`, `all` for PySpark RDD? -
pyspark supports common reductions sum
, min
, count
, ... support boolean reductions all
, any
?
i can fold
on or_
, and_
seems inefficient.
no underlying scala api doesn't have python 1 won't. don't think add either it's easy define in terms of filter
.
yes using fold
inefficient because won't parallelelize. .filter(!condition).take(1).isempty
mean .forall(condition)
, .filter(condition).take(1).nonempty
mean .exists(condition)
(general suggestion: underlying scala api more flexible python api, suggest move - makes debugging easier have less layers dig through. scala means scalable language - it's better scalable applications , more robust dynamically typed languages)
Comments
Post a Comment