apache spark - Efficient boolean reductions `any`, `all` for PySpark RDD? -
pyspark supports common reductions sum, min, count, ... support boolean reductions all , any?
i can fold on or_ , and_ seems inefficient.
no underlying scala api doesn't have python 1 won't. don't think add either it's easy define in terms of filter.
yes using fold inefficient because won't parallelelize. .filter(!condition).take(1).isempty mean .forall(condition) , .filter(condition).take(1).nonempty mean .exists(condition)
(general suggestion: underlying scala api more flexible python api, suggest move - makes debugging easier have less layers dig through. scala means scalable language - it's better scalable applications , more robust dynamically typed languages)
Comments
Post a Comment