apache spark - Efficient boolean reductions `any`, `all` for PySpark RDD? -

- March 15, 2011

pyspark supports common reductions sum, min, count, ... support boolean reductions all , any?

i can fold on or_ , and_ seems inefficient.

no underlying scala api doesn't have python 1 won't. don't think add either it's easy define in terms of filter.

yes using fold inefficient because won't parallelelize. .filter(!condition).take(1).isempty mean .forall(condition) , .filter(condition).take(1).nonempty mean .exists(condition)

(general suggestion: underlying scala api more flexible python api, suggest move - makes debugging easier have less layers dig through. scala means scalable language - it's better scalable applications , more robust dynamically typed languages)

Search This Blog

Backgorund

apache spark - Efficient boolean reductions `any`, `all` for PySpark RDD? -

Comments

Post a Comment

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -