apache spark - Efficient boolean reductions `any`, `all` for PySpark RDD? -

- March 15, 2011

pyspark supports common reductions sum, min, count, ... support boolean reductions all , any?

i can fold on or_ , and_ seems inefficient.

no underlying scala api doesn't have python 1 won't. don't think add either it's easy define in terms of filter.

yes using fold inefficient because won't parallelelize. .filter(!condition).take(1).isempty mean .forall(condition) , .filter(condition).take(1).nonempty mean .exists(condition)

(general suggestion: underlying scala api more flexible python api, suggest move - makes debugging easier have less layers dig through. scala means scalable language - it's better scalable applications , more robust dynamically typed languages)

Search This Blog

Backgorund

apache spark - Efficient boolean reductions `any`, `all` for PySpark RDD? -

Comments

Post a Comment

Popular posts from this blog

database - VFP Grid + SQL server 2008 - grid not showing correctly -

jquery - Set jPicker field to empty value -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -