scala - How to use GzipCodec or BZip2Codec for shuffle spill compression with Spark shell -
so when start spark shell -dspark.io.compression.codec=org.apache.hadoop.io.compress.gzipcodec
get.
due space limitations on our cluster, want use more aggressive compression codec how use bzip2codec , avoid exception? possible?
java.lang.nosuchmethodexception: org.apache.hadoop.io.compress.bzip2codec.<init>(org.apache.spark.sparkconf) @ java.lang.class.getconstructor0(class.java:2810) @ java.lang.class.getconstructor(class.java:1718) @ org.apache.spark.io.compressioncodec$.createcodec(compressioncodec.scala:48) @ org.apache.spark.io.compressioncodec$.createcodec(compressioncodec.scala:42) @ org.apache.spark.broadcast.httpbroadcast$.initialize(httpbroadcast.scala:106) @ org.apache.spark.broadcast.httpbroadcastfactory.initialize(httpbroadcast.scala:70) @ org.apache.spark.broadcast.broadcastmanager.initialize(broadcast.scala:81) @ org.apache.spark.broadcast.broadcastmanager.<init>(broadcast.scala:68) @ org.apache.spark.sparkenv$.create(sparkenv.scala:175) @ org.apache.spark.sparkcontext.<init>(sparkcontext.scala:141) @ org.apache.spark.repl.sparkiloop.createsparkcontext(sparkiloop.scala:956) @ $iwc$$iwc.<init>(<console>:8) @ $iwc.<init>(<console>:14) @ <init>(<console>:16) @ .<init>(<console>:20) @ .<clinit>(<console>) @ .<init>(<console>:7) @ .<clinit>(<console>) @ $print(<console>) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ org.apache.spark.repl.sparkimain$readevalprint.call(sparkimain.scala:772) @ org.apache.spark.repl.sparkimain$request.loadandrun(sparkimain.scala:1040) @ org.apache.spark.repl.sparkimain.loadandrunreq$1(sparkimain.scala:609) @ org.apache.spark.repl.sparkimain.interpret(sparkimain.scala:640) @ org.apache.spark.repl.sparkimain.interpret(sparkimain.scala:604) @ org.apache.spark.repl.sparkiloop.reallyinterpret$1(sparkiloop.scala:795) @ org.apache.spark.repl.sparkiloop.interpretstartingwith(sparkiloop.scala:840) @ org.apache.spark.repl.sparkiloop.command(sparkiloop.scala:752) @ org.apache.spark.repl.sparkiloopinit$$anonfun$initializespark$1.apply(sparkiloopinit.scala:119) @ org.apache.spark.repl.sparkiloopinit$$anonfun$initializespark$1.apply(sparkiloopinit.scala:118) @ org.apache.spark.repl.sparkimain.bequietduring(sparkimain.scala:258) @ org.apache.spark.repl.sparkiloopinit$class.initializespark(sparkiloopinit.scala:118) @ org.apache.spark.repl.sparkiloop.initializespark(sparkiloop.scala:55) @ org.apache.spark.repl.sparkiloop$$anonfun$process$1$$anonfun$apply$mcz$sp$5.apply$mcv$sp(sparkiloop.scala:912) @ org.apache.spark.repl.sparkiloopinit$class.runthunks(sparkiloopinit.scala:140) @ org.apache.spark.repl.sparkiloop.runthunks(sparkiloop.scala:55) @ org.apache.spark.repl.sparkiloopinit$class.postinitialization(sparkiloopinit.scala:102) @ org.apache.spark.repl.sparkiloop.postinitialization(sparkiloop.scala:55) @ org.apache.spark.repl.sparkiloop$$anonfun$process$1.apply$mcz$sp(sparkiloop.scala:929) @ org.apache.spark.repl.sparkiloop$$anonfun$process$1.apply(sparkiloop.scala:883) @ org.apache.spark.repl.sparkiloop$$anonfun$process$1.apply(sparkiloop.scala:883) @ scala.tools.nsc.util.scalaclassloader$.savingcontextloader(scalaclassloader.scala:135) @ org.apache.spark.repl.sparkiloop.process(sparkiloop.scala:883) @ org.apache.spark.repl.sparkiloop.process(sparkiloop.scala:981) @ org.apache.spark.repl.main$.main(main.scala:31) @ org.apache.spark.repl.main.main(main.scala)
running hadoop checknative
:
14/06/13 17:41:24 info bzip2.bzip2factory: loaded & initialized native-bzip2 library system-native 14/06/13 17:41:24 info zlib.zlibfactory: loaded & initialized native-zlib library native library checking: hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib/x86_64-linux-gnu/libz.so.1 snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1
after starting shell simple import statement should allow use bzip2 - compression , how want, although still unclear on question.
import org.apache.hadoop.io.compress.bzip2codec
and when starting shell, executed following command , gave me no error, try check if you're entering correctly.
./spark-shell -d spark.io.compression.codec=org.apache.hadoop.io.compress.bzip2codec
this of-course in "~/spark/bin/"
similarly gzip:
./spark-shell -d spark.io.compression.codec=org.apache.hadoop.io.compress.gzipcodec
Comments
Post a Comment