spark操作mongoDB出现异常:
Job aborted due to stage failure: Task 26 in stage 16.0 failed 4 times, most recent failure: Lost task 26.3 in stage 16.0 (TID 21384, 10.10.22.202, executor 1): com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast 409.922222 into a BsonValue. DecimalType(10,6) has no matching BsonValue. at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$elementTypeToBsonValue(MapFunctions.scala:153) at com.mongodb.spark.sql.MapFunctions$$anonfun$4.apply(MapFunctions.scala:86) at com.mongodb.spark.sql.MapFunctions$$anonfun$4.apply(MapFunctions.scala:86) at scala.util.Try$.apply(Try.scala:192) at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$convertToBsonValue(MapFunctions.scala:86) at com.mongodb.spark.sql.MapFunctions$$anonfun$rowToDocument$1.apply(MapFunctions.scala:58) at com.mongodb.spark.sql.MapFunctions$$anonfun$rowToDocument$1.apply(MapFunctions.scala:56) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at com.mongodb.spark.sql.MapFunctions$.rowToDocument(MapFunctions.scala:56) at com.mongodb.spark.MongoSpark$$anonfun$1.apply(MongoSpark.scala:162) at com.mongodb.spark.MongoSpark$$anonfun$1.apply(MongoSpark.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:1076) at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:1091) at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1128) at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1132) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1.apply(MongoSpark.scala:132) at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1.apply(MongoSpark.scala:131) at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$1.apply(MongoConnector.scala:186) at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$1.apply(MongoConnector.scala:184) at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171) at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171) at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154) at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171) at com.mongodb.spark.MongoConnector.withCollectionDo(MongoConnector.scala:184) at com.mongodb.spark.MongoSpark$$anonfun$save$1.apply(MongoSpark.scala:131) at com.mongodb.spark.MongoSpark$$anonfun$save$1.apply(MongoSpark.scala:130) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace:
原因分析:
我查了bson和json是一个道理,就去mongoDB官网看了一下,
发现mongo对spark还有版本匹配的要求。
MongoDB Connector for Spark | Spark Version | MongoDB Version |
---|---|---|
2.3.1 | 2.3.x | 2.6 or later |
2.2.5 | 2.2.x | 2.6 or later |
2.1.4 | 2.1.x | 2.6 or later |
2.0.0 | 2.0.x | 2.6 or later |
1.1.0 | 1.6.x | 2.6 or later |
问题解决:
果断去maven下载了对应集群spark版本的mongo-spark-connector_2.11-2.3.1.jar和mongo-java-driver-3.8.0.jar,放到spark jars里面,不用重启,立马试一下,果然可以了。