1.问题场景
df.write
.format("com.mongodb.spark.sql")
.mode("overwrite")
.option("uri",mongoUri)
.option("database",dbName)
.option("collection",tableName)
.save()
在把dataframe写入到mongo时出现异常
21/03/15 14:59:21 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 45.0 (TID 11196, slave203, executor 9): com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast null into a StringType
at com.mongodb.spark.sql.MapFunctions$$anonfun$com$mongodb$spark$sql$MapFunctions$$wrappedDataTypeToBsonValueMapper$1.apply(MapFunctions.scala:87)
at com.mongodb.spark.sql.MapFunctions$$anonfun$com$mongodb$spark$sql$MapFunctions$$wrappedDataTypeToBsonValueMapper$1.apply(MapFunctions.scala:83)
at com.mongodb.spark.sql.MapFunctions$$anonfun$9$$anonfun$apply$6.apply(MapFunctions.scala:139)
有时候也有可能是:Cannot cast null into a Decimal
2.问题分析
字面意思上null不能转成字符串类型,其实不是的,这个异常不是发生在转换而是发生在 转换之前的检查。
我们先来看一下 mongo的数据类型:
Object ID :Documents 自生成的 _id
String: 字符串,必须是utf-8
Boolean:布尔值,true 或者false (这里有坑哦~在我们大Python中 True False 首字母大写)
Integer:整数 (Int32 Int64 你们就知道有个Int就行了,一般我们用Int32)
Double:浮点数 (没有float类型,所有小数都是Double)
Arrays:数组或者列表,多个值存储到一个键 (list哦,大Python中的List哦)
Object:如果你学过Python的话,那么这个概念特别好理解,就是Python中的字典,这个数据类型就是字典
Null:空数据类型 , 一个特殊的概念,None Null
Timestamp:时间戳
Date:存储当前日期或时间unix时间格式 (我们一般不用这个Date类型,时间戳可以秒杀一切时间类型)
里面确实没有StringType和decimal,注意string和StringType也是有文子差别的。
再看一下源码:MapFunctions.scala:82
private def wrappedDataTypeToBsonValueMapper(elementType: DataType): (Any) => BsonValue = {
element =>
Try(dataTypeToBsonValueMapper(elementType)(element)) match {
case Success(bsonValue) => bsonValue
case Failure(ex: MongoTypeConversionException) => throw ex
case Failure(e) => throw new MongoTypeConversionException(s"Cannot cast $element into a $elementType")
}
}
private def dataTypeToBsonValueMapper(elementType: DataType): (Any) => BsonValue = {
elementType match {
case BinaryType => (element: Any) => new BsonBinary(element.asInstanceOf[Array[Byte]])
case BooleanType => (element: Any) => new BsonBoolean(element.asInstanceOf[Boolean])
case DateType => (element: Any) => new BsonDateTime(element.asInstanceOf[Date].getTime)
case DoubleType => (element: Any) => new BsonDouble(element.asInstanceOf[Double])
case IntegerType => (element: Any) => new BsonInt32(element.asInstanceOf[Int])
case LongType => (element: Any) => new BsonInt64(element.asInstanceOf[Long])
case StringType => (element: Any) => new BsonString(element.asInstanceOf[String])
case TimestampType => (element: Any) => new BsonDateTime(element.asInstanceOf[Timestamp].getTime)
case arrayType: ArrayType => {
val mapper = arrayTypeToBsonValueMapper(arrayType.elementType)
(element: Any) => mapper(element.asInstanceOf[Seq[_]])
}
case schema: StructType => {
val mapper = structTypeToBsonValueMapper(schema)
(element: Any) => mapper(element.asInstanceOf[Row])
}
case mapType: MapType =>
mapType.keyType match {
case StringType => element => mapTypeToBsonValueMapper(mapType.valueType)(element.asInstanceOf[Map[String, _]])
case _ => element => throw new MongoTypeConversionException(
s"Cannot cast $element into a BsonValue. MapTypes must have keys of StringType for conversion into a BsonDocument"
)
}
case _ if elementType.typeName.startsWith("decimal") =>
val jBigDecimal = (element: Any) => element match {
case jDecimal: java.math.BigDecimal => jDecimal
case _ => element.asInstanceOf[BigDecimal].bigDecimal
}
(element: Any) => new BsonDecimal128(new Decimal128(jBigDecimal(element)))
case _ =>
(element: Any) => throw new MongoTypeConversionException(s"Cannot cast $element into a BsonValue. $elementType has no matching BsonValue.")
}
}
对每个类型都找匹配,没找到就抛异常:Cannot cast $element into a $elementType
3.解决方法:
mongo都不支持,计算和输出的时候尽量使用mongo支持的数据类型,所以我数据类型 都用double而不是decimal