spark sql字段类型MapType和ArrayType

2020-12-23 10:01:02 | 编辑

1.ArrayType

之前使用spark读取mongo解决嵌套数据有用到ArrayType的案例

读取mongo嵌套

    val schema = StructType(
      Array(
        StructField("subjectiveList",
          ArrayType(StructType(Array(
            StructField("questionNo", StringType),StructField("score", DoubleType),StructField("isEffective", BooleanType)
            ,
            StructField("fastMark",
              ArrayType(StructType(Array(
                StructField("subQuestionNo", StringType),StructField("score", DoubleType)
              )))
            )
          )))
        ),
        StructField("studentId", StringType),
        StructField("classId", StringType)
      )
    )

    spark.read.format("com.mongodb.spark.sql")
      .schema(schema)
      .option("spark.mongodb.input.uri", mongoUri)
      .option("spark.mongodb.input.partitioner", "MongoSplitVectorPartitioner")
      .option("spark.mongodb.input.partitionerOptions.partitionSizeMB",32)
      .load()

提取嵌套数组到上层

Rdd.select((questionsRdd.schema.fieldNames.map(f=>{questionsRdd(f)}):+explode($"objectiveList").as("Info")):_*)
 .withColumn("no",objectiveQuestionRdd("Info")("questionNo"))

2.MapType

 

https://www.itdiandi.net/view/1678

 

登录后即可回复 登录 | 注册
    
关注编程学问公众号