累积求和之前 都会用map一个个累加,后来有用map求和的方式spark自定义函数udf案例。居然还有纯sql的方式。我是没想到这样也可以呀
根据salary分别计算各个role的移动累加值:
val cumSum = sampleData.withColumn("cumulativeSum", sum(sampleData("Salary"))
.over( Window.partitionBy("Role").orderBy("Salary")))
求和的同时根据salary排序就能得到累加值
scala> cumSum.show
+------+---------+------+-------------+
| Name| Role|Salary|cumulativeSum|
+------+---------+------+-------------+
| simon|Developer| 98000| 98000|
| mark|Developer|108000| 206000|
| henry|Developer|110000| 316000|
| bob|Developer|125000| 441000|
| eric|Developer|144000| 585000|
| peter|Developer|185000| 770000|
| jon| Tester| 65000| 65000|
| carl| Tester| 70000| 135000|
|carlos| Tester| 75000| 210000|
| roman| Tester| 82000| 292000|
+------+---------+------+-------------+