Find below expected output through spark and scala

Find below expected output through spark and scala

File Name : dxr.txt

11111,"2021-01-15",10

11111,"2021-01-16",15

11111,"2021-01-17",30

11112,"2021-01-15",10

11112,"2021-01-16",20

11112,"2021-01-17",30

Output Required :

Solution :

val schemax = new StructType()

.add("sensorid",IntegerType,true)

.add("timestamp",TimestampType,true)

.add("values",IntegerType,true)


val df = spark.read.format("csv").schema(schemax).load("file:///D:/dxr.txt")

df.show()

df.printSchema()  


val dx = df.withColumn("tx",lead("values",1).over(Window.partitionBy("sensorid") orderBy("timestamp")))

dx.show()


val dr = dx.withColumn("diff",col("tx")-col("values"))

.filter(col("tx").isNotNull)

.drop("values","tx")

.withColumnRenamed("diff","values")

dr.show()



Comments