Find below expected output through spark and scala
File Name : dxr.txt
11111,"2021-01-15",10
11111,"2021-01-16",15
11111,"2021-01-17",30
11112,"2021-01-15",10
11112,"2021-01-16",20
11112,"2021-01-17",30
Output Required :
Solution :
val schemax = new StructType()
.add("sensorid",IntegerType,true)
.add("timestamp",TimestampType,true)
.add("values",IntegerType,true)
val df = spark.read.format("csv").schema(schemax).load("file:///D:/dxr.txt")
df.show()
df.printSchema()
val dx = df.withColumn("tx",lead("values",1).over(Window.partitionBy("sensorid") orderBy("timestamp")))
dx.show()
val dr = dx.withColumn("diff",col("tx")-col("values"))
.filter(col("tx").isNotNull)
.drop("values","tx")
.withColumnRenamed("diff","values")
dr.show()
Comments
Post a Comment