How to get increasing and decreasing sale in dataframe find via pyspark
df = spark.read.format("csv").option("header","True").load("file:////home/cloudera/medi.txt")
df.show()
windowspec = Window.partitionBy("ProductID").orderBy("year")
resultdf = df.withColumn("sales",lag("Sold",1).over(windowspec))
resultdf.show()
dff = resultdf.withColumn("comp",col("Sold")-col("sales"))
dff.show()
res = dff.filter(col("comp")>=0).select("ProductID","ProductName").distinct()
res.show()
res2 = dff.filter(col("comp")<0).select("ProductID","ProductName").distinct()
res2.show()
When to lag and lit functions
ReplyDelete