How to calculate cumulative amount in below table in pyspark
Input:
Output:
Solution:
#Load data in dataframe
df = spark.read.format("csv").option("header","True").load("file:///home/cloudera/acc.txt")
df.show()
import sys
#create a new column "cumulative_sum" in the dataframe "df". The new column will contain the cumulative sum of the values in the "amount" column. The cumulative sum is calculated using the "sum" aggregation function in a Window specification, with no partitioning or ordering, and including all rows up to the current row.
dfc = df.withColumn("cumulative_sum",sum("amount").over(Window.partitionBy().orderBy().rowsBetween(-sys.maxsize,0)))
dfc.show()
Comments
Post a Comment