How to calculate cumulative amount in below table in pyspark

Input:

Output:

Solution:

#Load data in dataframe

df = spark.read.format("csv").option("header","True").load("file:///home/cloudera/acc.txt")

df.show()

import sys

#create a new column "cumulative_sum" in the dataframe "df". The new column will contain the cumulative sum of the values in the "amount" column. The cumulative sum is calculated using the "sum" aggregation function in a Window specification, with no partitioning or ordering, and including all rows up to the current row.

dfc = df.withColumn("cumulative_sum",sum("amount").over(Window.partitionBy().orderBy().rowsBetween(-sys.maxsize,0)))

dfc.show()

The Data Crusher

Search This Blog

How to calculate cumulative amount in below table in pyspark

How to calculate cumulative amount in below table in pyspark

Comments

Post a Comment