How to find out desired output by pyspark

How to find out desired output by pyspark

AccionLabs
Input:




Output:

Solution:

# load data into dataframe(df)

df = spark.read.format("csv").option("header","True").load("file:///home/cloudera/acbalance.txt")

# This code is grouping the data in the dataframe df by the "account_number" column and aggregating it with the following operations:


Taking the maximum value of "transaction_id" and renaming the result as "transaction_id".

Taking the last value of "balance" and renaming the result as "balance".

The resulting dataframe is sorted by the "balance" column in ascending order using orderBy.

dfg = df.groupby("account_number").agg(max("transaction_id").alias("transaction_id"),last("balance").alias("balance")).orderBy("balance")

dfg.show()



Comments