How to get marks compare to low,more or same as result via pyspark

 How to get marks compare to low,more or same as result via pyspark

Allstate
Input:

Output:


Solution:

# load data in Dataframe(df)

df = spark.read.format("csv").option("header","True").load("file:///home/cloudera/veeneth.txt")

df.show()

#The first new column is called "ranks", which is created using the "lead" function from the pyspark.sql.functions module and a window function "Window.partitionBy("subject").orderBy("subject")". The "lead" function returns the value of the "marks" column in the next row in the same partition, i.e., with the same value in the "subject" column, and with the same order specified by "orderBy("subject")".


The second new column is called "result", which is created using a "case when" expression. The expression evaluates the value of "ranks" for each row and sets the value of "result" to "more", "low", "same", or "null" depending on whether the value of "ranks" is greater than, less than, equal to, or null, respectively, compared to the value of "marks".


Finally, the "drop" method is used to remove the "ranks" column from the resulting DataFrame.

dfg = df.withColumn("ranks",lead("marks").over(Window.partitionBy("subject").orderBy("subject"))).withColumn("result",expr("case when ranks >marks then 'more' when ranks < marks then 'low' when marks = ranks then 'same' else null end")).drop("ranks")

dfg.show()

Comments