Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I would like to add a string to an existing column. For example, df['col1'] has values as '1', '2', '3' etc and I would like to concat string '000' on the left of col1 so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'.

I thought I should use df.withColumn('col1', '000'+df['col1']) but of course it does not work since pyspark dataframe are immutable?

This should be an easy task but i didn't find anything online. Hope someone can give me some help!

Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

from pyspark.sql.functions import concat, col, lit


df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5)
+------------------------------+
|concat(firstname,  , lastname)|
+------------------------------+
|                Emanuel Panton|
|              Eloisa Cayouette|
|                   Cathi Prins|
|             Mitchel Mozdzierz|
|               Angla Hartzheim|
+------------------------------+
only showing top 5 rows

http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...