问题
new to pySpark and I'm trying to fill a column based on conditions using a list. How can I fill a column based conditions using a list?
Python logic
if matchedPortfolios == 0:
print("ALL")
else:
print(Portfolios)
pySpark attempt with error
#Check matching column values in order to find common portfolio names
Portfolios = set (portfolio_DomainItemLookup) & set(portfolio_dataset_standardFalse)
Portfolios #prints list of matched names OR prints empty list
matchedPortfolios = len(Portfolios)
matchedPortfolios #prints 0 or length of list
dataset_standardFalse.withColumn('PortfolioRule', f.when( matchedPortfolios == 0, "ALL").otherwise(Portfolios)).show()
TypeError: condition should be a Column: Variable matchedPortfolios is a list. How can I fill a column based conditions using a list?
My Current dataframe
|SourceSystemName| Portfolio|PortfolioRule|
+----------------+----------------+-------------+
| ABCorp| ABC Portfolio| null|
| ABCorp| ABC Portfolio| null|
| ABCorp| ABC Portfolio| null|
Expected outcomes
if matchedPortfolios == 0 logic
+----------------+----------------+-------------+
|SourceSystemName| Portfolio|PortfolioRule|
+----------------+----------------+-------------+
| ABCorp| ABC Portfolio| ALL |
| ABCorp| ABC Portfolio| ALL |
| ABCorp| ABC Portfolio| ALL |
else logic
+----------------+----------------+--------------+
|SourceSystemName| Portfolio|PortfolioRule |
+----------------+----------------+--------------+
| ABCorp| ABC Portfolio|ABC Portfolio |
| ABCorp| ABC Portfolio|ABC Portfolio |
| ABCorp| ABC Portfolio|ABC Portfolio |
回答1:
This fills columns where Portfolio has no match. And if there is a match, it will be a straight copy from the Portfolio column
new = dataset_standardFalse.withColumn('PortfolioRule',f.when(dataset_standardFalse['Portfolio'].isin(Portfolios), dataset_standardFalse['Portfolio']).otherwise('ALL'))
display(new)
来源:https://stackoverflow.com/questions/61787976/populating-column-in-dataframe-with-pyspark