How do you create merge_asof functionality in PySpark?
问题 Table A has many columns with a date column, Table B has a datetime and a value. The data in both tables are generated sporadically with no regular interval. Table A is small, table B is massive. I need to join B to A under the condition that a given element a of A.datetime corresponds to B[B['datetime'] <= a]]['datetime'].max() There are a couple ways to do this, but I would like the most efficient way. Option 1 Broadcast the small dataset as a Pandas DataFrame. Set up a Spark UDF that