I have a pyspark dataframe df with the following elements:
-> document_id: string(nullable=true) -> probability: vector(nullable=true)