I have two DataFrames A and B:
A has columns (id, info1, info2) with about 200 Million rows
If I understand your question correctly, you want to use a broadcast join that replicates DataFrame B on every node so that the semi-join computation (i.e., using a join to filter id from DataFrame A) can compute independently on every node instead of having to communicate information back-and-forth between each other (i.e., shuffle join).
You can run join functions that explicitly call for a broadcast join to achieve what you're trying to do:
import org.apache.spark.sql.functions.broadcast
val joinExpr = A.col("id") === B.col("id")
val filtered_A = A.join(broadcast(B), joinExpr, "left_semi")
You can run filtered_A.explain() to verify that a broadcast join is being used.