Is there an operation in pandas that does the same as flatMap in pyspark?
flatMap example:
>>> rdd = sc.parallelize([2, 3, 4])
>>> sort
I suspect that the answer is "no, not efficiently."
Pandas isn't built for nested data like this. I suspect that the case you're considering in Pandas looks a bit like the following:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': [[1, 2], [3, 4, 5]]})
In [3]: df
Out[3]:
x
0 [1, 2]
1 [3, 4, 5]
And that you want something like the following
x
0 1
0 2
1 3
1 4
1 5
It is far more typical to normalize your data in Python before you send it to Pandas. If Pandas did do this then it would probably only be able to operate at slow Python speeds rather than fast C speeds.
Generally one does a bit of munging of data before one uses tabular computation.