Check Type: How to check if something is a RDD or a dataframe?

江枫思渺然 提交于 2019-12-01 04:56:56

问题


I'm using python, and this is Spark Rdd/dataframes.

I tried isinstance(thing, RDD) but RDD wasn't recognized.

The reason I need to do this:

I'm writing a function where both RDD and dataframes could be passed in, so I'll need to do input.rdd to get the underlying rdd if a dataframe is passed in.


回答1:


isinstance will work just fine:

from pyspark.sql import DataFrame
from pyspark.rdd import RDD

def foo(x):
    if isinstance(x, RDD):
        return "RDD"
    if isinstance(x, DataFrame):
        return "DataFrame"

foo(sc.parallelize([]))
## 'RDD'
foo(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

but single dispatch is much more elegant approach:

from functools import singledispatch

@singledispatch
def bar(x):
    pass 

@bar.register(RDD)
def _(arg):
    return "RDD"

@bar.register(DataFrame)
def _(arg):
    return "DataFrame"

bar(sc.parallelize([]))
## 'RDD'

bar(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

If you don't mind additional dependencies multipledispatch is also an interesting option:

from multipledispatch import dispatch

@dispatch(RDD)
def baz(x):
    return "RDD"

@dispatch(DataFrame)
def baz(x):
    return "DataFrame"

baz(sc.parallelize([]))
## 'RDD'

baz(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

Finally the most Pythonic approach is to simply check an interface:

def foobar(x):
    if hasattr(x, "rdd"):
        ## It is a DataFrame
    else:
        ## It (probably) is a RDD



回答2:


another way to check is type

type(object)

that return the type of the object like

pyspark.sql.dataframe.DataFrame



来源:https://stackoverflow.com/questions/36731365/check-type-how-to-check-if-something-is-a-rdd-or-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!