We are reading data from MongoDB Collection
. Collection
column has two different values (e.g.: (bson.Int64,int) (int,float)
).
I am trying to get a datatype using pyspark.
My problem is some columns have different datatype.
Assume quantity
and weight
are the columns
quantity weight
--------- --------
12300 656
123566000000 789.6767
1238 56.22
345 23
345566677777789 21
Actually we didn't defined data type for any column of mongo collection.
When I query to the count from pyspark dataframe
dataframe.count()
I got exception like this
"Cannot cast STRING into a DoubleType (value: BsonString{value='200.0'})"
Your question is broad, thus my answer will also be broad.
To get the data types of your DataFrame
columns, you can use dtypes
i.e :
>>> df.dtypes
[('age', 'int'), ('name', 'string')]
This means your column age
is of type int
and name
is of type string
.
I don't know how are you reading from mongodb, but if you are using the mongodb connector, the datatypes will be automatically converted to spark types. To get the spark sql types, just use schema atribute like this:
df.schema
Looks like your actual data and your metadata have different types. The actual data is of type string while the metadata is double.
As a solution I would recommend you to recreate the table with the correct datatypes.
I am assuming you are looking to get the data type of the data you read.
input_data = [Read from Mongo DB operation]
You can use
type(input_data)
to inspect the data type
来源:https://stackoverflow.com/questions/45033315/get-datatype-of-column-using-pyspark