问题
I'm trying to pad a text for a seq2seq model.
from keras_preprocessing.sequence import pad_sequences
x=[["Hello, I'm Bhaskar", "This is Keras"], ["This is an", "experiment"]]
pad_sequences(sequences=x, maxlen=5, dtype='object', padding='pre', value="<PAD>")
I encounter following error:
ValueError: `dtype` object is not compatible with `value`'s type: <class 'str'>
You should set `dtype=object` for variable length strings.
However, when I try to do same for integer it works well.
x=[[1, 2, 3], [4, 5, 6]]
pad_sequences(sequences=x, maxlen=5, padding='pre', value=0)
Output:
array([[0, 0, 1, 2, 3],
[0, 0, 4, 5, 6]], dtype=int32)
I hope to get output as:
[["<PAD>", "<PAD>", "<PAD>", "Hello, I'm Bhaskar", "This is Keras"], ["<PAD>", "<PAD>","<PAD>", "This is an", "experiment"]]
回答1:
As suggested by the Error, change dtype
to object
(not string but to an object itself), It will do the job for you.
from keras.preprocessing.sequence import pad_sequences
x=[["Hello, I'm Bhaskar", "This is Keras"], ["This is an", "experiment"]]
pad_sequences(sequences=x, maxlen=5, dtype=object, padding='pre', value="<PAD>")
Output
array([['<PAD>', '<PAD>', '<PAD>', "Hello, I'm Bhaskar", 'This is Keras'],
['<PAD>', '<PAD>', '<PAD>', 'This is an', 'experiment']],
dtype=object)
来源:https://stackoverflow.com/questions/55220072/difference-in-padding-integer-and-string-in-keras