问题
I'm really confused trying to solve this problem. I'm trying to use the sklearn function: MinMaxScaler but I'm getting an error because it seems to be that I'm setting an array element with a sequence.
The code is:
raw_values = series.values
# transform data to be stationary
diff_series = difference(raw_values, 1);
diff_values = diff_series.values;
diff_values = diff_values.reshape(len(diff_values), 1)
# rescale values to 0,1
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_values = scaler.fit_transform(diff_values); print(scaled_values)
scaled_values = scaled_values.reshape(len(scaled_values), 1)
"series" is a differenced time series that I'm trying to rescale between [0,1] with MinMaxScaler and the Time series was previously differenced in pandas.
I get the following error when running the code:
ValueError: setting an array element with a sequence.
Which I don't understand is the fact that if there is just one feature or variable in one column, the code runs all right, but in this case I have 2 features, each one in a different column.
Traceback:
File "C:/....py", line 88, in prepare_data
scaled_values = scaler.fit_transform(diff_values); print(scaled_values)
File "C:\Users\name\AppData\Roaming\Python\Python35\site-packages\sklearn\base.py", line 494, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Users\name\AppData\Roaming\Python\Python35\site-packages\sklearn\preprocessing\data.py", line 292, in fit
return self.partial_fit(X, y)
File "C:\Users\name\AppData\Roaming\Python\Python35\site-packages\sklearn\preprocessing\data.py", line 318, in partial_fit
estimator=self, dtype=FLOAT_DTYPES)
File "C:\Users\name\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
And this is what I obtain if I print diff_values
[[array([ -1.3, 119. ])]
[array([ 0.5, -9. ])]
[array([ 0.8, 17. ])]
...,
[array([ 2.8, 742. ])]
[array([ 1.50000000e+00, -1.65900000e+03])]
[array([ -2., 856.])]]
The full code is not mine, it's been obtained from here
EDIT:
Here is my dataset
Just switch the name 'shampoo-sales.csv'to 'datos2.csv' and this sentence:
return datetime.strptime('190'+x, '%Y-%m')
to this one:
return datetime.strptime(''+x, '%Y-%m-%d')
回答1:
In the tutorial you linked to, the object series is actually a Pandas Series. It's a vector of information, with a named index. Your dataset, however, contains two fields of information, in addition to the time series index, which makes it a DataFrame. This is the reason why the tutorial code breaks with your data.
Here's a sample from your data:
import pandas as pd
def parser(x):
return datetime.strptime(''+x, '%Y-%m-%d')
df = pd.read_csv("datos2.csv", header=None, parse_dates=[0],
index_col=0, squeeze=True, date_parser=parser)
df.head()
1 2
0
2012-01-01 10.9 3736
2012-01-02 10.3 3570
2012-01-03 9.0 3689
2012-01-04 9.5 3680
2012-01-05 10.3 3697
And the equivalent section from the tutorial:
"Running the example loads the dataset as a Pandas Series and prints the first 5 rows."
Month
1901-01-01 266.0
1901-02-01 145.9
1901-03-01 183.1
1901-04-01 119.3
1901-05-01 180.3
Name: Sales, dtype: float64
To verify this, select one of your fields and store it as series, and then try running the MinMaxScaler. You'll see that it runs without error:
series = df[1]
# ... compute difference and do scaling ...
print(scaled_values)
[[ 0.58653846]
[ 0.55288462]
[ 0.63942308]
...,
[ 0.75 ]
[ 0.6875 ]
[ 0.51923077]]
Note: One other minor difference in your dataset compared to the tutorial data is that there's no header in your data. Set header=None to avoid assigning your first row of data as column headers.
UPDATE
To pass your entire dataset to MinMaxScaler, just run difference() on both columns and pass in the transformed vectors for scaling. MinMaxScaler accepts an n-dimensional DataFrame object:
ncol = 2
diff_df = pd.concat([difference(df[i], 1) for i in range(1,ncol+1)], axis=1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_values = scaler.fit_transform(diff_df)
来源:https://stackoverflow.com/questions/44076195/array-inside-list