问题
I am using ConvNets to build a model to make weather forecast. My input data is 10K samples of a 96x144 matrix (which represents a geographic region) with values of a variable Z (geopotential height) in each point of the grid at a fixed height. If I include 3 different heights (Z is very different in different heights) then I have this input shape: (num_samples,96,144,3). The samples are for every hour, one sample = one hour. I have nearly 2 years of data. And the input data (Z) represents the state of the atmosphere in that hour.
That can be thought as an image with 3 channels, but instead of pixel values in a 0-256 range i have values of Z in a much larger range (last channel of height has a range of 7500 to 9500 and the first one has a range of 500 to 1500 aprox).
I want to predict precipitation (will it rain or not? just that, binary, yes or no).
In that grid, that region of space in my country, i only have output data in specific (x,y) points (just 122 weather stations with rain data in the entire region). There are just 122 (x,y) points where i have values of 1 (it rained that hour) or 0 (didn't).
So my output matrix is a (num_samples,122) vector which contains 1 in the station index if in that sample (that hour) did rain or 0 if it didn't.
So i used a mix between VGG16 model and this one https://github.com/prl900/precip-encoder-decoders/blob/master/encoder_vgg16.py which is a model used for this specific application that i found on a paper.
I wish to know if i'm building the model the right way, I just changed the input layer to match my shape and the last layer of the FC layer to match my classes (122, because for a specific sample of input, i wish to have an 1x122 vector with a 0 or 1 depending if in that station rained or not, is this right?). And because of the probabilities are not mutually-exclusive (i can have many 1s if it rained in more than one station) i used the 'sigmoid' activation in the last layer.
I DON'T know which metric to use in the compile, and acc, mae, and categorical acc are just staying the same all epochs (in the second epoch increases a little but after of that, acc and val_acc stay the same for every epoch).
AND, in the output matrix there are null values (hours in which the station doesn't have data), i am just filling that NaNs with a -1 value (like an 'i don't know' label). This may be the reason because nothing works?
Thanks for the help and sorry for the over-explanation.
def get_vgg16():
model = Sequential()
# Conv Block 1
model.add(BatchNormalization(axis=3, input_shape=(96,144,3)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 2
model.add(BatchNormalization(axis=3))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 3
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 4
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 5
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# FC layers
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(122, activation='sigmoid'))
#adam = Adam(lr=0.001)
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=[metrics.categorical_accuracy,metrics.binary_accuracy, 'acc'])
print(model.summary())
return model
回答1:
There are various things to consider in order to improve the model:
Your choice of loss
You could do various things here. Using a L2 loss (squared distance minimization) is an option, where your targets are no rain (0) or rain (1) for each station. Another (more accurate) option would be to consider each output as the probability of it raining at that station. Then, you would apply a binary cross entropy loss for each one of the output values.
The binary cross entropy is just the regular cross entropy applied to two-class classification problems. Please note that P(y) = 1 - P(x) when there are only two possible outcomes. As such, you don't need to add any extra neurons.
Mask the loss
Do not set the missing targets to -1. This does not make sense and only introduces noise to the training. Imagine you are using an L2 loss. If your network predicts rain for that value, that would mean (1 - (-1))^2 = 4, a very high prediction error. Instead, you want the network to ignore these cases.
You can do that by masking the losses. Lets say you make Y = (num_samples, 122) predictions, and have an equally shaped target matrix T. You could define a binary mask M of the same size, with ones for the values you know, and zeros in the missing value locations. Then, your loss would be L = M * loss(Y, T). For missing values, the loss would always be 0, with no gradient: nothing would be learnt from them.
Normalize the inputs
It is always good practice to normalize/standardize the inputs. This avoids some features having more relevance than others, speeding up the training. In cases where the inputs have very large magnitudes, it also helps stabilise the training, preventing gradient explosions.
In your case, you have three channels, and each one follows a different distribution (it has a different minimum and maximum value). You need to consider, separately for each channel (height), the data on all samples when computing the min+max / mean+stdv values, and then apply these two values to normalize/standardize the corresponding channel on all samples. That is, given a tensor of size (N,96,144,3), normalize/standardize each sub-tensor of size (N,96,144,1) separately. You will need to apply the same transform to the test data, so save the scaling values for later.
来源:https://stackoverflow.com/questions/57011658/convnet-with-missing-output-data-for-weather-forecast