MNIST(Modified National Institute of Standards and Technology)
MNIST被称作是计算机视觉的新手村,相当于神经网络CNN版的helloword,也是TensorFlow的初体验。提供的数据集是28*28的灰度矩阵,要分析并识别出对应原来手写图片的数字。
载入数据集
train = pd.read_csv('./input/train.csv') test = pd.read_csv('./input/test.csv')
训练集数字总览
# 数字出现总数求和,柱状图 g = sns.countplot(Y_train) plt.show()
各个数字出现的总数大致相等,没有极端情况
原始数据处理
因为训练集是28*28的灰度矩阵,取值范围是0-255的整数,数字越大对应的像素点越暗,因此/255转化成float
X_train = X_train / 255.0 test = test / 255.0 X_train = X_train.values.reshape(-1, 28, 28, 1) test = test.values.reshape(-1, 28, 28, 1) Y_train = to_categorical(Y_train, num_classes=10)
CNN建模
因为训练集是28*28的灰度矩阵,取值范围是0-255的整数,数字越大对应的像素点越暗,因此/255转化成float
model_begin = datetime.now() print(str(model_begin) + " model begin") model = Sequential() model.add(Conv2D(filters=32, kernel_size=(5, 5), padding='Same', activation='relu', input_shape=(28, 28, 1))) model.add(Conv2D(filters=32, kernel_size=(5, 5), padding='Same', activation='relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(filters=64, kernel_size=(3, 3), padding='Same', activation='relu')) model.add(Conv2D(filters=64, kernel_size=(3, 3), padding='Same', activation='relu')) model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation="softmax")) optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0) model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]) learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=0.00001) # epochs=1 ,- 340s - loss: 0.4151 - acc: 0.8693 - val_loss: 0.0748 - val_acc: 0.9779 # epochs=10,- 309s - loss: 0.0633 - acc: 0.9823 - val_loss: 0.0222 - val_acc: 0.9945 epochs = 1 batch_size = 86 datagen = ImageDataGenerator( featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, samplewise_std_normalization=False, zca_whitening=False, rotation_range=10, zoom_range=0.1, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=False, vertical_flip=False) datagen.fit(X_train) history = model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), epochs=epochs, validation_data=(X_val, Y_val), verbose=2, steps_per_epoch=X_train.shape[0] // batch_size , callbacks=[learning_rate_reduction])
训练集误差分析
plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, cm[i, j], horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.tight_layout() plt.ylabel('True label') plt.xlabel('Predicted label') plt.savefig('./output_cnn/matrix.png') plt.show()
x轴是预测的数字,y轴是真实的数字。可以看出把5预测成6,3预测成8的情况较多,可能是因为这几对数字形状相近,在手写的情况下存在一定的误导
查看预测错误的数字的真是图片
n = 0 nrows = 3 ncols = 3 fig, ax = plt.subplots(nrows, ncols, sharex=True, sharey=True) for row in range(nrows): for col in range(ncols): error = errors_index[n] ax[row, col].imshow((img_errors[error]).reshape((28, 28))) ax[row, col].set_title("Predicted label :{}\nTrue label :{}".format(pred_errors[error], obs_errors[error])) n += 1 plt.savefig('./output_cnn/errors.png') plt.show()
可以看出部分手写数字比较潦草,人眼看的话,也可能存在错误的情况
输出预测结果
nresults = model.predict(test) results = np.argmax(results, axis=1) results = pd.Series(results, name="Label") submission = pd.concat([pd.Series(range(1, 28001), name="ImageId"), results], axis=1) submission.to_csv("./output_cnn/mnist_cnn.csv", index=False)
输出日志
2019-05-12 18:43:05.861004 digit-recongizer begin 2019-05-12 18:43:09.434510 model begin Epoch 1/1 2019-05-12 18:43:10.537447: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA - 306s - loss: 0.4166 - acc: 0.8679 - val_loss: 0.0808 - val_acc: 0.9726 2019-05-12 18:48:16.955573 error begin 2019-05-12 18:48:25.481250 matrix begin 2019-05-12 18:48:26.292335 display_errors begin 2019-05-12 18:48:27.402511 predict begin 2019-05-12 18:49:28.578289 digit-recongizer end
上传Kaggle预测结果集
第二次修改epochs = 10
Using TensorFlow backend. 2019-05-19 13:10:45.624923 digit-recongizer begin 2019-05-19 13:10:49.557691 model begin Epoch 1/10 2019-05-19 13:10:51.337148: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA - 311s - loss: 0.4112 - acc: 0.8695 - val_loss: 0.0765 - val_acc: 0.9771 Epoch 2/10 - 294s - loss: 0.1281 - acc: 0.9622 - val_loss: 0.0400 - val_acc: 0.9860 Epoch 3/10 - 298s - loss: 0.0940 - acc: 0.9717 - val_loss: 0.0367 - val_acc: 0.9895 Epoch 4/10 - 318s - loss: 0.0785 - acc: 0.9765 - val_loss: 0.0317 - val_acc: 0.9895 Epoch 5/10 - 303s - loss: 0.0701 - acc: 0.9798 - val_loss: 0.0384 - val_acc: 0.9888 Epoch 6/10 - 301s - loss: 0.0678 - acc: 0.9799 - val_loss: 0.0315 - val_acc: 0.9910 Epoch 7/10 - 291s - loss: 0.0635 - acc: 0.9811 - val_loss: 0.0342 - val_acc: 0.9898 Epoch 8/10 - 293s - loss: 0.0585 - acc: 0.9830 - val_loss: 0.0312 - val_acc: 0.9921 Epoch 9/10 - 292s - loss: 0.0606 - acc: 0.9829 - val_loss: 0.0202 - val_acc: 0.9943 Epoch 10/10 - 309s - loss: 0.0633 - acc: 0.9823 - val_loss: 0.0222 - val_acc: 0.9945 2019-05-19 14:01:01.464350 error begin 2019-05-19 14:01:09.997218 matrix begin 2019-05-19 14:01:10.969481 display_errors begin 2019-05-19 14:01:13.028658 predict begin 2019-05-19 14:02:23.559788 digit-recongizer end
可以看到随着epochs的增加,准确度在缓慢提升,不过花的时间也是越来越长
查看系统资源
mbp几乎在cpu满负荷的情况下跑了1个小时,epochs每一个轮次大药5分钟,10次接近一小时
上传Kaggle预测结果集
准确率达到了0.992,暂时先这样,后面再看有没有其他的调参优化方法
完整代码,数据集下载
来源:https://www.cnblogs.com/wanli002/p/10888379.html