Multi-Label Image Classification

问题

I tried myself but couldn't reach the final point that's why posting here, please guide me.

I am working in multi-label image classification and have slightly different scenarios. Actually I am confused, how we will map labels and their attribute with Id etc So we can use for training and testing.

Here is code on which I am working

import os
import numpy as np
import pandas as pd
from keras.utils import to_categorical
from collections import Counter
from keras.callbacks import Callback
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from matplotlib import pyplot
from tensorflow.keras import backend

def create_tag_mapping(mapping_csv):
    labels = set()
    for i in range(len(mapping_csv)):
        tags = mapping_csv['Labels'][i].split(' ')
        labels.update(tags)
    labels = list(labels)
    labels.sort()
    labels_map = {labels[i]:i for i in range(len(labels))}
    inv_labels_map = {i:labels[i] for i in range(len(labels))}
    return labels_map, inv_labels_map

# create a mapping of filename to tags
def create_file_mapping(mapping_csv):
    mapping = dict()
    for i in range(len(mapping_csv)):
        name, tags = mapping_csv['Id'][i], mapping_csv['Labels'][i]
        mapping[name] = tags.split(' ')
    return mapping

# create a one hot encoding for one list of tags
def one_hot_encode(tags, mapping):
    # create empty vector
    encoding = np.zeros(len(mapping), dtype='uint8')
    # mark 1 for each tag in the vector
    for tag in tags:
        encoding[mapping[tag]] = 1
    return encoding

def load_dataset(path, file_mapping, tag_mapping):
    photos, targets = list(), list()
    # enumerate files in the directory
    for filename in os.listdir(path):
        # load image
        photo = load_img(path + filename, target_size=(760,415))
        # convert to numpy array
        photo = img_to_array(photo, dtype='uint8')
        # get tags
        tags = file_mapping[filename[:-4]]
        # one hot encode tags
        target = one_hot_encode(tags, tag_mapping)
        # store
        photos.append(photo)
        targets.append(target)
    X = np.asarray(photos, dtype='uint8')
    y = np.asarray(targets, dtype='uint8')
    return X, y

trainingLabels = 'labels.csv'
# load the mapping file
mapping_csv = pd.read_csv(trainingLabels)


# create a mapping of tags to integers
tag_mapping, _ = create_tag_mapping(mapping_csv)

# create a mapping of filenames to tag lists
file_mapping = create_file_mapping(mapping_csv)


# load the png images
folder = 'dataset/'

X, y = load_dataset(folder, file_mapping, tag_mapping)
print(X.shape, y.shape)

trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.3, random_state=1)
print(trainX.shape, trainY.shape, testX.shape, testY.shape)

img_x,img_y=760,415
trainX=trainX.reshape(trainX.shape[0], img_x,img_y,3)
testX=testX.reshape(testX.shape[0], img_x,img_y,3)

trainX=trainX.astype('float32')
testX=testX.astype('float32')

trainX /= 255
testX /=255

trainY=to_categorical(trainY,3)
testY=to_categorical(testY,3)
print(trainX.shape)
print(trainY.shape)

model = Sequential()
model.add(Conv2D(32, (5, 5), strides=(1,1), activation='relu', input_shape=(img_x, img_y,3)))
model.add(MaxPooling2D((2, 2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='sigmoid'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history=model.fit(trainX, trainY, batch_size=2, epochs=5, verbose=1)
plt.plot(history.history['acc'])
plt.plot(history.history['loss'])
plt.title('Accuracy and loss')
plt.xlabel('epoch')
plt.ylabel('accuracy/loss')
plt.legend(['Accuracy','loss'],loc='upper left')
plt.show()

score=model.evaluate(testX,testY,verbose=0)
print('test loss',score[0])
print('test accuracy',score[1])

I have attached an image file, that will give a clear picture of my problem.

Because If we followed these

https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-satellite-photos-of-the-amazon-rainforest/
https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff
https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/

etc. They have multi labels against each image but in my case, I have multilabel plus their attributes.

回答1:

If your goal is to predict if 'L', 'M' and 'H', you are using an incorrect loss function. You should use binary_crossentropy. The shape of your targets will be batch × 3 in this case.

categorical_crossentropy assumes the output is a categorical distribution: a vector of values that sum up to one. In other words, you have multiple possibilities, but only of them can be the correct one.
binary_crossentropy assumes that every number from the output vector is a (conditionally) independent binary distribution, so each number is between 0 and 1, but they do not necessarily sum up to one, because it can very well happen that all of them true.

If your goal is to predict for each label1, ..., label6 the value, then you should model a categorical distribution for each of the labels. You have six labels, each of them has 3 values, you thus need 18 numbers (logits). The shape of your targets will be batch × 6 × 3 in this case.

model.add(Dense(18, activation='none'))

Because you don't want a single distribution over 18 values, but over 6 × 3 values, you need to reshape the logits first:

model.add(Reshape((6, 3))
model.add(Softmax())

回答2:

Base on the above discussion. Here is the solution for the above problem. As I mentioned we have a total of 5 labels and each label have further three tags like (L, M, H) We can perform encoding in this way

# create a one hot encoding for one list of tags
def custom_encode(tags, mapping):
    # create empty vector
    encoding=[]
    for tag in tags:
        if tag == 'L':
            encoding.append([1,0,0])
        elif tag == 'M':
            encoding.append([0,1,0])
        else:
            encoding.append([0,0,1])
    return encoding

So encoded y-vector will look like

**Labels     Tags             Encoded Tags** 
Label1 ----> [L,L,L,M,H] ---> [ [1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]
Label2 ----> [L,H,L,M,H] ---> [ [1,0,0], [0,0,1], [1,0,0], [0,1,0], [0,0,1] ]
Label3 ----> [L,M,L,M,H] ---> [ [1,0,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label4 ----> [M,M,L,M,H] ---> [ [0,1,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label5 ----> [M,L,L,M,H] ---> [ [0,1,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]

The final layer will be like

 model.add(Dense(15)) #because we have total 5 labels and each has 3 tags so 15 neurons will be on final layer
 model.add(Reshape((5,3))) # each 5 have further 3 tags we need to reshape it
 model.add(Activation('softmax'))

来源：https://stackoverflow.com/questions/58813194/multi-label-image-classification

标签

python-3.x

neural-network

deep-learning

conv-neural-network

multilabel-classification