问题
I'm running the typical code found in the github repository tensorflow/object_deteciton: https://github.com/tensorflow/models/tree/master/research/object_detection
Specifically the 'object_detection_tutorial.ipynb' file. The main loop is this section here:
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
I'm just looking for some advise as to the best way to actually save what the image has identified int a dataframe that would ideally store the category of the object detected for each object detected in the image.
Any help would be appreciate (:
回答1:
Ok, rather late I guess, but I am working on this now. So I went through the same pain over a few days, and in the end got something to work. Just restricting myself to your code snippet, I added a few pieces and got this:
# Initialize hitlist
hitf = open("hitlist.csv",'w')
hitf.write('image,class,score,bb0,bb1,bb2,bb3\n')
hitlim = 0.5
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Write the results to hitlist - one line per hit over the 0.5
nprehit = scores.shape[1] # 2nd array dimension
for j in range(nprehit):
fname = "image"+str(i)
classid = int(classes[i][j])
classname = category_index[classid]["name"]
score = scores[i][j]
if (score>=hitlim):
sscore = str(score)
bbox = boxes[i][j]
b0 = str(bbox[0])
b1 = str(bbox[1])
b2 = str(bbox[2])
b3 = str(bbox[3])
line = ",".join([fname,classname,sscore,b0,b1,b2,b3])
hitf.write(line+"\n")
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
# close hitlist
hitf.flush()
hitf.close()
Notes:
There are three sections of added code, one initializes the
hitlist.csv, one adds a line per "pre-hit" that is above the confidence limit of 0.5, one closes the file.It is intentionally not very "pythonic", it uses simple and obvious constructs to illustrate what is going on. Except the
",".join(...), which I like so much I could not resist.The number of pre-hits can be found by looking at the 2nd dimension of scores, or the classids.
the classid that come back are
floats even though you mostly need them as an integer. Conversion is easy.there may be some small copy and paste errors in here since I don't really have a MVE (Minimum Verifyable Example) for it.
I am working with the
rfcn_resnet101_coco_2017_11_08object detection model instead of thessd_mobilenet_v1_coco_2017_11_17so my hitlist and scores are a bit different (worse actually).
Here is what a csv looks like:
image,class,score,bb0,bb1,bb2,bb3
image0,kite,0.997912,0.086756825,0.43700624,0.1691603,0.4966739
image0,person,0.9968072,0.7714941,0.15771112,0.945292,0.20014654
image0,person,0.9858992,0.67766637,0.08734644,0.8385928,0.12563995
image0,kite,0.9683157,0.26249793,0.20640253,0.31359094,0.2257214
image0,kite,0.8578382,0.3803091,0.42938906,0.40701985,0.4453904
image0,person,0.85244817,0.5692219,0.06266626,0.6282138,0.0788657
image0,kite,0.7622662,0.38192448,0.42580333,0.4104231,0.442965
image0,person,0.6722884,0.578461,0.022049228,0.6197509,0.036917627
image0,kite,0.6671517,0.43708095,0.80048573,0.47312954,0.8156846
image0,person,0.6577289,0.5996533,0.13272598,0.63358027,0.1430584
image0,kite,0.5893124,0.3790631,0.3451705,0.39845183,0.35965574
image0,person,0.51051,0.57377476,0.025907507,0.6221084,0.04294989
For this image (from the ipython notebook - but with a different object detection model).
回答2:
I think the syntax of the described object_detection.py file changed a bit. I adjusted the described answer a bit for the new syntax:
This is the position you should find in the code:
# Actual detection.
output_dict = run_inference_for_single_image(image_np, detection_graph)
And then this can be added:
# store boxes in dataframe!
cut_off_scores = len(list(filter(lambda x: x >= 0.1, output_dict['detection_scores'])))
detect_scores = []
detect_classes = []
detect_ymin = []
detect_xmin = []
detect_ymax = []
detect_xmax = []
for j in range(cut_off_scores):
detect_scores.append(output_dict['detection_scores'][j])
detect_classes.append(output_dict['detection_classes'][j])
# Assumption: ymin, xmin, ymax, xmax:
boxes = output_dict['detection_boxes'][j]
detect_ymin.append(boxes[0])
detect_xmin.append(boxes[1])
detect_ymax.append(boxes[2])
detect_xmax.append(boxes[3])
# Assumption: your files are named image1, image2, etc.
Identifier = ("image" + str(n))
Id_list = [Identifier] * cut_off_scores
Detected_objects = pd.DataFrame(
{'Image': Id_list,
'Score': detect_scores,
'Class': detect_classes,
'Ymin': detect_ymin,
'Xmax': detect_xmax,
'Ymax': detect_ymax,
'Xmax': detect_xmax
})
回答3:
I tried both the above methods, but couldn't succeed. Mike Wise's code has a small error of missing the value of i. And User27074, there is a problem in appending the values of xmax, xmin, etc.
I tried a simple code, it works just the coordinates of the detected objects are saved in percentage and needs to be multiplied with the height and width of the image later.
detected_boxes = []
h = image_height = 500 #Change accordingly
w = image_width = 500 #change accordingly
#Columns' format 'ymin','xmin','ymax', 'xmax', 'class', 'Detection score'
for i, box in enumerate(np.squeeze(boxes)):
if (np.squeeze(scores)[i] > 0.85):
box[0] = int(box[0] * h)
box[1] = int(box[1] * w)
box[2] = int(box[2] * h)
box[3] = int(box[3] * w)
box = np.append(box, np.squeeze(classes)[i])
box = np.append(box, np.squeeze(scores)[i]*100)
detected_boxes.append(box)
np.savetxt('detection_coordinates.csv', detected_boxes, fmt='%i', delimiter=',')
来源:https://stackoverflow.com/questions/47259171/saving-the-objects-detected-in-a-dataframe-tensorflow-object-detection