问题
Background
I'm using source code from Tensorflow's object detection, as well as Firebase's MLInterpreter. I'm trying to stick closely to the prescribed steps in the documentation. During training, I can see on TensorBoard that the models is training properly, but somehow I am not exporting and wiring things up correctly for inference. Here are the details:
Commands I used, from training through .tflite file
First, I submit the training job using a ssd_mobilenet_v1 config file. The config file is more or less the same that Tensorflow provides by default - I have only modified the class count and the bucket name.
gcloud ml-engine jobs submit training `whoami`_<JOB_NAME>_`date +%m_%d_%Y_%H_%M_%S` \
--runtime-version 1.12 \
--job-dir=gs://<BUCKET_NAME>/model_dir \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--model_dir=gs://<BUCKET_NAME>/model_dir \
--pipeline_config_path=gs://<BUCKET_NAME>/data/ssd_mobilenet_v1.config
Then I export the tflite_graph.pb
file:
python models/research/object_detection/export_tflite_ssd_graph.py \
--input_type image_tensor \
--pipeline_config_path ssd_mobilenet_v1.config \
--trained_checkpoint_prefix model.ckpt-264012 \
--output_directory exported_tflite
Great, at this point I have tflite_graph.pb
, and need to get from there to the actual .tflite
file:
tflite_convert \
--output_file=model.tflite \
--graph_def_file=exported_tflite/tflite_graph.pb \
--input_arrays=normalized_input_image_tensor \
--output_arrays=TFLite_Detection_PostProcess \
--input_shapes=1,300,300,3 \
--allow_custom_ops
Performing inference with Swift and Firebase
I'd like to eventually use AVFoundation to capture images from the camera, but to make this more readable I'll post just the relevant parts of the code:
Here's where the model is initialized and ioOptions are set. I found a comment at the top of export_tflite_ssd_graph (used above) that I used to determine the ioOptions, but I continue to be unconvinced that I configured those properly:
guard let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite") else {
self.interpreter = nil;
super.init()
return;
}
let localModel = CustomLocalModel(modelPath: modelPath)
self.interpreter = ModelInterpreter.modelInterpreter(localModel: localModel)
do {
try self.ioOptions.setInputFormat(index: 0, type: .float32, dimensions: [1, 300, 300, 3])
try self.ioOptions.setOutputFormat(index: 0, type: .float32, dimensions: [1, 10, 4])
try self.ioOptions.setOutputFormat(index: 1, type: .float32, dimensions: [1, 10])
try self.ioOptions.setOutputFormat(index: 2, type: .float32, dimensions: [1, 10])
try self.ioOptions.setOutputFormat(index: 3, type: .float32, dimensions: [1])
} catch let error as NSError {
print("Failed to set input or output format with error: \(error.localizedDescription)")
}
After setting things up, I use the following lines to perform inference later on. Basically, I convert the databuffer to CGImage, do some resizing, and then repack the RGB values into a buffer that I can pass to the model for inference:
# Draw the image in a context
guard let context = CGContext(
data: nil,
width: image.width, height: image.height,
bitsPerComponent: 8, bytesPerRow: image.width * 4,
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
) else {
return;
}
context.draw(image, in: CGRect(x: 0, y: 0, width: image.width, height: image.height))
guard let imageData = context.data else { return; }
# "image" is now a CGImage
let inputs = ModelInputs()
var inputData = Data()
do {
for row in 0 ..< 300 {
for col in 0 ..< 300 {
let offset = 4 * (col * context.width + row)
// (Ignore offset 0, the unused alpha channel)
let red = imageData.load(fromByteOffset: offset+1, as: UInt8.self)
let green = imageData.load(fromByteOffset: offset+2, as: UInt8.self)
let blue = imageData.load(fromByteOffset: offset+3, as: UInt8.self)
var normalizedRed = Float32(red) / 255.0
var normalizedGreen = Float(green) / 255.0
var normalizedBlue = Float(blue) / 255.0
// Append normalized values to Data object in RGB order.
let elementSize = MemoryLayout.size(ofValue: normalizedRed)
var bytes = [UInt8](repeating: 0, count: elementSize)
memcpy(&bytes, &normalizedRed, elementSize)
inputData.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedGreen, elementSize)
inputData.append(&bytes, count: elementSize)
memcpy(&bytes, &normalizedBlue, elementSize)
inputData.append(&bytes, count: elementSize)
}
}
try inputs.addInput(inputData)
} catch let error {
print("Failed to add input: \(error)")
}
guard let interpret = self.interpreter else { return; }
print("Running interpreter")
interpret.run(inputs: inputs, options: self.ioOptions) { outputs, error in
guard error == nil, let outputs = outputs else { return; }
do {
try print(outputs.output(index: 1))
try print(outputs.output(index: 2))
...
} catch let error {
print(error)
}
}
Problem / Question
I actually get an output finally, after a few hours of trying to get the data into a format that doesn't throw errors.
The problem is, the output probabilities are really low and the classes are almost never correct. I know that my model has better accuracy than this, and am feeling like I've done something wrong between getting the checkpoint files and actually running inference on the .tflite file.
Can anybody who has worked with object detection see where I may have gone off course?
来源:https://stackoverflow.com/questions/59736600/ssd-mobilenet-v1-with-tflite-giving-bad-output