问题
I trained a network using the provided ImageReader and now, I'm trying to use the CNTK EvalDll in a C# project to evaluate RGB Images.
I've seen examples related to the EvalDll, but the input is always an array of float/double, never images.
How can I use the exposed interface to use the trained network with an RGB image ?
回答1:
I'll assume that you'll want the equivalent of reading with the ImageReader, where your reader config looks something like
features=[
width=224
height=224
channels=3
cropType=Center
]
You'll need helper functions to create the crop, and to re-size the image to the size accepted by the network.
I'll define 2 extension methods of System.Drawing.Bitmap, one to crop, and one to re-size:
open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
/// Crops the image in the present object, starting at the given (column, row), and retaining
/// the given number of columns and rows.
member this.Crop(column, row, numCols, numRows) =
let rect = Rectangle(column, row, numCols, numRows)
this.Clone(rect, this.PixelFormat)
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
member this.ResizeImage(width, height, useHighQuality) =
// Rather than using image.GetThumbnailImage, use direct image resizing.
// GetThumbnailImage throws odd out-of-memory exceptions on some
// images, see also
// http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server
// Use the interpolation method suggested on
// http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp
let rect = Rectangle(0, 0, width, height);
let destImage = new Bitmap(width, height);
destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
use graphics = Graphics.FromImage destImage
graphics.CompositingMode <- CompositingMode.SourceCopy;
if useHighQuality then
graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
graphics.CompositingQuality <- CompositingQuality.HighQuality
graphics.SmoothingMode <- SmoothingMode.HighQuality
graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
else
graphics.InterpolationMode <- InterpolationMode.Low
use wrapMode = new ImageAttributes()
wrapMode.SetWrapMode WrapMode.TileFlipXY
graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
destImage
Based on that, define a function to do the center crop:
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
let cropSize =
float(min image.Height image.Width) * cropRatio
|> int
let startRow = (image.Height - cropSize) / 2
let startCol = (image.Width - cropSize) / 2
image.Crop(startCol, startRow, cropSize, cropSize)
Then plug it all together: crop, resize, then traverse the image in the plane order that OpenCV uses:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
// Apply the same image pre-processing that is typically done
// in CNTK when running it in test or write mode: Take a center
// crop of the image, then re-size it to the network input size.
let cropped = CenterCrop 1.0 image
let resized = cropped.ResizeImage(targetSize, targetSize, false)
// Ensure that the initial capacity of the list is provided
// with the constructor. Creating the list via the default constructor
// makes the whole operation 20% slower.
let features = List (targetSize * targetSize * 3)
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
for c in 0 .. 2 do
for h in 0 .. (resized.Height - 1) do
for w in 0 .. (resized.Width - 1) do
let pixel = resized.GetPixel(w, h)
let v =
match c with
| 0 -> pixel.B
| 1 -> pixel.G
| 2 -> pixel.R
| _ -> failwith "No such channel"
|> float32
features.Add v
features
Call ImageToFeatures with the image in question, feed the result into an instance of IEvaluateModelManagedF, and you're good. I'm assuming your RGB image comes in myImage, and you're doing binary classification with a network size of 224 x 224.
let LoadModelOnCpu modelPath =
let model = new IEvaluateModelManagedF()
let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
model.Init description
model.CreateNetwork description
model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)
回答2:
I implemented similar code in C#, which loads in a model, reads a test image, does the appropriate cropping/scaling/etc, and runs the model. As Anton pointed out, the output does not match 100% to that of CNTK but is very close.
Code for image reading / cropping / scaling:
private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
{
var rect = new Rectangle(col, row, numCols, numRows);
return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
}
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
{
var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
var startCol = (img.Width - cropSize) / 2;
var startRow = (img.Height - cropSize) / 2;
return ImCrop(img, startCol, startRow, cropSize, cropSize);
}
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
private static Bitmap ImResize(Bitmap img, int width, int height)
{
return new Bitmap(img, new Size(width, height));
}
Code for loading the model and the xml file which contains the pixel means:
public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
{
var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
Stopwatch stopWatch = new Stopwatch();
var model = new IEvaluateModelManagedF();
model.CreateNetwork(networkConfiguration, deviceId: -1);
stopWatch.Stop();
Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
return model;
}
/// Read the xml mean file, i.e. the offsets which are substracted
/// from each pixel in an image before using it as input to a CNTK model.
public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
{
// Read and parse pixel value xml file
XmlTextReader reader = new XmlTextReader(XmlPath);
reader.ReadToFollowing("data");
reader.Read();
var pixelMeansXml =
reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(Single.Parse)
.ToArray();
// Re-order mean pixel values to be in the same order as the bitmap
// image (as outputted by the getRGBChannels() function).
int inputDim = 3 * ImgWidth * ImgHeight;
Debug.Assert(pixelMeansXml.Length == inputDim);
var pixelMeans = new float[inputDim];
int counter = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < ImgHeight; h++)
for (int w = 0; w < ImgWidth; w++)
{
int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
pixelMeans[counter++] = pixelMeansXml[xmlIndex];
}
return pixelMeans;
}
Code to load in an image and convert to model input:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor, and the mean
/// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
{
// Apply the same image pre-processing that is done typically in CNTK:
// Take a center crop of the image, then re-size it to the network input size.
var imgCropped = ImCropToCenter(img, 1.0);
var imgResized = ImResize(imgCropped, targetSize, targetSize);
// Convert pixels to CNTK model input.
// Fast pixel extraction is ~5 faster while giving identical output
var features = new float[3 * imgResized.Height * imgResized.Width];
var boFastPixelExtraction = true;
if (boFastPixelExtraction)
{
var pixelsRGB = ImGetRGBChannels(imgResized);
for (int c = 0; c < 3; c++)
{
byte[] pixels = pixelsRGB[2 - c];
Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
for (int i = 0; i < pixels.Length; i++)
{
int featIndex = i + c * pixels.Length;
features[featIndex] = pixels[i] - pixelMeans[featIndex];
}
}
}
else
{
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
// Note: calling GetPixel(w, h) repeatedly is slow!
int featIndex = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < imgResized.Height; h++)
for (int w = 0; w < imgResized.Width; w++)
{
var pixel = imgResized.GetPixel(w, h);
float v;
if (c == 0)
v = pixel.B;
else if (c == 1)
v = pixel.G;
else if (c == 2)
v = pixel.R;
else
throw new Exception("");
// Substract pixel mean
features[featIndex] = v - pixelMeans[featIndex];
featIndex++;
}
}
return features.ToList();
}
/// Convert bitmap image to R,G,B channel byte arrays.
/// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
{
// Lock the bitmap's bits.
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
// Declare an array to hold the bytes of the bitmap.
int bytes = bmpData.Stride * bmp.Height;
byte[] rgbValues = new byte[bytes];
byte[] r = new byte[bytes / 3];
byte[] g = new byte[bytes / 3];
byte[] b = new byte[bytes / 3];
// Copy the RGB values into the array, starting from ptr to the first line
IntPtr ptr = bmpData.Scan0;
Marshal.Copy(ptr, rgbValues, 0, bytes);
// Populate byte arrays
int count = 0;
int stride = bmpData.Stride;
for (int col = 0; col < bmpData.Height; col++)
{
for (int row = 0; row < bmpData.Width; row++)
{
int offset = (col * stride) + (row * 3);
b[count] = rgbValues[offset];
g[count] = rgbValues[offset + 1];
r[count++] = rgbValues[offset + 2];
}
}
bmp.UnlockBits(bmpData);
return new List<byte[]> { r, g, b };
}
来源:https://stackoverflow.com/questions/37300946/how-to-use-rgb-image-as-input-for-the-c-sharp-evaldll-wrapper