图像处理：“可口可乐”识别的算法改进

问题：

One of the most interesting projects I've worked on in the past couple of years was a project about image processing . 我过去几年中最有趣的项目之一是关于图像处理的项目。 The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). 目的是开发一个能够识别可口可乐“罐头”的系统 （请注意，我强调的是“罐头”一词，稍后您会看到原因）。 You can see a sample below, with the can recognized in the green rectangle with scale and rotation. 您可以在下面看到一个示例，该示例在带有刻度和旋转的绿色矩形中可以识别。

模板匹配

Some constraints on the project: 对项目的一些限制：

The background could be very noisy. 背景可能非常嘈杂。
The can could have any scale or rotation or even orientation (within reasonable limits). 罐可以有任何比例，旋转或什至取向（在合理范围内）。
The image could have some degree of fuzziness (contours might not be entirely straight). 图像可能有一定程度的模糊性（轮廓可能不完全笔直）。
There could be Coca-Cola bottles in the image, and the algorithm should only detect the can ! 图像中可能有可口可乐瓶，该算法只能检测到罐头！
The brightness of the image could vary a lot (so you can't rely "too much" on color detection). 图像的亮度可能相差很大（因此您不能过多地依赖颜色检测）。
The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle. 罐可以部分地隐藏在两侧或中间，可能部分地隐藏了一瓶后面。
There could be no can at all in the image, in which case you had to find nothing and write a message saying so. 有可能是没有在图像中所有，在这种情况下，你必须找到什么，写一条消息这样说。

So you could end up with tricky things like this (which in this case had my algorithm totally fail): 因此，您可能会遇到诸如此类的棘手事情（在这种情况下，我的算法完全失败了）：

总失败

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. 我前一段时间做了这个项目，并且做得很有趣，并且实现得很好。 Here are some details about my implementation: 以下是有关我的实现的一些细节：

Language : Done in C++ using OpenCV library. 语言：使用OpenCV库在C ++中完成。

Pre-processing : For the image pre-processing, ie transforming the image into a more raw form to give to the algorithm, I used 2 methods: 预处理 ：对于图像预处理，即将图像转换为更原始的形式以提供给算法，我使用了2种方法：

Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. 将颜色域从RGB更改为HSV，并基于“红色”色调进行过滤，饱和度高于特定阈值以避免产生类似橙色的颜色，而对低值进行过滤以避免产生深色。 The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. 最终结果是一个二进制的黑白图像，其中所有白色像素将代表与该阈值匹配的像素。 Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with. 显然，图像中仍然有很多废话，但这减少了必须处理的尺寸数。
Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise. 使用中值滤波进行噪声滤波（获取所有邻居的中值像素值，然后用该值替换像素）以减少噪声。
Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. 经过2个先验步骤后，使用Canny Edge Detection滤镜获取所有项目的轮廓。

Algorithm : The algorithm itself I chose for this task was taken from this awesome book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). 算法：我为此任务选择的算法本身取自于这本很棒的书中有关特征提取的书，并称为通用霍夫变换（与常规霍夫变换完全不同）。 It basically says a few things: 它基本上说了几件事：

You can describe an object in space without knowing its analytical equation (which is the case here). 您可以在不知道其解析方程的情况下描述空间物体（此处就是这种情况）。
It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor. 它可以抵抗诸如缩放和旋转之类的图像变形，因为它将基本上测试图像的缩放因子和旋转因子的每种组合。
It uses a base model (a template) that the algorithm will "learn". 它使用算法将“学习”的基本模型（模板）。
Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model. 轮廓图像中剩余的每个像素将投票给另一个像素，该像素根据其从模型中学到的内容，应该是对象的中心（就重力而言）。

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below: 最后，您将获得投票的热图，例如，此处罐轮廓的所有像素都将为其重力中心投票，因此在与像素相对应的同一像素中将有很多投票居中，并会在热图中看到一个峰值，如下所示：

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). 有了这些功能后，您就可以使用简单的基于阈值的启发式方法来确定中心像素的位置，从中可以得出比例尺和旋转角度，然后在其周围绘制一个小矩形（最终比例尺和旋转系数显然相对于您的原始模板）。 In theory at least... 理论上至少...

Results : Now, while this approach worked in the basic cases, it was severely lacking in some areas: 结果：现在，尽管此方法在基本情况下可行，但在某些领域却严重缺乏：

It is extremely slow ! 太慢了 ！ I'm not stressing this enough. 我的压力还不够。 Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small. 处理这30张测试图像几乎需要整整一天的时间，这显然是因为我对旋转和平移具有非常高的缩放系数，因为某些罐非常小。
It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes) 当瓶子出现在图像中时，它完全丢失了，并且出于某种原因几乎总是找到瓶子而不是罐子（也许是因为瓶子更大，所以像素更多，因此票数更多）
Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map. 模糊图像也不是很好，因为投票最终以像素为中心围绕中心的随机位置，从而以非常嘈杂的热图结束。
In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized. 实现了平移和旋转的不变性，但没有实现定向，这意味着未识别未直接面对相机物镜的罐子。

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned? 您能否使用专有的OpenCV功能帮助我改善特定算法，以解决上述四个特定问题？

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. 我希望有些人也能从中学到一些东西，毕竟我认为不仅提出问题的人也应该学习。 :) :)