Face Detection with Camera

社会主义新天地 提交于 2019-11-29 19:26:40

There are 2 ways to go about detecting faces: One is CIFaceDetector and the other is AVCaptureMetadataOutput

Depending on your requirements, choose what is relevant for you.

CIFaceDetector has more features- Eg: Gives you location of eyes and mouth, smile detector, etc

On the other hand, AVCaptureMetadataOutput is computed on the frames and the detected faces are tracked and there is not extra code needed to be added by us. I find that because of the tracking faces are detected more reliably in this process. The con of this is that you will simply detect faces, no position of eyes/mouth. Another advantage of this method is that orientation issues are lesser as you can videoOrientation whenever the device orientation is changed and the orientation of the faces will be relative to that orientation

In my case, my application uses YUV420 as the required format so using CIDetector (which works with RGB) in real-time was not viable. Using AVCaptureMetadataOutput saved a lot of effort and performed more reliably due to continuous tracking.

Once I had the bounding box for the faces, I coded extra features, such as skin detection and applied it on still image.

Note: When you capture still image, the face box information is added along with the metadata so, no sync issues.

You can also use a combination of the two to get better results.

Do explore and evaluate the pros and cons as per your application.

UPDATE

Face rectangle is wrt image origin. So for the screen, it may be different. Use the following:

for (AVMetadataFaceObject *faceFeatures in metadataObjects) {
    CGRect face = faceFeatures.bounds;
    CGRect facePreviewBounds = CGRectMake(face.origin.y * previewLayerRect.size.width,
                               face.origin.x * previewLayerRect.size.height,
                               face.size.width * previewLayerRect.size.height,
                               face.size.height * previewLayerRect.size.width);

    /* Draw rectangle facePreviewBounds on screen */
}

To perform face detection on iOS, there are either CIDetector (Apple) or Mobile Vision (Google) API.

IMO, Google Mobile Vision provides better performance.

If you are interested, here is the project you can play with. (iOS 10.2, Swift 3)


After WWDC 2017, Apple introduces CoreML in iOS 11. The Vision framework makes the face detection more accurate :)

I've made a Demo Project. containing Vision v.s. CIDetector. Also, it contains face landmarks detection in real time.

extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate {
  func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {
    if findFaceControl {
      findFaceControl = false
      let faces = metadata.flatMap { $0 as? AVMetadataFaceObject } .flatMap { (face) -> CGRect in
                  guard let localizedFace =
      previewLayer?.transformedMetadataObject(for: face) else { return nil }
                  return localizedFace.bounds }
      for face in faces {
        let temp = UIView(frame: face)
        temp.layer.borderColor = UIColor.white
        temp.layer.borderWidth = 2.0
        view.addSubview(view: temp)
      }
    }
  }
}

Be sure to remove the views created by didOutputMetadataObjects.

Keeping track of the active facial ids is the best way to do this ^

Also when you're trying to find the location of faces for your preview layer, it is much easier to use facial data and transform. Also I think CIDetector is junk, metadataoutput will use hardware stuff for face detection making it really fast.

Elena

A bit late, but here it is the solution for the coordinates problem. There is a method you can call on the preview layer to transform the metadata object to your coordinate system: transformedMetadataObject(for: metadataObject).

guard let transformedObject = previewLayer.transformedMetadataObject(for: metadataObject) else {
     continue
}
let bounds = transformedObject.bounds
showBounds(at: bounds)

Source: https://developer.apple.com/documentation/avfoundation/avcapturevideopreviewlayer/1623501-transformedmetadataobjectformeta

By the way, in case you are using (or upgrade your project to) Swift 4, the delegate method of AVCaptureMetadataOutputsObject has change into:

func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection)

Kind regards

  1. Create CaptureSession
  2. For AVCaptureVideoDataOutput create following settings

    output.videoSettings = [ kCVPixelBufferPixelFormatTypeKey as AnyHashable: Int(kCMPixelFormat_32BGRA) ]

3.When you receive CMSampleBuffer, create image

DispatchQueue.main.async {
    let sampleImg = self.imageFromSampleBuffer(sampleBuffer: sampleBuffer)
    self.imageView.image = sampleImg
}
func imageFromSampleBuffer(sampleBuffer : CMSampleBuffer) -> UIImage
    {
        // Get a CMSampleBuffer's Core Video image buffer for the media data
        let  imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
        // Lock the base address of the pixel buffer
        CVPixelBufferLockBaseAddress(imageBuffer!, CVPixelBufferLockFlags.readOnly);


        // Get the number of bytes per row for the pixel buffer
        let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer!);

        // Get the number of bytes per row for the pixel buffer
        let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer!);
        // Get the pixel buffer width and height
        let width = CVPixelBufferGetWidth(imageBuffer!);
        let height = CVPixelBufferGetHeight(imageBuffer!);

        // Create a device-dependent RGB color space
        let colorSpace = CGColorSpaceCreateDeviceRGB();

        // Create a bitmap graphics context with the sample buffer data
        var bitmapInfo: UInt32 = CGBitmapInfo.byteOrder32Little.rawValue
        bitmapInfo |= CGImageAlphaInfo.premultipliedFirst.rawValue & CGBitmapInfo.alphaInfoMask.rawValue
        //let bitmapInfo: UInt32 = CGBitmapInfo.alphaInfoMask.rawValue
        let context = CGContext.init(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo)
        // Create a Quartz image from the pixel data in the bitmap graphics context
        let quartzImage = context?.makeImage();
        // Unlock the pixel buffer
        CVPixelBufferUnlockBaseAddress(imageBuffer!, CVPixelBufferLockFlags.readOnly);

        // Create an image object from the Quartz image
        let image = UIImage.init(cgImage: quartzImage!);

        return (image);
    }

By looking at your code I detected 2 things that could lead to wrong/poor face detection.

  1. One of them is the face detector features options where you are filtering the results by [CIDetectorSmile: true, CIDetectorEyeBlink: true]. Try to set it to nil: faceDetector?.features(in: faceImage, options: nil)
  2. Another guess I have is the result image orientation. I noticed you use AVCapturePhotoOutput.jpegPhotoDataRepresentation method to generate the source image for the detection and the system, by default, it generates that image with a specific orientation, of type Left/LandscapeLeft, I think. So, basically you can tell the face detector to have that in mind by using the CIDetectorImageOrientation key.

CIDetectorImageOrientation: the value for this key is an integer NSNumber from 1..8 such as that found in kCGImagePropertyOrientation. If present, the detection will be done based on that orientation but the coordinates in the returned features will still be based on those of the image.

Try to set it like faceDetector?.features(in: faceImage, options: [CIDetectorImageOrientation: 8 /*Left, bottom*/]).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!