Face Detection with Camera

How can I do face detection in realtime just as "Camera" does?

I noticed that AVCaptureStillImageOutput is deprecated after 10.0, so I use AVCapturePhotoOutput instead. However, I found that the image I saved for facial detection is not so satisfied? Any ideas?

UPDATE

After giving a try of @Shravya Boggarapu mentioned. Currently, I use AVCaptureMetadataOutput to detect the face without CIFaceDetector. It works as expected. However, when I'm trying to draw bounds of the face, it seems mislocated. Any idea?

let metaDataOutput = AVCaptureMetadataOutput()

captureSession.sessionPreset = AVCaptureSessionPresetPhoto
    let backCamera = AVCaptureDevice.defaultDevice(withDeviceType: .builtInWideAngleCamera, mediaType: AVMediaTypeVideo, position: .back)
    do {
        let input = try AVCaptureDeviceInput(device: backCamera)

        if (captureSession.canAddInput(input)) {
            captureSession.addInput(input)

            // MetadataOutput instead
            if(captureSession.canAddOutput(metaDataOutput)) {
                captureSession.addOutput(metaDataOutput)

                metaDataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)
                metaDataOutput.metadataObjectTypes = [AVMetadataObjectTypeFace]

                previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
                previewLayer?.frame = cameraView.bounds
                previewLayer?.videoGravity = AVLayerVideoGravityResizeAspectFill

                cameraView.layer.addSublayer(previewLayer!)
                captureSession.startRunning()
            }

        }

    } catch {
        print(error.localizedDescription)
    }

and

extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {
    if findFaceControl {
        findFaceControl = false
        for metadataObject in metadataObjects {
            if (metadataObject as AnyObject).type == AVMetadataObjectTypeFace {
                print("😇😍😎")
                print(metadataObject)
                let bounds = (metadataObject as! AVMetadataFaceObject).bounds
                print("origin x: \(bounds.origin.x)")
                print("origin y: \(bounds.origin.y)")
                print("size width: \(bounds.size.width)")
                print("size height: \(bounds.size.height)")
                print("cameraView width: \(self.cameraView.frame.width)")
                print("cameraView height: \(self.cameraView.frame.height)")
                var face = CGRect()
                face.origin.x = bounds.origin.x * self.cameraView.frame.width
                face.origin.y = bounds.origin.y * self.cameraView.frame.height
                face.size.width = bounds.size.width * self.cameraView.frame.width
                face.size.height = bounds.size.height * self.cameraView.frame.height
                print(face)

                showBounds(at: face)
            }
        }
    }

}
}

Original

see in Github

var captureSession = AVCaptureSession()
var photoOutput = AVCapturePhotoOutput()
var previewLayer: AVCaptureVideoPreviewLayer?    

override func viewWillAppear(_ animated: Bool) {
    super.viewWillAppear(true)

    captureSession.sessionPreset = AVCaptureSessionPresetHigh

    let backCamera = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeVideo)
    do {
        let input = try AVCaptureDeviceInput(device: backCamera)

        if (captureSession.canAddInput(input)) {
            captureSession.addInput(input)

            if(captureSession.canAddOutput(photoOutput)){
                captureSession.addOutput(photoOutput)
                captureSession.startRunning()

                previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
                previewLayer?.videoGravity = AVLayerVideoGravityResizeAspectFill
                previewLayer?.frame = cameraView.bounds

                cameraView.layer.addSublayer(previewLayer!)
            }
        }

    } catch {
        print(error.localizedDescription)
    }

}

func captureImage() {
    let settings = AVCapturePhotoSettings()
    let previewPixelType = settings.availablePreviewPhotoPixelFormatTypes.first!
    let previewFormat = [kCVPixelBufferPixelFormatTypeKey as String: previewPixelType
                         ]
    settings.previewPhotoFormat = previewFormat
    photoOutput.capturePhoto(with: settings, delegate: self)

}



func capture(_ captureOutput: AVCapturePhotoOutput, didFinishProcessingPhotoSampleBuffer photoSampleBuffer: CMSampleBuffer?, previewPhotoSampleBuffer: CMSampleBuffer?, resolvedSettings: AVCaptureResolvedPhotoSettings, bracketSettings: AVCaptureBracketedStillImageSettings?, error: Error?) {
    if let error = error {
        print(error.localizedDescription)
    }
    // Not include previewPhotoSampleBuffer
    if let sampleBuffer = photoSampleBuffer,
        let dataImage = AVCapturePhotoOutput.jpegPhotoDataRepresentation(forJPEGSampleBuffer: sampleBuffer, previewPhotoSampleBuffer: nil) {
            self.imageView.image = UIImage(data: dataImage)
            self.imageView.isHidden = false
            self.previewLayer?.isHidden = true
            self.findFace(img: self.imageView.image!)
        }
}

The findFace works with normal image. However, the image I capture via camera will not work or sometimes only recognize one face.

Normal Image

Capture Image

func findFace(img: UIImage) {
    guard let faceImage = CIImage(image: img) else { return }
    let accuracy = [CIDetectorAccuracy: CIDetectorAccuracyHigh]
    let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: accuracy)


    // For converting the Core Image Coordinates to UIView Coordinates
    let detectedImageSize = faceImage.extent.size
    var transform = CGAffineTransform(scaleX: 1, y: -1)
    transform = transform.translatedBy(x: 0, y: -detectedImageSize.height)


    if let faces = faceDetector?.features(in: faceImage, options: [CIDetectorSmile: true, CIDetectorEyeBlink: true]) {
        for face in faces as! [CIFaceFeature] {

            // Apply the transform to convert the coordinates
            var faceViewBounds =  face.bounds.applying(transform)
            // Calculate the actual position and size of the rectangle in the image view
            let viewSize = imageView.bounds.size
            let scale = min(viewSize.width / detectedImageSize.width,
                            viewSize.height / detectedImageSize.height)
            let offsetX = (viewSize.width - detectedImageSize.width * scale) / 2
            let offsetY = (viewSize.height - detectedImageSize.height * scale) / 2

            faceViewBounds = faceViewBounds.applying(CGAffineTransform(scaleX: scale, y: scale))
            print("faceBounds = \(faceViewBounds)")
            faceViewBounds.origin.x += offsetX
            faceViewBounds.origin.y += offsetY

            showBounds(at: faceViewBounds)
        }

        if faces.count != 0 {
            print("Number of faces: \(faces.count)")
        } else {
            print("No faces 😢")
        }
    }


}

func showBounds(at bounds: CGRect) {
    let indicator = UIView(frame: bounds)
    indicator.frame =  bounds
    indicator.layer.borderWidth = 3
    indicator.layer.borderColor = UIColor.red.cgColor
    indicator.backgroundColor = .clear

    self.imageView.addSubview(indicator)
    faceBoxes.append(indicator)

}

There are 2 ways to go about detecting faces: One is CIFaceDetector and the other is AVCaptureMetadataOutput

Depending on your requirements, choose what is relevant for you.

CIFaceDetector has more features- Eg: Gives you location of eyes and mouth, smile detector, etc

On the other hand, AVCaptureMetadataOutput is computed on the frames and the detected faces are tracked and there is not extra code needed to be added by us. I find that because of the tracking faces are detected more reliably in this process. The con of this is that you will simply detect faces, no position of eyes/mouth. Another advantage of this method is that orientation issues are lesser as you can videoOrientation whenever the device orientation is changed and the orientation of the faces will be relative to that orientation

In my case, my application uses YUV420 as the required format so using CIDetector (which works with RGB) in real-time was not viable. Using AVCaptureMetadataOutput saved a lot of effort and performed more reliably due to continuous tracking.

Once I had the bounding box for the faces, I coded extra features, such as skin detection and applied it on still image.

Note: When you capture still image, the face box information is added along with the metadata so, no sync issues.

You can also use a combination of the two to get better results.

Do explore and evaluate the pros and cons as per your application.

UPDATE

Face rectangle is wrt image origin. So for the screen, it may be different. Use the following:

for (AVMetadataFaceObject *faceFeatures in metadataObjects) {
    CGRect face = faceFeatures.bounds;
    CGRect facePreviewBounds = CGRectMake(face.origin.y * previewLayerRect.size.width,
                               face.origin.x * previewLayerRect.size.height,
                               face.size.width * previewLayerRect.size.height,
                               face.size.height * previewLayerRect.size.width);

    /* Draw rectangle facePreviewBounds on screen */
}

To perform face detection on iOS, there are either CIDetector (Apple) or Mobile Vision (Google) API.

IMO, Google Mobile Vision provides better performance.

If you are interested, here is the project you can play with. (iOS 10.2, Swift 3)

After WWDC 2017, Apple introduces CoreML in iOS 11. The Vision framework makes the face detection more accurate :)

I've made a Demo Project. containing Vision v.s. CIDetector. Also, it contains face landmarks detection in real time.

extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate {
  func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {
    if findFaceControl {
      findFaceControl = false
      let faces = metadata.flatMap { $0 as? AVMetadataFaceObject } .flatMap { (face) -> CGRect in
                  guard let localizedFace =
      previewLayer?.transformedMetadataObject(for: face) else { return nil }
                  return localizedFace.bounds }
      for face in faces {
        let temp = UIView(frame: face)
        temp.layer.borderColor = UIColor.white
        temp.layer.borderWidth = 2.0
        view.addSubview(view: temp)
      }
    }
  }
}

Be sure to remove the views created by didOutputMetadataObjects.

Keeping track of the active facial ids is the best way to do this ^

Also when you're trying to find the location of faces for your preview layer, it is much easier to use facial data and transform. Also I think CIDetector is junk, metadataoutput will use hardware stuff for face detection making it really fast.

Elena

A bit late, but here it is the solution for the coordinates problem. There is a method you can call on the preview layer to transform the metadata object to your coordinate system: transformedMetadataObject(for: metadataObject).

guard let transformedObject = previewLayer.transformedMetadataObject(for: metadataObject) else {
     continue
}
let bounds = transformedObject.bounds
showBounds(at: bounds)

Source: https://developer.apple.com/documentation/avfoundation/avcapturevideopreviewlayer/1623501-transformedmetadataobjectformeta

By the way, in case you are using (or upgrade your project to) Swift 4, the delegate method of AVCaptureMetadataOutputsObject has change into:

func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection)

Kind regards

Create CaptureSession
For AVCaptureVideoDataOutput create following settings

output.videoSettings = [ kCVPixelBufferPixelFormatTypeKey as AnyHashable: Int(kCMPixelFormat_32BGRA) ]

3.When you receive CMSampleBuffer, create image

DispatchQueue.main.async {
    let sampleImg = self.imageFromSampleBuffer(sampleBuffer: sampleBuffer)
    self.imageView.image = sampleImg
}
func imageFromSampleBuffer(sampleBuffer : CMSampleBuffer) -> UIImage
    {
        // Get a CMSampleBuffer's Core Video image buffer for the media data
        let  imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
        // Lock the base address of the pixel buffer
        CVPixelBufferLockBaseAddress(imageBuffer!, CVPixelBufferLockFlags.readOnly);


        // Get the number of bytes per row for the pixel buffer
        let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer!);

        // Get the number of bytes per row for the pixel buffer
        let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer!);
        // Get the pixel buffer width and height
        let width = CVPixelBufferGetWidth(imageBuffer!);
        let height = CVPixelBufferGetHeight(imageBuffer!);

        // Create a device-dependent RGB color space
        let colorSpace = CGColorSpaceCreateDeviceRGB();

        // Create a bitmap graphics context with the sample buffer data
        var bitmapInfo: UInt32 = CGBitmapInfo.byteOrder32Little.rawValue
        bitmapInfo |= CGImageAlphaInfo.premultipliedFirst.rawValue & CGBitmapInfo.alphaInfoMask.rawValue
        //let bitmapInfo: UInt32 = CGBitmapInfo.alphaInfoMask.rawValue
        let context = CGContext.init(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo)
        // Create a Quartz image from the pixel data in the bitmap graphics context
        let quartzImage = context?.makeImage();
        // Unlock the pixel buffer
        CVPixelBufferUnlockBaseAddress(imageBuffer!, CVPixelBufferLockFlags.readOnly);

        // Create an image object from the Quartz image
        let image = UIImage.init(cgImage: quartzImage!);

        return (image);
    }

By looking at your code I detected 2 things that could lead to wrong/poor face detection.

One of them is the face detector features options where you are filtering the results by [CIDetectorSmile: true, CIDetectorEyeBlink: true]. Try to set it to nil: faceDetector?.features(in: faceImage, options: nil)
Another guess I have is the result image orientation. I noticed you use AVCapturePhotoOutput.jpegPhotoDataRepresentation method to generate the source image for the detection and the system, by default, it generates that image with a specific orientation, of type Left/LandscapeLeft, I think. So, basically you can tell the face detector to have that in mind by using the CIDetectorImageOrientation key.

CIDetectorImageOrientation: the value for this key is an integer NSNumber from 1..8 such as that found in kCGImagePropertyOrientation. If present, the detection will be done based on that orientation but the coordinates in the returned features will still be based on those of the image.

Try to set it like faceDetector?.features(in: faceImage, options: [CIDetectorImageOrientation: 8 /*Left, bottom*/]).

来源：https://stackoverflow.com/questions/41354698/face-detection-with-camera

标签

ios

swift

avfoundation

face-detection