I am investigating OpenCV for my aging server as an alternative to Tensorflow for facial recognition and hopefully GPU accelerated image down sampling. Tensorflow is a fine library however my server doesn’t have the AVX or AVX2 instruction sets and the GTX 570 only supports CUDA Compute Capability 2.0, both of which are required for Tensorflow. My approach is to first look at scaling the images, then see how to move it onto the GPU, then finally start looking at facial recognition.

First step is getting it installed. Although the most recent version is the 4 series it appears as though most of the material out there is still for the 3 series. On the release page there is not an OSX release unfortunately. Consulting the general internet people have installed it via Homebrew, which I am still scared by watching machines being bricked by that. So to the source!

The package is built using CMake. There was a rather old version of CMake on my laptop. Easy to update. Language bindings to Python were intentionally disabled as well as compiling examples.

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=$HOME/tools/opencv-4.0.1 -D INSTALL_PYTHON_EXAMPLES=OFF -D INSTALL_C_EXAMPLES=OFF -D OPENCV_ENABLE_NONFREE=ON -D BUILD_EXAMPLES=OFF ..

While that is compiling I was wondering what Ubuntu 18.10 had available for OpenCV. Looks like the most recent version available through the package management system is 3.2. Looks like the library dates back to 2016, so it is fairly old. I will probably elect to compile from source to ensure I have reasonable parity between my laptop and server. Of course the server won the compilation race by a long shot, being able to compile up to 12 units concurrently.

Building Against the installed OpenCV library

I elected to use CLion since I have a license and I was hoping it would reduce the time to implement with it’s project templates. The project template produces a CMake compatible environment with C++17. Out of the box I had the following file:

cmake_minimum_required(VERSION 3.12)
project(opencv_play)

set(CMAKE_CXX_STANDARD 17)

add_executable(opencv_play main.cpp)

The main.cpp file contained a Hello World example in C++. Not bad. The target can be configured with cmake . -B build which will produce the relative directory build. A make within the build directory will produce the executable artifact opencv_play.

Next step was to get the OpenCV project properly linked in. The failing test case should look something like the following example:

#include<iostream>
#include <opencv2/opencv.hpp>

using namespace std;

int main( int argc, char** argv){
    cout << "Hello World" << endl;
    return 0;
}

This produces an error like the following on OSX.

opencv-play/main.cpp:6:10: fatal error: 'opencv2/opencv.hpp' file not found
#include <opencv2/opencv.hpp>
         ^~~~~~~~~~~~~~~~~~~~

Since I am not familiar with the CMake system I had to do a bit of web searching. It’s Time To Do CMake Right was a great article pointing towards a way to properly implement a CMake dependency. I added the following stanzas to the CMakeLists.txt file.

cmake_minimum_required(VERSION 3.12)
project(opencv_play)

set(CMAKE_CXX_STANDARD 17)

find_package( OpenCV REQUIRED HINTS "/home/user/tools/opencv-4.0.1/lib/cmake/opencv4" )


add_executable(opencv_play main.cpp)
target_link_libraries( opencv_play ${OpenCV_LIBS} )

At this time although I am sure there is a better way to promote the discovery of the library I hard coded the path since I am exploring the library. This allows for correct linking against the OpenCV libraries.

Image Scaling on the CPU

The following code sample will produce a CPU down sampled image. This uses the the LANCZOS4 algorithm since it appears to be the best available implementation for the output image. The output image will forced into a 256 pixel square, distorting the image to fit. The waitkey(0) function will block until the window produced by imshow(string, Mat) receives the Escape character.

#include<iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int cpuImageResize( const string fileName ){
    auto image = imread( fileName, IMREAD_COLOR );
    if ( !image.data ){
        cerr << "Unable to load image " << fileName << endl;
        return -1;
    }

    Mat result;
    Size size(256,256);
    resize( image, result, size, INTER_LANCZOS4);

    namedWindow("Display Image", WINDOW_AUTOSIZE );
    imshow("Display Image", result);
    waitKey(0);

    return 0;
}

int main( int argc, char** argv){
    auto fileName = "test.jpg";
    return cpuImageResize( fileName );
}

To get the test image copied to the build directory the following stanza needs to be added to the CMakeLists.txt file:

file(COPY test.jpg DESTINATION ${CMAKE_BINARY_DIR})

Image Scaling on the GPU?

Many of the examples available are for the OpenCV version 3 branch. Part of the major version change was the underlying the architecture of the platform to split a the processing pipeline description and application. This feels similar to the limited amount of experience I have with the Tensorflow API. As a result the tutorials and community posts were not any help in figuring out how to build against the API, resulting in linking errors.

From what I had read, the changes were to prevent arbitrary writes back to the CPU and reduce the cost of implementing backend to perform the computations. As a result the application client code is portable between underlying computational platforms as long as you do not create additional operations for a specific backend.

A majority of the functions are under cv::gapi in opencv2/gapi.hpp. To get high level operations such as resize the header opencv2/gapi/core.hpp needs to be included. The resize operation takes a Size object during the pipeline description, or optionally a scale parameter. Since sizes are described during pipeline creation, the pipeline must be tailored to each aspect ratio. Here is the minimal example:

#include<iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/gapi.hpp>
#include <opencv2/gapi/core.hpp>

using namespace std;
using namespace cv;
using namespace cv::gapi;

int gpuImageResize( const string fileName ){
    auto image = imread( fileName, IMREAD_COLOR );
    if ( !image.data ){
        cerr << "Unable to load image " << fileName << endl;
        return -1;
    }

    GMat in;
    Size size(256,256);
    auto dest = resize( in, size, INTER_LANCZOS4);
    GComputation computation(GIn(in), GOut(dest));

    Mat result;
    computation.apply(gin(image), gout(result));

    namedWindow("Display Image", WINDOW_AUTOSIZE );
    imshow("Display Image", result);
    waitKey(0);

    return 0;
}


int main( int argc, char** argv){
    auto fileName = "test.jpg";
    return gpuImageResize( fileName );
}

Computationally Accelerated Platforms

Despite a performance benefit of 20% when using reusing the pipeline with the gapi implementation I fear this may still be executing on the CPU. Scaling approximately 1250 images a second with the non-pipelined implementation versus 1500 images a second with the pipeline. I was unable to verify which backend was performing the processing at the time.

A future project will be building a diagnostic tool to verify the expected backends are being used, such as OpenCL or CUDA.