Haar Cascade and false positives – choose your training data well

Many tutorials can guide an image recognition newbie to training a Haar cascade classifier, particularly the OpenCV official tutorial, Harrison’s Haar Cascade Object Detection Face & Eye OpenCV Python Tutorial, and Creating your own Haar Cascade OpenCV Python Tutorial.

Once you’ve managed to set up the software, gathered the samples, test data, and all the applications, images, and scripts required, and you’ve trained your first classifier… You may discover that… It doesn’t work that well, actually.

In my case, I had a particularly severe problem with false positives. I wanted to detect a company logo. The cascade I’ve trained detected the logo in many frames it wasn’t actually present. Granted, real occurrences were correctly marked most of the time, but the cascade was still unusable because of the false positives.

These aren’t the ~~droids~~ logos you’re looking for

I double-checked my process – everything seemed in order. So I lowered the parameters for training the model. I allowed a very, very low false-positive rate and started training. Over a week later, the cascade still wasn’t done. Stage 13 of 20… And then, another few days later came windows update, and the days of work were lost.

I found it! … and some more… “Carrefour trademark”

In the meantime, I dived into the API. Where is the certainty of the match? It was not easy to find, eventually using SO and OpenCV documentation, I managed to get the parameter I wanted. Here’s a quick code snippet:

<br /><br /><br /><br /><br />
detected = cascade.detectMultiScale3(image_gray, outputRejectLevels=True)<br /><br /><br /><br /><br />
detected_max=max(detected[2])

1 2	detected = cascade.detectMultiScale3(image_gray, outputRejectLevels=True) detected_max=max(detected[2])

At least I think (and hope) this is what I was looking for. The images I generated seemed to confirm that. But it’s a real disadvantage of Python OpenCV API, in my opinion. In C, you’d get structs with named fields. In Java, you’d get objects with named fields. In python, you get an array of arrays. All you can do is hope the documentation gets it right.

But the results weren’t great. And even the real positive matches had quite a low score.

So I started thinking – what else could I have done wrong? What did I miss? And then came the realization: the data. I didn’t think of it at the time, I wanted to have the Haar cascade up and to run ASAP, so I chose some random pictures from my hard drive as negative samples. Various images I’ve taken across the years, mountains, lakes, streets, people. They did not have anything to do with the problem at hand.

So I prepared better data. I reduced the number of random pictures and introduced some carefully chosen images that could actually be mistaken for the logo I wanted to detect. I simply added pictures containing logos of other companies.

Eventually, I managed to extend the training data by more than a hundred of appropriate images. Training the model with the same parameters as I did initially took a while longer, but still managed to finish in 24 hours.

Now you’re probably expecting to read that the model was excellent and worked flawlessly. It didn’t. But it did work better. Significantly better. The real positive matches had better scores, and there were fewer false positives. And the cascade was deemed usable, at least as one more factor to take into account during further analysis.

tl;dr – build your training data well. Use images that could be mistaken for what you are training your Haar cascade so that the false positives are more meaningful.

Looking for a software development company?

Work with a team that already helped dozens of market leaders. Book a discovery call to see:

How our products work
How you can save time & costs
How we’re different from another solutions

Przemysław Staniszewski

CEO of Pretius

+48 600 800 881

pstaniszewski@pretius.com

Book a meeting

Haar Cascade and false positives – choose your training data well

Dariusz Wawer

Contents

Looking for a software development company?

Work with a team that already helped dozens of market leaders. Book a discovery call to see:

Przemysław Staniszewski

CEO of Pretius

We keep your data safe: ISO certified

Drop us a line at

Want to work with us?

What we do

How we do it

Products

Sitemap