Many tutorials can guide an image recognition newbie to training a Haar cascade classifier, particularly the OpenCV official tutorial, Harrison’s Haar Cascade Object Detection Face & Eye OpenCV Python Tutorial, and Creating your own Haar Cascade OpenCV Python Tutorial.
Once you’ve managed to set up the software, gathered the samples, test data, and all the applications, images, and scripts required, and you’ve trained your first classifier… You may discover that… It doesn’t work that well, actually.
In my case, I had a particularly severe problem with false positives. I wanted to detect a company logo. The cascade I’ve trained detected the logo in many frames it wasn’t actually present. Granted, real occurrences were correctly marked most of the time, but the cascade was still unusable because of the false positives.
I double-checked my process – everything seemed in order. So I lowered the parameters for training the model. I allowed a very, very low false-positive rate and started training. Over a week later, the cascade still wasn’t done. Stage 13 of 20… And then, another few days later came windows update, and the days of work were lost.
In the meantime, I dived into the API. Where is the certainty of the match? It was not easy to find, eventually using SO and OpenCV documentation, I managed to get the parameter I wanted. Here’s a quick code snippet:
detected = cascade.detectMultiScale3(image_gray, outputRejectLevels=True)
At least I think (and hope) this is what I was looking for. The images I generated seemed to confirm that. But it’s a real disadvantage of Python OpenCV API, in my opinion. In C, you’d get structs with named fields. In Java, you’d get objects with named fields. In python, you get an array of arrays. All you can do is hope the documentation gets it right.
But the results weren’t great. And even the real positive matches had quite a low score.
So I started thinking – what else could I have done wrong? What did I miss? And then came the realization: the data. I didn’t think of it at the time, I wanted to have the Haar cascade up and to run ASAP, so I chose some random pictures from my hard drive as negative samples. Various images I’ve taken across the years, mountains, lakes, streets, people. They did not have anything to do with the problem at hand.
So I prepared better data. I reduced the number of random pictures and introduced some carefully chosen images that could actually be mistaken for the logo I wanted to detect. I simply added pictures containing logos of other companies.
Eventually, I managed to extend the training data by more than a hundred of appropriate images. Training the model with the same parameters as I did initially took a while longer, but still managed to finish in 24 hours.
Now you’re probably expecting to read that the model was excellent and worked flawlessly. It didn’t. But it did work better. Significantly better. The real positive matches had better scores, and there were fewer false positives. And the cascade was deemed usable, at least as one more factor to take into account during further analysis.
tl;dr – build your training data well. Use images that could be mistaken for what you are training your Haar cascade so that the false positives are more meaningful.