Computer vision is one of the latest buzzwords in the IT industry – alongside other terms such as AI or machine learning. What is it? What can it do for your company? We’ll tell you all about it and, most importantly, present several computer vision use cases – based on POCs (Proofs of Concept) we’ve created for our clients.
What is computer vision?
Computer vision is a technology that allows computers to perceive visual images similarly to human eyes (at least in theory). While it’s not a new research field – people have been exploring ways to allow machines to see for about 60 years or so – recently, it progressed in tremendous leaps.
This rapid advancement was mostly caused by developments in other areas, such as computing power and AI/machine learning. Because of this, nowadays, computer vision is no longer a distinct possibility or a potential scenario – it’s a working, applicable technology businesses already use to their advantage.
How does it work?
Modern computer vision has several steps:
- Image acquisition – the machine must first access visual information (images, videos) captured via high-resolution digital cameras
- Data preprocessing – the next step is ensuring the best possible quality. That includes reducing noise, edge detection, normalization, grayscale conversion, etc.
- Feature extraction – after that, the system isolates key characteristics (shapes, edges, etc.) to segment the image and identify objects and people
- Analysis and classification – the system analyzes and interprets the image based on context. For example, it might recognize patterns or try to guess what’s happening by analyzing how the image changed over time (if it’s a video or a sequence of images)
- Using the information – the machine can then make a decision and carry out predefined, automated actions based on its conclusions (for example, triggering an alarm or sending an alert)
The key technologies that allow this “magic” to happen are deep learning and convolutional neural networks (CNN).
Deep learning models are trained to understand the context of visual data to distinguish between images or recognize objects within them. CNNs are specialized neural networks that make this possible by “dissecting” images into single pixels and labelling them. This allows them to make predictions regarding the content via mathematical operations (convolutions – hence the name). These predictions are then verified multiple times (also through convolutions) until the most probable possibility is reached.
Computer vision use cases
So, what are the potential applications and advantages of computer vision? Almost any industry can make use of it. Here are some examples and possibilities:
- Transportation – computer vision is one of the things that make self-driving cars possible. It can also detect pedestrians, help analyze traffic flows, and monitor the condition of road and parking infrastructure
- Security – machines can monitor the premises 24/7, identify people and unwanted activities, and notify the owner or authorities
- Healthcare – computer vision can speed up disease identification and diagnosis by analyzing MRI or X-ray images against existing data
- Construction – the technology can enable predictive maintenance (detect problems or issues with equipment before they affect productivity negatively or become a security risk). It can also be used to let workers onto a construction site and validate their right to be there
- Manufacturing – computers can read barcodes and derive information from them (to allow or enhance context-specific task automation, etc.). It’s also possible to validate products or identify defects to improve QA (quality assurance) testing
Real-world applications of computer vision based on Pretius experience
We created two POCs related to computer vision, each for a different business case. Both were made for actual clients and based on real-life problems and challenges.
A system for a construction company that scans QR codes on helmets
The first POC was made for a system that was supposed to scan QR codes on the helmets of workers entering a construction site – to identify them and allow or refuse entry. Here’s how it works:
- The system can detect people, so it knows that a human is approaching
- It outlines the person (separates their silhouette from the rest of the image) and checks whether there is a QR code somewhere on them (typically on the helmet). Everyone has a unique QR code, which can be easily created using a QR code generator
- If a code is detected, the system scans it
- If the code is recognized, the machine connects to the employee system, downloads the data regarding this person, verifies whether they’re assigned to this construction site, and checks whether they can be allowed inside (there may be conditions, e.g., occupational health and safety training ends in 2 days). All of this happens in just a couple of seconds since the scan
- A message is displayed on the screen above the entrance, informing the person whether they can go through. In some cases – for example, when the person is not part of the usual crew but allowed on the premises – it can also mention a condition that outlines the person’s reason for being there, such as “company X, audit” or something similar. If the person is refused entry, they may also be informed why (but it’s not necessary)
- The system also takes a photo from each such entry event, which is stored in the database for future reference, along with a log entry (if rejected, the reason is also included in the log, as well as a link to the photo)
We have also added face scanning – if we create a user profile and add a profile photo, after scanning the code and confirming that the person has access, we also scan the face and confirm that they’re actually who they claim to be, instead of a stranger with someone’s helmet. It’s a good way to improve security. To avoid errors, if the face is not recognized, the person can approach security to show their ID. You can also use other measures – such as PIN codes – for a similar purpose (but it’s nowhere near as cool!).
Backend:
- Python
- OpenCV (Open Computer Vision) library
- MobileNetSSD library
Frontend:
- React (JavaScript)
A platform that detects human presence on premises for a company that rents telecommunication infrastructure
Our second POC was for a platform that detects human presence at a venue to identify potential worrying events and unwanted guests. Here’s how it works:
- The system uses the popular HIKVision cameras. The app integrates with them, which means the client doesn’t have to replace cameras at their venue
- The idea is to have ten or more such cameras in different places in the facility (they can be mobile)
- When the system detects something that resembles a person, it tracks them, frame by frame, and the ML model decides how likely it is to be a person, using a percentage value
- The system creates attendance logs from each such event and takes photos that go into the logs – to have information about what someone did
- The system administrator can define specific rules – for example, they can specify that there should be no one on the premises between 8 p.m. and 8 a.m. If the system detects a presence with a high likelihood of being a person, it can send a notification (e-mail, SMS, push) to the admin or anyone else
- Alternatively, the event can simply be stored in the logs, along with the certainty in % (we use the value from the moment that certainty was highest)
We have also integrated this POC with an MLLM (Multi-modal Large Language Model) – AI that can understand and analyze text. We send the event data to it and ask it to interpret it. The model creates a verbal description of what is in the photos, which allows the system’s user to conduct a verbal search later (e.g., find a guy wearing a balaclava in the logs simply by typing “balaclava”).
Backend:
- Python
- OpenCV (Open Computer Vision) library
- MobileNetSSD library
- Ollama
- LLaVA (Large Language-and-Vision Assistant)
Frontend:
- React (JavaScript)
Conclusion
As you can see, computer vision offers plenty of exciting opportunities for businesses of various shapes and sizes. The most surprising thing about it is perhaps how advanced it already is – or maybe that it’s really not that hard to set up, as long as you have a team of competent and talented IT specialists. We hope you found this article informative. If you’re interested in AI, we advise you to check out some of our other publications:
- GitHub Copilot tutorial: We’ve tested it with Java and here’s how you can do it too
- AI code review – We’ve tried OpenAI at our company, and here’s what we’ve learned
- Biscuits+ChatGPT: Using AI to generate Oracle APEX Theme Roller Styles
- AI in software testing: Can Pretius OpenAI Reviewer help you with test automation?
Need help with a computer vision project?
As you can see, Pretius has plenty of knowledge and ideas for using computer vision and other machine learning technologies creatively. If you need help with a project, we’ll be happy to assist you. Contact us at hello@pretius.com (or use the contact form below). We’ll respond within 48 hours.