- blog -

Segmentation? So how do we classify images in computer vision

Segmentation is one of the most important activities in computer vision. The ability to recognise objects in what we see is automatic in all the living beings with eyes. We do not need to learn it as we see naturally see shapes and objects, and we can differentiate them instantly. A computer image is a simple matrix of dots (called pixels) and the computer is not able to recognise any shape nor to associate it any meaning, the image is a simple collection of dots, each one with its associated colour and that is it. If we want to create a system that is able to interact with what it “sees” it is imperative that we teach him how to transform the collection of dots in a collection of objects and to assign them a semantic label.

It may seem that segmentation segregates our data contained in an image or in a video file but in reality it is providing a meaning to the components we are interested into. The segmentation process can be divided in two steps, it might not always necessary to perform both of them.

If we want to obtain high-class data, we must attach great importance to the individual steps taken in these processes.

Semantic segmentation

Semantic segmentation is the process where the matrix of dots is scanned and labels are assigned to each dot to mark their semantic class, for example in an image of a highway it marks two items: the cars, the road (for example for counting cars)

Instance segmentation

the instance segmentation associates to an object a more specific label, so if use the example above in an image of the highway it marks each car with its own plate, so we do not know only how many cars are there but also which ones (for example recognising the plates)

Depending on the problem in hands one or both segmentations must be properly done. If the problem with the pixel assignment occurs during the semantic segmentation, then the pixels may be assigned to the wrong class. so the objects might be deformed (for example two cars might be fused together and considered one)

Let’s consider an example of a company system dealing with the protection of property and objects:

If the system will be trained using badly marked image data, i.e. people in the pictures will not be fully marked (e.g. only part of the trunk or fingers will be labeled, which might well resemble bush branches). The “human model” developed in the system will be different than it should be. It is at the stage of segmentation that the system will not be able to determine what it “sees” because the data on which it trained did not give him the correct image of a person. Learning with incorrect data will make the system react too often disrupting operativity or even worse not reacting in an event as it does not recognize the correct human figure. This can result in huge financial losses for both the client and the security company.

As you can see in the example above, the correct segmentation plays a key role in correctly determining what is in the image. The more accurately marked data, the better the system results and the potential customer costs saved.