Data Labeling Best Practices – WaiTalk 12/15/2020
During the #WaiTALK event, organized jointly by #Women in AI and the #DataFreaks community, WEimpact.Ai representatives also had the pleasure to perform: Tom Horecki (CEO) – a fan of freestyle dance, skiing, and recently – picking up trash and Aneta Wróbel (AI Project Manager) – who loves to discover new tastes and constantly works on personal development, especially in the field of HR.
The event was entirely devoted to the broadly understood issue of data, which in recent years has had a huge impact on the development of artificial intelligence. The invited guests shared their knowledge and knowledge of practical solutions for the use of data in various industries and fields.
Tom and Aneta talked about their experiences with labeling image data, which they gained in running and caring for the DataLabeling.EU brand. Working with teams creating solutions based on machine learning, they have gained and are still deepening their knowledge in this field in line with the WEimpact.Ai – lifelong learning idea.
I am pleased to invite you to watch the recording of Aneta and Tom’s speech at the virtual event #WaiTALK, which took place on December 15, 2020. Invite!
We warmly welcome you!
Thank you for the opportunity to be here with you and share our experiences. We represent the WEimpact.Ai company. With me is the president – Tom Horecki.
My name is Aneta Wróbel and my main role is to coordinate the projects we have the pleasure to tell you about.
Moderator – Grzegorz Górny:
You can continue, Aneta, there’s another slide.
We will talk about the projects we deal with.
The three main projects we want to present to you are: subtitling and transcription project, after the ASR system – the speech recognition model. We also test chatbots and label cartographic data for the image recognition model. Here we have a visualized subtitling and ASR transcription project. As we can see, this process is as follows: we receive sound files from the client.
These files are loaded into our application created by us, i.e. our proprietary one. ASR creates the text. The machine learning model embeds punctuation, but obviously like any machine – it’s not perfect. It is not able to obey all the exceptions that are found in our difficult Polish language, hence a human correction is needed – that is human participation. After correcting the text, the client receives the results. The key is the selection of a team based on competences. The desirable competences of people involved in this project are impeccable use of the Polish language, in particular: knowledge of the rules of punctuation, the ability to accurately pick up the content of the recording, even if it is not of the best quality. The transcriber should be able to find the smallest errors in a thicket of text. I’m talking about the correct inflection by cases, last but not least – you have to google any unfamiliar expressions in order to write them correctly. Here I mean, for example, foreign language borrowings or some specialized industry vocabulary, such as for example business English. The ability to type quickly on the keyboard is also valuable, which also increases the efficiency of the entire process. Good practice in transcription is software optimization, which allows you to automatically separate interlocutors in statements, for example. What to keep in mind is the omission of repetitions and words like yhym, ah oh. That is, those that do not contribute anything substantive to the text that we are transcribing. We have such a tendency to create vocabulary in our statements. Our statements are not always syntactically correct, or we do not always stress when we finish a sentence.
Therefore, the work of the transcriber is also to break up long statements into shorter sentences, i.e. dotting according to sense. Another project that we had the opportunity to deal with was chatbot testing. Effective chatbot testing is based on a conversation based on designed scenarios and effective bot classification of phrases entered by the tester. The key task of the tester is to conduct such a conversation with the bot, which contains as many language nuances as possible, obliging the bot to ‘think’ about the classification of a given phrase. For example, I can mention here that it can be paraphrasing, deliberately typing a typo, or breaking one word by inserting ‘space’ so as to make two particles out of it. The tester must not forget to check that the bot understands the contact correctly. Imagine a phrase typed for a bot: “Great! I’ve been waiting for this for six months now, and nothing.” As you can guess, the bot will have a hard time classifying this sentence as negative. There is a high risk of falling into the wrong context of the conversation. Another problem for the bot is jumping from context to context. This happens when the tester asks for more than one service offered by the company. The bot must then be able to seamlessly switch between scenarios. At the same time, the bot’s ability to return to a given context of the scenario is checked in such a situation, after entering the phrase of a different context or a different scenario. Variable values, i.e. various parameters that are specific to a given service, can also cause trouble for the bot. An interesting example of a variable value, specific to our Polish culture, is asking the bot about the opening days of the store. While testing the bot for one large retail chain, we were able to diagnose that the bot did not understand the question: “Is the store open on Pentecost?”. The last project we would like to talk about is the labeling of image data – namely cartographic. Here, as we can see in the attached screenshots, the project consists in marking architectural objects on maps / satellite photos. Here we can see that buildings are marked. As we can see, the roof itself is not marked, but the walls as well.
Halo halo, can you hear me now?
Yes, I heard. I give you a voice.
Finally, we managed to enter. Although at the right moment, because it is a very exciting case – directing cartographic data, or actually a few cases. You can build building infrastructure reports, and the insurance industry also benefits from it, which is interesting. Because from photos, for example from Street View, you can estimate the condition of the facade of a given building. Or by address and map, you can find whether the addressee, who, for example, wants to insure a car, lives in the countryside or in a city where the road network is very dense, which is strictly correlated with the fact that there are more accidents. Then you can adjust your insurance this way. And one more impressive case is the prediction of armed conflicts. Because thanks to photos / tagging of photos that refer to the crops, it is possible to predict how the crop has grown in a given season in a given region of the world. If there is a potential famine, which is the main factor in an armed conflict that occurs after a few months, it is very vital information that can be ordered and many lives saved.
In this project, when selecting a team, we focus on competences related to perceptiveness, efficient mouse movement, and spatial planning skills. It is good to start marking an object from the upper left corner. Interest in such areas as architecture, geography, geodesy and construction is also welcome.
And finally, we will sum it up with the key practices that we have built ourselves internally, after this year of operation. They relate to all the problems discussed above. And what we do is match people with competences to projects. For example, what Anetka mentioned about cartographic data. It is worth for someone to have a past or at least interest in the subject at school, which was geography or experience in construction or architecture or geodesy. We run the group gathered in this way through a test that we create as part of a given project to check how these people think and then we release such a group of qualified people to the trial stage. And from this trial stage, at the end, you can select a mini crowd that works over months or years on improving ML and preparing training data. The key in the first days is to check the results of the annotators’ work with the person who is substantively responsible for the level of data that should be included in the model, i.e. checking whether we can treat this data as ground truth or not. And the basis for this is the creation of a very transparent instruction that addresses the vast majority of highly repetitive problems, but also does not omit ambiguous cases about which even the professional person building the model has to think for a while. It is worth paying attention to this. Especially if this error rate in the training data has to be very low. Then, too, it is good practice to simply cross-check the annotators’ work. And several people do the same job, and only considering how uniform it was, can the model be modified accordingly.
Dear friends, what we have presented now is only a substitute for what has become a part of our lives for a year. Thank you for your attention and listening. If you want to know more we’d love to talk about good data labeling.
We also will share a presentation, because unfortunately it is not fully displayed.
So thank you.
Moderator – Grzegorz Górny:
Thank you. We’re here, thank you. There is one question for you, listen up in chat. But maybe we will ask you to answer offline, because we have such a question from us to Aneta.
Aneta, tell us, what is it like to start a presentation prepared for two voices at the moment when your partner loses his voice through a technological imp? I chapeau bas, ‘because really you found in this situation perfectly.
Yes, a very, very difficult challenge if you were prepared only for your statements.
Moderator – Grzegorz Górny:
Tomek and Aneta are available with us and will answer them in the chat.
Moderator – Grzegorz Górny
CEO WEimpact.Ai – Tom Horecki
AI Project Manager WEimpact.Ai – Aneta Wróbel