The “Balsu talka” project will soon have a year anniversary / Article

The “Balsu talka” project will soon have a year anniversary / Article
The “Balsu talka” project will soon have a year anniversary / Article
--

The “Balsu talka” project will soon have a year anniversary

To participate in “Balsu talka”, you need to go to the project’s homepage, which is available both on a computer and on a phone. You should find a “Speak” section that offers specific phrases to say. Then you have to click on the microphone, speak, and the person has donated his voice recording. The phrases offered for pronunciation are very diverse:

Texts spoken by the participants of “Balsu talkas”: “What do we do with such an initiative of the legislators?”, “My child is better at filming with a mobile phone”, “That will be a job”, “For once in your damn life, think with your head!”, “In the Dundag area , but especially in Gipka, there have been great magicians”.

The texts have been spoken by people of different ages, expressively and less expressively, with an accent and without – and this is exactly what is needed – because it is important to have as much variety as possible so that digital tools can understand the Latvian language and be able to communicate in it as fully and accurately as possible.

“For these tools to be able to talk to us in the near future, whether it’s a phone, a refrigerator or a car, we need a dataset where many people have spoken different texts.

If there is such a data set, then a model can be made on its basis, which is able to recognize the spoken language and convert it into text. Computers, on the other hand, can already understand the text, process it as commands and do something further with it,” said Pēteris Jurčenko, board member of the Latvian Open Technologies Association.

On May 4, it will be one year since the initiative “Balsu talka” started, and people’s response has been wide – digital humanities researcher Sanita Reinsone assessed:

“We started with 18 hours of material that was spoken, so now we have 205 hours that have been spoken by the public. That’s a lot, we’re already approaching the bigger languages ​​that have a lot more resources, a lot more speakers. But still, researchers are found out that we need even more votes. At the moment, we have set a goal – 300 hours – but of course we hope for more”.

“Balsu talka” is connected to the “Mozilla Common Voice” platform, which collects data on the texts spoken by people in various languages, and the Latvian language ranks quite high among them.

“At the moment, the Latvian language is in 11th place on the Mozilla Common Voice platform in terms of the absolute number of people involved. Respectively, we really had a lot of people involved, participated, so we are quite high. If we, for example, look at the number of people who participated, against the number of people who speak Latvian, then we are the top five,” Yurchenko said.

Residents are always encouraged to actively participate in “Balsu talk”, so that the agreed time reaches at least 300 hours. But in parallel, the submitted voice recordings are already being analyzed and the first version of the speech recognition and transcription model is being trained. The results will be published by May 4 and will be freely available on the website “balsutalka.lv” to anyone who develops solutions based on speech technologies.

On the other hand, language researchers will have the most convenient access to the data in the National Corpus Collection “korpuss.lv”, where everyone will be able to get to know them and analyze them from a linguistic perspective.

The article is in Latvian

Tags: Balsu talka project year anniversary Article

-

PREV In Liela Talka in Gulbene district, the total amount of collected waste is still being specified – Gulbene district – Dzirkstele.lv
NEXT In Ogre on May 4, the anniversary of the restoration of independence, cultural events and other activities