I was writing this article when I was a software engineer at Bahasa Kita and Indonesia was going to have president election in 2019. As a way to educate the voters about their choice in the election, KPU (General Election Commissions) held debates. The debates were held five times with different topics discussed. The topics were as follows :
- Law, Human Right, Corruption, and Terrorism (17 January 2019 at Hotel Bidakara)
- Energy and Food, Natural Resources and the Environment, and Infrastructure (17 February 2019 at Hotel Sultan)
- Education, Health, Employment and Social and Culture (17 March 2019 at Hotel Sultan)
- Ideology, Government, Defense and Security and International Relations (30 March 2019)
- Social Economy and Welfare, Finance and Investment and Trade and Industry
As machine learning startup focused on voice, we at Bahasa Kita wanted to contribute to the election through the voice technology. In the time we look for the ideas, we remembered that our founder once had made use the voice technology for the deaf. Thus we decide to do the same thing upon the presidential debate. This was still relevant since we also found many comments from the disabled people hoping there was such technology to help them at Youtube video which has no caption available. Surprisingly, the idea is also needed by normal people when they miss a certain part while the debate is live. They want to review and dig deeper on the candidate’s speech. In short, we propose to transcribe the debate using our automatic speech recognition service and present the transcription quickly while the debate is still ongoing.
We had no long time to implement the idea because we had talked about how to contribute just 8 hours before the first debate. However, during the discussion, other thought appeared. It said that it feels incomplete when we just show the debate verbatim. Then, we searched references from the previous election and found an analytics company that do analytics on the debate verbatim. After analyzing the content, we realized that this analytics can be used as it is neutral. This neutrality is important as we were aware that anything in the political year can be assumed to be not neutral. The interesting thing was just hours or days after our website (debatcapres.bahasakita.co.id) went online, there was one of the candidate’s party contacted us to claim the website for their needs. It is surely a huge amount of money before our eyes. But we chose to decline the offers and keep providing the presidential debate independently for free.
Actually, it is too much if we call them by analytics, so I prefer to call them as a summary. The summary only counts the words statistic without prior knowledge about political, economics, or other needed. The summary comes initially in three forms: the total word count, the topic-specific word count, and word cloud. Word count is just a counting upon words uttered by each candidate. While topic specific word count must relate to the topics. The topics appeared to have other words association. We use these associations to count the topic related to word count. The last, word cloud shows a self-explaining visualization resembling a cloud.
If we take a look at Natural Language Processing (NLP) and Natural Language Understanding (NLU) state-of-the-art, there are still so many kinds of analysis that we can do and we are able to implement. For example, we can define whether utterances tend to polarize to a particular opinion or another way. Or we can display fact based on the news on the internet from the utterance told by the candidates. The most extreme one is we can show the personality and its description based on the given speech. All of them are done using artificial intelligence that we work on every day. But once again, it is too risky to release such information to the public as it is too subjective to judge the quality of analysis. Then we remain on the summary that I just mentioned and give further interpretation to the reader.
Later we were informed that our website is used by journalists to research and find the truth about each statement surfaced during the debate. We were quite happy that our website is useful to others.