Amid the rising usage of artificial intelligence technology, a Singapore government-led initiative is developing a new large language model (LLM) trained in Southeast Asian languages and cultural norms.
The Southeast Asian Languages in One Network, or SEA-LION for short, is an open-source model trained in 11 Southeast Asian languages. It aims to bridge the gap within the ever-growing worldly AI economy and provide accessibility among Southeast Asians.
However, as more countries and regions start developing their own LLMs, digital and human rights experts have expressed concern that these measures may only represent the dominant views expressed online. This could pose a problem for nations with authoritarian governments or strict media censorship and those with weak civil societies.
This issue is present in the SEA-LION model when answering queries about socio-political topics. When queried about the former president of Indonesia, Suharto, GPT-4 highlighted his poor human rights record in its response, while SEA-LION focused on his accomplishments.
On the other hand, relying only on Western LLMs, which disproportionately impact wealthy, liberal Western democracies, can perpetuate biases related to cultural values, political beliefs, and social norms.
“We are not trying to compete with the big LLMs; we are trying to complement them, so there can be a better representation of us,” shared Leslie Teo, Senior Director for AI Products in AI Singapore, to the Thomson Reuters Foundation.