📑 Table of Contents

GoTo Group Builds Custom LLM for Bahasa

📅 · 📁 LLM News · 👁 7 views · ⏱️ 13 min read
💡 Indonesia's GoTo Group develops a proprietary large language model optimized for Bahasa Indonesia, targeting Southeast Asia's largest digital economy.

Indonesia's GoTo Group, the tech giant behind ride-hailing app Gojek and e-commerce platform Tokopedia, is building a custom large language model specifically designed for Bahasa Indonesia language understanding. The initiative marks one of the most ambitious AI projects in Southeast Asia, targeting a market of over 275 million people who remain largely underserved by Western-built foundation models.

The move positions GoTo alongside a growing wave of non-English LLM development efforts worldwide, from China's Baidu and Alibaba to the UAE's Falcon and France's Mistral AI. But GoTo's approach carries unique strategic significance — it represents the first major attempt by a Southeast Asian super-app to build foundational AI infrastructure tailored to the region's dominant language.

Key Facts at a Glance

  • GoTo Group is developing a proprietary LLM optimized for Bahasa Indonesia comprehension and generation
  • Indonesia is the world's 4th most populous country with 275+ million residents and over 700 regional languages
  • Bahasa Indonesia is spoken by approximately 200 million native and secondary speakers
  • Existing models like GPT-4 and Gemini support Bahasa but often underperform compared to English-language tasks
  • GoTo operates across ride-hailing, e-commerce, payments, and financial services — generating massive volumes of Indonesian-language data
  • The custom LLM could serve GoTo's ecosystem of 100+ million monthly active users

Why Western LLMs Fall Short in Bahasa

Most leading foundation models from OpenAI, Google, Anthropic, and Meta include Bahasa Indonesia in their multilingual training data. However, the language consistently receives a fraction of the attention given to English, Chinese, or European languages.

Benchmark evaluations reveal that GPT-4 and similar models score 15-25% lower on Bahasa-language reasoning tasks compared to equivalent English prompts. This performance gap stems from the disproportionate volume of English-language training data in these models' corpora.

Bahasa Indonesia also presents unique linguistic challenges. The language features extensive use of affixation — prefixes, suffixes, and infixes that dramatically alter word meanings. Informal Indonesian, widely used in digital communication, incorporates slang from Javanese, Sundanese, and dozens of other regional languages, creating a linguistic complexity that general-purpose models struggle to capture.

GoTo's Data Advantage Fuels LLM Ambitions

GoTo sits on one of Southeast Asia's richest troves of Indonesian-language data. The company's ecosystem processes millions of daily transactions across Gojek (ride-hailing and food delivery), Tokopedia (e-commerce), and GoPay (digital payments).

This data includes customer service interactions, merchant communications, product listings, user reviews, and conversational exchanges — all predominantly in Bahasa Indonesia and its informal variants. For LLM training purposes, this represents an invaluable corpus of real-world, domain-specific language data that no Western AI lab can easily replicate.

The company reportedly employs over 100 machine learning engineers and data scientists across its AI division. GoTo has been steadily expanding its AI capabilities since its 2022 IPO on the Indonesia Stock Exchange, which raised approximately $1.1 billion.

Strategic Applications Across GoTo's Super-App

A custom Bahasa LLM could transform multiple layers of GoTo's business operations. The potential applications span virtually every vertical the company operates in.

Key use cases include:

  • Customer service automation: Handling millions of support queries in natural, culturally appropriate Bahasa Indonesia
  • Merchant tools: Helping small business owners on Tokopedia generate product descriptions, respond to buyers, and manage listings
  • Search and discovery: Improving product search relevance by understanding informal Indonesian language patterns and local slang
  • Financial services: Powering conversational interfaces for GoPay and GoTo Financial products, making digital banking accessible to less tech-savvy users
  • Driver and partner communication: Streamlining logistics coordination through AI-powered messaging and voice interfaces
  • Content moderation: Detecting harmful content, fraud, and policy violations in Bahasa at scale

Unlike generic API calls to OpenAI or Google, a proprietary model gives GoTo full control over latency, cost, data privacy, and customization. Running inference on third-party APIs at GoTo's scale — potentially billions of monthly queries — would be prohibitively expensive.

The Economics of Building vs. Buying AI

The build-versus-buy decision is one every major tech company faces today. For GoTo, several factors tip the balance toward building in-house.

First, cost efficiency at scale matters enormously. GoTo's gross margins remain under pressure as the company pushes toward sustained profitability. Paying per-token API fees to Western providers for hundreds of millions of user interactions would erode those margins further.

Second, data sovereignty is increasingly important in Indonesia. The country's Personal Data Protection Law (PDP Law), enacted in 2022, imposes strict requirements on how companies handle Indonesian citizens' data. A locally developed and hosted model reduces cross-border data transfer risks.

Third, competitive differentiation drives the decision. If GoTo can deliver superior Bahasa AI experiences compared to rivals like Grab (which partners with OpenAI) or Sea Group's Shopee, it gains a meaningful edge in user retention and merchant satisfaction.

Training a mid-sized LLM (in the range of 7-30 billion parameters) optimized for a specific language and domain is increasingly feasible. Companies like Mistral have demonstrated that focused, efficient models can rival much larger general-purpose ones on targeted tasks. GoTo likely does not need to match GPT-4's 1.8 trillion estimated parameters — it needs a model that excels specifically at Indonesian-language commercial interactions.

Southeast Asia's Emerging AI Race Heats Up

GoTo's LLM initiative reflects a broader trend across Southeast Asia. The region's tech companies are increasingly investing in homegrown AI capabilities rather than relying solely on Western infrastructure.

Grab, GoTo's primary rival, announced an expanded partnership with OpenAI in 2024 to integrate AI across its platform. Singapore's government has invested over $150 million in national AI initiatives, including the SEA-LION open-source multilingual model developed by AI Singapore.

Meanwhile, Indonesia's government has signaled strong support for domestic AI development. President Prabowo Subianto's administration has outlined plans to establish national AI infrastructure, including data centers and compute capacity. The Indonesian government allocated approximately $500 million for digital infrastructure development in its recent budget.

This regional momentum creates both opportunity and urgency for GoTo. Being first to deploy a production-grade Bahasa LLM could establish a significant moat.

Technical Challenges on the Road Ahead

Building a competitive LLM is no small feat, even for a company with GoTo's resources. Several technical hurdles remain.

Compute infrastructure is the most obvious challenge. Indonesia lacks the GPU cluster density available in the US or China. GoTo may need to partner with cloud providers like AWS, Google Cloud, or Azure for training workloads, even if inference runs locally.

Data quality presents another concern. While GoTo has vast data, much of it is noisy transactional data that requires significant cleaning, filtering, and curation before it becomes suitable for LLM training. The company will need robust data pipelines and human annotation workflows.

Talent acquisition remains competitive globally. Top ML researchers and LLM engineers command salaries exceeding $300,000-$500,000 at major US tech companies. GoTo will need to offer compelling packages or build strong university partnerships within Indonesia to develop local talent.

Finally, evaluation and safety frameworks for Bahasa-specific AI remain underdeveloped compared to English. GoTo will likely need to build custom benchmarks and red-teaming processes to ensure model quality and safety.

What This Means for Developers and Businesses

If GoTo succeeds, the implications extend well beyond its own ecosystem. A high-quality Bahasa LLM could catalyze AI adoption across Indonesia's digital economy.

Small and medium businesses, which form the backbone of Tokopedia's merchant base, could gain access to AI-powered tools previously available only in English. Indonesian startups building on top of GoTo's platform could leverage the model for their own applications.

There is also the question of whether GoTo might eventually open-source its model or offer API access. Doing so would position the company as an AI platform provider, not just an app company — a strategic shift that could unlock entirely new revenue streams.

For Western AI companies watching from the sidelines, GoTo's initiative underscores an important reality: the next wave of LLM innovation will not come exclusively from Silicon Valley. Local language, local data, and local context matter enormously in markets that collectively represent billions of potential users.

Looking Ahead: Timeline and Market Impact

GoTo has not disclosed a specific launch timeline for its LLM. Industry observers expect an initial deployment within the company's own products before any external availability, likely within the next 12-18 months.

The success of this initiative could influence similar efforts across emerging markets. Companies in Brazil, India, Nigeria, and other large non-English-speaking markets face identical challenges with Western LLM performance gaps.

If GoTo delivers a model that demonstrably outperforms GPT-4 and Gemini on Bahasa-specific tasks while maintaining lower operational costs, it could become a blueprint for how regional tech champions approach AI development. The era of one-size-fits-all foundation models may be giving way to a more fragmented but ultimately more effective landscape of specialized, language-optimized AI systems.