The spread of artificial intelligence (AI) presents major challenges, not only in terms of technology but also in terms of language, transparency, ethics, and digital sovereignty; for example, one of the most significant challenges for AI language processing regards ethical considerations, the quality of data collecting (which can be biased), and the difficulties in translation[1].
The ALIA project (Artificial Intelligence Language Infrastructure of Spain) responds directly to these issues. It is a pioneering EU public initiative that provides open, multilingual AI infrastructure to promote Spanish and its co-official languages (Catalan, Valencian, Basque, and Galician) in global AI development. By reinforcing Europe’s technological sovereignty and ensuring transparency, linguistic diversity, and cultural representation, ALIA empowers communities to maintain control over their digital presence.

The project belongs to the family of Large Language Models (LLMs), advanced AI systems based on transformer architecture, capable of processing and generating human-like text. By being trained on extensive datasets, LLMs excel across a variety of natural language processing tasks, including translation, text generation, and summarization[2]. Unlike most LLMs developed by private U.S. companies (e.g., GPT, Claude, Gemini), ALIA is public, open-source, and led by the Spanish government, making it unique in Europe.
ALIA originated in 2019, under the Language Technologies Plan and is coordinated by the Barcelona Supercomputing Center, known in Spanish as the Centro Nacional de Supercomputación (BSC-CNS), which manages MareNostrum 5[3], one of the world’s most powerful supercomputers used in training the ALIA model[4]. Nowadays, ALIA plays a key role in the development of large-scale language models within the EU framework[5].
ALIA falls within the scope of Spain’s Artificial Intelligence Strategy 2024, which aims to provide the capabilities needed to meet the growing demand for AI products and services across SMEs and the public sector. Moreover, it responds to the EU’s Digital Decade program, which guides Europe’s digital transformation and technological sovereignty[6]. The ALIA family of models is verified by the Spanish Artificial Intelligence Supervisory Agency (AESIA) and complies with the transparency and ethical standards established by the EU AI Act.
This pioneering initiative focuses on generating models and corpora for a public infrastructure of language models. Being 100% publicly funded[7], partially by the European Union, it provides advanced resources to public administration, companies, and universities while guaranteeing universal access for society[8]. ALIA can therefore be described as an open and collaborative project, characterized by its commitment to transparency, social responsibility, and inclusiveness.
According to Mateo Valero, Director of BSC, ALIA operates with texts in over 35 European languages, making sure that 20% of its linguistic data is dedicated to Spain’s co-official languages. As a result, it is the AI system that best reflects Europe’s multilingual and multicultural identity[9].

Ultimately, ALIA is a concrete example of how Europe can develop AI systems that are open, ethical, and aligned with regional values and linguistic diversity[10], strengthening both digital sovereignty and social inclusion. Its open and multilingual AI infrastructure can also be directly leveraged in vocational education and training (VET), providing access to language models, datasets, and translation tools that support the upskilling of educators and learners, and enabling SMEs and training institutions to adopt AI responsibly and inclusively.
[1] The Limitations of AI Language Processing, https://waywithwords.net/resource/ai-language-processing-key-limitations/.
[2] Romero-Arjona, Miguel, et al. “Red teaming contemporary ai models: Insights from Spanish and Basque perspectives.” arxiv preprint arxiv:2503.10192 (2025).
[3] Barcelona Supercomputing Center (BSC), marenostrum 5 Technical information, https://www.bsc.es/marenostrum/marenostrum-5.
[4] Inés Modrón Lecue, Así es ALIA, la inteligencia artificial española desarrollada por el
Gobierno, 20.01.2025, https://www.rtve.es/noticias/20250120/alia-familia-modelos-inteligencia-artificial-pedro-sanchez/16414386.shtml?Utm_source=chatgpt.com.
[5] Barcelona Supercomputing Center (BSC), ALIA, Europe’s first public, open and multilingual AI infrastructure, 21 January 2025 https://www.bsc.es/news/bsc-news/alia-europes-first-public-open-and-multilingual-ai-infrastructure
[6] ALIA website, https://alia.gob.es/.
[7] ALIA website, https://alia.gob.es/.
[8] Romero-Arjona, Miguel, et al. “Red teaming contemporary ai models: Insights from Spanish and Basque perspectives.” arxiv preprint arxiv:2503.10192 (2025).
[9] ALIA, Europe’s first public, open and multilingual AI infrastructure, 21 January 2025
[10] Ibidem.

