Korean-Filipino Parallel Corpus Development

The Korean-Filipino Parallel Corpus Development is part of the project initiated by South Korea’s National Institute of Korean Language (국립국어원), which aims to develop a massive Korean-Foreign Languages Parallel Corpus. A total of eight languages are included in the development: (1) Vietnamese, (2) Bahasa Indonesia/Malaysia, (3) Thai, (4) Hindi, (5) Khmer, (6) Russian, (7) Uzbek, and (8) Filipino. The corpus will be used in improving AI-based NLP technology in order to help enhance the quality of intercultural communication and machine translation. The project also aims to contribute to the increase of cultural and trade exchanges between Korea and Southeast Asia, as well as Eurasian countries, in line with South Korea’s New Southern and New Northern Policies.

This long-term project began in October 2021.

The parallel corpus will eventually be made available to the public.

Project Members


Lee, A. P., Jeon, J. Y. & Lee, J. H. (2023). A Parallel Corpus-Based Comparative Analysis of Korean and Tagalog Negation. Journal of Korean Language Education, 34 (3), 201-237.

Lee, A. P., Bae, K. M, & Chua, M. L. C. (2023). Instructions for Korean to Filipino/Tagalog Translation [한국어-필리핀 타갈로그어 번역 세부 지침]. In Translation Guidelines for Building a Korean-Foreign Language Parallel Corpus [한국어-외국어 병렬 말뭉치 구축을 위한 번역 지침]. Hawoo Publishing.