The demand for interpretation and translation in various fields such as politics, economy, and culture is ever increasing these days, and artificial intelligence (AI)-based automatic translation technology is also developing day by day to keep pace with the demand. However, not all languages have existing corpus data that is essential for building tools that can produce high-quality translation outputs.

In looking for solutions to this issue, the National Institute of the Korean Language (NIKL) initiated a project in August 2021 which aims to build a Korean-foreign language parallel corpus. Eight languages were selected for this project: Vietnamese, Indonesian, Thai, Hindi, Cambodian, Tagalog, Russian, and Uzbek. The first phase of the project produced a corpus of eight million words for each language. The project aims to help in improving cultural, political, and trade relations between Korea and other countries by promoting intercultural communication and reducing language barriers.

To disseminate information about the project, Dr. Jung Hee Lee (Kyung Hee University), project convener of the Korean-Foreign Language Parallel Corpus Project and Dr. Ji Yeon Jeon (Kangwon National University) visited the Philippines from September 3 to 6.

On September 4, the delegation visited UP Department of Linguistics and gave a lecture as part of Talks on Asian Languages (TAL) lecture series. For this installment of TAL, Dr. Jung Hee Lee introduced the objectives, potential uses, and the processes involved in building the parallel corpus. Associate Professor Aldrin Lee, who serves as a supervisor of the Korean-Filipino parallel corpus project, talked about the findings of their collaborative research, where they compared negative constructions in Korean and Filipino.

The team also made courtesy visits to the UP Diliman Office of Chancellor, UP Diliman Sentro ng Wikang Filipino, and the Komisyon sa Wikang Filipino, and they also met with other linguists from various fields and academic institutions to introduce the parallel corpus project and discuss possible future collaborative corpus-related projects.

The corpus data from phase one is now accessible and available to the public for free via the NIKL project website and any interested scholars may also contact Dr. Ji Yeon Jeon via her email (

The talks that were presented by the two Dr. Lees can be accessed on our YouTube channel.

Published by UP Department of Linguistics