Building a Tagalog Universal Dependencies Treebank

This project was commissioned by the Leipzig University Institute of Computer Science and the University of Cologne Department of Linguistics. It aims to create a Tagalog treebank annotated according to the Universal Dependencies (UD) Framework. The UD Framework aims to develop a cross-linguistically consistent morphosyntactic annotation in order to facilitate the development of multilingual parsers, machine translation, as well as the development of research in linguistic typology and comparative linguistic analysis.

The Tagalog corpus contains approximately fifteen thousand sentences with over 300,000 tokens from various sources, including online Tagalog-language newspapers.

This work is an offshoot from the following project:

  • Project Title: Information distribution and language structure – correlation of grammatical expressions of the noun/verb distinction and lexical information content in Tagalog, Indonesian and German
  • Project Heads: Dr. Gerhard Heyer (Leipzig University) and Dr. Nikolaus Himmelmann (University of Cologne)
  • Fund Source: German Research Foundation (DFG)

Local Project Members

Project Coordinator: Elsie Marie Or


  • Angelina Aquino
  • Paola Ellaine Luzon
  • Patricia Anne Asuncion 
  • Jenard Tricano
  • Michael Wilson Rosero
  • Jim Bagano 
  • Mary Dianne Jamindang
  • Yeddah Joy Piedad
  •  Farah Cunanan 
  • Calen Manzano
  • Aien Gengania
  • Prince Heinreich Omang
  • Noah Cruz
  • Leila Ysabelle Suarez
  • Orlyn Joyce Esquivel
  • Andre Magpantay

Link to the published corpus to follow.