Building a Tagalog Universal Dependencies Treebank
This project was commissioned by the Leipzig University Institute of Computer Science and the University of Cologne Department of Linguistics. It aims to create a Tagalog treebank annotated according to the Universal Dependencies (UD) Framework. The UD Framework aims to develop a cross-linguistically consistent morphosyntactic annotation in order to facilitate the development of multilingual parsers, machine translation, as well as the development of research in linguistic typology and comparative linguistic analysis.
The Tagalog corpus contains approximately fifteen thousand sentences with over 300,000 tokens from various sources, including online Tagalog-language newspapers.
This work is an offshoot from the following project:
- Project Title: Information distribution and language structure – correlation of grammatical expressions of the noun/verb distinction and lexical information content in Tagalog, Indonesian and German
- Project Heads: Dr. Gerhard Heyer (Leipzig University) and Dr. Nikolaus Himmelmann (University of Cologne)
- Fund Source: German Research Foundation (DFG)
Local Project Members
Project Coordinator: Elsie Marie Or
Link to the published corpus to follow.