Parallel Reverse Treebanks for the Discovery of Morpho-Syntactic Markings
MetadataShow full item record
This paper describes a corpus of syntactic structures and associated sentences. However, it is not a traditional treebank. The syntactic structures are created first and are then associated with sentences in a human language. We therefore call it a reverse treebank (RTB). The RTB has been created for elicitation of sentences in low resource languages. First, a corpus of feature structures is created using a tool suite built by the authors. The second step is to add sentences in a widely spoken language like English or Spanish that express the meanings of each fea-ture structure. We will call this language the Elicitation Language. The third step is to have a bilingual informant translate the sentences into a low resource language. Using an elicitation tool, the informant can also graphically align the words of the Elicitation Language to the words of the low resource language. The result is a high quality parallel, word aligned corpus annotated with feature structures, which we will call a parallel RTB.