Event Extraction for Portuguese: A QA-driven Approach using ACE-2005
A framework in which two separated BERT-based models were fine-tuned to identify and classify events in Portuguese documents, producing a new corpus for Portuguese event extraction and developing an automatic translation pipeline.
Abstract
Event extraction is an Information Retrieval task that commonly consists of\nidentifying the central word for the event (trigger) and the event's arguments.\nThis task has been extensively studied for English but lags behind for\nPortuguese, partly due to the lack of task-specific annotated corpora. This\npaper proposes a framework in which two separated BERT-based models were\nfine-tuned to identify and classify events in Portuguese documents. We\ndecompose this task into two sub-tasks. Firstly, we use a token classification\nmodel to detect event triggers. To extract event arguments, we train a Question\nAnswering model that queries the triggers about their corresponding event\nargument roles. Given the lack of event annotated corpora in Portuguese, we\ntranslated the original version of the ACE-2005 dataset (a reference in the\nfield) into Portuguese, producing a new corpus for Portuguese event extraction.\nTo accomplish this, we developed an automatic translation pipeline. Our\nframework obtains F1 marks of 64.4 for trigger classification and 46.7 for\nargument classification setting, thus a new state-of-the-art reference for\nthese tasks in Portuguese.\n