This work builds an abstract text summarizer for the German text using the state-of-the-art “Transformer” model and proposes an iterative data augmentation approach of using synthetic data additionally along with real German summarization data.
Text summarization is considered as one of the challenging tasks in the NLP community. The availability of multilingual summarization dataset is rare and difficult to construct. In this work, we build an abstract text summarizer for the German text using the state-of-the-art “Transformer” model. We propose an iterative data augmentation approach of using synthetic data additionally along with real German summarization data. To generate synthetic data, the Common Crawl (German) data are exploited, covering different domains. The synthetic data were found effective under low resource condition and particularly helpful for our multilingual scenario where availability of summarizing data is still a challenging issue.