The experimental result of PASCALVOC show that the copy-pastemethod model can be used to improve the performance of deep learning models in many vision tasks such as objectdetection, by responding to reasonable generationfortraining examples byannotatinggroundtruth onfreespace according to the placementrules.
Data augmentation is an important technique to improve the performance of deep learning models in many vision tasks such as objectdetection.Recently,someworksproposedthecopy-pastemethod,whichaugmentstrainingdatasetbycopyingforeground objectsandpastingthemonbackgroundimages.Bydesigningalearning-basedcontextmodeltopredictrealisticplacementregions,theseapproacheshavebeenprovedtobemoreeffectivethantraditionaldataaugmentationmethods.However,the performanceoftheexistingcontextmodelwaslimitedbythreeproblems:(1)Thedefinitionsofpositiveandnegativesamples generatetoomuchlabelnoise.(2)Theexampleswithmaskedregionslosealotofcontextinformation.(3)Thesizes(i.e.,scaleand aspectratios)ofpredictedregionsaresampledfromapriorshapedistribution,whichleadstoacoarseestimation.Inthiswork,wefirstexploretheplacementrulesthatgeneraterealismandeffectivetrainingexamplesfordetectors.Andthen,wepropose atrainablecontextmodelinordertofindproperplacementregionsbyclassifyingandrefiningdensepriordefaultboxes.Wealsodesignacorrespondingreasonablegenerationfortrainingexamplesbyannotatinggroundtruthonfreespaceaccording totheplacementrules.TheexperimentalresultsonPASCALVOCshowthatourapproachoutperformsthestate-of-the-art-relatedwork. © 2021 The Authors . Published by Atlantis Press B.V. This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).