This report thinks that the essential difference is in how much manual feature engineering is applied and how much data is available to train a huge yet high quality matrix of parameters.
• {PL+NL} Comprehension: Code search (SEARCH), sentimental analysis (SENTI), code/repo topic modelling (TOPIC), type inference (SEM), semantic/syntax defect detection (BUGLOC) Since around 2015, the researchers of NLP and SE communities started work together closely, and produced a line of exploratory work in this area. There are two flavors of NLP techniques being applied to these problems: (1) Training-light, or traditional NLP: semantic parsing (SemPar) and language modelling (LM). (2) Training-heavy, or deep learning NLP: sequence/tree-shaped encoder and decoder based on recurrent neural network (RNN) architecture, attention mechanism etc. However, these two flavors are not totally separate. Instead, we think that the essential difference is in how much manual feature engineering is applied and how much data is available to train a huge yet high quality matrix of parameters. However, this is a young field and naturally, we are wondering what is proven to be working and what is not, and what can we learn from that. Specifically, in this report, we ask the following two questions: