Towards Learning Code Semantics from Big Code

12.06.2017, 14:00 – 15:00

12.06.2017 14:00-15:00

Speaker: Miltos Allamanis, Microsoft Research, Cambridge | Location: Hochschulstraße 10 (S2|02), Piloty Building, Room A126, Darmstadt

Organizer: Software Lab (SoLa) / Prof. Michael Pradel


Analyzing software with deep learning is a new and exciting interdisciplinary area yielding promising results. One of the important challenges in the area, is representing source code semantics with machine learning.

First, we notice that good variable names are highly indicative of the variable's role and function. Thus, to predict variable names, we need to learn elements of variable semantics. Naturalize, a neural network that learns to predict variable names, tries to achieve this by learning distributed vector representations of variables.

Then, I will introduce SmartPaste, a variant of the program repair problem that requires to adapt a given (pasted) snippet of code to the surrounding, existing source code. We design a set of deep neural models that learn to represent the context of each variable location and variable usage in a data flow-sensitive way. Our evaluation suggests that our models learn to solve the SmartPaste task in many cases, achieving 58.6% accuracy, while learning meaningful representation of variable semantics.


I am a postdoc researcher at Microsoft Research, Cambridge, UK and a member of the Deep Program Understanding program. My research concerns application of machine learning and natural language processing to software engineering and programming languages to create smart software engineering tools for developers. In the era of “big code”, code is a form of data that can be manipulated by machine learning methods to provide useful software engineering tools, interfaces and insights. I focus on developer tools with a strong machine learning component, while using problems of this area to motivate machine learning research.

(List of publications)