Natural Language Processing (NLP) techniques and tools have become very powerful and are applicable in many domains. In the context of Software Engineering (SE), there are many promising opportunities for the application of NLP to be used to improve SE theory and practice. Recently, investigations have begun to unravel the extent to which large code corpora that can be retrieved from GitHub, StackOverflow, etc., are amenable to analysis using statistical NLP models and algorithms, so that the revolutionary advances in speech recognitions, translation, comprehension, etc. can be applied in SE.

This workshop will bring together an international group of researchers in Statistical NLP, Programming Languages, Software Engineering and related fields for an intensive period of discussion and presentation of results in the area. We invite a range of researchers with both NLP and SE backgrounds to come together, discuss their research, establish datasets, tasks, and baselines, and generally help the field build momentum.

Workshop program

The workshop will be held Sunday, November 13th, co-located with FSE,2016.

Program overview

Sunday, November 13

9:00am –  9:15am
9:15am – 10:30am

KeyNote: TBD

Chris Quirk, Microsoft Research

10:30am – 11am
coffee break
11am – 12:30pm

Paper Presentation

Session 1: Coding Style

- Learning to Name Code Identifiers by Miltiadis Allamanis and Charles Sutton.

- On the Use of Statistical Machine Translation for Code Beautification and Refactoring by Bogdan Vasilescu.

Session 2: Tracing & Translating

- Augmenting Natural Language Analysis with Trace Links to Mine Domain Facts from Software Requirements by Jin Guo and Jane Cleland-Huang.

- Learning to Translate Docstrings to Function Representations in Standard Library Documentation by Kyle Richardson.

- Using Natural Language Processing to Translate Software Project Queries into Structured Form by Jane Cleland-Huang, Jin Guo, Natawut Monaikul, Sugandha Lohar, William Goss and Alexander Rasin.

12:30pm – 2:00pm
2:00pm – 3:30pm

Technical Briefing: Statistical NLP for Software

Miltiadis Allamanis, Edinburgh University

3:30pm – 4:00pm
coffee break
4:00pm – 5:30pm

Paper Presentation

Session 3: Language Models and Code Cloning

- Entropy Guided Spectrum Based Testing by Matthew Irvine and Baishakhi Ray.

- A deep language model for software code by Hoa Khanh Dam, Truyen Tran and Trang Pham.

- Can I Copy this Code? Extracting Norms from Software Licenses using Frame Semantics by Sayonnha Mandal, Robin Gandhi and Harvey Siy.

Session 4: Search & Retrieval

- Towards Improving Q&A Forum Search and Mining: Automatic Identification of Developer Goal and Symptom by JZachary R. Senzer, Lori Pollock and K. Vijay-Shanker.

- Finding Similar Projects in GitHub using Word2Vec and WMD by Md Masudur Rahman.


We gratefully acknowledge sponsorship from NSF. A limited amount of Travel funding is available to support participant travel. As per sponsor guidance, priority for travel funding will be given to Natural-language processing researchers and their students, and secondarily to software engineering students and faculty, with special considerations for people from under-represented groups and those without other funding sources.

Call for participation

We invite short position papers, of at most 4 pages in length. Submissions will be reviewed primarily for relevance, will not appear in ACM Digital Library, and may be published subsequently elsewhere. A few of the submissions will be invited for presentation.


Please submit your paper here:

Important dates

Aug 8, 2016
1-4 page paper due
Nov 13, 2016
Workshop date

Program Committee

Program Chairs
Prem DevanbuUniversity of California, Davis
Baishakhi RayUniversity of Virginia
Program Committee
Abram HindleUniversity of Alberta
Charles SuttonUniversity of Edinburgh
Christian BirdMicrosoft Research, Redmond  
Dana Movshovitz-Attias   Google Research
Denys PoshyvanykCollege of William & Mary
Earl BarrUniversity College London
Tien N. NguyenIowa State University
Vladimir FilkovUniversity of California, Davis
Zhendong SuUniversity of California, Davis


  • Prem Devanbu (UC Davis)
  • Tien N. Nguyen (Iowa State University)
  • Baishakhi Ray (University of Virginia)
  • Earl Barr (University College, London)
  • Christian Bird (Microsoft Research)


For questions or comments about the workshop, please contact Baishakhi Ray.