This is an old revision of the document!

Abstract

Little is known about the specific kinds of questions programmers ask when evolving a code base and how well existing tools support those questions. To better support the activity of programming, answers are needed to three broad research questions: (1) what does a programmer need to know about a code base when evolving a software system? (2) how does a programmer go about finding that information? and (3) how well do existing tools support programmer?s in answering those questions? We undertook two qualitative studies of programmers performing change tasks to provide answers to these questions. In this paper, we report on an analysis of the data from these two user studies. This paper makes three key contributions. The first contribution is a catalog of 44 types of questions programmers ask during software evolution tasks. The second contribution is a description of the observed behavior around answering those questions. The third contribution is a description of how existing deployed and proposed tools do, and do not, support answering programmers' questions.

Comments

Yann-Gaël Guéhéneuc, 2014/01/08

In this paper, the authors seek to provide the typical questions asked by developers during real change tasks. Knowing such questions could help designing tool support that is more efficient than today's tool support. It could also help refining models of program comprehension. Thus, it complements nicely previous works on models of program comprehension, programmers' questions, and empirical studies of change. This paper includes a very interesting list of references but unfortunately does not relate in details the cited previous work with the 44 questions discussed in the paper. In particular when it describes other works on programmers' questions, it does not explicitly relate previous works by:

Jonhson and Erdem who studied Usenet newsgroup and classified questions as goal oriented, symptom oriented, and system oriented;
Herbsleb and Kuwana who studied design meetings and classified questions according to their targets (evolve, task assignment, interface, realization, and identity), attribute (who, what, how, why, when), and lifecycle stages (requirements, design, implementation, maintenance);
Letovsky who studied programmers and classified their conjectures as why, how, what, whether, and discrepency;
Erdos and Sneed, based on their personal experience, who reported questions such as “where is a particular subroutine/procedure invoked” and “what are the arguments and results of a given function”;
Erdem et al. who reused their study of Usenet newsgroups and classified questions based on their topic, type, and relation;
Ko et al. who studied co-located software teams and classified questions as about writing code, submitting change, triaging bugs, reproducing a failure, understanding execution behaviour, reasoning about a design, and maintaining awareness;

It would have been interesting to relate all these different sets of questions and identify their intersections, redundancies, and gaps. In particular, there seems to be little interest about the behaviour… Other interesting questions pertain to the possible errors when mapping (top-down) or grouping (bottom-up) concepts and the stopping condition. It seems that few authors studied the errors that developers make when understanding programs (and why they make them). They also include relating the different sets of questions to comprehension theory/models, such as the theory of cognitive support or distributed cognition.

The authors observed two sets of developers: pair programmers in an artificial environment (pairs of students performing change tasks on an unknown system) and in a real environment (professional developers performing changes on a system that their company develops). The use a grounded-theory analysis to code the collected audio recordings (and others) and, as categories emerge, to perform further selective sampling and gather more variations about the categories. “The aim here is to build rather than test [a] theory and the specific result of this [analysis] is a theoretical understanding of the situation of interest that is grounded in the data collected.”

Although some explanations are provided, I am confused about the choice of pairs of programmers in the artificial environment. In both environment, the limits of the used think-aloud method are not discussed: in particular, think aloud and problem with social desirability. This limit could explain the lack of “more involved” question regarding design choices (not the how/what but the why) and the behaviour of the systems.

The authors considered a code source as a graph of entities and categorised the questions as finding focus points, expanding focus points, understanding a subgraph, and questions over groups of subgraphs. Surprisingly, they mentioned neither design patterns as typical cases of “subgraph” nor Sim's structured transition graphs to explain the “jumps” that developers make between categories. The 44 questions are as follows:

Finding focus points
1. Which type represents this domain concept or this UI element or action?
2. Where in the code is the text in this error message or UI element?
3. Where is there any code involved in the implementation of this behavior?
4. Is there a precedent or exemplar for this?
5. Is there an entity named something like this in that unit (project, package, or class, say)?
Expanding a focus point
1. What are the parts of this type?
2. Which types is this type a part of?
3. Where does this type fit in the type hierarchy?
4. Does this type have any siblings in the type hierarchy?
5. Where is this field declared in the type hierarchy?
6. Who implements this interface or these abstract methods?
7. Where is this method called or type referenced?
8. When during the execution is this method called?
9. Where are instances of this class created?
10. Where is this variable or data structure being accessed?
11. What data can we access from this object?
12. What does the declaration or definition of this look like?
13. What are the arguments to this function?
14. What are the values of these arguments at runtime?
15. What data is being modified in this code?
Understanding a subgraph
1. How are instances of these types created and assembled?
2. How are these types or objects related (whole-part)?
3. How is this feature or concern (object ownership, UI control, etc.) implemented?
4. What in this structure distinguishes these cases?
5. What is the behavior that these types provide together and how is it distributed over the types?
6. What is the “correct” way to use or access this data structure?
7. How does this data structure look at runtime?
8. How can data be passed to (or accessed at) this point in the code?
9. How is control getting (from here to) here?
10. Why is not control reaching this point in the code?
11. Which execution path is being taken in this case?
12. Under what circumstances is this method called or exception thrown?
13. What parts of this data structure are accessed in this code?
Questions over groups of subgraphs
1. How does the system behaviour vary over these types or cases?
2. What are the differences between these files or types?
3. What is the difference between these similar parts of the code (e.g., between sets of methods)?
4. What is the mapping between these UI types and these model types?
5. Where should this branch be inserted or how should this case be handled?
6. Where in the UI should this functionality be added?
7. To move this feature into this code, what else needs to be moved?
8. How can we know that this object has been created and initialized correctly?
9. What will be (or has been) the direct impact of this change?
10. What will the total impact of this change be?
11. Will this completely solve the problem or provide the enhancement?

At the end, the authors report the frequencies of the questions but among groups of developers, not among sessions: “[i]f a specific question is asked repeatedly in a session, it is counted only once […]”, which undermine the usability of the number: it would have been much more interesting to know how often a question is asked in general! Also, the authors identify tool support for the questions but remain quite general. An in-depth study would be required. They conclude that “[b]ecause tools typically [provide] only limited support for defining the scope over which to operate, [developers] end up asking questions more globally than they intend and, so, the result sets presented will include many relevant items”, which is “[an opportunity] for tools to make use of the larger context to help [developers] more effectively scope their questions and to determine what is relevant to their higher level questions”.