User Tools

Site Tools


empirical_evidence_of_large-scale_diversity_in_api_usage_of_object-oriented_software

This is an old revision of the document!


Abstract

In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on “usage diversity'”, defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software engineering knowledge and research.

Comments

Yann-Gaël Guéhéneuc, 2013/10/14

The authors ask several interesting, intriguing questions related to the design and implementation of object-oriented classes. Using a biological metaphor, that of the bio-diversity in eco-systems, they ask whether developers use a given in the same way and what are the factors causing API usage diversity. Thus, they pursue the noble goal to critisise object-oriented design principle, for example the single responsibility principle.

The authors reuse their previous work on “type-usage”, where a type-usage describe abstractly the use of a class but providing a set of method calls on the variables of a given class. “Type-usages abstract over tokens, control flow and variables interplay.” They introduce the sets including type-usage with one method, two methods, etc. as {|TU|=1}, {|TU|=2}, etc.

Interestingly, the authors propose a measure of abundance that seems to confirm a discussion with Daniel M. German about the fact that “a large majority [of classes] are used a small number of times”, which means that, given a name, it is possible to identify with almost 100% precision the JAR where it comes from.

The authors should pursue their study by taking into account the domain object described by the classes. Indeed, class String is intrinsically low-level and the target of many different, unrelated operations. In contrary, a class such as Thread for example is probably subject to constraints in the set of operations applicable and the order of their application. Thus, I would expect String and Thread to have very different API usage and API diversity.

I would like to see the type-usages when {|TU|=2}, {|TU|=3}, etc. because I would expect these type-usages to really tell us about the responsibilities of classes. Also, I would like to study the proportions of classes with some sets of type-usages but not others: what does it mean for a class to have a non-empty set {|TU|=1} but an empty set {|TU|=2} and vice-versa?

One of the main limitation that I see in the authors' work is that type-usage are “sets”, i.e., unordered collections of method calls while some class require that their methods be called in very particular orders on their instances. A possible continuation of the authors' study could involve defining a concept of “dynamic type-usages” that could be build from dynamic traces and, hence, that would intrinsically account for the orders of the calls.

empirical_evidence_of_large-scale_diversity_in_api_usage_of_object-oriented_software.1381806270.txt.gz · Last modified: 2019/10/06 20:37 (external edit)