I’ve started digging into data analysis in software development back in 2014. During that time, I was working on my master thesis with the topic that translates into English like this:
“Usages of automated analysis of artifacts and metadata from software projects to support the optimization of maintainability of long-lasting software systems”
This was also the time where the hype around Software Analytics was everywhere (at least in academia). So I found some very interesting papers at that time that I want to share with you if you are curious what this “Software Analytics” thing is all about.
Mining Version Histories to Guide Software Changes
Thomas Zimmermann, Peter Weisgerber, Stephan Diehl, and Andreas Zeller. 2004. Mining Version Histories to Guide Software Changes. In Proceedings of the 26th International Conference on Software Engineering (ICSE ’04). IEEE Computer Society, Washington, DC, USA, 563-572.
This was the first paper I’ve ever read on the topic. I came across the paper by simply typing some random terms like “data, mining, software, version, control” into Google. In the meantime, this is a classic paper of technical excellence made by some of the pioneers in the area. It also was awarded the ICSE 10 Years Most Influential Paper Award 2014. I’m really fortunate that I stumbled across this paper at that time. It’s far away from an introduction to Software Analytics, but it shows the deepness that a single analysis can go into.
The inductive software engineering manifesto
Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte, and Ekrem Kocaganeli. 2011. The inductive software engineering manifesto: principles for industrial data mining. In Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering (MALETS ’11). ACM, New York, NY, USA, 19-26.
I like this paper because it shows some aspects that you have to be aware of if you want to implement Software Analytics in industry. There are several principles that can guide you to get through the vast amount of information and tools out there.
Analyze This! 145 Questions for Data Scientists in Software Engineering
Andrew Begel and Thomas Zimmermann. 2014. Analyze this! 145 questions for data scientists in software engineering. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 12-23.
This is more like “meta paper” about the questions that arise during software development. The authors asked several employees, gathered and clustered a multitude of questions. This paper is important in two ways: First, you see a methodology how you can take and structure many different questions. Second, if you are a student and are searching for the research question, just take a look at the variety of existing questions that are still searching for answers.
Software intelligence: the future of mining software engineering data.
Ahmed E. Hassan and Tao Xie. 2010. Software intelligence: the future of mining software engineering data. In Proceedings of the FSE/SDP workshop on Future of software engineering research (FoSER ’10). ACM, New York, NY, USA, 161-166.
This paper discusses (among other topics) which types of software data can be used with existing data mining techniques. I also like this paper because it comes from another school of thoughts.
Software Analytics: So What?
Tim Menzies and Thomas Zimmermann. 2013. Software Analytics: So What?. IEEE Softw. 30, 4 (July 2013), 31-37.
This paper is more like the introducing chapter to a special two-part series of the IEEE Software magazine. But it also contains a critical discussion on how far has Software Analytics got in the recent years. From this article, you can dive very deep into the whole topic.
As of today, there are also some books available that I recommend if you want to go further on this topic. The first two are more targeted to an academic audience, where the third one can get developers started very quickly to apply first data analysis on their own software data:
- Christian Bird, Tim Menzies, Thomas Zimmermann: The Art and Science of Analyzing Software Data. Morgan Kaufmann, 2015. Link to an early version (GitHub)
- Tim Menzies, Laurie Williams, Thomas Zimmermann: Perspectives on Data Science for Software Engineering. Morgan Kaufmann, 2016. Link to publisher (Elsevier)
- Adam Tornhill: Software X-Ray. Pragmatic Programmers, 2018. Link to publisher (PragProg)
So, that’s it. Do you have any other “must-reads” to quickly get into the topic of Software Analytics? Let me know!