There are multiple reasons for analyzing a version control system like your Git repository. See for example Adam Tornhill’s book “Your Code as a Crime Scene” or his upcoming book “Software Design X-Rays” for plenty of inspirations:
You can analyze knowledge islands, distinguish often changing code from stable code parts, identify code that is temporal coupled to other code.
Having the necessary data for those analyses in a Pandas DataFrame gives you many possibilities to quickly gain insights into the evolution of your software system in various ways…
Software version control systems contain a huge amount of evolutionary data. It’s very common to mine these repositories to gain some insight about how the development of a software product works. But there is the need for some preprocessing of that data to avoid false analysis.
That’s why I show you how to read the commit information of a Git repository into Pandas’ DataFrame!
In preparation for a talk about performance optimization, I needed some monstrous amounts of fake data for a system under test. I choose the Spring Pet Clinic project as my “patient” because there are some typical problems that this application does wrong. But this application comes with round about 100 database entries. This isn’t enough at all…
I’m a huge fan of the software analysis framework jQAssistant. It’s a great tool for scanning and validating various software artifacts. But I also love Python Pandas as a powerful tool in combination with Jupyter notebook for reproducible Software Analytics.
Combining these tools is near at hand. So I’ve created a quick demonstration for “first contact” 🙂