Reading a Git repo’s commit history with Pandas efficiently

Reading a Git repo’s commit history with Pandas efficiently

There are multiple reasons for analyzing a version control system like your Git repository. See for example Adam Tornhill’s book “Your Code as a Crime Scene” or his upcoming book “Software Design X-Rays” for plenty of inspirations:

You can analyze knowledge islands, distinguish often changing code from stable code parts, identify code that is temporal coupled to other code.

Having the necessary data for those analyses in a Pandas DataFrame gives you many possibilities to quickly gain insights into the evolution of your software system in various ways…

Storing Git commit information into Pandas’ DataFrame

Storing Git commit information into Pandas’ DataFrame

Software version control systems contain a huge amount of evolutionary data. It’s very common to mine these repositories to gain some insight about how the development of a software product works. But there is the need for some preprocessing of that data to avoid false analysis.

That’s why I show you how to read the commit information of a Git repository into Pandas’ DataFrame!

A simple demo on how to use Python Pandas with jQAssistant / Neo4j

A simple demo on how to use Python Pandas with jQAssistant / Neo4j

I’m a huge fan of the software analysis framework jQAssistant. It’s a great tool for scanning and validating various software artifacts. But I also love Python Pandas as a powerful tool in combination with Jupyter notebook for reproducible Software Analytics.

Combining these tools is near at hand. So I’ve created a quick demonstration for “first contact” 🙂