wirespeed

the hypothetical maximum data transmission rate of a telecommunications medium

Posts Tagged ‘subversion’

Subversion – how many times has a file been modified?

Posted by dlandgren on 2011-06-08

Someone asked me the other day at work, which file in a project has undergone the most changes. The idea being to look at which files are “hot”, as in frequently touched, and which files are “cold”, rarely edited.

This information is not available directly, but can be assembled from the commit history.

The first step is to produce the list of files changed in each commit since the beginning of time:

svn log -qvr 1:HEAD

which will produce a rather verbose description of what files were changed in each revision:

r382 | david | 2011-04-07 15:32:57 +0200 (Thu, 07 Apr 2011)
Changed paths:
   M /trunk/Assemble.pm
   M /trunk/Changes
   M /trunk/MANIFEST
   M /trunk/README
   M /trunk/t/03_str.t
   M /trunk/t/09_debug.t
   A /trunk/t/10_perl514.t
------------------------------------------------------------------------
r387 | david | 2011-04-17 16:29:59 +0200 (Sun, 17 Apr 2011)
Changed paths:
   M /trunk
   M /trunk/MANIFEST
   M /trunk/t/03_str.t
   M /trunk/t/09_debug.t
   D /trunk/t/10_perl514.t
------------------------------------------------------------------------

What we want to do is throw away all the fluff in each revision stanza, and retain the file paths. A quick Perl one-liner will do that for us, by printing out only the lines between “Changed paths” and a line of dashes. Since this will also include the delimiting lines, the line is also tested to ensure it starts with a space. This is probably overkill, but offers a slightly improved guarantee against surprises.

perl -nle 'print if /^Changed paths:/ ... /^-+$/ and /^\s/'

If the svn log output it piped through this, we obtain

   M /trunk/Assemble.pm
   M /trunk/Changes
   M /trunk/MANIFEST
   M /trunk/README
   M /trunk/t/03_str.t
   M /trunk/t/09_debug.t
   A /trunk/t/10_perl514.t
   M /trunk
   M /trunk/MANIFEST
   M /trunk/t/03_str.t
   M /trunk/t/09_debug.t
   D /trunk/t/10_perl514.t

The next step is to throw away the Subversion action code, and discard any paths not under /trunk (such as /branches or /tags). To do this, we’ll attempt a substitution that eliminates the leading space, some non-space characters and a space, and then capture a path that begins with /trunk. If this succeeds, then print the line:

perl -nle '/^Changed paths:/ ... /^-+$/ and s/^\s+\S+\s+(\/trunk)/$1/ and print'

Now we’re down to:

/trunk/Assemble.pm
/trunk/Changes
/trunk/MANIFEST
/trunk/README
/trunk/t/03_str.t
/trunk/t/09_debug.t
/trunk/t/10_perl514.t
/trunk
/trunk/MANIFEST
/trunk/t/03_str.t
/trunk/t/09_debug.t
/trunk/t/10_perl514.t

Now it’s a simple matter to either grep for the file we want to look for, or count how many times each file occurs, and sort the files by the number of times they appear. The latter is done trivially with the Unix toolkit: sort, count unique occurrences, and sort by count:

sort | uniq -c | sort -n

Which results in

...
  40    M /trunk/t/00_basic.t
  58    M /trunk/Changes
  59    M /trunk/t/03_str.t
 109    M /trunk/Assemble.pm

So putting it altogether, the magic command is

svn log -qvr 1:HEAD|perl -nle 'print if /^Changed paths:/ ... /^-+$/ and /^\s/' \
    | sort | uniq -c | sort -n

And the deed is done.

Advertisements

Posted in perl, programming | Tagged: , | 1 Comment »