Site Network: Home |

GSoC ends, but work goes on

Well GSoC came to an end. Thanks to my supportive mentor, Seth Lemons, for grading my work as satisfactorily meeting the intial requirements, and thus making my GSoC work a success.


But the work doesn't come to an end here. I plan on keeping this blog alive. Though got really busy for the past couple of weeks, with college coming to an end, project thesis presentations, shifting to a new place, setting up the new place... it was pretty much a mess.

Now, I've plans to work on my fork of figleaf. I made the C and Python report integration work properly, but the tool needs to be made user friendly. I still haven't submitted the patches for the test suite of Py3k that I wrote, need to get feedback from devs on that, and write more tests in the meanwhile.

GSoC was a great gateway into the world of core Python development, and I plan to make a good use of it, for the long term.

Keep expecting updates.

os.py test cases

Added 7 new test cases to test_os.py which tests the os module.

Following is a test class to test os.renames(). It starts by creating a temporary directory tree, and then renaming it. A walk is performed on the top most directory later on, and it is examined to verify that the renaming has taken place.


class RenamesDirTest(unittest.TestCase):
def setUp(self):
if os.path.exists(support.TESTFN):
os.remove(support.TESTFN)
os.mkdir(support.TESTFN)

def test_renames(self):
base=support.TESTFN
self.old=os.path.join(base, "dir1", "dir2", "dir3", "dir4")
os.makedirs(self.old)
self.new=os.path.join(base, "dir1", "dir2", "test3", "test4")
os.renames(self.old, self.new)
for (dirpath, dirnames, filenames) in os.walk(base, topdown=False):
self.assertEqual(self.new, dirpath)
break

def tearDown(self):
if os.path.exists(self.new):
os.removedirs(self.new)
elif os.path.exists(self.old):
os.removedirs(self.old)

A test for os.chdir(). Current directory for the current process is changed using os.chdir(), and later confirmed by calling os.getcwd(). A test case to see that os.chdir() fails when called with a directory argument that doesn't exist is also added.

#Tests for changing directory paths
class ChangePathTests(unittest.TestCase):
def setUp(self):
self.tempdir=support.TESTFN
os.mkdir(self.tempdir)
self.tempdir2=support.TESTFN+"2"

def test_chdir(self):
if os.path.exists(self.tempdir):
os.chdir(self.tempdir)
cwd=os.getcwd()
self.assertEqual(cwd, self.tempdir)

#Test to check chdir fails if nonexisting directory passed
def test_chdir_nonexistent(self):
if os.path.exists(self.tempdir2):
os.rmdir(self.tempdir2)
try:
os.chdir(self.tempdir2)
except OSError:
pass
else:
self.fail("Did not raise OSError")

def tearDown(self):
if os.path.exists(self.tempdir):
os.rmdir(self.tempdir)

A test class to test os.geteuid() and os.getgid(). Both are verified by comparing with values returned by os.stat().

class PosixGetUidGidTests(unittest.TestCase):
def setUp(self):
f=os.open(support.TESTFN, os.O_CREAT|os.O_RDWR)
os.close(f)
self.stats=os.stat(__file__)

def tearDown(self):
if os.path.exists(support.TESTFN):
os.remove(support.TESTFN)

if hasattr(os, "geteuid"):
def test_geteuid(self):
self.assertEqual(os.geteuid(), self.stats.st_uid)

if hasattr(os, "getgid"):
def test_getgid(self):
self.assertEqual(os.getgid(), self.stats.st_gid)

A test class for testing os.getenv() and os.putenv().

class GetPutEnvironTests(unittest.TestCase):
def test_putenv(self):
try:
os.putenv("KEY", "VALUE")
except:
self.fail("Not able to set environment variable")

def test_getenv(self):
keyvalue={"KEY":"VALUE"}
os.environ["KEY"]=keyvalue["KEY"]
value=os.getenv("KEY")
self.assertEqual(value, keyvalue["KEY"])

I am designing a few more tests to test the os.spawn* family of functions.

It's done and its working. And there isn't much new to tell. If you have read my report on the integration of figleaf for py2.6, the details are very similar, just that the syntax is of course py3kish. I used Titus' port of figleaf to py3k (http://github.com/ctb/figleaf/tree/py3k), with a few minor fixes.

The working is very similar to how the figleaf for py2.6 works. You have to use the -c/--c-coverage switch to give the directories holding C modules built with gcov support, and figleaf will incorporate their coverage report as well.

Now its time to work on writing new test cases for different modules. Many modules aren't 100% covered by its test suites, and there is a lot of room for improvements there. Of course I can't make all of them get completely covered, but I will write as much new tests as possible. I hope once I get started on the process and get used to it and get better familiarity with the code base, I can go on with the tests improvement well beyond GSoC.

Meanwhile, I am also going to give a try to make figleaf easy to use with gcov, and try to automate the static compilation of C modules for use with gcov.

Short note...

Figleaf working with py3k now after a few minor modifications. Time to put its integration with gcov code in place.

Quick note

This week's report; pace of work getting a bit slower again, need to pace it up.

Stuck at the following error for the night while porting figleaf to py3k... I hope it doesn't prove out to be a long night :/

test_decimal
Exception in thread Thread-53:
Traceback (most recent call last):
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 445, in getcontext
return _local.__decimal_context__
AttributeError: '_thread._local' object has no attribute '__decimal_context__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/threading.py", line 509, in _bootstrap_inner
self.run()
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/threading.py", line 462, in run
self._target(*self._args, **self._kwargs)
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/test/test_decimal.py", line 1065, in thfunc1
test1 = d1/d3
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 1273, in __truediv__
context = getcontext()
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 447, in getcontext
context = Context()
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 3757, in __init__
for name, val in locals().items():
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-54:
Traceback (most recent call last):
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 445, in getcontext
return _local.__decimal_context__
AttributeError: '_thread._local' object has no attribute '__decimal_context__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/threading.py", line 509, in _bootstrap_inner
self.run()
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/threading.py", line 462, in run
self._target(*self._args, **self._kwargs)
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/test/test_decimal.py", line 1077, in thfunc2
test1 = d1/d3
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 1273, in __truediv__
context = getcontext()
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 447, in getcontext
context = Context()
File "/home/shuaib/Tools/Projects/GSoC2009/py3k/Lib/decimal.py", line 3757, in __init__
for name, val in locals().items():
RuntimeError: dictionary changed size during iteration

Actually, it works...

Though I've a few doubts. I need to discuss it with the devs if the solution I provided indeed is the best way to go... Titus, a mail coming your way soon :)

For the time being, lets talk about what has been done.

If you want to get the C coverage now with figleaf for your python repository, you have to make sure you execute figleaf from within the parent directory where you built python with make. Execute figleaf using your newly compiled python interpreter, passing -c/--c-coverage option on the command line with a list of comma separated directories to look for C code compiled for coverage with gcov:

$./python ../figleaf-github/figleaf/bin/figleaf -cModules Lib/test/regrtest.py test_zlib.py

In my case, I had only compiled zlib statically into python for C coverage, so I am running its test suite only yet. The module itself is located in Modules subdirectory of the Python source, so passing it along with -c option. And here is the output:

test_zlib
1 test OK.
File '/usr/include/sys/sysmacros.h'
Lines executed:0.00% of 6
/usr/include/sys/sysmacros.h:creating 'sysmacros.h.gcov'

File '/usr/include/sys/stat.h'
Lines executed:0.00% of 12
/usr/include/sys/stat.h:creating 'stat.h.gcov'

File './Modules/zlibmodule.c'
Lines executed:74.11% of 448
./Modules/zlibmodule.c:creating 'zlibmodule.c.gcov'
The gcov generates coverage report for the module in the current directory. Convert it to html using figleaf2html:

$../figleaf-github/figleaf/bin/figleaf2html

And here is the output for me:
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/getopt.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/contextlib.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/__future__.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/posixpath.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/os.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/random.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/test/test_support.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Modules/zlibmodule.c
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/warnings.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/fnmatch.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/test/__init__.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/sre_compile.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/genericpath.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/test/regrtest.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/sre_parse.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/test/test_zlib.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/unittest.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/socket.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/shutil.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/functools.py
reported on /home/shuaib/Tools/Projects/GSoC2009/release26-maint/Lib/re.py
reported on 21 file(s) total

figleaf: HTML output written to html
And here is a screen shot of the report:


And now all about how it works...

You start by compiling your Python with the modules you want to perform C code coverage for, built statically into it. This is something I've yet left to the user to do manually, after Seth suggested we move onto the report generation itself first as that was the priority. Later on may be I'll add auto static linkage of the C modules onto figleaf's functions.

Once you have compiled Python with the C modules statically linked in, and with the gcov options, there will be files generated for them by gcc with the filename structure of source.c.gcda. Passing the directory where these sources exist to figleaf with -c/--c-coverage option, makes figleaf look for the *.gcda files in that directory, and calling gcov on them for C report generation. Here is the function that performs this:

def get_c_coverage():
cov = {}
dir_file={}
dirs=c_code_dirs.split(",")
gcov_cmd=os.environ.get("COV", "gcov")
for d in dirs:
files=os.listdir(d)
for f in files:
if f.split(".")[-1]=="gcda":
os.system("%s %s -o %s" % (gcov_cmd, f.split(".")[0], d))
cov[d+"/"+".".join(f.split(".")[0:-1])+".c"]=d
try:
source=open(".".join(f.split(".")[0:-1])+".c.gcov", "r")
except IOError:
print "Can't open file: ", ".".join(f.split(".")[0:-1])+".c.gcov"
continue
exe_lines=[]
for line in source:
line=line.rstrip("\n")
if line.count(":") < 2:
continue
(count, lineno, code)=line.split(":", 2)
if count.strip()=="-" or count.strip()=="#####":
continue
else:
exe_lines.append(int(lineno))
cov[d+"/"+".".join(f.split(".")[0:-1])+".c"]=set(exe_lines)
return cov

This function has been added to figleaf's __init__ file, so it is located in the same file as the figleaf's main function. The above function is called from write_coverage():
def write_coverage(filename, append=True):
"""
Write the current coverage info out to the given filename. If
'append' is false, destroy any previously recorded coverage info.
"""
if _t is None:
return

data = internals.CoverageData(_t)

d = data.gather_files()

# sum existing coverage?
if append:
old = {}
fp = None
try:
fp = open(filename, 'rb')
except IOError:
pass

if fp:
old = load(fp)
fp.close()
d = combine_coverage(d, old)

# ok, save.
if c_code_dirs:
c_coverage=get_c_coverage()
if c_coverage:
d=combine_coverage(c_coverage, d)
outfp = open(filename, 'wb')
try:
dump(d, outfp)
finally:
outfp.close()

Here as you can see, figleaf checks if C coverage has been enabled, and calls get_c_coverage(). The result is appended to the Python coverage report, and consequently written to the output file.

Now to generate the html report, I've made a few modifications to a number of functions. Starting with...
def build_python_coverage_info(coverage, exclude_patterns, files_list):
keys = coverage.keys()

line_info = {}
lines=set([])
for pyfile in filter_files(keys, exclude_patterns, files_list):
try:
fp = open(pyfile, 'rU')
if pyfile.split(".")[-1]=="py":
lines = figleaf.get_lines(fp)
else:
lines = figleaf.get_c_lines(pyfile)
except KeyboardInterrupt:
raise
except IOError:
logger.error('CANNOT OPEN: %s' % (pyfile,))
continue
except Exception, e:
logger.error('ERROR: file %s, exception %s' % (pyfile, str(e)))
continue

# retrieve the coverage and merge into a realpath-based filename.
covered = coverage.get(pyfile, set())
realpath = os.path.realpath(pyfile)

# be careful not to overwrite existing coverage for different
# filenames that may have been canonicalized.
(old_l, old_c) = line_info.get(realpath, (set(), set()))
lines.update(old_l)
covered.update(old_c)

line_info[realpath] = (lines, covered)

return line_info

Here you can see a check for whether the file to generate the line information for is a Python source or a C source. The check isn't too generic, but works for the time being. In case of a C source, it calls a different function I wrote in __init__.py:

def get_c_lines(fp):
"""
Return the set of interesting lines in the C source code read
from this file.
"""
lines=[]
fp=os.path.basename(fp)
try:
fp=open("./"+fp+".gcov", 'r')
except IOError:
print "Can't open: ", "./"+fp+".gcov"

for line in fp:
line=line.rstrip("\n")
if line.count(":") < 2:
continue
(count, lineno, code)=line.split(":", 2)
if count.strip()=="-":
continue
else:
lines.append(int(lineno))
return set(lines)

It looks for all the lines in the source file that are marked as executable by gcov. Somewhat similar to what is already done in figleaf for Python code, but here is where the doubts arise.

I am not sure if this is the best way to check for the executable lines. It does generate accurate report compatible with what gcov generates, but I've seen gcov marking lots of lines as not executable that I would think should be marked otherwise. For example it skips on declaration statements. I was wondering if relying on gcov's interpretation of what lines are executable and what not is the right way to go here. Something to discuss with my supervisor... :|

Hi again,

I thought I would update the blog now instead of waiting for my code to work properly so the progress reports keep coming in on time.

Well I've worked on making sure figleaf now takes into account the C code coverage too. I added an additional command line option to figleaf "-c/-c-coverage" which take a comma separated list of directories that would make figleaf look for C code in those directories linked in with gcov options for C code coverage. figleaf would make sure gcov is called for all the files it finds with a .gcda extension, thus generating a C code coverage report for it.

This does add the restriction on figleaf that now it is compulsory to call it from the directory where the make for the project was executed from, as gcov requires it being called from the compilation directory.

I've to make a few modifications to the code achieving the above task, as it is still not very generic. And now I am thinking of getting either my own fork of figleaf, or talking to Titus Brown about where to send the patches to.

Next comes in the integration of the report generated by gcov and figleaf. I made sure the gcov reported is filed in the .figleaf file generated by figleaf, but once you try to convert that file into an html report, the figleaf2html generates a number of indentation errors for the C code. That shows figleaf2html is very Python specific, but I haven't had a detail look at it yet, and that's what I am suppose to do next.

Once this is achieved, I am well on my way doing all of this again for Py3k.

Seth, my mentor suggested adding a command line option to figleaf so it would look for the C directories listed in a file, instead of giving them all on the command line with the "-c/-c-coverage" option. I'll make sure that's incorporated.

Progress...

I've had little progress this week, and couldn't update my blog last week, due to final exams of my final semester. But since the exams are about to end, I think at least I can document my previous to last week's progress.

I analyzed a good number of python C modules using gcov manually. The results seem good and encouraging and gcov seems to be the right tool for the coverage. The fact that it requires modules to be linked in statically into Python is a bit of a trouble, but after having a detail discussion with my supervisor, Seth, we decided on assuming the following two points for my later work on integrating the gcov reports into figleaf:

  1. The user has already compiled Python with modules statically linked in
  2. Proper arguments were used with gcc during the compilation process so to let gcov track the coverage.
Lots of manual work during the past weeks. It is actually time to start coding all of it, and I've to start first with letting figleaf take care of the C code coverage report generation too. I've been looking through the figleaf code to figure out how to best achieve that goal, and will document any changes made here on the blog.

In the meantime, I've also started to look into improving the test coverage by writing new test cases. I found the unittest module to be interesting to start with as it has around half coverage, and would be interesting writing unit tests for the unittest module itself.

Gcov is a very handy tool. I've been using it for a while now to manually trace the C code coverage of different Python 2.6 modules.

Gcov Usage:

To use gcov with a C source file for code coverage, you have to compile your code with GCC and pass it two arguments at the compile time, "-fprofile-arcs -ftest-coverage". For example if you have a source file named "main.c", you would compile it as:

$gcc -o main -fprofile-arcs -ftest-coverage main.c

The compiler will compile your source file and produce another file named "main.gcno". Now run:

$gcov main

You will see that gcov will report 0% code coverage as you haven't run the compiled code yet. Run the compiled program and then use gcov for code coverage report again:

$./main
$gcov main

Depending on how much code in your program is executed, gcov will report the code coverage percentage, and will also produce two new files named "main.c.gcov" and "main.gcda". "main.c.gcov" is the file that holds the record of what lines were executed in your source file and how many times.

Using Gcov with Python:

There is a bit of a problem with Gcov. You need to statically link all of your code into your program in order to get code coverage analysis on it. Python doesn't do that automatically, and there is no easy configure switch to do it. So you have to manually play with the configurations to get different modules statically linked in. To compile a module statically, one way to do is to copy its entry from Modules/Setup into Modules/Setup.local, and put it under a "*static*" heading (without the quotes).

Lets say you want to compile the "mmap" module statically. First just start your python interpreter and import mmap. Now check for the __file__ attribute of the module:

>>>import mmap
>>>print mmap.__file__

The interpreter will print the module's location on the hard drive from where it was imported. Lets compile this module statically into the python interpreter now. We copy its entry from the Modules/Setup file in the python source tree into Modules/Setup.local, and put it under a *static* heading:

*static*
mmap mmapmodule.c -I$(prefix)/include -fprofile-arcs -ftest-coverage -L$(exec_prefix)/lib -lz -lgcov

The part after mmapmodule.c is to make this module work with gcov so we could get coverage report on it. Now compile your python interpreter and look for mmap's __file__ attribute again. This time you will see no such attribute defined for the module, which means it is built in this time.

Changing current directory into Modules and listing files will show that gcov has produced its data files for the mmapmodule. Run gcov on it to get the current coverage reports.

$gcov mmapmodule

You might get some code coverage for it since the import of the module into your python environment runs some of the initialization code. Had you run gcov on it without importing it first, you would have got 0% code coverage. Here is what I get:

File '/usr/include/sys/sysmacros.h'
Lines executed:0.00% of 6
/usr/include/sys/sysmacros.h:creating 'sysmacros.h.gcov'

File '/usr/include/sys/stat.h'
Lines executed:0.00% of 12
/usr/include/sys/stat.h:creating 'stat.h.gcov'

File './Modules/mmapmodule.c'
Lines executed:0.00% of 476
./Modules/mmapmodule.c:creating 'mmapmodule.c.gcov'
./Modules/mmapmodule.c:cannot open source file

Now lets run the mmap testing code and see how much of the actual code this test suite exercises:

$./python Lib/test/test_mmap.py -v
$cd Modules
$gcov mmapmodule

File '/usr/include/sys/sysmacros.h'
Lines executed:0.00% of 6
/usr/include/sys/sysmacros.h:creating 'sysmacros.h.gcov'

File '/usr/include/sys/stat.h'
Lines executed:0.00% of 12
/usr/include/sys/stat.h:creating 'stat.h.gcov'

File './Modules/mmapmodule.c'
Lines executed:71.64% of 476
./Modules/mmapmodule.c:creating 'mmapmodule.c.gcov'
./Modules/mmapmodule.c:cannot open source file

71.64% of the module is exercised by the test suite.

What's Next:

I have traced the code coverage of a number of modules by now. Next is to look for a way to easily compile a major number of modules statically into the Python interpreter so to easily get an automated code coverage report on it. This can be done by uncommenting appropriate entries for the modules in Module/Setup file, but I've been getting into a few problems with it right now. Solving that is the first task, and then the part of making figleaf do all of this automatically.

I've been discovering Gcov for the past week. Gcov is a C code coverage analysis tool developed by GNU. And its fun to use. :)

Well, I tried to do code coverage for a few modules. I started with zlib, well, because I found out someone had already used Gcov for its C coverage. Seems like Gcov isn't able to do code coverage for dynamically linked modules. So recompiling Python with statically linked modules is the first step for C coverage of the modules using Gcov. In case of zlib, it was easy meat. But then I started searching for ways to build whole of Python statically, with modules linked in statically as well. There is no supported way in Python build system for that, but a bit of manual hackery can be used to achieve the goal. You need to specify the modules you want to link statically in Modules/Setup.local. I need to do more research to come up with an efficient way of doing this.

Also found out a nice extension for Gcov called Lcov that can be used to generate nicely formated HTML reports of the coverage analysis.

Project Milestones...

I had a nice discussion with my project supervisor, Seth Lemons, a day back, during which we came up with a more clear strategy of how to go about achieving our goals (also thanks to Titus for his feedback). This also lead to a few modifications to the initial plan that I had proposed at the time of proposal submission. I think it's a good idea to document the strategy we came up with so we have it as a reference here to point to later on:

  1. I am going to start with making C coverage analysis work for Python 2.6. The generic C coverage tool I've decided to use for this purpose is Gcov. I think it makes sense to first manually put Gcov to use with Python for the coverage analysis.
  2. Next comes making figleaf generate a combined C + Python code coverage analysis report, by integrating the C coverage using Gcov with figleaf.
  3. The part up till this point is for Python 2.6 version. So to migrate it to Py3k, I'll need to port figleaf to Py3k. A port of figleaf for Py3k is already available in a somewhat unmaintained form, developed by the author of figleaf (Titus Brown) himself. So I hope this task shouldn't be a tough one.
  4. Once the above parts are completed, I can move to making the C code coverage working with Py3k again using Gcov.
  5. A goal of the project was to increase the code coverage by writing new test cases as well. This task can go on in parallel during all of the above four phases of the project.
The decision to start with Python 2.6 before moving to Py3k was stimulated by two reasons:
  1. Python 2.6 is still widely used so community can gain from the work performed for this version. Plus whatever is done for 2.6, can than be easily migrated to Py3k.
  2. figleaf is currently stable for Python 2.6, so to start with the actual work instead of starting with porting figleaf to Py3k, it only makes sense to to use the current figleaf for Python 2.6 and use it for it's coverage analysis.
I haven't got much experience with Gcov, infact this is going to be my first time using it. So it is hard to estimate the time durations for the project milestones. But I hope once I get started with it, I'll be in a better position to judge the time constraints on the individual tasks.

...officially on 23rd May. I haven't had a head start yet though, mainly due to my college activities (final year project, presentations, mids etc), plus also because I was having discussion with my mentor and the community to get a good idea of what would be the right way of going through with my project, before I actually begin to write the code.

I've been testing different code coverage tools for Python in the past couple of weeks. The ones I specifically tested were coverage.py, Pycoco, and figleaf. All are actually pretty easy to use tools generating good reports. Though I found Pycoco to be slower mainly due to it downloading its own copy of Python and doing code coverage analysis on the downloaded source.

All of the above tools perform code coverage for Python 2.x, though figleaf has been ported by its create (Titus Brown) to Py3k, but I guess it pretty much in early stage. I haven't had it tried out yet but that's something on my priorities list.

Gcov is a code coverage tool that can be used with GCC to perform code coverage analysis on your C code. I'll be using it for C code coverage for Python 2.6. Later I'll extend it to Py3k.

I'll put more details on the blog and update regarding the development work, in the coming days.

Know thyself...

GSoC2009's formal starting date for accepted students to start coding their projects is 23rd May. But to be actually able to start *coding* on that date, it is essential to know the tools and environment that you are going to be using and working on. Also important is to get a clear idea of what actually you will be writing the code for, what features your coded project will have, and how you plan to go about implementing them, making sure that your mentor and mentor organization are also clear on your goals and motives. So that's why the period till 23rd of May is called "The Community Bonding Period".

I'll be working with Python Software Foundation (PSF) to improve and analyze the code coverage of Python3k. I've already checkedout a local copy of Py3k's svn repository to get to know its directory structure and how things are managed. I've been on Python-dev mailing list for a long time, but it is time to start paying more attention to the discussions carried out on it. Also subscribed to Python-3000 and Python-ideas mailing lists.

An important resource for all the students working with PSF this year is the Python Developers Guide. It links to many resources for starting developers and reference material for a quick look up if you are stuck somewhere.

Python3k test suite is located in the Lib/tests directory. You can run all of the tests by running the command "./python Lib/tests/regrtest.py". Interestingly, two tests failed for me, test_distutils and test_socket. Running in verbose mode with the -v argument showed that distutils test was having some permission problem with the directories it created on my machine's /var/tmp directory. Interestingly, if I run the test as a root user, there is no problem as root doesn't need any kind of permissions to access a specific directory. But I am not supposed to run the tests as a root user. I've reported the problem on the Bug Tracker and hopefully the problem will be solved. The socket test had a weirder problem and I haven't dug it up yet.

Two important tools for python code coverage are figleaf and coverge.py. I'll be giving both the tools a detailed look because my summer project also involves comping up with a way for better report generation for Python code coverage.

Thanks to the folks on #python-dev on irc.freenode.net for listening to my questions regarding the source tree with patience. I hope to bug you guys even more, but only when it is necessary. ;)

My GSoC2009 Proposal

Lots of students that didn't make it this year into GSoC are requesting for access to accepted students' proposals so they can have a look at what the accepted proposals looked like. So I thought I would post my proposal here.

Title: Code coverage analysis of and improvements to Python3k Core's testing framework
Student: muhammad shuaib khan

Abstract: This project proposes improvements to and extension of the Python3k testing framework by adding more test cases for increased code coverage, integration of the C code with the Python code coverage, and easy to generate integrated report.

Content:

GSoC Proposal
Name: Muhammad Shuaib Khan
Email: aries.shuaib@gmail.com, shuaib.khan@niit.edu.pk
Phone: +923435286646

Title

Code coverage analysis of and improvements to Python3k Core's testing framework

Synopsis

Testing phase of any big software project is critically essential in order to ensure flawless execution of the tool in real world enterprise environment. Python is no exception, and being a widely used programming language to code programs that are of critical importance to organizations, it is even more essential to have a solid testing framework for the language which is easy to use, improve, extend, and covers all the aspects of the core code. Python3k has a testing infrastructure that needs to be extended and improved. This would prove to be useful for code coverage analysis for both experienced core developers and new developers to ensure that the changes they have made to the code didn't break other pieces of the software.

Benefits to Community:

The need for a solid testing framework for Python interpreter and its branches is something that has been getting a good amount of attention in the recent times. Python3k has a relatively well defined test-cases development methodology as mentioned in the document http://svn.python.org/projects/python/branches/py3k/Lib/test/README. But it is important to notice that not all code gets tested by the testcases that are already laid out. Measuring the code coverage both of the Py3k C code and the python code, and getting the code coverage percentage to a higher numeral by writing new tests for the standard library and the core itself will be part of this project. The community will greatly benefit from this as they'll have an easy to use testing framework to ensure that their code hasn't broken down anything, and the confidence that all the aspects of the code have been tested for successful execution.


Deliverables:

The project is aiming to improve the code coverage of Py3k by writing new tests, and make it easy to analyze the code coverage achieved. Keeping the main motive in mind, the following deliverables can be chalked out:

1- Study the existing tests implemented and identify parts/modules that need improved testing.
2- Write new test cases to improve the code coverage.
3- Do the test coverage on Linux and Windows platforms. (Would be nice if done for Mac OS X as well, but I do not have access to this platform)
4- Integration of the C Code coverage with the Python code coverage, and generating an integrated report.

Description:

Whenever new code is added to Python as a module or as functionality to an existing module, it is of essence to add the corresponding test code as well that tests the added functionality for situations where the code might be used in ways the programmer didn't intend it to be used. But this is not the way things get done always and there is a good amount of code in the Py3k that needs test cases. As a GSoC project, I intend to improve the situation by analyzing the code coverage and add more tests for the C code and the python code, integrate the tests for both, and provide easy to generate integrated report mechanism

The guideline for test additions documents three different ways of adding the test cases. The unittest bases tests, doctest based tests, and the traditional way of test cases that uses hackish ways of testing such as comparing the output of functionality to expected output by comparing the two and looking for mismatches.

Related work:

Professor Titus Brown (http://ivory.idyll.org/about.html) was kind enough to point me to some existing documentation related to this project. Brett Cannon has been writing invaluable posts on his blog (http://sayspy.blogspot.com/) about Python test coverage and ways of improving it.

Biographical Information:

- I am undergraduate Computer Software Engineering student at National University of Sciences and Technology (NUST) - Pakistan.
- I was selected as a Summer Student for CERN Openlab in Geneva, Switzerland (http://tinyurl.com/caapau), where I worked on developing a testing framework in Python for a Linux performance monitoring tool called "Perfmon".
- I have been writing about open source software and have been published numerous times on Linux.com
* Lguest: A simple virtualization platform for Linux (http://www.linux.com/feature/126293) Feb 20, 2008
* Chess engines for Linux (http://www.linux.com/feature/60859) Mar 22, 2007
* CLI Magic: Linux troubleshooting tools 101 (http://www.linux.com/feature/60136) Feb 19, 2007
* A survey of open source cluster management systems (http://www.linux.com/feature/57073) Sep 21, 2006
* Setting up a Condor cluster (http://www.linux.com/feature/56747) Sep 01, 2006
- I've won a number of national software project and programming competitions.
- I'm also a Teaching Assistant (TA) at a national university where I conduct lab sessions of undergraduate students for Fundamentals of Programming and Object Oriented Programming courses.
- At my home institute, I've also been involved with a research group that targeted the development of an operating system with built in Grid management facilities.

This is my last semester of my bachelors degree and I've less courses to take. Only one and a half hour of lectures on weekdays on average. This gives me the opportunity to dedicate suitable time to my GSoC project. I'm familiar to all the major communication ways on the Internet and can be found on IRC, Skype, IM, Twitter. I've a good response time on the email.

Having an eye on the GSoC ever since Google initiated it, but applying only this summer and getting accepted, indeed is a happy feeling.


For all those who couldn't make it this year, don't lose your morale. Get involved with your favorite opensource project, get to know the opensource community, get yourself a name, and try next year. You'll get in. 

I'll use this blog to document my progress on my GSoC project, which is with Python Software Foundation (PSF). The project formally begins on 23rd of May, but the time till then is a community bonding period during which I get to know my mentor organization, read documentation related to my project, get to know the code and how it is structured, etc. 

Keep visiting for updates. Or subscribe to the feed.

Testing...

This is just a test post.