October 18, 2006

Specification

Still working on my specification, will post it to here once it has been completed.

Apparently I have nearly 2000 words in my posts (I have two in-progress entries: one following up on extract method, the other discussing some of the issues with rename), so hopefully the writing over the summer will pay off.

Hopefully the other people on Planet3yp will post their specifications.


September 24, 2006

Extract Method

Extract method is one of the core refactorings performed when cleaning up
a large function. A block of code is analysed for references to previously
existing variables, then moved into a new function with those variables passed
in. Any assignments that are later used are then returned from this new
function, which is trival to do with tuples in Python.

Fowler in Refactoring suggests alternatives for languages like Java: replace
temporary values with query methods or even creating a method object instead
of a new method (i.e. encapsulate the state changes inside a new class which has
a “calculate” or similar method. The absurdity of this style of programming
was commented on in this excellent article by Steve Yegge)

Anyway, here’s fun problem with performing extract method automatically: How to
propagate early exits?

def early_exit(i):
i = f(i) # Begin extract method
if i == 0:
return 2 # End extract method
i = g(i)
return i/2

In order to propagate the early return, we need to return a boolean indicating
such, but this means we must introduce temporaries, either with ugly automatic
naming to avoid conflicts:

_cond_1, _ret_1_true, _ret_1_false = extracted(i)
if _cond_1:
return _ret_1_true
else:
return _ret_1_false

or asking the user for suitable names.

This problem also extends to other control flow statements: break, continue,
yield (of course, raise isn’t a problem).

Yield is particularly noteworthy because the yield must remain in the original
function in order for the function to remain a generator i.e. any refactorings
around a yield must be transformed beyond simple code movement.

In Python 2.5, yield was extended to allow it to return a value inside the
generator, so data flow between a generator and its caller became bidirectional.

Another fun thing is detecting and rejecting invalid ranges of extract method:

def f():
g()
try:
h()
except Exception:
i()
finally:
j()
def another_function():
pass

Attempting to extract between any two of the function calls is invalid (apart
from moving the entire try/except/finally block). Similarly trying to extract
from j() into the new function definition is invalid.

EDIT: Fixed Python doc URL


September 10, 2006

Administrivia

Quick list of implementation details for my project:

Revision control system: Bazaar-ng

Language: Python 2.5. Final release should be very soon (i.e. next few days).
I like the language and library improvements over Python 2.4, details here.

Only drawback is availability: Ubuntu 6.10 (Edgy Eft) and Debian 3.1 (Etch) will
probably be the first distributions to ship with support for 2.5.

Initial target platform: Vim 7.0. It’s what my desktop runs and reportedly has
better features for writing things to assist the programmer. Emacs support
would be using Pymacs.

I will need to compile both Python 2.5 and Vim 7.0 for use in DCS as RHEL4 ships
with Python 2.3 (although Python 2.4 is locally available) and Vim 6.x.
Bzr will have to be installed to my account’s python install as well.

I shall be using the py.test tool/library for project wide testing (and probably some
other features from that library).


Testing

Whilst looking for code coverage utilities for Python, I discovered this
paper
by Brian Marick. He makes
some interesting points, although (unsuprisingly) similar to what’s in “The Art
of Software Testing” by Myers.

  • Line by line coverage is too simplistic, so you have to test for branches
    being taken.
  • When testing branches, you need to ensure that each component of the
    conditional is tested (short circuited evaluation).
  • Static code coverage can tell you only some things about code that is
    present, due to the halting problem (this point came from another paper).
    Dynamic code coverage can only describe code that gets run.

The example given was checking a function’s return value for FATAL_ERROR and
exiting or continuing. What is missing is that the function can also return
RECOVERABLE_ERROR which requires some remedial action before the program can
continue. This is an “error by omission”.

A more sophisticated tool would determine the dependency between the function
and the code that checks its return values and check that all possible classes
of values are returned and checked.

<ObMissingThePoint>Of course, they should be using exceptions rather than
FATAL_ERROR return values ;-)</>.

  • Don’t expect full coverage or think too much about whole program coverage.

It’s a misleading goal: testing may be clustered, with some modules heavily
tested and some overlooked (Marick uses the term “black holes”, crediting Rich
Conley).

Full coverage doesn’t guarantee correctness anyway: missing conditions and even
more importantly, side-effects means reordering operations may give different
results. Determining all valid reorderings is NP-complete (I think I read this
in one of the papers I’ve collected, I’ll verify this at some point…) and
then each of those permutations would have to be tested.

  • There exists a temptation to treat messages from the coverage tool(s) as
    commands (“make that statement evaluate true”) rather than hints (“you made
    some mistakes somewhere around there”). Marick advises against using code
    coverage in test design as the “return on your testing dollar (in terms of bugs
    found) is too low”.

Despite these problems, Marick still finds code coverage useful: “I wouldn’t
have written four coverage tools if I didn’t think they’re helpful. But they’re
only helpful if they’re used to enhance thought, not replace it.”

EDIT: Uploaded old version of document, stupid caching.


July 14, 2006

Expression(Const('Hello World'))

If you've ever revisited some code and "tidied it up", you've probably performed a refactoring. If the code continued to work afterwards, then it satisfies some definitions of refactoring.

Don Roberts's PhD. thesis was a stimulating read about specifying and implementing refactorings, albeit a bit thin in places – the claim that conservative static checking matched with liberal dynamic checking could produce exact results wasn't explored or justified.

Despite this, the paper was very useful, especially for two things:

  • The concept of extending the syntax of the language to produce a meta–language for pattern matching and specifying program transformations. The language used in the paper was smalltalk, so I need to think about a suitably pythonic version of this concept.
  • A formal basis for reasoning about refactorings:
A refactoring is an ordered triple R = (pre, T , P ) where pre is an assertion that must be true on a program for R to be legal, T is the program transformation, and P is a function from assertions to assertions that transforms legal assertions whenever T transforms programs.

This is later extended to reason about dependencies between refactorings using a superficially similar sounding method to what Darcs uses to represent changes. I've not properly read up on Darcs's patch theory, so at the moment I consider them similar because they both use commutativity to establish independence.

The title of this post is a Python Abstract Syntax Tree. ASTs appear to be the only sensible way of transforming a program, although converting the changes back into source code whilst preserving formatting and comments is challenging. Ideas gleaned from the thesis include extending the AST to have a Comment node or storing "textual coordinates" on the nodes – something like this is already stored in order to provide sensible error diagnostics.

At the moment, I am playing around with the Python standard library's compiler module to produce ASTs and writing Visitors to traverse them. A brief conversation in #pypy on freenode indicates that it might be worth using PyPy instead, one reason given was that I'll need to produce a flow graph in order to do refactoring properly and PyPy already does something along these lines.

Next step, read this thesis and this one (ftp://st.cs.uiuc.edu/pub/papers/refactoring/opdyke-thesis.ps.Z) as well as anything else that looks interesting on Martin Fowler's website on refactoring.


April 2023

Mo Tu We Th Fr Sa Su
Mar |  Today  |
               1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search this blog

Blog archive

Loading…
RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder
© MMXXIII