Recursive-descent in Python, using generators

Writing recursive descent parsers is easy, especially in Python (just like everything is easy in Python). But Python defines a low limit on the number of recursive calls that can be made, and recursive descent parsers are prone to exceed this limit.

We should thus write our parser without real recursion. Fortunately, Python offers us a way out: Coroutines, introduced in Python 2.5 as per PEP 342. Using coroutines and a simple trampoline function, we can convert every mutually recursive set of functions into a set of coroutines that require constant stack space.

Let’s say our parser looks like this (tokens being an iterator over characters):

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        return token
    elif token == '(':
         result = parse(tokens)
         if next(tokens) != ')': raise ...
         return result
         raise ...

We now apply the following transformations: return X => yield (X) parse() => (yield parse())

That is, we yield the value we want to return, and we yield a generator whenever we want to call (using a yield expression). Our rewritten parser reads:

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        yield token
    elif token == '(':
         result = yield parse(tokens)
         if next(tokens) != ')': raise ...
         return result
         raise ...

We obviously cannot call that generator like the previous function. What we need to introduce is a trampoline. A trampoline is a function that manages that yielded generators, calls them, and passes their result upwards. It looks like this:

def trampoline(generator):
    """Execute the generator using trampolining. """
    queue = collections.deque()

    def schedule(coroutine, stack=(), value=None):
        def resume():
            if 0:
                global prev
                now = stack_len(stack)
                if now < prev:
                    print("Length", now)
                    prev = -1
                elif prev != -1:
                    prev = now
            result = coroutine.send(value)
            if isinstance(result, types.GeneratorType):     # Normal call
                schedule(result, (coroutine, stack))
            elif isinstance(result, Tail):                  # Tail call (if you want to)
                schedule(result.value, stack)
            elif stack:                                     # Return to parent
                schedule(stack[0], stack[1], result)
            else:                                           # Final Return
                return result



    result = None
    while queue:
        func = queue.popleft()
        result = func()

    return result

This function is based on the code in PEP 342, the difference being that

For a more generic version of that, you might want to re-add exception passing, but the exceptions will then have long tracebacks, so I’m not sure how well they will work if you have deep recursion.

Now, the advantage of that is that our parser now requires constant stack space, because the actual real stack is stored in the heap using tuples which are used like singly-linked lists in scheme here. So, the only recursion limit is available memory.

A disadvantage of that transformation is that the code will run slightly slower for small inputs that could be handled using a normally recursive parser.

comments powered by Disqus