Boxes v0.19

We have spent several lessons (2 3 4 5 6 7) improving our code.

We have turned it from a pile of functions into a working command line tool, we have refactored a few things out of our most complex function, layout() and into a few smaller, testable functions and we have created automated tests for them.

Now the time has come to consider the elephant in the room. We said we needed to add tests and do refactoring because layout() was too complex, and we did. So we have to put our code on the table and fix bugs.

The main bug we have is that the choice of breaking points for rows is not optimal. Often, when I have to make algorithmic changes in a function, it helps to describe what it does in plain English.

If I can't, or the description is too long, that often means I need to refactor further.

Here is the function as it is now:

121def layout(_boxes, pages, separation):
122    """Layout boxes along pages.
124    Keep in mind that this function modifies the boxes themselves, so
125    you should be very careful about trying to call layout() more than once
126    on the same boxes.
128    Specifically, some spaces will become 0-width and not stretchy.
129    """
131    # Because we modify the box list, we will work on a copy
132    boxes = _boxes[:]
133    # We start at page 0
134    page = 0
135    # The 1st box should be placed in the correct page
136    previous = boxes.pop(0)
137    previous.x = pages[page].x
138    previous.y = pages[page].y
139    row = []
140    while boxes:
141        # We take the new 1st box
142        box = boxes.pop(0)
143        # And put it next to the other
144        box.x = previous.x + previous.w + separation
145        # At the same vertical location
146        box.y = previous.y
148        # Put the box in the row
149        row.append(box)
151        if is_breaking(box, pages[page]):
152            h_b = add_hyphen(row, separation)
153            if h_b:
154                _boxes.append(h_b)  # So it's drawn
155            justify_row(row, pages[page], separation)
156            # We start a new row
157            row = []
158            # We go all the way left and a little down
159            box.x = pages[page].x
160            box.y = previous.y + previous.h + separation
162        # But if we go too far down
163        if box.y + box.h > pages[page].y + pages[page].h:
164            # We go to the next page
165            page += 1
166            # And put the box at the top-left
167            box.x = pages[page].x
168            box.y = pages[page].y
170        # Collapse all left-margin space
171        if all(b.letter == " " for b in row):
172            box.w = 0
173            box.stretchy = False
174            box.x = pages[page].x
176        previous = box
178    # Remove leftover boxes
179    del (pages[page + 1:])

And here is my attempt at describing it:

  • We take a list of boxes and one of pages (and a letter separation measure)
  • Starting at the first page's top-left , we lay boxes one next to the other in a row
  • If we need to break a row:
    • We add a hyphen if needed
    • Justify the row
    • Start back from the left edge of the page
  • If we have gone too far down and are off the page:
    • We Move the row to the top-left of the next page
    • Make that page our "current page"
  • We collapse all left-margin space (WAT?)
  • After we run out of boxes, we remove the unused pages

That is not too long. It does seem to point that we could split layout() into an outer function handling pages and an inner one filling a single page, but that's minor.

Also, the collapsing of left-margin space is in a weird place, but we can ignore it.

Keep in mind that what we have there is a description of what layout() does now and not necessarily of what it should do.

In fact, it seems clear that if we want to be able to choose the best breaking point for a row, we need to rethink how this function works and find a better way to do it. If the change is large enough, refactoring the algorithmic change may require a full rewrite of the function.

And that's why we did all the previous refactors extracting code out of layout(). The more we extract, the less we have to replace.

Let's walk this path in the other direction. Let's speak about what layout() does and turn that into code doing it.

On a very high level way, we could say that what we are doing is just breaking text into lines and breaking the list of lines into pages. Therefore:

  • We take a list of boxes and one of pages (and a letter separation measure)
  • Find the best amount of boxes from our list of boxes we can fit in a page width.
    • Put that row in the top-left of the page or below the last row.
    • Repeat until we run out of page (or rows)
  • Use the next page
    • Repeat until we are out of rows (or pages)

If you compare with the previous description, you will see that this code is not in a single loop, so it will look pretty different. I don't think the new layout() will have much in common with the old one... except that it should do more or less the same thing.

In fact, if we made the same decision about how many boxes we put in a row, it should behave the same way as the current version. Probably.

So, let's do it in three stages:

  1. We implement the new code with the same line breaking decisions we have now.
  2. We make sure we are not breaking something else
  3. We improve those decisions.

This may seem very conservative. Why not do both things at once?

Well, if I were doing this on my own, I probably would. But that would make the change a bit hard to follow, so I am trading space for simplicity.

Let's start with code to "Find the best amount of boxes from our list of boxes we can fit in a page width." This is supposed to be the same logic we had before but refactored into a separate function just like we have already done for other things:

  • Pop letters one at a time.
  • Put them in the row next to the last letter.
  • If it's a good place to break, return that row.
115def fill_row(boxes, page, separation):
116    """Fill a row with elements removed from boxes.
118    The elements put in the row should be a good fit for laying out on
119    page considering separation.
120    """
122    row = []
123    x = page.x
124    while boxes:
125        b = boxes.pop(0)
126        row.append(b)
127        b.x = x
128        if is_breaking(b, page):
129            break
131        x = x + b.w + separation
132    return row

BTW, I did write some tests for it, you can see them in

And here is the new layout() in all its smallness:

135def layout(_boxes, _pages, separation):
136    """Layout boxes along pages."""
138    # We modify these lists, so use copies
139    boxes = _boxes[:]
140    pages = _pages[:]
142    # Start in page 0
143    current_page = pages.pop(0)
144    y = current_page.y
146    # If we run out of boxes or pages, stop
147    while boxes and pages:
148        # If this row would end below the page, advance to next page
149        if (y + boxes[0].h) > (current_page.y + current_page.h):
150            current_page = pages.pop(0)
151            y = current_page.y
152        # Put "enough" letters into row and out of boxes
153        row = fill_row(boxes, current_page, separation)
154        # Adjust box positions to fill the page width
155        justify_row(row, current_page, separation)
156        # Put all the letters in the right vertical position
157        y = y + row[0].h + separation
158        for b in row:
159            b.y = y
160    # Remove unused pages
161    del (_pages[-len(pages):])

I took code that added real hyphens at the end of rows broken on soft-hyphens and moved it into justify_row():

74    # If the line ends with a soft-hyphen, replace it with a real hyphen
75    elif row[-1].letter == "\xad":
76        # This looks pretty bad, doesn't it?
77        hyphen = hyphenbox()
78        row[-1].letter = hyphen.letter
79        row[-1].w = hyphen.w

And of course added a test for it called test_add_hyphen() in This means we are not using add_hyphen() anywhere so I removed it.

Does it work?

$ python ./pride-and-prejudice.txt lesson8.svg


Well, not perfectly. The last line is weirdly justified. But it's not bad for such a drastic change!

The problem with the last line is a mystery. Before we improve our line-breaking algorithm, we will have to fix it.

We may end up having some actual tests for layout() now!

results matching ""

    No results matching ""