We have turned it from a pile of functions into a working command line tool,
we have refactored a few things out of our most complex function,
and into a few smaller, testable functions and we have created automated
tests for them.
Now the time has come to consider the elephant in the room. We said we
needed to add tests and do refactoring because
layout() was too complex,
and we did. So we have to put our code on the table and fix bugs.
The main bug we have is that the choice of breaking points for rows is not optimal. Often, when I have to make algorithmic changes in a function, it helps to describe what it does in plain English.
If I can't, or the description is too long, that often means I need to refactor further.
Here is the function as it is now:
121def layout(_boxes, pages, separation): 122 """Layout boxes along pages. 123 124 Keep in mind that this function modifies the boxes themselves, so 125 you should be very careful about trying to call layout() more than once 126 on the same boxes. 127 128 Specifically, some spaces will become 0-width and not stretchy. 129 """ 130 131 # Because we modify the box list, we will work on a copy 132 boxes = _boxes[:] 133 # We start at page 0 134 page = 0 135 # The 1st box should be placed in the correct page 136 previous = boxes.pop(0) 137 previous.x = pages[page].x 138 previous.y = pages[page].y 139 row =  140 while boxes: 141 # We take the new 1st box 142 box = boxes.pop(0) 143 # And put it next to the other 144 box.x = previous.x + previous.w + separation 145 # At the same vertical location 146 box.y = previous.y 147 148 # Put the box in the row 149 row.append(box) 150 151 if is_breaking(box, pages[page]): 152 h_b = add_hyphen(row, separation) 153 if h_b: 154 _boxes.append(h_b) # So it's drawn 155 justify_row(row, pages[page], separation) 156 # We start a new row 157 row =  158 # We go all the way left and a little down 159 box.x = pages[page].x 160 box.y = previous.y + previous.h + separation 161 162 # But if we go too far down 163 if box.y + box.h > pages[page].y + pages[page].h: 164 # We go to the next page 165 page += 1 166 # And put the box at the top-left 167 box.x = pages[page].x 168 box.y = pages[page].y 169 170 # Collapse all left-margin space 171 if all(b.letter == " " for b in row): 172 box.w = 0 173 box.stretchy = False 174 box.x = pages[page].x 175 176 previous = box 177 178 # Remove leftover boxes 179 del (pages[page + 1:])
And here is my attempt at describing it:
- We take a list of boxes and one of pages (and a letter separation measure)
- Starting at the first page's top-left , we lay boxes one next to the other in a row
- If we need to break a row:
- We add a hyphen if needed
- Justify the row
- Start back from the left edge of the page
- If we have gone too far down and are off the page:
- We Move the row to the top-left of the next page
- Make that page our "current page"
- We collapse all left-margin space (WAT?)
- After we run out of boxes, we remove the unused pages
That is not too long. It does seem to point that we could split
layout() into an outer function handling pages and an inner one
filling a single page, but that's minor.
Also, the collapsing of left-margin space is in a weird place, but we can ignore it.
Keep in mind that what we have there is a description of what
does now and not necessarily of what it should do.
In fact, it seems clear that if we want to be able to choose the best breaking point for a row, we need to rethink how this function works and find a better way to do it. If the change is large enough, refactoring the algorithmic change may require a full rewrite of the function.
And that's why we did all the previous refactors extracting code out of
layout(). The more we extract, the less we have to replace.
Let's walk this path in the other direction. Let's speak about what
does and turn that into code doing it.
On a very high level way, we could say that what we are doing is just breaking text into lines and breaking the list of lines into pages. Therefore:
- We take a list of boxes and one of pages (and a letter separation measure)
- Find the best amount of boxes from our list of boxes we can fit in a page
- Put that row in the top-left of the page or below the last row.
- Repeat until we run out of page (or rows)
- Use the next page
- Repeat until we are out of rows (or pages)
If you compare with the previous description, you will see that this code is
not in a single loop, so it will look pretty different. I don't think the new
layout() will have much in common with the old one... except that it should do
more or less the same thing.
In fact, if we made the same decision about how many boxes we put in a row, it should behave the same way as the current version. Probably.
So, let's do it in three stages:
- We implement the new code with the same line breaking decisions we have now.
- We make sure we are not breaking something else
- We improve those decisions.
This may seem very conservative. Why not do both things at once?
Well, if I were doing this on my own, I probably would. But that would make the change a bit hard to follow, so I am trading space for simplicity.
Let's start with code to "Find the best amount of boxes from our list of boxes we can fit in a page width." This is supposed to be the same logic we had before but refactored into a separate function just like we have already done for other things:
- Pop letters one at a time.
- Put them in the row next to the last letter.
- If it's a good place to break, return that row.
115def fill_row(boxes, page, separation): 116 """Fill a row with elements removed from boxes. 117 118 The elements put in the row should be a good fit for laying out on 119 page considering separation. 120 """ 121 122 row =  123 x = page.x 124 while boxes: 125 b = boxes.pop(0) 126 row.append(b) 127 b.x = x 128 if is_breaking(b, page): 129 break 130 131 x = x + b.w + separation 132 return row
BTW, I did write some tests for it, you can see them in test_fill_row.py
And here is the new
layout() in all its smallness:
135def layout(_boxes, _pages, separation): 136 """Layout boxes along pages.""" 137 138 # We modify these lists, so use copies 139 boxes = _boxes[:] 140 pages = _pages[:] 141 142 # Start in page 0 143 current_page = pages.pop(0) 144 y = current_page.y 145 146 # If we run out of boxes or pages, stop 147 while boxes and pages: 148 # If this row would end below the page, advance to next page 149 if (y + boxes.h) > (current_page.y + current_page.h): 150 current_page = pages.pop(0) 151 y = current_page.y 152 # Put "enough" letters into row and out of boxes 153 row = fill_row(boxes, current_page, separation) 154 # Adjust box positions to fill the page width 155 justify_row(row, current_page, separation) 156 # Put all the letters in the right vertical position 157 y = y + row.h + separation 158 for b in row: 159 b.y = y 160 # Remove unused pages 161 del (_pages[-len(pages):])
I took code that added real hyphens at the end of rows broken on soft-hyphens
and moved it into
74 # If the line ends with a soft-hyphen, replace it with a real hyphen 75 elif row[-1].letter == "\xad": 76 # This looks pretty bad, doesn't it? 77 hyphen = hyphenbox() 78 row[-1].letter = hyphen.letter 79 row[-1].w = hyphen.w
And of course added a test for it called
test_add_hyphen() in test_justify_row.py. This means we are not using
add_hyphen() anywhere so I removed it.
Does it work?
$ python boxes.py ./pride-and-prejudice.txt lesson8.svg
Well, not perfectly. The last line is weirdly justified. But it's not bad for such a drastic change!
The problem with the last line is a mystery. Before we improve our line-breaking algorithm, we will have to fix it.
We may end up having some actual tests for