Boxes v0.20
In the previous lesson we finally finished refactoring our
layout()
function, by moving most of the code into separate independent
functions and rewrote what was left to allow for independent improvement of
its parts.
Want to change how we decide line breaks? We hack fill_row()
. Want to change
the algorithm used to correct horizontal positioning of letters? We improve
justify_row()
. Our code has come a long way!
On the other hand, we introduced a bug.
In the third page, in the last line, the text is separated strangely. Those who are about my age probably have seen something like it on newspapers long ago. It's a case of trying to justify a line that really should have been kept left-justified.
There are not enough letters there to fill the whole width of the page, so things get ugly and very separate.
Further, there is no reason to justify that line. It's the last one in the text so it could just as easily be left-aligned like lines that end in newlines do.
Our text does not, however, end in a newline. Therefore it attempts full justification and fails badly.
The simplest solution is to make all text end with a newline. Here is the function that loads text files, create_text_boxes()
:
201def create_text_boxes(input_file):
202 p_and_p = open(input_file).read()
203 p_and_p = insert_soft_hyphens(p_and_p) # Insert invisible hyphens
204 text_boxes = []
205 for letter in p_and_p:
206 text_boxes.append(Box(letter=letter, stretchy=letter == " "))
207 adjust_widths_by_letter(text_boxes)
208 return text_boxes
And the obvious fix:
201def create_text_boxes(input_file):
202 p_and_p = open(input_file).read()
203 p_and_p = insert_soft_hyphens(p_and_p) # Insert invisible hyphens
204 if p_and_p[-1] != "\n":
205 p_and_p += "\n"
206 text_boxes = []
207 for letter in p_and_p:
208 text_boxes.append(Box(letter=letter, stretchy=letter == " "))
209 adjust_widths_by_letter(text_boxes)
210 return text_boxes
And because we are now serious people: the tests.
1import boxes
2
3
4def test_adds_newline(tmpdir):
5 """Test that we add a newline if the input file doesn't end in one."""
6 # Create a temporary file with context
7 hello = tmpdir.mkdir("sub").join("hello.txt")
8 hello.write("hello")
9 # No newline
10 assert hello.read() == "hello"
11
12 text_boxes = boxes.create_text_boxes(hello)
13
14 # And now we have a newline (and a soft hyphen)
15 assert "".join([b.letter for b in text_boxes]) == "hel\xadlo\n"
16
17
18def test_not_add_newline_if_repeated(tmpdir):
19 """Test that we don't add a newline if the input file ends in one."""
20 # Create a temporary file with context
21 hello = tmpdir.mkdir("sub").join("hello.txt")
22 hello.write("hello\n")
23 # No newline
24 assert hello.read() == "hello\n"
25
26 text_boxes = boxes.create_text_boxes(hello)
27
28 # And now we have a newline (and a soft hyphen)
29 assert "".join([b.letter for b in text_boxes]) == "hel\xadlo\n"
If you look carefully at those tests, you will see some ... unusual things.
- What is that
tmpdir
argument? Where is it coming from? - What is
hello
? Why can I write to it like a file? - If I am creating a temporary file, where is it? Does it go away?
In order:
tmpdir
is a pytest fixture. Think
of it as a helper. A fixture lets you easily set something up, use it, and
have it be automatically cleaned up after you are done.
Often, when writing tests, you need to test against a file. Using tmpdir you can create as many files and directories as you want, and use them in your tests.
They are created somewhere in your system where temporary files go, and they are deleted when the test finishes.
Pytest has many other fixtures for common uses and you can even create your own. Learning about the builting fixtures is useful. There are also many other fixtures available on the Internet for things such as working with databases.
Did we fix our problem?
$ python lesson9/boxes.py ./pride-and-prejudice.txt lesson9.svg
Yes! On the other hand, if you look a few lines higher, you can see a line
that says You have no compassion for my poor ner
which in the original text
says for my poor nerves."
with a quote at the end.
Looks like we are actually missing a piece of text!
Can we create a test case that shows the same problem without it being the long text? Usually the smaller the sample that causes the problem, the easier it is to debug it.
1$ echo 'take delight in vexing me. You have no compassion for my poor nerves.”' > line.txt
2$ python boxes.py ./line.txt lesson9.1.svg --page-size=30x3
So, it seems (to me) like we may have a hyphenation problem. Surely there is some point in "nerves" where there could be a hyphen?
1>>> from hyphen import insert_soft_hyphens
2>>> insert_soft_hyphens('nerves', hyphen='-')
3
4'nerves'
Oops, looks like there isn't. But in this case, when we need to justify an
overfull line... we don't. So that means there is a bug in justify_row()
?
Here is what we are going to do. Now that we are somewhat convinced there is a bug, let's create a test that shows the bug. Then we fix the code so the test passes. That is called TDD, or Test Driven Development. You can read many books about it, but the basic gist is:
- Find a bug
- Write a failing test
- Fix the code so the test doesn't fail
- Refactor any ugliness you introduced
- Repeat
We have the bug, here is the test:
161def test_justify_overfull(tmpdir):
162 """If a line is overfull, it still should be justified."""
163
164 separation = 0.05
165 page = boxes.Box(0, 0, 30, 50)
166 # Our failing text
167 inp = tmpdir.mkdir("sub").join("hello.txt")
168 inp.write(
169 "take delight in vexing me. You have no compassion for my poor nerves.\”"
170 )
171 # Adjust widths:
172 row = boxes.create_text_boxes(inp)
173 # Put side by side
174 boxes.fill_row(row[:], page, separation)
175
176 # Should be overwide
177 assert row[-1].x > page.w
178
179 boxes.justify_row(row, page, separation)
180
181 # Should now be justified
182 assert row[-1].x + row[-1].w == pytest.approx(page.w)
And it does indeed fail:
1$ env PYTHONPATH=. pytest tests/test_justify_row.py::test_justify_overfull
2
3============================= test session starts ==============================
4platform linux -- Python 3.6.4, pytest-3.5.0, py-1.5.3, pluggy-0.6.0
5rootdir: part2/code/lesson9, inifile:
6collected 1 item
7tests/test_justify_row.py F [100%]
8
9=================================== FAILURES ===================================
10____________________________ test_justify_overfull _____________________________
11
12[snip]
13
14 boxes.justify_row(row, page, separation)
15
16 # Should now be justified
17> assert row[-1].x + row[-1].w == pytest.approx(page.w)
18E assert (32.68750000000003 + 0.671875) == 30 ± 3.0e-05
19E + where 32.68750000000003 = Box(32.68750000000003, 0, 0.671875, 1, "\n").x
20E + and 0.671875 = Box(32.68750000000003, 0, 0.671875, 1, "\n").w
21E + and 30 ± 3.0e-05 = <function approx at 0x7f500f96d2f0>(30)
22E + where <function approx at 0x7f500f96d2f0> = pytest.approx
23E + and 30 = Box(0, 0, 30, 50, "x").w
24
25tests/test_justify_row.py:182: AssertionError
26=========================== 1 failed in 0.30 seconds ===========================
All that is left is fixing it, right?
One way to do this is to run that test and step through it using a debugger. Here is a session trying to do that. Just put a breakpoint like this in the test:
import pdb; pdb.set_trace()
1$ env PYTHONPATH=. pytest tests/test_justify_row.py::test_justify_overfull
2============================= test session starts ==============================
3platform linux -- Python 3.6.4, pytest-3.5.0, py-1.5.3, pluggy-0.6.0
4rootdir: code/lesson9, inifile:
5collected 1 item
6
7tests/test_justify_row.py
8>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
9> tests/test_justify_row.py(164)test_justify_overfull()
10-> separation = 0.05
11(Pdb) n
Here, n
means "run the next line".
1> tests/test_justify_row.py(165)test_justify_overfull()
2-> page = boxes.Box(0, 0, 30, 50)
3(Pdb) n
4> tests/test_justify_row.py(167)test_justify_overfull()
5-> inp = tmpdir.mkdir("sub").join("hello.txt")
6(Pdb) n
7> tests/test_justify_row.py(168)test_justify_overfull()
8-> inp.write(
9(Pdb) n
10> tests/test_justify_row.py(169)test_justify_overfull()
11-> "take delight in vexing me. You have no compassion for my poor nerves.\”"
12(Pdb) n
13> tests/test_justify_row.py(172)test_justify_overfull()
14-> row = boxes.create_text_boxes(inp)
15(Pdb) n
16> tests/test_justify_row.py(174)test_justify_overfull()
17-> boxes.fill_row(row[:], page, separation)
18(Pdb) n
19> tests/test_justify_row.py(177)test_justify_overfull()
20-> assert row[-1].x > page.w
21(Pdb) n
22> tests/test_justify_row.py(179)test_justify_overfull()
23-> boxes.justify_row(row, page, separation)
24(Pdb) s
However s
means "step into the next line". We are going to go inside
justify_row()
--Call--
> boxes.py(59)justify_row()
-> def justify_row(row, page, separation):
(Pdb) n
> boxes.py(67)justify_row()
-> while row[-1].letter == " ":
(Pdb)
> boxes.py(71)justify_row()
-> if row[-1].letter == "\n":
(Pdb)
> boxes.py(72)justify_row()
-> return
(Pdb)
--Return--
Oh. So, it's not justifying the row because it ends in a newline. Looks like we do have a bug. We want to justify the rows that end in newline if they are wider than the page.
So here is the fix:
59def justify_row(row, page, separation):
60 """Given a row and a page, adjust position of elements in the row
61 so it fits the page width.
62
63 It modifies the contents of the row in place, so returns nothing.
64 """
65
66 # Remove all right-margin spaces
67 while row[-1].letter == " ":
68 row.pop()
69
70 # If the line ends in newline, and is underfull, do nothing.
71 if row[-1].letter == "\n" and (row[-1].x + row[-1].w) < (
72 page.x + page.w
73 ):
74 return
Which makes tests pass, and produces this output:
1$ echo 'take delight in vexing me. You have no compassion for my poor nerves.”' > line.txt
2$ python lesson9.1/boxes.py ./line.txt lesson9.2.svg --page-size=30x3
Which is bad. But in a different way. Clearly, we are breaking the line in a bad spot. We have mentioned before that our choice of breaking points sucks. Overfull lines are bad and justifying them by smushing things together looks bad.
In fact, it's different in two ways.
Besides the smushing, it doesn't look justified. There is space on the right! But our test says it's justified!
Not to keep you guessing, it turns out harfbuzz is giving newlines width. I have no idea if that is correct or wrong technically, but I know it's wrong for us.
One last fix: make sure newlines have zero width.
1import harfbuzz as hb
2import freetype2 as ft
3
4
5def adjust_widths_by_letter(boxes):
6 """Takes a list of boxes as arguments, and uses harfbuzz to
7 adjust the width of each box to match the harfbuzz text shaping."""
8 buf = hb.Buffer.create()
9 buf.add_str("".join(b.letter for b in boxes))
10 buf.guess_segment_properties()
11 font_lib = ft.get_default_lib()
12 face = font_lib.find_face("Arial")
13 face.set_char_size(size=1, resolution=64)
14 font = hb.Font.ft_create(face)
15 hb.shape(font, buf)
16 # at this point buf.glyph_positions has all the data we need
17 for box, position in zip(boxes, buf.glyph_positions):
18 # For us, newlines should have zero width
19 if box.letter != "\n":
20 box.w = position.x_advance
21 else:
22 box.w = 0
And a final visual check:
1$ echo 'take delight in vexing me. You have no compassion for my poor nerves.”' > line.txt
2$ python lesson9.2/boxes.py ./line.txt lesson9.3.svg --page-size=30x3
$ python lesson9.2/boxes.py ./pride-and-prejudice.txt lesson9.4.svg
And next lesson we will choose our breaking points better.
- Full source code for this lesson boxes.py
- Difference with code from last lesson