Boxes v0.20

In the previous lesson we finally finished refactoring our layout() function, by moving most of the code into separate independent functions and rewrote what was left to allow for independent improvement of its parts.

Want to change how we decide line breaks? We hack fill_row(). Want to change the algorithm used to correct horizontal positioning of letters? We improve justify_row(). Our code has come a long way!

On the other hand, we introduced a bug.

lesson8.svg

In the third page, in the last line, the text is separated strangely. Those who are about my age probably have seen something like it on newspapers long ago. It's a case of trying to justify a line that really should have been kept left-justified.

There are not enough letters there to fill the whole width of the page, so things get ugly and very separate.

Further, there is no reason to justify that line. It's the last one in the text so it could just as easily be left-aligned like lines that end in newlines do.

Our text does not, however, end in a newline. Therefore it attempts full justification and fails badly.

The simplest solution is to make all text end with a newline. Here is the function that loads text files, create_text_boxes():

201def create_text_boxes(input_file):
202    p_and_p = open(input_file).read()
203    p_and_p = insert_soft_hyphens(p_and_p)  # Insert invisible hyphens
204    text_boxes = []
205    for letter in p_and_p:
206        text_boxes.append(Box(letter=letter, stretchy=letter == " "))
207    adjust_widths_by_letter(text_boxes)
208    return text_boxes

And the obvious fix:

201def create_text_boxes(input_file):
202    p_and_p = open(input_file).read()
203    p_and_p = insert_soft_hyphens(p_and_p)  # Insert invisible hyphens
204    if p_and_p[-1] != "\n":
205        p_and_p += "\n"
206    text_boxes = []
207    for letter in p_and_p:
208        text_boxes.append(Box(letter=letter, stretchy=letter == " "))
209    adjust_widths_by_letter(text_boxes)
210    return text_boxes

And because we are now serious people: the tests.

 1import boxes
 2
 3
 4def test_adds_newline(tmpdir):
 5    """Test that we add a newline if the input file doesn't end in one."""
 6    # Create a temporary file with context
 7    hello = tmpdir.mkdir("sub").join("hello.txt")
 8    hello.write("hello")
 9    # No newline
10    assert hello.read() == "hello"
11
12    text_boxes = boxes.create_text_boxes(hello)
13
14    # And now we have a newline (and a soft hyphen)
15    assert "".join([b.letter for b in text_boxes]) == "hel\xadlo\n"
16
17
18def test_not_add_newline_if_repeated(tmpdir):
19    """Test that we don't add a newline if the input file ends in one."""
20    # Create a temporary file with context
21    hello = tmpdir.mkdir("sub").join("hello.txt")
22    hello.write("hello\n")
23    # No newline
24    assert hello.read() == "hello\n"
25
26    text_boxes = boxes.create_text_boxes(hello)
27
28    # And now we have a newline (and a soft hyphen)
29    assert "".join([b.letter for b in text_boxes]) == "hel\xadlo\n"

If you look carefully at those tests, you will see some ... unusual things.

  • What is that tmpdir argument? Where is it coming from?
  • What is hello? Why can I write to it like a file?
  • If I am creating a temporary file, where is it? Does it go away?

In order:

It's a pytest fixture. Think of it as a helper. A fixture lets you easily set something up, use it, and have it be automatically cleaned up after you are done.

Often, when writing tests, you need to test against a file. Using tmpdir you can create as many files and directories as you want, and use them in your tests.

They are created somewhere in your system where temporary files go, and they are deleted when the test finishes.

Pytest has many other fixtures for common uses and you can even create your own. Learning about the builting fixtures is useful. There are also many other fixtures available on the Internet for things such as working with databases.

Did we fix our problem?

$ python lesson9/boxes.py ./pride-and-prejudice.txt lesson9.svg

lesson9.svg

Yes! On the other hand, if you look a few lines higher, you can see a line that says You have no compassion for my poor ner which in the original text says for my poor nerves." with a quote at the end.

Looks like we are actually missing a piece of text!

Can we create a test case that shows the same problem without it being the long text? Usually the smaller the sample that causes the problem, the easier it is to debug it.

1$ echo 'take delight in vexing me. You have no compassion for my poor nerves.”' > line.txt
2$ python boxes.py ./line.txt lesson9.1.svg --page-size=30x3

lesson9.1.svg

So, it seems (to me) like we may have a hyphenation problem. Surely there is some point in "nerves" where there could be a hyphen?

1>>> from hyphen import insert_soft_hyphens
2>>> insert_soft_hyphens('nerves', hyphen='-')
3
4'nerves'

Oops, looks like there isn't. But in this case, when we need to justify an overfull line... we don't. So that means there is a bug in justify_row()?

Here is what we are going to do. Now that we are somewhat convinced there is a bug, let's create a test that shows the bug. Then we fix the code so the test passes. That is called TDD, or Test Driven Development. You can read many books about it, but the basic gist is:

  • Find a bug
  • Write a failing test
  • Fix the code so the test doesn't fail
  • Refactor any ugliness you introduced
  • Repeat

We have the bug, here is the test:

161def test_justify_overfull(tmpdir):
162    """If a line is overfull, it still should be justified."""
163
164    separation = 0.05
165    page = boxes.Box(0, 0, 30, 50)
166    # Our failing text
167    inp = tmpdir.mkdir("sub").join("hello.txt")
168    inp.write(
169        "take delight in vexing me. You have no compassion for my poor nerves.\”"
170    )
171    # Adjust widths:
172    row = boxes.create_text_boxes(inp)
173    # Put side by side
174    boxes.fill_row(row[:], page, separation)
175
176    # Should be overwide
177    assert row[-1].x > page.w
178
179    boxes.justify_row(row, page, separation)
180
181    # Should now be justified
182    assert row[-1].x + row[-1].w == pytest.approx(page.w)

And it does indeed fail:

 1$ env PYTHONPATH=. pytest tests/test_justify_row.py::test_justify_overfull
 2
 3============================= test session starts ==============================
 4platform linux -- Python 3.6.4, pytest-3.5.0, py-1.5.3, pluggy-0.6.0
 5rootdir: part2/code/lesson9, inifile:
 6collected 1 item
 7tests/test_justify_row.py F                                              [100%]
 8
 9=================================== FAILURES ===================================
10____________________________ test_justify_overfull _____________________________
11
12[snip]
13
14        boxes.justify_row(row, page, separation)
15
16        # Should now be justified
17>       assert row[-1].x + row[-1].w == pytest.approx(page.w)
18E       assert (32.68750000000003 + 0.671875) == 30 ± 3.0e-05
19E        +  where 32.68750000000003 = Box(32.68750000000003, 0, 0.671875, 1, "\n").x
20E        +  and   0.671875 = Box(32.68750000000003, 0, 0.671875, 1, "\n").w
21E        +  and   30 ± 3.0e-05 = <function approx at 0x7f500f96d2f0>(30)
22E        +    where <function approx at 0x7f500f96d2f0> = pytest.approx
23E        +    and   30 = Box(0, 0, 30, 50, "x").w
24
25tests/test_justify_row.py:182: AssertionError
26=========================== 1 failed in 0.30 seconds ===========================

All that is left is fixing it, right?

One way to do this is to run that test and step through it using a debugger. Here is a session trying to do that. Just put a breakpoint like this in the test:

import pdb; pdb.set_trace()
 1$ env PYTHONPATH=. pytest tests/test_justify_row.py::test_justify_overfull
 2============================= test session starts ==============================
 3platform linux -- Python 3.6.4, pytest-3.5.0, py-1.5.3, pluggy-0.6.0
 4rootdir: code/lesson9, inifile:
 5collected 1 item
 6
 7tests/test_justify_row.py
 8>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
 9> tests/test_justify_row.py(164)test_justify_overfull()
10-> separation = 0.05
11(Pdb) n

Here, n means "run the next line".

 1> tests/test_justify_row.py(165)test_justify_overfull()
 2-> page = boxes.Box(0, 0, 30, 50)
 3(Pdb) n
 4> tests/test_justify_row.py(167)test_justify_overfull()
 5-> inp = tmpdir.mkdir("sub").join("hello.txt")
 6(Pdb) n
 7> tests/test_justify_row.py(168)test_justify_overfull()
 8-> inp.write(
 9(Pdb) n
10> tests/test_justify_row.py(169)test_justify_overfull()
11-> "take delight in vexing me. You have no compassion for my poor nerves.\”"
12(Pdb) n
13> tests/test_justify_row.py(172)test_justify_overfull()
14-> row = boxes.create_text_boxes(inp)
15(Pdb) n
16> tests/test_justify_row.py(174)test_justify_overfull()
17-> boxes.fill_row(row[:], page, separation)
18(Pdb) n
19> tests/test_justify_row.py(177)test_justify_overfull()
20-> assert row[-1].x > page.w
21(Pdb) n
22> tests/test_justify_row.py(179)test_justify_overfull()
23-> boxes.justify_row(row, page, separation)
24(Pdb) s

However s means "step into the next line". We are going to go inside justify_row()

--Call--
> boxes.py(59)justify_row()
-> def justify_row(row, page, separation):
(Pdb) n
> boxes.py(67)justify_row()
-> while row[-1].letter == " ":
(Pdb)
> boxes.py(71)justify_row()
-> if row[-1].letter == "\n":
(Pdb)
> boxes.py(72)justify_row()
-> return
(Pdb)
--Return--

Oh. So, it's not justifying the row because it ends in a newline. Looks like we do have a bug. We want to justify the rows that end in newline if they are wider than the page.

So here is the fix:

59def justify_row(row, page, separation):
60    """Given a row and a page, adjust position of elements in the row
61    so it fits the page width.
62
63    It modifies the contents of the row in place, so returns nothing.
64    """
65
66    # Remove all right-margin spaces
67    while row[-1].letter == " ":
68        row.pop()
69
70    # If the line ends in newline, and is underfull, do nothing.
71    if row[-1].letter == "\n" and (row[-1].x + row[-1].w) < (
72        page.x + page.w
73    ):
74        return

Which makes tests pass, and produces this output:

1$ echo 'take delight in vexing me. You have no compassion for my poor nerves.”' > line.txt
2$ python lesson9.1/boxes.py ./line.txt lesson9.2.svg --page-size=30x3

lesson9.2.svg

Which is bad. But in a different way. Clearly, we are breaking the line in a bad spot. We have mentioned before that our choice of breaking points sucks. Overfull lines are bad and justifying them by smushing things together looks bad.

In fact, it's different in two ways.

Besides the smushing, it doesn't look justified. There is space on the right! But our test says it's justified!

Not to keep you guessing, it turns out harfbuzz is giving newlines width. I have no idea if that is correct or wrong technically, but I know it's wrong for us.

One last fix: make sure newlines have zero width.

 1import harfbuzz as hb
 2import freetype2 as ft
 3
 4
 5def adjust_widths_by_letter(boxes):
 6    """Takes a list of boxes as arguments, and uses harfbuzz to
 7    adjust the width of each box to match the harfbuzz text shaping."""
 8    buf = hb.Buffer.create()
 9    buf.add_str("".join(b.letter for b in boxes))
10    buf.guess_segment_properties()
11    font_lib = ft.get_default_lib()
12    face = font_lib.find_face("Arial")
13    face.set_char_size(size=1, resolution=64)
14    font = hb.Font.ft_create(face)
15    hb.shape(font, buf)
16    # at this point buf.glyph_positions has all the data we need
17    for box, position in zip(boxes, buf.glyph_positions):
18        # For us, newlines should have zero width
19        if box.letter != "\n":
20            box.w = position.x_advance
21        else:
22            box.w = 0

And a final visual check:

1$ echo 'take delight in vexing me. You have no compassion for my poor nerves.”' > line.txt
2$ python lesson9.2/boxes.py ./line.txt lesson9.3.svg --page-size=30x3

lesson9.3.svg

$ python lesson9.2/boxes.py ./pride-and-prejudice.txt lesson9.4.svg

lesson9.4.svg

And next lesson we will choose our breaking points better.


results matching ""

    No results matching ""