Boxes v0.18

In our previous lesson, we moved some functionality from layout() into justify_row() to simplify code, and we refactored some code in layout(), again to simplify code. This is janitorial work. We are not adding anything to the code, we are just making it a nice place to be in.

Let's do that a little more.

We have a fairly rough piece of code in layout() that calls attention:

134        if (box.letter == "\n") or (box.x + box.w) > (
135            pages[page].x + pages[page].w
136        ) and box.letter in (
137            " ", "\xad"
138        ):
139            h_b = add_hyphen(row, separation)
140            if h_b:

That is problematic for a few reasons, but the most important is: Try to say it in English.

My best try is:

If the letter is "\n" or it ends outside of the page and is a space or soft hyphen, then...

That is a mouthful. One of the reasons to refactor code out of a function is so that the code that is left is more readable, and also, more testable.

We could refactor that out like this:

if is_breaking(box, pages[page]):

That is readable.

And then we would create a is_breaking() function that does the check, and if we in the future decide to add new ways in which rows can break (on a tab? On a zero width space?) we can just modify that function and not make the layout() function even uglier.

So let's do that. Here is that function:

105def is_breaking(box, page):
106    """Decide if 'box' is a good candidate to be the end of a row
107    in the page."""
108    # If it's a newline
109    if box.letter == "\n":
110        return True
112    # If we are too much to the right
113    if (box.x + box.w) > (page.x + page.w):
114        # And it's a breaking character:
115        if box.letter in BREAKING_CHARS:
116            return True
118    return False

Remember we are now in the stage in the evolution of this project where we want to have automatic tests for things. So, let's add some.

All the required tests would be of this form:

"Take a box with this letter in this position, and compare it with the page. Based on the position, the letter and the page size, is_breaking() should return True or False"

When you have many tests which would be the same code with different input, you can use a parametrized test and not write each test separately.

You use @pytest.mark.parametrize and pass two arguments:

  1. A string with a comma separated list of names which are the "input" of the test.
  2. A list of tuples, where each element of the tuple is a value for that input.
  3. In the test function, add one argument for each name from 1.

Then pytest will turn your single test into many tests, one with each input.

This is how it's done:

 1import pytest
 2import boxes
 6    "letter, position, expected",
 7    [
 8        ("\n", 0, True),
 9        ("\n", 50, True),
10        (" ", 0, False),
11        (" ", 50, True),
12        ("\xad", 0, False),
13        ("\xad", 50, True),
14    ],
16def test_newline_is_breaking(letter, position, expected):
17    """Newlines break even if not too wide."""
18    box = boxes.Box(letter=letter, x=position)
19    page = boxes.Box(w=30, h=30)
21    assert boxes.is_breaking(box, page) == expected

So, we have tests for is_breaking(). Good. But we changed layout() that has no tests. We can't slack, we need to check it manually.

$ python pride-and-prejudice.txt lesson7.svg


Yes, still looks good.

Currently our layout() function that started around 100 lines long, is around 50 lines if you don't count comments. That is an improvement. I think it's short enough that we can try improving it now.

So, in the next lesson, we will finally fix the problem described in Lesson 11 of Part 1:

We are often breaking the line in the first breaking point after it becomes overfull. Many times it would have been better to break in an earlier point where it was underfull instead.

Let's see if we can do it.

results matching ""

    No results matching ""