In our previous lesson, we moved some functionality from
justify_row() to simplify code, and we refactored some code
layout(), again to simplify code. This is janitorial work. We are not
adding anything to the code, we are just making it a nice place to be in.
Let's do that a little more.
We have a fairly rough piece of code in
layout() that calls attention:
134 if (box.letter == "\n") or (box.x + box.w) > ( 135 pages[page].x + pages[page].w 136 ) and box.letter in ( 137 " ", "\xad" 138 ): 139 h_b = add_hyphen(row, separation) 140 if h_b:
That is problematic for a few reasons, but the most important is: Try to say it in English.
My best try is:
If the letter is "\n" or it ends outside of the page and is a space or soft hyphen, then...
That is a mouthful. One of the reasons to refactor code out of a function is so that the code that is left is more readable, and also, more testable.
We could refactor that out like this:
if is_breaking(box, pages[page]):
That is readable.
And then we would create a
is_breaking() function that does the check, and
if we in the future decide to add new ways in which rows can break (on a tab?
On a zero width space?)
we can just modify that function and not make the layout() function even
So let's do that. Here is that function:
104 105def is_breaking(box, page): 106 """Decide if 'box' is a good candidate to be the end of a row 107 in the page.""" 108 # If it's a newline 109 if box.letter == "\n": 110 return True 111 112 # If we are too much to the right 113 if (box.x + box.w) > (page.x + page.w): 114 # And it's a breaking character: 115 if box.letter in BREAKING_CHARS: 116 return True 117 118 return False
Remember we are now in the stage in the evolution of this project where we want to have automatic tests for things. So, let's add some.
All the required tests would be of this form:
"Take a box with this letter in this position, and compare it with the
page. Based on the position, the letter and the page size,
is_breaking() should return
When you have many tests which would be the same code with different input, you can use a parametrized test and not write each test separately.
@pytest.mark.parametrize and pass two arguments:
- A string with a comma separated list of names which are the "input" of the test.
- A list of tuples, where each element of the tuple is a value for that input.
- In the test function, add one argument for each name from 1.
Then pytest will turn your single test into many tests, one with each input.
This is how it's done:
1import pytest 2import boxes 3 4 email@example.com( 6 "letter, position, expected", 7 [ 8 ("\n", 0, True), 9 ("\n", 50, True), 10 (" ", 0, False), 11 (" ", 50, True), 12 ("\xad", 0, False), 13 ("\xad", 50, True), 14 ], 15) 16def test_newline_is_breaking(letter, position, expected): 17 """Newlines break even if not too wide.""" 18 box = boxes.Box(letter=letter, x=position) 19 page = boxes.Box(w=30, h=30) 20 21 assert boxes.is_breaking(box, page) == expected
So, we have tests for
is_breaking(). Good. But we changed
that has no tests. We can't slack, we need to check it manually.
$ python boxes.py pride-and-prejudice.txt lesson7.svg
Yes, still looks good.
layout() function that started around 100 lines long,
is around 50 lines if you don't count comments. That is an improvement.
I think it's short enough that we can try improving it now.
So, in the next lesson, we will finally fix the problem described in Lesson 11 of Part 1:
We are often breaking the line in the first breaking point after it becomes overfull. Many times it would have been better to break in an earlier point where it was underfull instead.
Let's see if we can do it.