BOXES v0.11
In our previous lesson we promised we will improve the problem of overfull lines in our text layout engine.
The problem: A long string of letters without any hyphenation points or spaces causes a very long row to be fitted into a smaller page, causing letters to overlap. Or, if we are unable to break at all, we just spill to the right forever.
Underfull and Overfull
When a line has too few characters, slack is positive and we have to "spread" them, that is called underfull. The opposite, where a line has too many characters, slack is negative and we have to "smush" them, is called overfull.
As you probably expected... no changes in the Box class.
1from fonts import adjust_widths_by_letter
2from hyphen import insert_soft_hyphens
3
4
5class Box():
6
7 def __init__(self, x=0, y=0, w=1, h=1, stretchy=False, letter="x"):
8 """Accept arguments to define our box, and store them."""
9 self.x = x
10 self.y = y
11 self.w = w
12 self.h = h
13 self.stretchy = stretchy
14 self.letter = letter
15
16 def __repr__(self):
17 return 'Box(%s, %s, %s, %s, "%s")' % (
18 self.x, self.y, self.w, self.h, self.letter
19 )
Also no changes in how we draw things.
170import svgwrite
171
172
173def draw_boxes(boxes, fname, size, hide_boxes=False):
174 dwg = svgwrite.Drawing(fname, profile="full", size=size)
175 # Draw the pages
176 for page in pages:
177 dwg.add(
178 dwg.rect(
179 insert=(f"{page.x}cm", f"{page.y}cm"),
180 size=(f"{page.w}cm", f"{page.h}cm"),
181 fill="lightblue",
182 )
183 )
184 # Draw all the boxes
185 for box in boxes:
186 # The box color depends on its features
187 color = "green" if box.stretchy else "red"
188 # Make the colored boxes optional
189 if not hide_boxes:
190 dwg.add(
191 dwg.rect(
192 insert=(f"{box.x}cm", f"{box.y}cm"),
193 size=(f"{box.w}cm", f"{box.h}cm"),
194 fill=color,
195 )
196 )
197 # Display the letter in the box
198 if box.letter:
199 dwg.add(
200 dwg.text(
201 box.letter,
202 insert=(f"{box.x}cm", f"{box.y + box.h}cm"),
203 font_size=f"{box.h}cm",
204 font_family="Arial",
205 )
206 )
207 dwg.save()
So far, our algorithm to break lines is simple:
- If the next character is a newline: break
- If we are short, add to the line.
If the next character makes the line overfull:
a. If it's a space or a hyphen: break
b. If not, add to the line
If we broke the line, adjust positions and widths so the line is exactly as wide as the page by spreading the slack.
We need a better way to decide what is the best possible breaking point for the line. If it's too overfull, it will look bad. If it's too underfull, it will look bad too. And sometimes there is just no good breaking point and we need to do something.
If we were lucky enough to have a breaking point that exactly fills the line, then we would not have a problem, so usually this will come down to choosing between two breaking points. One makes the line overfull, one keeps it underfull.
How good is a breaking point?
- If the slack is smaller, it's better.
- If it's underfull it's slightly better than overfull, since it's better to have a spread out line than overlapping letters.
- If we have many spaces, then a bigger slack is acceptable.
This is all subjective, of course. How much movement we tolerate on the text is a judgment call, so it involves some trial and error.
Let's start with some made up numbers, because we have to start somewhere.
- The allowed positive slack is 20% of the width of the spaces in the line.
- The allowed negative slack is 10% of the width of the spaces in the line.
And here's a plan to implement it:
- When adding characters, check if it's a possible breaking point.
- If it is, remember the "goodness" it has
- When reaching an overfull breaking point, check if it's better than the last one we saw.
- If the overfull breaking point is better, break.
- If the overfull breaking point is worse, use the underfull breaking point.
The first new thing is a function to calculate how good a breaking point is, in those terms.
40def badness(page_width, row):
41 """Calculate how 'bad' a position to break is.
42
43 bigger is worse.
44 """
45 # Yes, this is suboptimal. It's easier to optimize working code
46 # than fixing fast code.
47 row_width = (row[-1].x + row[-1].w) - row[0].x
48 slack = page_width - row_width
49 stretchies = [b for b in row if b.stretchy]
50 if len(stretchies) > 0:
51 stretchies_width = sum(s.w for s in stretchies)
52 # More stetchy space is good. More slack is bad.
53 badness = slack / stretchies_width
54 else: # Nothing to stretch. Not good.
55 badness = 1000
56 if slack < 0:
57 # Arbitrary fudge factor, negative slack is THIS much worse
58 badness *= -2
59 return badness
And we can see how it works:
1# 10 boxes of width 1, 1 stretchy in the middle
2boxes = [Box(x=i, w=1) for i in range(10)]
3boxes[5].stretchy = True
4
5# On a page that is 8 units wide:
6print(badness(8, boxes))
4.0
Let's see how that works for page widths between 5 and 15 (remember our row is 10 units wide):
1for w in range(5,15):
2 print('page_width:', w, ' -> badness:', badness(w, boxes))
As you can see, if the page was 10 units wide, it would be optimal. The second best option is for the page to be slightly wider, then maybe slightly thinner and so on.
page_width: 5 -> badness: 10.0
page_width: 6 -> badness: 8.0
page_width: 7 -> badness: 6.0
page_width: 8 -> badness: 4.0
page_width: 9 -> badness: 2.0
page_width: 10 -> badness: 0.0
page_width: 11 -> badness: 1.0
page_width: 12 -> badness: 2.0
page_width: 13 -> badness: 3.0
page_width: 14 -> badness: 4.0
We will need to load data that shows the problem. In this case, it's a row of 20 letters 'a' (without hyphens), a space, then 20, and then 30 more 'a's.
Why?
Like before, the page is about wide enough to fix 58 "a"s. That means the first run will not be enough to fill the line. The second run will still not be enough. The third run will, however, badly overfill it. So, we should go all the way to the end, see that it's too long, and then go back to the second space and break there.
22# Three runs of "a" with spaces between them.
23# The ideal breaking point is the second space.
24text_boxes = [Box(letter="a") for a in range(72)]
25for i in (20, 50):
26 text_boxes[i].letter = " "
27 text_boxes[i].stretchy = True
28adjust_widths_by_letter(text_boxes)
29
30# A few pages all the same size
31pages = [Box(i * 35, 0, 30, 50) for i in range(10)]
And now we have a problem. The change to our code needed to implement that is too large. It's so large that it amounts to a rewrite of our layout function and if we rewrite it, how would we be sure that we are not breaking all the things we achieved previously?
It turns out this is as far as just sitting down and writing code will take us. We need to get more serious, and transform this small pile of fragile code into something we can actually hack confidently and transform without fear of destroying it.
Therefore, we leave layout
intact, show the failing test and move on to
Part 2 of the book, where we will reorganize the code into a coherent
software package and then... we will try again.
62# We add a "separation" constant so you can see the boxes individually
63separation = .05
64
65
66def layout(_boxes):
67 """Layout boxes along pages.
68
69 Keep in mind that this function modifies the boxes themselves, so
70 you should be very careful about trying to call layout() more than once
71 on the same boxes.
72
73 Specifically, some spaces will become 0-width and not stretchy.
74 """
75
76 # Because we modify the box list, we will work on a copy
77 boxes = _boxes[:]
78 # We start at page 0
79 page = 0
80 # The 1st box should be placed in the correct page
81 previous = boxes.pop(0)
82 previous.x = pages[page].x
83 previous.y = pages[page].y
84 row = []
85 while boxes:
86 # We take the new 1st box
87 box = boxes.pop(0)
88 # And put it next to the other
89 box.x = previous.x + previous.w + separation
90 # At the same vertical location
91 box.y = previous.y
92
93 # Handle breaking on newlines
94 break_line = False
95 # But if it's a newline
96 if (box.letter == "\n"):
97 break_line = True
98 # Newlines take no horizontal space ever
99 box.w = 0
100 box.stretchy = False
101
102 # Or if it's too far to the right, and is a
103 # good place to break the line...
104 elif (box.x + box.w) > (
105 pages[page].x + pages[page].w
106 ) and box.letter in (
107 " ", "\xad"
108 ):
109 if box.letter == "\xad":
110 # Add a visible hyphen in the row
111 h_b = hyphenbox()
112 h_b.x = previous.x + previous.w + separation
113 h_b.y = previous.y
114 _boxes.append(h_b) # So it's drawn
115 row.append(h_b) # So it's justified
116 break_line = True
117 # We adjust the row
118 # Remove all right-margin spaces
119 while row[-1].letter == " ":
120 row.pop()
121 slack = (pages[page].x + pages[page].w) - (
122 row[-1].x + row[-1].w
123 )
124 # Get a list of all the ones that are stretchy
125 stretchies = [b for b in row if b.stretchy]
126 if not stretchies: # Nothing stretches do as before.
127 bump = slack / len(row)
128 # The 1st box gets 0 bumps, the 2nd gets 1 and so on
129 for i, b in enumerate(row):
130 b.x += bump * i
131 else:
132 bump = slack / len(stretchies)
133 # Each stretchy gets wider
134 for b in stretchies:
135 b.w += bump
136 # And we put each thing next to the previous one
137 for j, b in enumerate(row[1:], 1):
138 b.x = row[j - 1].x + row[j - 1].w + separation
139
140 if break_line:
141 # We start a new row
142 row = []
143 # We go all the way left and a little down
144 box.x = pages[page].x
145 box.y = previous.y + previous.h + separation
146
147 # But if we go too far down
148 if box.y + box.h > pages[page].y + pages[page].h:
149 # We go to the next page
150 page += 1
151 # And put the box at the top-left
152 box.x = pages[page].x
153 box.y = pages[page].y
154
155 # Put the box in the row
156 row.append(box)
157
158 # Collapse all left-margin space
159 if all(b.letter == " " for b in row):
160 box.w = 0
161 box.stretchy = False
162 box.x = pages[page].x
163
164 previous = box
165
166
167layout(text_boxes)
210draw_boxes(text_boxes, "lesson11.svg", ("33cm", "5cm"), hide_boxes=True)
Further references:
- Full source code for this lesson lesson11.py
- Difference with code from last lesson