BOXES v0.11

In our previous lesson we promised we will improve the problem of overfull lines in our text layout engine.

The problem: A long string of letters without any hyphenation points or spaces causes a very long row to be fitted into a smaller page, causing letters to overlap. Or, if we are unable to break at all, we just spill to the right forever.

Underfull and Overfull

When a line has too few characters, slack is positive and we have to "spread" them, that is called underfull. The opposite, where a line has too many characters, slack is negative and we have to "smush" them, is called overfull.

lesson10_one_break.svg

As you probably expected... no changes in the Box class.

 1from fonts import adjust_widths_by_letter
 2from hyphen import insert_soft_hyphens
 3
 4
 5class Box():
 6
 7    def __init__(self, x=0, y=0, w=1, h=1, stretchy=False, letter="x"):
 8        """Accept arguments to define our box, and store them."""
 9        self.x = x
10        self.y = y
11        self.w = w
12        self.h = h
13        self.stretchy = stretchy
14        self.letter = letter
15
16    def __repr__(self):
17        return 'Box(%s, %s, %s, %s, "%s")' % (
18            self.x, self.y, self.w, self.h, self.letter
19        )

Also no changes in how we draw things.

170import svgwrite
171
172
173def draw_boxes(boxes, fname, size, hide_boxes=False):
174    dwg = svgwrite.Drawing(fname, profile="full", size=size)
175    # Draw the pages
176    for page in pages:
177        dwg.add(
178            dwg.rect(
179                insert=(f"{page.x}cm", f"{page.y}cm"),
180                size=(f"{page.w}cm", f"{page.h}cm"),
181                fill="lightblue",
182            )
183        )
184    # Draw all the boxes
185    for box in boxes:
186        # The box color depends on its features
187        color = "green" if box.stretchy else "red"
188        # Make the colored boxes optional
189        if not hide_boxes:
190            dwg.add(
191                dwg.rect(
192                    insert=(f"{box.x}cm", f"{box.y}cm"),
193                    size=(f"{box.w}cm", f"{box.h}cm"),
194                    fill=color,
195                )
196            )
197        # Display the letter in the box
198        if box.letter:
199            dwg.add(
200                dwg.text(
201                    box.letter,
202                    insert=(f"{box.x}cm", f"{box.y + box.h}cm"),
203                    font_size=f"{box.h}cm",
204                    font_family="Arial",
205                )
206            )
207    dwg.save()

So far, our algorithm to break lines is simple:

  1. If the next character is a newline: break
  2. If we are short, add to the line.
  3. If the next character makes the line overfull: a. If it's a space or a hyphen: break b. If not, add to the line
  4. If we broke the line, adjust positions and widths so the line is exactly as wide as the page by spreading the slack.

We need a better way to decide what is the best possible breaking point for the line. If it's too overfull, it will look bad. If it's too underfull, it will look bad too. And sometimes there is just no good breaking point and we need to do something.

If we were lucky enough to have a breaking point that exactly fills the line, then we would not have a problem, so usually this will come down to choosing between two breaking points. One makes the line overfull, one keeps it underfull.

How good is a breaking point?

  • If the slack is smaller, it's better.
  • If it's underfull it's slightly better than overfull, since it's better to have a spread out line than overlapping letters.
  • If we have many spaces, then a bigger slack is acceptable.

This is all subjective, of course. How much movement we tolerate on the text is a judgment call, so it involves some trial and error.

Let's start with some made up numbers, because we have to start somewhere.

  • The allowed positive slack is 20% of the width of the spaces in the line.
  • The allowed negative slack is 10% of the width of the spaces in the line.

And here's a plan to implement it:

  • When adding characters, check if it's a possible breaking point.
  • If it is, remember the "goodness" it has
  • When reaching an overfull breaking point, check if it's better than the last one we saw.
  • If the overfull breaking point is better, break.
  • If the overfull breaking point is worse, use the underfull breaking point.

The first new thing is a function to calculate how good a breaking point is, in those terms.

40def badness(page_width, row):
41    """Calculate how 'bad' a position to break is.
42
43    bigger is worse.
44    """
45    # Yes, this is suboptimal. It's easier to optimize working code
46    # than fixing fast code.
47    row_width = (row[-1].x + row[-1].w) - row[0].x
48    slack = page_width - row_width
49    stretchies = [b for b in row if b.stretchy]
50    if len(stretchies) > 0:
51        stretchies_width = sum(s.w for s in stretchies)
52        # More stetchy space is good. More slack is bad.
53        badness = slack / stretchies_width
54    else:  # Nothing to stretch. Not good.
55        badness = 1000
56    if slack < 0:
57        # Arbitrary fudge factor, negative slack is THIS much worse
58        badness *= -2
59    return badness

And we can see how it works:

1# 10 boxes of width 1, 1 stretchy in the middle
2boxes = [Box(x=i, w=1) for i in range(10)]
3boxes[5].stretchy = True
4
5# On a page that is 8 units wide:
6print(badness(8, boxes))
4.0

Let's see how that works for page widths between 5 and 15 (remember our row is 10 units wide):

1for w in range(5,15):
2  print('page_width:', w, ' -> badness:', badness(w, boxes))

As you can see, if the page was 10 units wide, it would be optimal. The second best option is for the page to be slightly wider, then maybe slightly thinner and so on.

page_width: 5  -> badness: 10.0
page_width: 6  -> badness: 8.0
page_width: 7  -> badness: 6.0
page_width: 8  -> badness: 4.0
page_width: 9  -> badness: 2.0
page_width: 10  -> badness: 0.0
page_width: 11  -> badness: 1.0
page_width: 12  -> badness: 2.0
page_width: 13  -> badness: 3.0
page_width: 14  -> badness: 4.0

We will need to load data that shows the problem. In this case, it's a row of 20 letters 'a' (without hyphens), a space, then 20, and then 30 more 'a's.

Why?

Like before, the page is about wide enough to fix 58 "a"s. That means the first run will not be enough to fill the line. The second run will still not be enough. The third run will, however, badly overfill it. So, we should go all the way to the end, see that it's too long, and then go back to the second space and break there.

22# Three runs of "a" with spaces between them.
23# The ideal breaking point is the second space.
24text_boxes = [Box(letter="a") for a in range(72)]
25for i in (20, 50):
26    text_boxes[i].letter = " "
27    text_boxes[i].stretchy = True
28adjust_widths_by_letter(text_boxes)
29
30# A few pages all the same size
31pages = [Box(i * 35, 0, 30, 50) for i in range(10)]

And now we have a problem. The change to our code needed to implement that is too large. It's so large that it amounts to a rewrite of our layout function and if we rewrite it, how would we be sure that we are not breaking all the things we achieved previously?

It turns out this is as far as just sitting down and writing code will take us. We need to get more serious, and transform this small pile of fragile code into something we can actually hack confidently and transform without fear of destroying it.

Therefore, we leave layout intact, show the failing test and move on to Part 2 of the book, where we will reorganize the code into a coherent software package and then... we will try again.

 62# We add a "separation" constant so you can see the boxes individually
 63separation = .05
 64
 65
 66def layout(_boxes):
 67    """Layout boxes along pages.
 68
 69    Keep in mind that this function modifies the boxes themselves, so
 70    you should be very careful about trying to call layout() more than once
 71    on the same boxes.
 72
 73    Specifically, some spaces will become 0-width and not stretchy.
 74    """
 75
 76    # Because we modify the box list, we will work on a copy
 77    boxes = _boxes[:]
 78    # We start at page 0
 79    page = 0
 80    # The 1st box should be placed in the correct page
 81    previous = boxes.pop(0)
 82    previous.x = pages[page].x
 83    previous.y = pages[page].y
 84    row = []
 85    while boxes:
 86        # We take the new 1st box
 87        box = boxes.pop(0)
 88        # And put it next to the other
 89        box.x = previous.x + previous.w + separation
 90        # At the same vertical location
 91        box.y = previous.y
 92
 93        # Handle breaking on newlines
 94        break_line = False
 95        # But if it's a newline
 96        if (box.letter == "\n"):
 97            break_line = True
 98            # Newlines take no horizontal space ever
 99            box.w = 0
100            box.stretchy = False
101
102        # Or if it's too far to the right, and is a
103        # good place to break the line...
104        elif (box.x + box.w) > (
105            pages[page].x + pages[page].w
106        ) and box.letter in (
107            " ", "\xad"
108        ):
109            if box.letter == "\xad":
110                # Add a visible hyphen in the row
111                h_b = hyphenbox()
112                h_b.x = previous.x + previous.w + separation
113                h_b.y = previous.y
114                _boxes.append(h_b)  # So it's drawn
115                row.append(h_b)  # So it's justified
116            break_line = True
117            # We adjust the row
118            # Remove all right-margin spaces
119            while row[-1].letter == " ":
120                row.pop()
121            slack = (pages[page].x + pages[page].w) - (
122                row[-1].x + row[-1].w
123            )
124            # Get a list of all the ones that are stretchy
125            stretchies = [b for b in row if b.stretchy]
126            if not stretchies:  # Nothing stretches do as before.
127                bump = slack / len(row)
128                # The 1st box gets 0 bumps, the 2nd gets 1 and so on
129                for i, b in enumerate(row):
130                    b.x += bump * i
131            else:
132                bump = slack / len(stretchies)
133                # Each stretchy gets wider
134                for b in stretchies:
135                    b.w += bump
136                # And we put each thing next to the previous one
137                for j, b in enumerate(row[1:], 1):
138                    b.x = row[j - 1].x + row[j - 1].w + separation
139
140        if break_line:
141            # We start a new row
142            row = []
143            # We go all the way left and a little down
144            box.x = pages[page].x
145            box.y = previous.y + previous.h + separation
146
147        # But if we go too far down
148        if box.y + box.h > pages[page].y + pages[page].h:
149            # We go to the next page
150            page += 1
151            # And put the box at the top-left
152            box.x = pages[page].x
153            box.y = pages[page].y
154
155        # Put the box in the row
156        row.append(box)
157
158        # Collapse all left-margin space
159        if all(b.letter == " " for b in row):
160            box.w = 0
161            box.stretchy = False
162            box.x = pages[page].x
163
164        previous = box
165
166
167layout(text_boxes)
210draw_boxes(text_boxes, "lesson11.svg", ("33cm", "5cm"), hide_boxes=True)

lesson11.svg


Further references:

results matching ""

    No results matching ""