This routine is a relatively long 90 bytes long, but relatively straightforward once unrolled.
D86C: 34 10 PSHS ,X Push X ($1D00) to stack
D86E: A6 A0 LDA ,Y+ Set A = $46 (value at $C0A9), Y + 1
We immediately push X to the stack and read a byte from the address at Y. This suggests that these registers need to be set before we arrive here, so we can think of this routine as accepting two arguments, both of which are memory pointers.
In our assembly comments, we've assumed that the function has just been called from $C0A4
, where X was set to $1D00
and Y was set to $C0A9
. In our analysis of the memory at that point in the code, we guessed that the data at Y was ASCII text, and the data at X was uninitialized memory (see the post here for more details) We can therefore label these two parameters:
$$
X = \mathtt{destination\_address} \\
Y_1 = \mathtt{ascii\_character\_address}
$$
I've designated $Y_1$ instead of $Y$ because the Y register will be used to store a different variable later on.
Special case for end-of-transmission byte
D870: 81 04 CMPA #$04 Compare A (ascii_character) to 4 (end-of-transmission)
D872: 27 51 BEQ $D8C5 Jump to end if A = end-of-transmission
When the character we read is equal to 04
, we jump to the very end of the routine, i.e. immediately exiting. This indicates that the byte 04
, which does appropriately correspond to "end of transmission" in ASCII, marks the end of the chunk of data that we're reading.
Special case for spaces (' ')
D874: 81 20 CMPA #$20 Compare A to $20 (' ')
D876: 26 18 BNE $D890 Jump to $D890 if A != ' '
D878: 86 08 LDA #$08 Set A = 8 (counter)
D87A: 34 02 PSHS ,A Push A to stack
D87C: CC 00 00 LDD #$0000 Set A & B to 0
D87F: ED 84 STD ,X Store 0000 at destination_address
D881: 30 88 20 LEAX +$20,X destination_address + 20
D884: 6A E4 DEC ,S counter - 1
D886: 26 F7 BNE $D87F Loop while counter > 0
If the byte loaded is a space (i.e. ' '
), we enter a fairly simple loop which zeroes out the memory at destination_address
. Makes sense; a space is not visible so we're probably loading pure zeroes to our destination RAM in order to not render it.
The only thing that might seem a bit strange (for now) is that we're incrementing the destination address by 32 bytes for each of the 8 loops, so it has gaps in memory somewhat like this:
D888: 32 61 LEAS +$01,S Pop stack & discard
D88A: 30 89 FF 02 LEAX $FF02,X Reset destination_address to original offset + 2
D88E: 20 DE BRA $D86E Jump
Here we reset the stack to its original position and also reset the destination_address
. Since during our looping we incremented it 256 bytes, now we move it backwards 254 bytes so that it sits at first uninitialized location in memory from the original offset:
This is implemented by adding the 2's-complement of $0x00FE
, which is $0xFF02
.
TODO: more explanation of subtraction using 2's-complement
Special case for question marks ('?')
D890: 81 3F CMPA #$3F Compare A to 63 ('?')
D892: 26 0C BNE $D8A0 Jump to $D8A0 if A != 63
D894: 34 20 PSHS ,Y Push ascii_character_address to stack
D896: 10 8E D8 C7 LDY #$D8C7 Set Y = $D8C7
D89A: 86 08 LDA #$08 Set A = 8 (counter)
D89C: 34 02 PSHS ,A Push A to stack
D89E: 20 10 BRA $D8B0 Jump
Again, we perform some special logic if the byte is the ASCII code for a question mark.
At this point, we introduce another new variable which is, again, another memory address in the cartridge ROM. Based on all we know so far, we might have suspicions about what lies there -- but let's continue for now.
We store this variable in address register Y. This register already contains ascii_character_address
, but since we've already loaded the data from there, we don't need the address right now - so we push it to the stack to make way for our new variable.
We'll call this new variable $Y_2$ until we know what it's for.
$$
Y_2 = \mathtt{unknown}
$$
For a question mark, we hardcode this address to $D8C7
. This is a clue. When dealing with sets of ASCII characters, we can usually deal with alphanumerics in a generic way because they're coded contiguous i.e.
code |
character |
65 |
A |
66 |
B |
67 |
C |
68 |
D |
69 |
E |
70 |
F |
71 |
G |
72 |
H |
or
code |
character |
48 |
0 |
49 |
1 |
50 |
2 |
51 |
3 |
52 |
4 |
53 |
5 |
54 |
6 |
55 |
7 |
56 |
8 |
57 |
9 |
The question mark, on the other hand, is the only non-alphanumeric character apart from a space that I can recall the game displaying. Whatever data it is we need for the question mark simply lies at the arbitrary location of $D873
and we accordingly point directly to it.
General case for alphabetics
D8A0: 34 20 PSHS ,Y Push ascii_character_address to stack
D8A2: 80 41 SUBA #$41 Set A = ascii_character - 65
D8A4: C6 10 LDB #$10 Set B = 16
D8A6: 3D MUL (ascii_character - 65) x 16
D8A7: C3 D9 77 ADDD #$D977 + $D977
D8AA: 1F 02 TFR D,Y Set Y = result
As for the question mark, we first push ascii_character_address
to the stack to get it out of the way. We then load $Y_2$ with the seemingly arbitrary address $D977
. However, we then additionally offset it according to where it appears in the set of all characters A to Z:
$$
Y_2 = \mathtt{$D977} + ((\mathtt{ascii\_code} - 65)\times16)
$$
The magic number 65 ($41
) corresponds to the ASCII code for the letter A. The calculated addresses therefore end up looking like this:
$$
Y_{2A} = \mathtt{$D977} + ((65 - 65)\times16) = \mathtt{$D977} \\
Y_{2B} = \mathtt{$D977} + ((66 - 65)\times16) = \mathtt{$D987} \\
Y_{2C} = \mathtt{$D977} + ((67 - 65)\times16) = \mathtt{$D997} \\
... \\
Y_{2Z} = \mathtt{$D977} + ((90 - 65)\times16) = \mathtt{$DB07} \\
$$
D8AC: 86 08 LDA #$08 Set A = 8 (counter)
D8AE: 34 02 PSHS ,A Push A to stack
D8B0: EC A1 LDD ,Y++ Set D = data from Y2, Y + 2
D8B2: ED 84 STD ,X Store data at destination_address
D8B4: 30 88 20 LEAX +$20,X destination_address + $20
D8B7: 6A E4 DEC ,S counter - 1
D8B9: 26 F5 BNE $D8B0 Loop while counter > 0
Similar to what we did with the special case of the space (' '), we read the 16 bytes starting from the address at $Y_2$ sequentially, but then place them 32 bytes apart at the destination address e.g. $1D00
, $1D20
, $1D40
etc.
D8BB: 32 61 LEAS +$01,S Pop stack and discard
D8BD: 35 20 PULS ,Y Pop ascii_character_address into Y
D8BF: 30 89 FF 02 LEAX $FF02,X Reset destination_address to original offset + 2
D8C3: 20 A9 BRA $D86E Loop to beginning
This is mostly cleanup code to reset our stack and restore the variables that we had at the beginning of the function. Then we loop right back to the beginning to start the whole process again with the next ASCII character we're loading.
D8C5: 35 90 PULS ,X,PC Return
Finally, we return to the caller, additionally popping the X register off the stack to tidy up. These two actions could be written as two separate instructions:
However, since popping from the stack into the program counter does exactly the same thing as a RET
, we can save a byte by combining the two pops into one.
What have we loaded?
We loaded all this data into RAM, but what actually is this data? Let's take a look at how it looks after we've used it for the four characters 'F', 'T', 'S', 'R', used as labels on the gauges in the bottom left of the screen.
When we were looping through our data we noticed that we were spacing it 32-bytes apart, so lets look at it in 32-byte rows:
1D00: 3F FC 0F FC 0F F0 3F F0 00 ..
1D20: 30 00 00 C0 30 0C 30 0C 00 ..
1D40: 30 00 00 C0 30 00 30 0C 00 ..
1D60: 3F C0 00 C0 0F F0 3F F0 00 ..
1D80: 30 00 00 C0 00 0C 33 00 00 ..
1DA0: 30 00 00 C0 00 0C 30 C0 00 ..
1DC0: 30 00 00 C0 30 0C 30 30 00 ..
1DE0: 30 00 00 C0 0F F0 30 0C 00 ..
There are only 9 distinct values used ($00
, 0C
, 0F
, 30
, 33
, 3F
, C0
, F0
and FC
), but more importantly, if you ignore the zeros you get some interesting patterns:
1D00: 3F FC F FC F F 3F F ..
1D20: 3 C 3 C 3 C ..
1D40: 3 C 3 3 C ..
1D60: 3F C C F F 3F F ..
1D80: 3 C C 33 ..
1DA0: 3 C C 3 C ..
1DC0: 3 C 3 C 3 3 ..
1DE0: 3 C F F 3 C ..
Hel-lo! The shapes of an F, T, S and R are clearly distinct.
It's even more obvious once we convert them to binary. Here's the F by itself:
1D00: 0011 1111 1111 1100 ..
1D20: 0011 0000 0000 0000 ..
1D40: 0011 0000 0000 0000 ..
1D60: 0011 1111 1100 0000 ..
1D80: 0011 0000 0000 0000 ..
1DA0: 0011 0000 0000 0000 ..
1DC0: 0011 0000 0000 0000 ..
1DE0: 0011 0000 0000 0000 ..
and without the zeroes to make it really obvious:
1D00: 11 1111 1111 11 ..
1D20: 11 ..
1D40: 11 ..
1D60: 11 1111 11 ..
1D80: 11 ..
1DA0: 11 ..
1DC0: 11 ..
1DE0: 11 ..
I remember being blown away by this when I first saw it done years ago. We're literally drawing in 1's and 0's. And we can at last name our $Y_2$ variable.
$$
Y_2 = \mathtt{glyph\_data\_address}
$$
This is a very rarely used technique in modern times. It's about the most primitive possible way of embedding graphics into a program. It only really works if you want your shapes to be not only monochromatic, but entirely toneless i.e. only 2 colors.