Draw score ($C131—C157)

The easiest way to understand what this block of code is doing is look at the state of memory before and after. The main difference is some new entries in the video RAM, from $1CF8 to $1DDF. Recalling our earlier trick from "Load glyph data ($D86C—D8C6)" (TODO: link) to peek at the video memory, we end up with a set of bytes looking like this:

1CF8:    11111111        11111111        11111111        11111111
1D18:  11      1111    11      1111    11      1111    11      1111
1D38:  11    11  11    11    11  11    11    11  11    11    11  11
1D58:  11    11  11    11    11  11    11    11  11    11    11  11
1D78:  11  11    11    11  11    11    11  11    11    11  11    11
1D98:  11  11    11    11  11    11    11  11    11    11  11    11
1DB8:  1111      11    1111      11    1111      11    1111      11
1DD8:    11111111        11111111        11111111        11111111

This looks exactly like the player's score, when set to zero. Knowing this, it becomes much easier to guess at what our variables are.

C131: 86 03        LDA #$03    A = 3
C133: 97 51        STA $51     Set $2051 = 3

...except this one. $2051 is likely another global variable but it does not seem to be read or changed during a game.

C135: 8E 1C F8     LDX #$1CF8  X = $1CF8 (video_address, video RAM)
C138: 10 8E 20 4F  LDY #$204F  Y = $204F (score_address)
C13C: BD D8 29     JSR $D829   Jump to $D829

We saw the score start at $1CF8, so X clearly indicates the video address into which data will be written. $204F indicates an area of memory which starts with zeroed data but which, when we run the game, reflects the player's score in real-time. We therefore need no hesitation in labelling these variables video_address and score_address.

In fact byte $204F contains the hundreds and thousands,while byte $2050 contains the tens and ones. Each digit uses four bits (officially, but not popularly, know as a nibble). For example, if the score is 1550, then the individual bits will look like this:

Bits:   0001 0101 0101 0000
Number:    1    5    5    0

D829: 34 30        PSHS ,Y,X   Push video_address, score_address to stack
D82B: 86 02        LDA #$02    A = 2
D82D: 34 02        PSHS ,A     Push A to stack (loop_count_outer)

We push these variables to the stack, including a simple 2 which I know will only be used for looping. I call it loop_count_outer because there will shortly be an inner loop too.

D82F: A6 A4        LDA ,Y      A = score_address->val
D831: 8D 10        BSR $D843

The accumulator is loaded with the byte containing the 2 right digits of the player score.

D843: 34 30        PSHS ,Y,X   Push video_address, score_address to stack again

X and Y, which still contain the video_address and score_address respectively, are pushed to the stack again only for convenience; we want to access these again soon, but if we wanted to do so without making copies, we would have to navigate up the stack beyond loop_count_outer. Fiddling with the stack like that could be considered bad form, as it does make it easier to introduce subtle bugs.

We can call these copies video_address2 and score_address2.

D845: 44           LSRA        A >> 1
D846: 44           LSRA        A >> 1
D847: 44           LSRA        A >> 1
D848: 44           LSRA        A >> 1

We know that A contains a byte of the player score, with each of the two nibbles accounting for a digit each. Right-shifting A four times effectively isolates the high nibble digit, by moving it into the low nibble position and implicitly zeroing the high nibble. The entire byte in A now represents that one digit instead of two.

We then call - with needing to actually call it, as it happens to start from the next instruction - the function draw_digit(A), with the first digit that we want to draw.

# draw_digit(A)
D849: C6 10        LDB #$10    B = 16
D84B: 3D           MUL         digit * B
D84C: C3 D8 D7     ADDD #$D8D7 D = D + $D8D7 (ROM address)
D84F: 1F 02        TFR D,Y     Y = D
D851: AE E4        LDX ,S      X = video_address2

We start by finding where the glyph data for this digit exists on the cartridge ROM, by calculating an offset from $D8D7. From this we can extrapolate exactly where each number's data is stored. As it turns out they are simply store 16 bytes apart.

number	Glyph in ROM
0	`$D8D7`
1	`$D8E7`
2	`$D8F7`
3	`$D907`
4	`$D917`
5	`$D927`
6	`$D937`
7	`$D947`
8	`$D957`
9	`$D967`

D853: 86 08        LDA #$08    A = 8
D855: 34 02        PSHS ,A     Push A to stack (loop_count_inner)
D857: EC A1        LDD ,Y++    D = Y, Y = Y + 1
D859: ED 84        STD ,X      X->val = D
D85B: 30 88 20     LEAX +$20,X X = X + 32
D85E: 6A E4        DEC ,S      loop_count_inner = loop_count_inner - 1
D860: 26 F5        BNE $D857   Loop until loop_count_inner = 0 (8 loops)

This is a simple function; re-written naively in C it would look more like this:

for (int i = 0; i < 8; i++)
{
    memory[X] = memory[Y];
    memory[X+1] = memory[Y+1];
    X += 32;
    Y += 1;
}

This small loop copies the data from cartridge ROM into video memory. Reading the ROM data is easy, as we just loop 8 times, reading 2 bytes each time. When writing it, we separate every 2 bytes with a 32-byte gap; as a complete horizontal line across the screen uses 32 bytes, this displays as 2 bytes per line.

In any case it's clear that we're populating a collection of 8 (still unknown) data structures that are 32 bytes wide.

D862: 32 61        LEAS +$01,S   Clean-up loop_count_inner
D864: 35 B0        PULS ,X,Y,PC  Clean-up video_address2 & score_address2, return to caller

We clean up all the stack variables, simultaneously using the trick of pulling the PC off the stack to simulate an RTS, and continue where we left off.

D833: 30 02        LEAX +$02,X X = X + 2 (video_address + 2)
D835: A6 A0        LDA ,Y+     A = Y->val, Y = Y + 1
D837: 8D 2D        BSR $D866

We've now popped the stack back to the point where X holds the initial video_address of $1CF8, and Y holds the initial score_address of $204F. We offset the video address by 2 bytes - which positions us 8 pixels to the right, at the insertion position for the next digit. The score address will actually remain as $204F, though Y is then incremented. This means that the next time it's accessed it will be pointing to $2050, which contains the two left-most digits of the score.

D866: 34 30        PSHS ,Y,X   Push video_address2 & score_address2 to stack
D868: 84 0F        ANDA #$0F   A = A & 00001111
D86A: 20 DD        BRA $D849   Call draw_number(A) on second digit

We call the draw function again, but this time we isolate the second digit. This is more simply done; recall that to isolate the high nibble we needed 4 right-shifts, but to isolate the low nibble we can simply remove the high nibble. This can be done with a simple AND mask.

A note on unorthodox branching

Note that we use a BRA here instead of a JSR, so the address of the current execution point is not placed on the stack. This means that when copy_rom_to_ram1() finishes and calls RTS (or uses PULS ,PC simulates it), we'll be returning to wherever we last used a BSR or JSR. In this case that will be at $D839. This execution flow is not something that can usually be achieved in a higher-level language, though it's really the same as:

JSR $B849
RTS

This form makes the execution flow explicit, at the cost of one extra byte and one extra instruction.

D839: 30 02        LEAX +$02,X  X = X + 2
D83B: 6A E4        DEC ,S       loop_count_outer = loop_count_outer - 1
D83D: 26 F0        BNE $D82F    Loop until loop_count_outer = 0 (2 loops)

Having now drawn two digits, the entire code from $D82F is repeated, but by now

score_address will be 1 byte higher at $2050 instead of $204F, and
video_address will be 4 bytes higher at $1CFC instead of $1CF8.

This will draw the remaining two digits of the player's score onto the screen.

D83F: 32 61        LEAS +$01,S  Clean-up loop_count_outer
D841: 35 B0        PULS ,X,Y,PC Clean-up video_address, score_address and return to caller

Clean-up, as usual, before continuing. Here is the fruit of our labours:

Player score 0000

Written on September 11, 2015