Thursday, April 24, 2014

NES Hello Sprites 1

For my first demonstration of sprites I decided to go with your typical bouncing sprite. Instead of using a ball, the text message "Hello Sprites!" Is used. Due to the NES limitation of only allowing 8 sprites on a line, the message is broken into two lines of text. This version manually manipulates sprite memory instead of using the more common DMA method that will be covered next. To my surprise, this actually lead to issues with the emulator if ran with the default fast PPU emulation though worked fine when ran with the slower PPU emulator. This is a good reminder that emulators are not always accurate, though developing with them is much faster than developing on real hardware.


The complete source code for this project is at http://blazinggames.com/books/NES/. Only the most relevant sections of code are included.

The program starts out with pretty standard initialization code. The screen is then filled with a checkerboard pattern instead of blank spaces so it will be clear that the sprites are appearing above the screen image (background). Next we have the task of setting up the sprites. In our case we are just copying this info from a table in ROM into the PPU Sprite Memory. Variables for holding the top corner of the block of sprites and the movement directions are then set up. The main loop simply waits for the VBlank flag to be set and then calls the moveSprite function which has all the movement logic.

; initialize the sprites
LDA #0
TAY
STA $2003
; fill sprite memory with data from ROM
InitSprites_dataLoop:
LDA hello_sprite_data,Y
STA $2004
INY
CPY #52
BNE InitSprites_dataLoop

; fill rest of sprite memory with 255 so that it will not be visible
LDA #255 
InitSprites_fillLoop:
STA $2004
INY
BNE InitSprites_fillLoop
  
; set sprite variables 
LDA #0
STA SPRITE_X
STA SPRITE_Y
LDA #1
STA H_ADJUST
STA V_ADJUST

The MoveSprites function is fairly large but also simple. The first chunk of code shows how the coordinates are adjusted. Each frame the block's top corner is adjusted by the current direction variables. Hits against the screen bounds are then performed with hits resulting in the direction being changed. What may be confusing, and something I will definitely be writing about in the future, 255 is used for -1.

MoveSprites:
; First adjust top corner of sprite block
LDA SPRITE_X
CLC
ADC H_ADJUST
STA SPRITE_X
; Has hit left edge?
BNE MoveSprite_checkRightEdge
; if so set horizontal adjust to +1
LDA #1
STA H_ADJUST
MoveSprite_checkRightEdge:
; if has hit right edge?
CMP #192
BNE MoveSprite_vertical
; if hit right edge, set horizontal adjust to -1
LDA #$FF ; -1
STA H_ADJUST
; vertical tests similar to above and not included here, see source file for full code.

Once we know the coordinates for the top of the block of sprites, we can calculate the position of the sprites. As the bottom calculations are the more complex ones, here is the code for setting the bottom line of sprites.

; we reach here when top row done, so prepare bottom row
LDA SPRITE_Y
CLC
ADC #8
STA TEMP_Y ; Y coordinate to use for bottom row of sprites
LDA SPRITE_X
STA TEMP_X ; X coordinate starts at block x coordinate
; Y register already correct so no need to set
LDX #8 ; 8 sprites in bottom row need to be counted
MoveSprite_bottomRowAdjust:
STY $2003
LDA TEMP_Y
STA $2004 ; write adjusted sprite Y coordinate
INY ; Adjust index to point to sprite X data
INY
INY
STY $2003 ; tell PPU we want to write sprite X
LDA TEMP_X
STA $2004 ; Write sprite X coordinate
CLC
ADC #8 ; add 8 to this coordinate for next sprite
STA TEMP_X
INY ; set up index for next sprite
DEX ; and adjust countdown
BNE MoveSprite_bottomRowAdjust

As you can see it is fairly simple. Most NES programs, however, do not manipulate sprites directly as we are doing so we will rewrite this program to take advantage of interrupts and DMA,as well as explain what interrupts and DMA shortly but next week will be a postmortem possibly followed by something else.

Thursday, April 17, 2014

NES Sprites

When programmers talk about sprites they are not usually talking about the mythical variant. They are referring to a special graphical object that has transparent parts and floats above the display without altering what is behind it. This seemingly magical behaviour is likely why programmers started referring to these objects as sprites. In fact, on the 2600, they were referred to as Player Missile Graphics. Sprite Graphics clearly rolls off the tongue better.


Not all graphics hardware supports sprites, and even when they do the specifications for the sprite can vary wildly. When emulating sprites in software, the software needs to be able to preserve what is behind the sprites and then draw the sprites over the display. With multiple overlapping sprites this can be complex, especially if speed is important. Thankfully the NES supports hardware sprites so we don't need to get into display lists and dirty rectangles here.

The NES supports 64 sprites on the screen at the same time, though the hardware is limited to displaying 8 sprites on a single scan line. To get around these limitations games that need more sprites or more than 8 sprites on a scan line can shuffle sprites around so the visible sprites change from frame to frame. This has the downside of causing a noticeable flickering effect. Some emulators do not emulate the 8 sprites per scan line limitation so will only flicker if you use more than 64 sprites. I prefer emulators that keep hardware limitations and bugs as that makes it much more likely that whatever I create using the emulator will run fine on real hardware.

Each of the 64 NES sprites take 4 bytes of memory. The layout of this data may seem rather haphazard but is actually very logical from a hardware perspective. Sprite data is set up as follows:

Byte 0 holds the Y coordinate. To know if it needs to draw a sprite on the current scan line this value is used making it the first piece of sprite information needed.
Byte 1 is the character to draw. Hardware needs to know this to grab the data from the pattern table.
Byte 2 is the display flags that alter how the sprite is displayed so is obviously the next piece of information the hardware needs. The bits are set up as follows:
        Bit 0 and 1 indicate the palette to use
        Bits 2 though 4 are not used
        Bit 5 When set puts the sprite behind the background image.
        Bit 6 Flips sprite horizontally when set
        Bit 7 Flips sprite vertically when set
Byte 3 holds the X coordinate. Once the hardware knows what it is drawing it needs to know where on the line it is going to be drawn.

We now know enough about sprites to create a demo, which will be next.

Friday, April 11, 2014

NES PPU Ports

Over the weekend I finished an Atari 2600 3D maze game which I will be releasing shortly (possibly as part of a large announcement) but if anybody wants early access to the ROM email me.

One of the things that both the NES version of Hello World and the 2600 version had in common is mapping the graphics processor to memory addresses. This a very common way of communicating with devices when using assembly language. The PPU condenses this to only 8 addresses. The audio unit, Sprite DMA, and IO ports are separate maps, but I don't consider them part of the PPU. As has been pointed out earlier, the PPU has it's own memory. The addresses here do not correspond to PPU memory but are simply registers used to control the PPU.

$2000 is the first of two control registers, with bits used to control the PPU settings as follows:
Bits 0 and 1 controls the screen memory (name table in NES terminology) with 00 being PPU address $2000, 01 being $2400, 10 being $2800 and 11 being $2C00.
Bit 2 enables vertical writing, meaning that when set writing to PPU memory increments the next write address by 32 bytes. As the screen has 32 columns this has the effect of writing a vertical strip of screen memory, which is exceedingly handy for side-scrolling games.
Bit 3 selects which of the two pattern tables the sprites will use for their images.
Bit 4 selects which of the two pattern tables the screen will use for its character set.
Bit 5 selects the sprite size. 0 for 8x8 sprites, 1 for 8x16 sprites.
Bit 6 not used
Bit 7 VBlank interrupt enable. When set will cause an interrupt to be triggered when a vertical blank is occurring.

$2001 is a second control port with bits as follows:
Bit 0 not used
Bit 1 Hides the left most column of the screen if 0
Bit 2 Hides sprites in the leftmost column of the screen if 0
Bit 3 Shows screen if set, blank screen when 0
Bit 4 Enables sprites when set.
Bit 5 intensifies Red colors
Bit 6 intensifies Green colors
Bit 7 intensifies Blue colors
I have heard that only one of the color intensifier bits should be set.

$2002 PPU Status gives status of PPU
Bits 0-5 not used
Bit 6 set when a visible bit in sprite 0 intersects a visible background bit. It is set only when the scan line with the intersection is occurring which means it can be used to trigger events on certain scan lines.
Bit 7 VBlank flag. Set when a VBlank has occurred. Once read it becomes 0 until the next VBlank, but many emulators do not emulate this behaviour.

$2003 Sprite Memory Address writing to this register will set the memory address that $2004 will point to for reading and writing sprite data. It should be pointed out that there is only 256 bytes of sprite memory so only a single write to this address is needed to access all of sprite memory.

$2004 read or write the sprite memory address that was set by $2003. Auto increments the next read or write will be the next address.

$2005 Screen Position written to twice to set the position of the screen. This is used to scroll the screen as if set to non-zero the portion of the screen outside of screen memory will come from one of the other name tables.

$2006 PPU Memory Address writing to this register will set the memory address that $2007 will point to for reading and writing PPU data. It requires two writes to set an address. The first write sets the high byte with the second being the low byte. Yes, this is the opposite of how the 6502 does things.

$2007 read or write the PPU memory address that was set by $2006. Auto increments the next read or write will be the next address. If vertical writing is enabled (via $2000 bit 2), next read/write address will be increment end by 32.

Programming the PPU is simply a matter of reading and writing to the appropriate registers. The sprites are a special case as if you are using a lot of sprites there is a much more efficient way of setting them. But before we cover sprite DMA, it would probably be a good idea to discuss sprites.

Thursday, April 3, 2014

Hello World 2600

For comparison purposes, this week I am showing an assembly language version of Hello World, but created for the Atari 2600. This is using DASM as the assembler due to it's popularity with the 2600 home-brew crowd. This assembler supports a number of different chips so we start off with a declaration of the processor type. This is followed by setting up the constants that we use. In this case, the constants are the names of the TIA registers which are mapped to memory. While it may seem strange that many of these ports are mapped to precious zero page addresses, the 2600 only has 128 bytes of RAM so all RAM is in zero page anyway.

processor 6502

; Set up TIA registers as constants

VSYNC  = $00
VBLANK = $01
WSYNC  = $02
COLUPF = $08
COLUBK = $09
PF0    = $0D
PF1    = $0E
PF2    = $0F
INTIM  = $284
TIM64T = $296

As with the NES, things start out with basic housekeeping. Memory and TIA registers are zeroed out. The colours for the background and playfield are then set.

org $F000
Start
; Typical starting houskeeping
SEI ; Disable  Interrupts
CLD ; Clear BCD mode.
LDX #$FF ; Set ...
TXS ; ... stack pointer

; lets take advantage of X to wipe memory and TIA registers
LDA #0
ClearMemory
STA 0,X
DEX
BNE ClearMemory

; set up background and playfield colors
LDA #$CE ; A light greenish color
STA COLUBK
LDA #$60 ; Purple!
STA COLUPF

Next we start the main loop. This is where things get really different. The 2600 does not have any type of graphics memory. Instead, the display is drawn by the program as the television is actually drawing the display. This means that the program has to manage the television display. The first thing for doing that is to perform a Vertical sync which synchronizes the television signal with the frame that we are sending. This is done by telling the TIA chip we are syncing then waiting for 3 scanlines to be drawn. The WSYNC register will halt the processor until the horizontal blank (when the TV's Cathode ray is turned off and moved to the left for the next scanline) starts. This is followed by a 37 scan line  blank period before we start drawing the display. By setting a timer, we can do some work while we wait. The timer is set for 2752 cycles before it goes off. We have nothing to do, so we will waste this time.

MainLoop
; Vertical sync
; bit D1 of VSYNC needs to be set to turn on vsync
LDA #2
STA VSYNC
; now wait for 3 scanlines
STA WSYNC
STA WSYNC
STA WSYNC

; Vertical Blank
; set timer so we know when vertical blank nearly over
LDA  #43 ;load 43 (decimal) in the accumulator
STA  TIM64T ;and store that in the timer
; end VSync
LDA #0 ; Zero out bit 2 of VSYNC
STA  VSYNC ; to indicate sync time is over

; some game logic can go here while vertical blank happening
; as long as less than 2752 cycles

WaitForEndOfVBlank
LDA INTIM ; load remaining time
BNE WaitForEndOfVBlank ; wait till timer done

TAX ; set x to 0 for holding current scanline
TAY ; set y to 0 for offset data
; end vblank period
STA WSYNC
STA VBLANK  

Now we are ready to start drawing the message. The 2600 does not have any type of text capability so we are going to create the message using playfield graphics. This is only 40 blocks per scan line so rather rough looking but it is what we have. The thing is, the playfield registers only support 20 blocks with the other half of the display either copied or mirrored. In order to have an asymmetrical display like we need, the playfield registers need to be changed in the middle of drawing the line. Each line consists of 22 2/3 cycles in the horizontal blank period followed by 53 1/3 cycles of actual drawing time for a total of 76 cycles per scan line. These are 6502 cycles, as the TIA uses something it calls color-clocks which happen at 3 times the rate of processor cycles. This means that we need to set up the playfield registers by loading in our data from a data segment at the end of the program.  Then we do other stuff until the beam is far enough along. As I am repeating the playfield data for 16 lines per data set, the checking if time to move to next line of data is done in the middle of the scan to let the beam catch up with us. Finally, we set up the other half of the display then finish our end-of-scan-line logic.

ScanLoop
; fill left playfield data from table
LDA PlayfieldData,Y
STA PF0
LDA PlayfieldData+1,Y
STA PF1
LDA PlayfieldData+2,Y
STA PF2

; end of scanline logic done here so beam is far enough to reload 
; playfields. We are simply checking if on a line evenly divisible by
; 16 since when 16 (32,48,...) the lower bits will be zeros.
INX
TXA
AND #15
PHA

; replace playfield data with right side data from table
LDA PlayfieldData+3,Y
STA PF0
LDA PlayfieldData+4,Y
STA PF1
LDA PlayfieldData+5,Y
STA PF2

; durring mid line we calculated if time for next set of playfield bytes
; and pushed it onto stack. Pull results and check
PLA
BNE EndOfLine
; if time for new playfield data, increment y index by 6
TYA
CLC
ADC #6
TAY

EndOfLine
STA WSYNC
; are we finished rendering?
CPX #191
BNE ScanLoop

Once we have finished drawing the display, it is time for the over scan. This lasts 30 scan lines. Again, a timer is set up. Other things can be done while the timer is running, but we don't have anything to do so this will be wasted as well. Obviously, in a game this is where some of the game logic would be handled, and all the buttons handled. Did I mention that the buttons on the console, including the reset button, are the responsibility of the program? Thankfully we really don't need to worry about that with this program.

; Overscan
LDA #2 ; Set D1 bit for the VBLANK...
STA VBLANK ; Make TIA output invisible for the overscan, 
; set timer for overscan
LDA #35
STA TIM64T

; could put more game logic here if we had any
; 2240 cycles set on timer

WaitForOverscan
LDA INTIM ; load remaining time
BNE WaitForOverscan ; wait till timer done
STA WSYNC

JMP  MainLoop      ; Loop forver!

And now we are done. Now the playfield data. Notice that the playfield registers are not logically set up. PF0 is only half a byte, with bits 4 through 7 used drawn in that order (backwards from a human perspective). PF1 is written from bits 7 to 0 so is logical from a human perspective. PF2 is written from bits 0 to 7 so again is backwards from a human perspective. This strangeness was most likely done to make the TIA chip easier and cheaper to produce. Still, it is simply a matter of making sure the data is in the correct order, so here is the display data.

; Game Data
PlayfieldData ; pf0-4..7  pf1-7..0  pf2 0..7   pf0-4..7   pf1-7..0    pf2-0..7
.byte 000000, 000000, 000000, 000000, 000000, 000000 
.byte %01000000, %01011110, 100001, %10000000, %10000000, 000000 
.byte %01000000, %01010000, 100001, %01000000, %01000000, 000000 
.byte %11000000, %11011100, 100001, %01000000, %01000000, 000000 
.byte %01000000, %01010000, 100001, %01000000, %01000000, 000000 
.byte %01000000, %01011110, %11101111, %10010000, %10000000, 000000 
.byte 000000, 000000, 000000, 000000, 000000, 000000 
.byte 000000, 000010, %01100100, %11100000, 100001, 100011 
.byte 000000, 000010, %10010100, 100000, %10100001, 100100 
.byte 000000, 000010, %10010101, %11100000, 100001, 100100 
.byte 000000, 000011, %10010110, 100000, %10100001, 000100 
.byte 000000, 000010, %01100100, 100000, %10111101, 100011 
.byte 000000, 000000, 000000, 000000, 000000, 000000 

; Set pointers hardware uses to find start of program
org $FFFC
.word Start
.word Start

And we are finished. Clearly the NES is a nicer system but it is a lot newer. Considering what is involved in creating a 2600 game, you have to be impressed with many of the games that were created for the platform.