Tuesday 25 February 2014

Bootstrap ROM now works

After much VHDL and SD card protocol wrangling, and the odd spot of FAT file system implementation, I finally have the C65GS able to load the main ROM from SD card when it boots, and then automatically switch to C64 mode.  As can be seen in the video, loading the 128KB ROM file is very fast, somewhere in the hundreds of kilobytes per second on the boring 1x 2GB microSD card I am using.

<zort! this video cannot be displayed>


Next steps will be to refine this a little, and then get it booting the C65 ROM.  Pretty much all of the 4502 CPU is there and working (the bootstrap ROM uses a lot of 4502 instructions, including the MAP instruction that will be vital for C65 ROMs), so I am hoping that it won't be too hard.

Sunday 23 February 2014

Progress on kickstart after a week of pain

The past week has been a frustrating cycle of making some progress on the kickstart ROM and related SD card controller work, followed by a plethora of bizarre bugs popping up, and in some cases FPGA synthesis totally failing.

Some of the causes are known, such as a long-standing bug relating to code executed from the fastio bus (as compared to chipram or slowram busses).  However, others are more elusive, and it has proven better to backtrack and reattempt scaling the mountain, rather than figuring out how to escape the quagmire.  In the process, hopefully the design of some of the components has improved through repeated implementation.

Anyway, after such a frustrating week, it is nice that I have finally got an FPGA build with a prototype kickstart ROM in it that can correctly initialise the SD card, and then understand enough of the FAT file system to find the C65GS.ROM file:


  The FAT code is now even fairly structured, so that the search for the ROM file happens using structured routines, not unlike the real-mode calls in the early versions of MS-DOS.  To give an idea of the API, here is the loop of my directory search:

                ; Now load root directory cluster.
  ; (fs_clusternumber was set above)
  jsr fs_cd_rootdir
  jsr fs_opendir
  bcc sdcarderror

  ; iterate through directory entries looking for ordinary file 
  ; C65GS.ROM
nextdirectoryentry:
  jsr fs_readdir
  bcc sdcarderror


  ldx #$00
l17b:  lda fs_direntry,x
  jsr toupper
  cmp txt_c65gsrom,x
  bne nextdirectoryentry
  inx
  cpx #$0b
  bne l17b
  ; this is the entry
  jmp foundromfile


The full source is in the kickstart branch at http://github.com/gardners/c65gs.  Also note that there is now a Google Group for this project at https://groups.google.com/forum/#!forum/c65gs-development.

But now it is time for sleep and getting ready for the week ahead.

Sunday 16 February 2014

Creating the C65GS KickStart ROM

The C65 ROM is 128KB, which is more than I can fit in the internal RAM of the FPGA, so I need to load it externally and store it in the 16MB SDRAM.  This means we need a little ROM to load the ROM from the microSD card.  For Amiga users this will immediately remind them of the kickstart ROM.  Linux also has a similarly named facility for automatic installation, but that isn't really relevant for us.

So I have begun implementing a kickstart ROM for the C65GS.

The idea is that on boot, the C65GS will map a special 8KB ROM @ $E000 - $FFFF, that will stay mapped until $01 gets written to, after which the normal memory mapping rules will apply.

This little ROM needs to look for an SD card, and then look for a FAT32 file system on that card, and then on that file system look for C65GS.ROM, and load that into the right place.

Of course we don't want to waste time on this each time we reset if the ROM is already loaded, so this ROM should also do a checksum on the section of the SDRAM where we intend to place the ROM.  If the checksum is valid, then we can just jump straight into the ROM.

Today I finally got the SD card controller working well enough (although it still has some bugs), that I could begin making sensible progress on the kickstart ROM.

Specifically, I have working:

1. The code working that checksums the SDRAM, and if OK, switches to C64 mode, and starts the C64 kernel.
2. The code that looks for an SD card.
3. The code that reads the master boot record from SD card, finds the first partition, and then checks if that partition is FAT32, and complains if it isn't.
4. The code that can map a FAT32 cluster to an absolute sector, which can be used to load the first cluster of the root directory of the file system.

Here is how it looks right now.

This is without any SD card inserted.



If I then insert an SD card (without needing to reset the machine, because kickstart keeps probing the SD card), then we see something like the following:


I was pretty happy that it worked when I hot-inserted the SD card.  The various hex output will probably be trimmed in due course, but for now provides me with some helpful diagnostics as I make sure I am interpreting the filesystem and partition table correctly.

The block of stuff nearer the bottom is the first 256 bytes of the root directory.  The volume name of the file system can be seen at the beginning (FPGA_BOARD rendered as FPGAMBOARD), and a few file names can be seen in there, such as VGA.BIT (VGA@@@@@BIT), which is an older version of the FPGA program itself.

Next step is to actually copy the ROM onto the SD card, and then get the cluster number of the ROM, and start loading the clusters into the SDRAM.

Saturday 8 February 2014

Testing the 4510 CPU

As mentioned previously, I am now working on extending the C64 Emulator Test Suite to create a C65 emulator test suite.  In the first instance, this is all about testing coverage of the extra 4510 operations.

The existing tests in the C64 Emulator Test Suite are good, but the assembly code for them is poorly documented, and being written using Turbo Assembler on a real C64, they are 95% shared code.

I have tried to improve this situation by factoring out a lot of the common parts of the tests, and using the .include directive of Ophis, so that a complete test is <100 lines, and each of the includes and parts of the test itself are fairly well documented.

For example, the test for PLZ is:

  .include "test_top.a65"

         .byte 145,"PLZN"

  .include "test_prepare.a65"

         ; perform one test
next:     ; expect data byte to be in results
         lda db
         sta dr

         ; expect Z and data byte should be identical
  lda db
         sta zr

         ; expect A to be the same in the results
         lda ab
         sta ar

         ; expect x to be the same in the results
         lda xb
         sta xr

         ; expect Y to be the same in the results
         lda yb
         sta yr

         ; expect processor flags to have B flag set and E flag set
         lda pb
         ora #110000
  and #%01111101
         tax
         lda dr
         cmp #0
         bne nozero
         txa
         ora #000010
         tax
         lda dr
nozero:  asl
         bcc noneg
         txa
         ora #%10000000
         tax
noneg:   stx pr

         ; expect SP to be one more than it started
  ; but in practice, the value will be the same, because we will be pulling
  ; a byte off that we have pushed
         ldx sb
         txs
         stx sr

  ; push data byte onto stack to be pulled off by the instruction
  lda db
  pha  

  .include "test_setup.a65"

         ; test instruction
cmd:     plz

  .include "test_record.a65"

  ; for stack pull instructions, data value is the appropriate register
  lda za
         sta da

  .include "test_check.a65"

  ; name of next test
name:    .byte "PHWIL"

  .include "test_common.a65"

I am sure there will be further changes before it is all done and dusted, but I can now fairly rapidly write tests for the simpler instructions.

Those interested can check out http://github.com/gardners/4510tests

If anyone with a real C65 is willing to run the tests on their machine in C64 mode, that would be a great help to verify that the behaviour I am implementing is correct.  I can provide a D81 file of the current set of tests if anyone is able to help out.

An assembler for 4502/4510 CPUs

As I have progressed, the need has arisen for an assembler that can assemble 4510 instructions.

I have extended the Ophis 6502/6510/65c02 assembler to include support for the 4510.  This is a nice assembler written in python that is quite powerful and flexible.  The source code is at http://github.com/gardners/Ophis.  I have issued a pull request to the upstream Ophis distribution so that it will hopefully get included in the main Ophis distribution in time.

Now that I have a 4510 assembler, I can start extending the Commodore 64 Emulator Test Suite to include tests for all the 4510 opcodes, and prune out the ones for undocumented 6510 opcodes.

Once that is in place, and the tests pass on the FPGA, I will know that I have a working 4510, which will remove a significant unknown when testing booting with C65 ROMs.

Sunday 2 February 2014

Disassembling the C65 C64-mode Kernel

Now that I have the machine starting with the stock C64 kernel, I want to get it booting with the C65 ROM, which means the modified C65 C64-mode kernel.

This isn't just for C65 compatibility, but it also provides a nice way to integrate microSD storage, because I can override the internal DOS so that it uses the microSD connector instead of the floppy controller.

To do this, I need to implement some of the extra 4510 opcodes, as well as some of the other C65 memory mapper and other control features.

What I didn't know was what extra instructions and features I would need to implement to facilitate this.  So I went looking for a disassembly of the C65 ROMs, but couldn't find any.  So I set about making my own, using Marko Makela's well known C64 kernel disassembly as the reference, and making note of all the changes.

I could have done it automatically (although the extra opcodes would have complicated this), but I also wanted to gain a clear understanding of how the C65 modified kernel works, and how it intercepts DOS.

The whole process took a few hours to exhaustively map the differences.

The changes basically consisted of throwing out the cassette routines and putting in sufficient intercepts.  Other smaller changes include making 8 the default device number:

; set parameters for load/verify/save

E1D4   A9 00      LDA #$00
E1D6   20 BD FF   JSR $FFBD
; DIFF: C65: Make device 8 the default.
; C64: E1D9 A2 01 LDX #$01
E1D9   A2 08      LDX #$08
E1DB   A0 00      LDY #$00
E1DD   20 BA FF   JSR $FFBA

...

; get open/close parameters

E219   A9 00      LDA #$00
E21B   20 BD FF   JSR $FFBD
E21E   20 11 E2   JSR $E211
E221   20 9E B7   JSR $B79E
E224   86 49      STX $49
E226   8A         TXA
; DIFF: C65: Make device 8 the default
; C64 E227   A2 01      LDX #$01
E227   A2 08      LDX #$08
E229   A0 00      LDY #$00
E22B   20 BA FF   JSR $FFBA

and changing the shift-RUN/STOP text so that it loads the first file from disk and runs it:

; Fill keyboard buffer with LOAD command when shift-RUNSTOP is pressed
E5EE   A2 09      LDX #$0A      ; C64: LDA #$09
E5F0   78         SEI
E5F1   86 C6      STX $C6
; C64: E5F3   BD E6 EC   LDA $ECE6,X
; C65: Copy string L"0:*R into keyboard buffer
E5F3   BD 56 FC   LDA $FC56,X   ; C65
E5F6   9D 76 02   STA $0276,X
E5F9   CA         DEX
E5FA   D0 F7      BNE $E5F3

...

; C65: key sequence when shift-RUN/STOP is pressed
;  L"0:*R
FC57  .BY $4C, $CF, $22, $30, $3A, $2A, $0D, $52, $D5, $0D

The DOS routines have been intercepted using an interesting approach.  First, $C0 ceases to indicate the tape motor state, and instead indicates whether the current drive is on the serial IEC bus, or handled by the internal 1581 DOS.  This is checked using one of the new 4510 opcodes, BBS7, which branches on whether bit 7 is set in a zero-page byte, without having to use the accumulator:

; C65: call DOS routine and return: send secondary address (talk)
F7E4  FF C0 09    BBS7 $C0,$F7F0  ; branch based on whether current device is internal 1581
F7E7  20 C7 ED    JSR $F72C  ; bank in C65 1581 DOS
F7EA  22 0A 80    JSR ($800A)
F7ED  20 3E F8    JSR $F83E  ; C65: return from DOS context and set $90 status
F7F0  4C C7 ED    JMP $EDC7 ; send secondary address (talk) on serial bus

As can be seen, if the drive is on the IEC bus, then the normal C64 kernel routine is used.  However, if the drive is the internal one, then the C65 1581 DOS is banked in to $8000 - $BFFF (conveniently leaving the C64 kernel still in view), and then the appropriate vector in that ROM is called.  Notice the use of the new indirect mode of the JSR instruction (opcode $22).

Banking involves the use of the MAP instruction, which sets the memory map.  C65 memory mapping is too complex to cover in this post, so I will cover it in a separate post later.  The interesting thing for now is to see how NOP is no longer really NOP.  The MAP instruction prevents both IRQ and NMI interrupts until a NOP instruction is run.  NOP is consequentially also known as End Of Mapping (EOM) on the 4510.

All this and more can be seen in the routine for switching to the C65 1581 DOS memory context.

; C65:  switch to C65 1581 DOS context

F72C   78         SEI
F72D   48         PHA
F72E   A9 A5      LDA #$A5      ; C65: VIC-III enable sequence
F730   8D 2F D0   STA $D02F
F733   A9 96      LDA #$96
F735   8D 2F D0   STA $D02F     ; C65: VIC-III enabled
F739   A9 40
F73A   0C 31 D0   TSB $D031     ; set bit 6 in $D031 to put CPU at 3.5MHz
F73D   A9 21
F73F   0C 30 D0   TSB $D030     ; bank in $C000 interface ROM and remove CIAs from IO map
F742   68         PLA           ; store registers
F743   8D F6 DF   STA $DFF6
F746   8E F7 DF   STX $DFF7
F749   8C F8 DF   STY $DFF8
F74C   9C F9 DF   STZ $DFF9
F74F   68         PLA
F750   8D FB DF   STA $DFFB
F753   68         PLA
F754   8D FC DF   STA $DFFC
F757   BA         TSX
F758   8E FF DF   STX $DFFF
; C65: bank in 1581 DOS
F75B   A9 00      LDA #$00
F75D   A2 11      LDX #$11   ; Map $0000-$1FFF to $10000-$11FFF ($0000+$10000)
F75F   A0 80      LDY #$80
F761   A3 31      LDZ #$31   ; Map $8000-$BFFF to $20000-$23FFF ($8000+$18000)
F763   5C         MAP        ; activate new map
F764   A2 FF      LDX #$FF
F766   9A         TXS
F767   AD FC DF   LDA $DFFC
F76A   48         PHA
F76B   AD FB DF   LDA $DFFB
F76E   48         PHA
F76F   AD F6 DF   LDA $DFF6
F772   AE F7 DF   LDX $DFF7
F775   AC F8 DF   LDY $DFF8
F778   AB FA DF   LDZ $DFFA
F77B   60         RTS        ; RTS (notice how the return address was copied from old stack to new stack)

Notice that there is no NOP or EOM in this routine.  This prevents any interrupts occurring while the internal DOS is operating in its special memory map.  The EOM appears in the routine for returning from the C65 1581 DOS context:
; C65: return from DOS call and set status in $90
F83E  68          PLA
F83F  8D FD DF    STA $DFFD
F842  68          PLA
F843  8D FE DF    STA $DFFE
F846  20 7C F7    JSR $F77C ; restore C64 memory map
F849  77 C0       RMB7 $C0  ; clear bit 7 in $C0
F84B  6B          TZA
F84C  10 0C       BPL $F85A
F84E  A9 00       LDA #$00
F850  F7 C0       SMB7 $C0  ; set bit 7 in $C0
F852  AE FE DF    LDX $DFFE
F855  DA          PHX
F856  AE FD DF    LDX $DFFD
F859  DA          PHX
F85A  04 90       TSB $90   ; Set bits in $90 (status) if required
F85C  AE F7 DF    LDX $DFF7
F85F  AC F8 DF    LDY $DFF8
F862  AB F9 DF    LDZ $DFF9
F865  AD F6 DF    LDA $DFF6
F868  48          PHA
F869  A9 21       LDA #$21
F86B  1C 30 D0    TRB $D030 ; bank out $C000 ROM and bank CIAs back in.
F86E  A9 40       LDA #$40
F870  1C 31 D0    TRB $D031 ; return CPU to 1MHz.
F873  8D 2F D0    STA $D02f ; return to VIC-II mode
F876  68          PLA
F877  EA          EOM       ; release IRQ & NMI after MAP change triggered at $F846
F878  58          CLI
F879  18          CLC
F87A  60          RTS

The $DFFx memory accesses are not to the CIAs, but to the end of screen RAM.  Setting bit 0 in $D030 replaces $DC00-$DFFF with an extra 1KB of colour RAM, which is in fact the last 2KB of the 128KB of main RAM of a C65, and hence is 8 bit RAM, unlike the 4-bit colour RAM on the C64.

The $D030 flag is primarily for making the 2KB colour RAM conveniently available to the kernel when working with an 80-column, and hence 2,000 byte screen.  Of course this leaves a few bytes spare at the end that are nicely used here to save and restore registers when the stack cannot be used because memory is being remapped.

The last interesting piece is to explore is the reset process.  The reset vector has been changed:

FFFA   .WD $FE43   ; NMI vector
; C64: FFFC   .WD $FCE2   ; RESET vector
; C65 new reset vector
FFFC   .WD $E4B8   ; RESET vector
FFFE   .WD $FF48   ; IRQ/BRK vector

Reset proceeds from $E4B8, instead of $FCE2.   The $E4B8 routine is quite simple, if a little curious:

; C65: CPU reset entry point.
; Check for cartridge, else normal reset sequence.
; (this is a little strance, since $FCE2 routine also calls $FD02
E4B8   20 02 FD   JSR $FD02  ; check for cartridge
E4BB   D0 03      BNE $E4C0
E4BD   4C E2 FC   JMP $FCE2  ; RESET routine

; C65: Enable VIC-III mode, jump to interface
E4C0   78         SEI
E4C1   A9 A5      LDA #$A5
E4C3   8D 2F D0   STA $D02F
E4C6   A9 96      LDA #$96
E4C8   8D 2F D0   STA $D02F
E4CB   A9 20      LDA #$20
E4CD   8D 30 D0   STA $D030 ; bank interface ROM in @ $C000
E4D0   4C 00 C8   JMP $C800 ; interface ROM entry point

E4D3   85 A9      STA $A9

E4D5   A9 01      LDA #$01
E4D7   85 AB      STA $AB
E4D9   60         RTS

If a C64 cartridge is detected, then the usual C64 reset process is followed.  If not, then the machine switches to C65 mode by banking in the interface ROM at $C000-$CFFF and jumping to the entry point there.  Also, at another point the C65 1581 ROM is mapped into $8000 - $BFFF and the DOS setup routine is called.  This means that I need to disassemble and examine those two ROMs as well I am to fully understand what is going on.

But for now, the complete C65 C64-mode kernel disassembly is available here.

Saturday 1 February 2014

Raster IRQs now work

Today I implemented a small but important feature that has been in the pipeline for a while: raster interrupts triggered by the VIC-IV.

I already had much of the machinery in place for raster IRQs, I just hadn't finished tying it all together.

So a few lines of VHDL later, I set the FPGA building.  Unfortunately, just adding a few ties for the IRQ pushed the timing out by about 2ns from the 5.6ns required for the 192MHz pixel clock to around 7ns.  As a result it didn't work.

I scratched my head for a while wondering how about 7 logic gates could ruin the timing so badly, and eventually realised that I needed to pipeline the IRQ line by adding a drive stage, so that the IRQ had time to propagate across the FPGA.  Without it, ISE was rearranging everything else (badly) to make the IRQ line get to the CPU in one cycle.  Net result, the IRQ triggers one pixel clock cycle late on the CPU, which isn't really an issue, since the CPU runs at half that clock speed, so it should still trigger the CPU interrupt on the correct cycle.

So then I set about writing a little raster interrupt routine to test it.  This was an important step, not only to make sure that the raster interrupt line worked, but also that clearing VIC-IV interrupts worked in a C64 compatible way, unlike the C65 where the usual ASL $D019 or INC $D019 doesn't clear VIC interrupts.  This is a big source of incompatibility on the C65.

Interestingly, this incompatibility on the C65 is not the VIC-III's fault, but rather the 4510 CPU's.  This is because the 4510 uses the CMOS 65CE02 core that changed the behaviour of ASL, INC and other read-modify-write instructions.  On the 6510, the instructions read the original value, write the original value and then write the modified value.  This is why ASL $D019 or INC $D019 works to clear interrupts on the C64, because writing the value read from $D019 will clear all triggered interrupts.

But on the 4510, the original value is read, and the modified value is written, saving a cycle, and in the process really breaking compatibility.  The SuperCPU has a similar problem because it uses the 65816 that includes the same "optimisation".  Aware of this problem, I resolved that this problem would not exist in the C65GS, and today it was time to test it.

The interrupt routine I wrote is:
 sei
 ; CIA IRQ disable
 lda #$7f
 sta $DC0D
 ; clear bit 8 of raster compare
 and $D011
 sta $D011
 ; set raster for split
 lda #$80
 sta $D012
 ; enable raster IRQ
 lda #$01
 sta $D01A
 ; set IRQ vector
 lda #irq
 sta $0315
 RTS

irq: ; border yellow
 lda #$07
 sta $D020
 ; wait for a bit
 ldx #$ff
l1: dex
 bne l1
 ; border back to light blue
 lda #$0e
 sta $D020
 ; acknowledge IRQ
 inc $D019
 ; return from interrupt, via keyboard scan etc.
 jmp $EA31

As can be seen, I am using my usual INC $D019 to clear the raster IRQ.

Now to see if it worked.  Bingo! A nice little raster bar.  There is a few cycles jitter as you would expect, but of course with a 96MHz CPU the jitter is only about one character wide.  Since each character is 20 cycles wide, that means a jitter of less than 10 cycles.  That's more than on a real C64 because I still have some wait states in the C65GS CPU memory access that I have yet to work around, and so some instructions can take a dozen or so cycles, in particular things like INC that include six memory accesses can take 12 or 13 cycles.


For comparison, here is the same routing running on VICE.  


The keen observer will notice that not only is the raster bar much narrow on the C65GS, but it is also not in quite the same position.  This is because the 1920x1200 frame has 1248 physical rasters (including flyback), which is 4x PAL's 312 lines.  However, to keep the vertical borders small, in C64 mode the C65GS makes each logical raster equal to five physical rasters.

I'll have to do something about this so that raster splits occur on the correct logical line.  This will most likely consist of having logical rasters spaced 3-lines apart before the display, 5-lines apart during the display, and 3-lines after.  It all gets a bit fun, because 1920x1200@60Hz has only one invisible raster at the beginning of frame, while PAL has more, so the logical raster counter will have to start during flyback in the previous frame.  Entirely possible, just a bit fiddly.