Friday 31 January 2014

256-colour modes now mostly working

The past few days I have been fixing some bugs in the VIC-IV character generator that were causing various glitches.  Today I got the last of the important ones fixed, and so now when the C65GS boots, it looks just like a real C64:


I also had time to finish implementing the palette.  The palette is mostly like on the Commodore 65, with $D1xx holding the red component, $D2xx the green and $D3xx the blue components.  These are 8-bit registers, in principle allowing for 24-bit colour, but for now the simple VGA output I am using is limited to 12-bit colour.  I have plans to make an HDMI out that would have 24 colour digitally.  Here is what happens when you run:  LDA $D012; STA $D020; JMP *-6, you get nice beautiful solid raster bars with the default palette (which is 128 colours repeated twice).


$D012 is the VIC-II compatibility raster counter, so it is counting every 5th raster.  If we use $D052 instead, we get the physical raster counter, which means the raster bars get much, much finer:


You really need to see a zoomed detail to see just how fine they are:


With these various other bugs and problems fixed, I was also able to test fullcolour character mode.  This tells the VIC-IV to use 64 bytes for each character instead of 8, and each byte is the colour value for a single pixel of the character.  Just flip bit 1 in $D054 on and it is engaged for all characters.   The display then changed to the following, showing the 1970s carpet stripe pattern from the default contents of RAM, as well as some horizontal gradient I POKEd in, just because I could.


Here is the same thing with the resolution increased from 320x200 to 800x500 using the hardware scale registers ($D042, $D043). It is really 960x600 resolution if you used the border area, which is easy to do.

What is not apparent in the above is that I have also implemented four palette banks, so there is really a 1,024 colour palette available.  I will most likely make it possible to use a different palette bank for background and sprites at the same time.

Finally, here is the hardware and the monitor together, before the FPGA board gets put inside a C64 case and uses a real C64 keyboard for interaction.  There will still be plenty more to do before the FPGA design is complete, including of course sound and sprites.


Motivations and goals for the project

The previous posts have mostly consisted of screen shots of progress on the C65GS, but until now I have not gotten around to actually describing what I intend to achieve, and what bought me to this point.

I owned a Commodore 65 prototype from 1994 until 2010, when I sold it to a collector because, among various reasons, I didn't feel that I had the resources to care for what was rapidly becoming a valuable museum piece.  During the time that I owned the C65, I did make regular use of it, wrote a few simple demos and utilities, and modified some existing C64 software to take advantage of the faster CPU.

I have also owned a C128D through that time, which I also enjoyed using.  However, I always found the C128 architecture to be rather strange and unappealing.  It really does, to me at least, feel like a hacked on C64, rather than the feeling of a new and enhanced machine that the C65 provides.

During the 1990s and 2000s I had also repeatedly thought about making a C64 accelerator using a trick I devised and tested that avoids the synchronisation with VIC-II RAM problem faced by accelerators like the SuperCPU, which either limited compatibility or the speed of acceleration possible.

During my PhD studies I learned to program in VHDL, and started thinking about implementing an accelerator in an FPGA.  However, FPGAs at the time were too slow to provide the degree of acceleration that I considered necessary to make the project worth pursuing.

My goals were to make the most powerful 8-bit computer to date by various measures:

  • Better graphics than the Apple IIgs, Atari 800 or Plus/4: 1920x1200 @ 60Hz, 256 colour palette from 4,096 colours (later from 24-bit colour palette once I create an HDMI output) via my VIC-IV video controller.
  • Better sprites than the C64.  Plan is for the 8 compatibility sprites, plus perhaps 32 256-colour Enhanced Sprites with hardware scaling and practically unlimited size.  Maximum number of displayable sprites will depend on the resolution of the display and the sprites on a given raster line.
  • Faster CPU than the SuperCPU or any available 65C816 CPU (20MHz), and ideally with enough headroom to beat a 20MHz 65C816 running in 16-bit mode.  Currently the 65GS10 runs at 96MHz, but with an effective speed more like 48MHz until I work on some planned IPC improvements, like a 16-bit cache of zero-page to make zero-page indirect instructions take as little as 3 cycles.
  • More RAM than a fully expanded Apple IIgs or C65 (~8.125MB).  It will initially have 128KB of chipram like the C65, plus 16MB of slowram, plus "some" ROM.
  • Comparable or better sound capability than the Apple IIgs.  Multiple SIDs plus digital audio channels.  Design to be finalised.
I also wanted to make the machine more backward compatible than the C65 or any 65C816 based machine.  The main issue here is actually quite easy to fix, consisting of restoring the 6502 read-write-modify behaviour of instructions like INC and ASL.  I would also like to make the machine sufficiently C65 compatible to be able to run a stock C65 ROM.

However, perfect C65 compatibility is not high on my list, given the relative lack of software available for it anyway.  In particular, I have no real intention at this stage of implementing the bit-planar graphics modes, as they were never really a good idea for an 8-bit computer, requiring way to many cycles to edit even a single pixel.  

Instead, all new graphics modes (and Enhanced Sprites) are planned to really be character mode, but allowing 16-bit character sets and making characters 8x8 fully addressible pixels, i.e., requiring 64 bytes per character.  This also saves lots of RAM and CPU cycles when most of the screen is blank or repetitive.  Enhanced Sprites will be mapped in the same way, allowing reuse of graphic characters to help save chipram.

This graphics architecture helps to keep fun in programming the system by making it non-trivial to have a full 1920x1200 image, as there is only about 10% of the chipram required to support such an image, and the slowram is too slow to supply the data, even if using the DMAagic DMA controller I intend to implement.

In short, I hope to preserve most of the fun elements of an 8-bit computer, while providing some 21st century improvements that will make the machine fun to program and use, and who knows, maybe help foster new life in the demo scene.

From a hardware perspective, I am purposely implementing it using an off-the-shelf FPGA development board designed for university students, as the boards are relatively cheap for their performance and have many built-in peripherals, like ethernet, VGA output, USB keyboard input.  This also has the significant benefit that availability will not be based on small production runs by myself or anyone else.  

The design is intended to be able to be installed in a real C64 case with keyboard using either a Keyrah v2, or a custom interface PCB that I have worked on.  The custom interface PCB will likely offer datasette and IEC serial ports, and later may also provide a userport and/or expansion port, depending on some unresolved factors.

Tuesday 28 January 2014

Debugging character display generation

Working on the character generator for the VIC-IV, I have a bug where the left edge of the character display gets two characters of junk, before beginning the real stuff.  Annoyingly, the bug only shows up on the FPGA, and not in simulation.

I don't have fancy gear to probe the internals of the FPGA here at home, nor the knowledge of how to us it, anyway. Also, as I use a Mac, it is a pain to get those tools to work in the first place.

To work around this I have added video generator debug registers that allow you to specify the exact pixel position on screen that you want to know several internal video generator state registers.  These values then get latched at that position of the frame, so that they can be read out some other registers at ones leisure.  

Of course, I want an automated means of setting the position, reading the results (one frame later of course), and then advancing the position along a raster, so that I can capture the entire sequence of events over the entire raster.

So I set about creating this.  When the debug registers are set, the VIC-IV draws a red cross-hair showing the pixel being interrogated, as can be seen below:



I then wrote a C program that talks to the board over my serial monitor interface, setting the registers, waiting a couple of frames (just to be sure), and then reading the results out from the debug registers, and rendering them in a useful way.

This displays information like the log at the end of this post.  It looks mostly like gobbledygook, and there is a bug that causes some wrong columns of output in this example, but the cycles_to_next_card and chargen_active and chargen_active_soon signals tell me that something is indeed going wrong at the left of each raster.  cycles_to_next_card should drop down to 1 eight cycles before chargen_active goes to 1.

In the process I realised that I was missing one other rather important signal.  So back to spending an hour or so rebuilding the FPGA program to get more information.  Then hopefully I will have what I need to fix this bug that has been hanging around for a while and spoiling the otherwise very nice looking display.

display_y=100, display_x=145, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=146, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=147, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=148, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=149, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=150, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=151, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=152, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=153, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=154, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=0
display_y=100, display_x=155, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=156, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=254, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=157, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=253, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=158, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=252, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=159, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=251, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=160, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=250, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=161, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=249, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=162, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=248, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=163, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=255, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=164, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=254, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=165, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=253, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=166, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=252, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=167, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=251, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=168, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=250, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=169, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=1, chargen_active=0, chargen_active_soon=1
display_y=100, display_x=170, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=40, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=171, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=39, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=172, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=38, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=173, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=37, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=174, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=36, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=175, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=35, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=176, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=34, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=177, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=33, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=178, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=32, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=179, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=31, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=180, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=30, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=181, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=29, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=182, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=28, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=183, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=27, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=184, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=26, chargen_active=2, chargen_active_soon=0
display_y=100, display_x=185, x_chargen_start_minus16=198890353, next_card_number=154, cycles_to_next_card=25, chargen_active=2, chargen_active_soon=0

Monday 27 January 2014

CPU now passes complete suite of 6502 official opcode tests

The title says it all really.  The CPU now passes all official opcode stress tests from Marko Makela's suite.  It currently runs at 96MHz, but a wait state on most memory accesses the effective speed is more like 48x original.  A previous quick and dirty test with a FOR loop in BASIC suggests 42x, assuming that the CIA timers are running at the correct rate.

As a result, the C64 ROMs start up correctly as can be seen in the image below.  The display is shifted two characters to the right, due to a bug in the VIC-IV that I am trying to fix at the moment.


Of course the above image is rather boring, being just a 40x25 display like on a normal C64.  So the following two images show the same with the horizontal and vertical scalers set to 1x instead of 5x, yielding a 200x125 character display.  As mentioned in a previous post, the repetition of the C64 character rows is because the virtual row length is still set to 40 columns, so while it reads 200 bytes to display, on the next row, it goes back to the previous row + 40.  In other words, screen lines start at $0400, $0428 etc, but with overlapping spans.


A detail of the top left corner of the screen showing that all the issues with not displaying characters properly and repeating character data for multiple rows have been resolved.

Saturday 25 January 2014

BASIC now works

Again, a quick few screen shots to show how things are moving along.

I spent the day fixing ADC and SBC bugs, as well as a few other miscellaneous CPU bugs.  Also implemented $01 CPU port for memory banking.

Things are now working well enough that BASIC works fairly well, as the following screen shots show.

First, raster bars in BASIC, which show that the CPU is MUCH faster than a real C64, even before I do a pile of optimisation work that I know needs to happen, and will give something like 2x to 3x the current figures.


Okay, so the CPU is clearly much faster than on a C64, but just how fast?  Well, let's do a quick comparison using BASIC:


52/60 of a second to count to 25,000 in BASIC.  Quite nice.  Let's see the same on a C64 and work out a back-of-envelope acceleration factor as things currently stand:


Okay, so we are 42 times faster.  That might be the answer to life the universe and everything, but not for this CPU design.  As mentioned above there are some waitstates that I know I can hide, and also some parallel instruction fetching when running code from chip RAM and other little tricks that will push this to be 2x to 3x the current figure.  Basically I am aiming for 100x C64 speed, and see no real hurdles to achieving it.

Friday 24 January 2014

Some screen shots

This is a hastily prepared post with a few screen shots of the C65GS display just to give you an idea of what I am working on.

The C65GS drives 1920x1200 @ 60Hz natively, with 1248 physical rasters for PAL compatibility.

By default hardware scaling of the character generator is set at 5x to render a normal 320x200 display within the borders ($D042 = $04, $D043 = $04).  Of course, the rasters are still physical, so an INC $D020, JMP *-3 loop produces very fine rasters, as you can see in the following image.

Sorry for the blur, my phone camera is not the best for this sort of thing, and I don't have a better way to capture the display yet.

You can also see that I still have at least one CPU bug that prevents 38911 from being printed correctly.  Looking into that.


The next image shows the detail near the ready prompt to give an idea of just how fine the rasters are.  You can also see that $D020 can be incremented quite fast in relation to the pixel clock, each colour band being only about 3 characters wide.  The pixel to CPU clock ratio is currently 2:1, instead of 8:1 on the C64.  In fact, it is possible to make them even narrower in future when I improve the IPC of the CPU (removing dead cycles from INC etc when not writing to locations that really matter, like $D019, and using the 64bit wide chipram bus to fetch entire instructions in a single cycle.


The next two images show the character generator set to physical resolution ($D042 = $00, $D043 = $00). There is a bug that is apparent in these images, where the character generator draws the characters at physical resolution, but doesn't fetch the character number from screen ram properly, resulting the same character being repeated three times each.  This is fairly high on my list of things to fix.

Also in this mode you can see that the character generator doesn't naively increment the pointer to screen memory when moving to the next line.  Instead there is a virtual screen width register that decides how much to increment each line.  In this example it is still set to 40 for 40-column mode, hence the repeating.

In the lower part of the screen you can see some odd things, like underlined characters.  These are characters with C65-compatibility VIC-III extended attributes of underline, reverse, bold and blink.


Again, a zoomed in view of the cornerwhere you get an idea of how teeny tiny these characters are.  Even at this physical resolution the rasters aren't too wide.


Anyway, that's it for this sneak peak.  I'll explain more about the machine in a future post.