After some time developing for the Jaguar here are some ideas that I wish that Atari implemented into the Jaguar.
First of all, all the things about bitness it’s complete bullshit. You don’t have a better device if you have some 64bits processor, just have a look at Intellivision (Mattel 1979), it has a 16bits CPU so the games look just like a Sega Megadrive(Genesis) or a SNES, isn’t it?.
With today's technology you could build an 8bits console running at 1GHz, and a GPU with thousands of cores, each one will draw a single pixel. Everything using 8bits ALU, and it will blow away any other 8, 16, or 32bits console.
In the end the most important thing it’s the memory bandwidth, not the bits. Note, for 3D games also you need computational power because you’re going to do a lot of multiplications.
68000
It’s too slow to make something interesting also the lack of cache makes it starve for free cycles of the bus.
Ideally, it should be on his own bus with something like 256KB of RAM, and maybe only can access the other custom chips but not the main RAM. A better option could be a 68020 or a 68030.
GPU/DSP
I would change the instruction set encoding to allow a few more opcodes, all single operand instructions can use the same opcode, and then use the reg1 field to specify the actual instruction. Also, it’s a must to allow bigger jumps. And of course, include a cache (the real one) to run the code from the main RAM without the current headache.
Some new opcodes that I find useful.
- split: Takes a 32 bits register and write the high word into a second register and the low word into the current one. With and without sign extension.
- join: The inverse of the split opcode, of course.
- pack/unpack with RGB pixels
- load/store with pre-decrement and post-increment
- loadp/storep should work with registers pairs, instead of using a different register for the high word.
- 32bits bus on the DSP, well actually it has a 32bit bus but it’s not fully connected, maybe to make the MMU more simple?.
- Include a real sound chip.
Object Processor
Having to rebuild the Object Processor list on each frame it’s a waste of time, anyway I think that there are more important things to fix.
- Bigger CLUT, 256 color palette it’s not enough. At least 1024 colors, this is 4 8bits sprites with different palettes.
- Object to change CLUT
- It could be interesting to include an 8bit direct RGB mode in the color depth.
- More transparency modes and they must also work in RGB.
- Include three-color multipliers, one for each color channel, to make easy fade effects.
- Pixel precise collision detection.
- Remove all link address in all object except at branch object.
- The Image Width field must be a signed value to allow vertical mirrored sprites.
- GPU interrupt Object must have y coordinate and height field, and work without bugs…
- Rearrange the bitmap object and scaled bitmap object to have the same size. If you remove the link address both objects fit into 16bytes.
- Improve the write ratio, it must write at 4 pixels per cycle.
- Cache, it will be flushed on each VBL interrupt.
Blitter
I don’t know why they thought that the bitter was fast enough, if you try to make any interesting effect like scaling, rotation or texture map you must work in pixel mode and it kills the performance. The blitter must be as fast as the Object Processor, it’s sad but you can’t make a game like After Burner (1987) into the Jaguar without a lot of headaches.
- Allow pixel expansion, this allows to use 1, 2, 4, or 8 bits texture and write the destination in a 16bits bitmap.
- Optimize single color/Gouraud horizontal lines. If you are going to draw a horizontal line, always write the pixels in phrases.
- RGB lighting
- Command queue, why do you have to wait for the blitter to be idle before you set any register? This is a waste of time.
- Reorganize the registers, why the integer and fractional coordinates are in different registers? What they were thinking?
- Cache, of course
RAM
Dual-port RAM could be nice but it’s expensive maybe 4MB should be better.
As an extra, I think that it would be great to include a second GPU to drive the blitter, something like a RPU (Rasterizer Process Unit) but it only runs code from his internal RAM. You’ll write a polygon list (or sprite with scaling/rotation info) and this RPU will read it and send the corresponding blitter command while you are processing the next frame with the CPU/GPU.
And of cause some more Mhz, a bus at 13Mhz it’s a bit slow.