miércoles, 24 de junio de 2020

ST-NICCC 33%

This is a small update, I've realized that I've some bugs in my libraries when I tried to compile it. :(

The software render version (1) it's the same code than the first version but now I'm using gcc compiler instead of vbcc.

For now, only options 1 & 2 are implemented.

Music updated with the original tune.

I've realized that if you draw a CLUT sprite without writing the CLUT (using uninitialized colors) you will get some ugly vertical lines.

Download: st-niccc 33% (skunkboard only)
Download: st-niccc (first version)


jueves, 7 de mayo de 2020

My dream Jaguar

After some time developing for the Jaguar here are some ideas that I wish that Atari implemented into the Jaguar.

First of all, all the things about bitness it’s complete bullshit. You don’t have a better device if you have some 64bits processor, just have a look at Intellivision (Mattel 1979), it has a 16bits CPU so the games look just like a Sega Megadrive(Genesis) or a SNES, isn’t it?.

With today's technology you could build an 8bits console running at 1GHz, and a GPU with thousands of cores, each one will draw a single pixel. Everything using 8bits ALU, and it will blow away any other 8, 16, or 32bits console.

In the end the most important thing it’s the memory bandwidth, not the bits. Note, for 3D games also you need computational power because you’re going to do a lot of multiplications.

68000

It’s too slow to make something interesting also the lack of cache makes it starve for free cycles of the bus.
Ideally, it should be on his own bus with something like 256KB of RAM, and maybe only can access the other custom chips but not the main RAM. A better option could be a 68020 or a 68030.

GPU/DSP

I would change the instruction set encoding to allow a few more opcodes, all single operand instructions can use the same opcode, and then use the reg1 field to specify the actual instruction. Also, it’s a must to allow bigger jumps. And of course, include a cache (the real one) to run the code from the main RAM without the current headache.

Some new opcodes that I find useful.

- split: Takes a 32 bits register and write the high word into a second register and the low word into the current one. With and without sign extension.

- join: The inverse of the split opcode, of course.

- pack/unpack with RGB pixels

- load/store with pre-decrement and post-increment

- loadp/storep should work with registers pairs, instead of using a different register for the high word.

- 32bits bus on the DSP, well actually it has a 32bit bus but it’s not fully connected, maybe to make the MMU more simple?.

- Include a real sound chip.

Object Processor

Having to rebuild the Object Processor list on each frame it’s a waste of time, anyway I think that there are more important things to fix.

- Bigger CLUT, 256 color palette it’s not enough. At least 1024 colors, this is 4 8bits sprites with different palettes.

- Object to change CLUT

- It could be interesting to include an 8bit direct RGB mode in the color depth.

- More transparency modes and they must also work in RGB.

- Include three-color multipliers, one for each color channel, to make easy fade effects.

- Pixel precise collision detection. 

- Remove all link address in all object except at branch object.

- The Image Width field must be a signed value to allow vertical mirrored sprites.

- GPU interrupt Object must have y coordinate and height field, and work without bugs…

- Rearrange the bitmap object and scaled bitmap object to have the same size. If you remove the link address both objects fit into 16bytes.

- Improve the write ratio, it must write at 4 pixels per cycle.

- Cache, it will be flushed on each VBL interrupt.

Blitter

I don’t know why they thought that the bitter was fast enough, if you try to make any interesting effect like scaling, rotation or texture map you must work in pixel mode and it kills the performance. The blitter must be as fast as the Object Processor, it’s sad but you can’t make a game like After Burner (1987) into the Jaguar without a lot of headaches.

- Allow pixel expansion, this allows to use 1, 2, 4, or 8 bits texture and write the destination in a 16bits bitmap.

- Optimize single color/Gouraud horizontal lines. If you are going to draw a horizontal line, always write the pixels in phrases.

- RGB lighting

- Command queue, why do you have to wait for the blitter to be idle before you set any register? This is a waste of time.

- Reorganize the registers, why the integer and fractional coordinates are in different registers? What they were thinking?

- Cache, of course

RAM

Dual-port RAM could be nice but it’s expensive maybe 4MB should be better.

As an extra, I think that it would be great to include a second GPU to drive the blitter, something like a RPU (Rasterizer Process Unit) but it only runs code from his internal RAM. You’ll write a polygon list (or sprite with scaling/rotation info) and this RPU will read it and send the corresponding blitter command while you are processing the next frame with the CPU/GPU.


And of cause some more Mhz, a bus at 13Mhz it’s a bit slow.

viernes, 7 de febrero de 2020

Disassembling Supercross 3D

I've been playing a bit with my disassembler, mostly fixing bug... And I've been using Supercross 3D for testing. Looking at the source code I can understand why it runs so slow. Ok the Jaguar it's very slow at texture mapping but the code could be better.

For now I've seen the following things.
  • The code it's about 117KB, 120,016 bytes to be precise and it's stored at the end of the cartridge.
  • The game it's locked to a minimum of 4 vbls per frame for PAL systems and 5 vbls for NTSC ones, this means that it will run at maximum speed of 12,5fps and 12fps respectively.
  • There are one block of DSP code, I suppose that it's the sound engine.
  • There are eleven blocks of code for the GPU (maybe one or two more, I haven't finished the disassembly)
  • One of the GPU blocks it's used just to set the Object Processor List Pointer, this one never it's loaded into the GPU internal RAM, it runs from ROM.
  • There are about 20KB (21, 184bytes) of dead code or unused data, they are spread around the code and most of them end with a $4E75 (rts opcode) but they are never referenced or called.
  • Short branches are almost never used.
  • It waits for the bitter to be idle in several places, but IMO if you are using the 68000 you don't need to wait because it has lower priority (68000 < blitter), so if the blitter it's busy the 68000 will be stoped. The only advantage of not having a cache.
  • There are some link/unlink opcodes, also some routines push values into the stack, jump somewhere, load the values from the stack to the registers and jump again to do the actual work. I think that some parts are written in C and others in assembler, and this kind of routines are used to jump from C to ASM.
  • There are some parts of the game that depends if the system it's PAL or NTSC, but it reads the hardware register each time that it needs to instead of using a flag.
  • The game runs in 8bits mode with colors in CrY format (not 100% sure).

And now some codes snippets. All of them are actual code (it's full of them).
move.w (a0),d0
addq.w #1,d0
move.w d0,a3
move.w a3,-(sp)
jsr l01e3e0e
At least it uses quick add, I think that this is used to increment the lap count and print it.

move.w #0,l01b72d8
move.w #0,l01b72da
move.w #0,l01b72dc
move.w #0,l01b72de
move.w #0,l01b72e0
...

What about using a data register and post-increment addressing?

move.l a1,-(sp)
move.l #l01ece80,d3
move.l d3,a1
jsr (a1)
Because jsr l01ece80 it's too easy.


By the way, I've found two bugs in my assembler when I was looking at the disassembled code to write this post.