I've spent some days looking at OpenLara source code to better understand how it works.
For now, it works like this, all is executed into the 68000. Although all boxes have the same size, they take different times of processing.
Now, I'm building a command list and using it to do all the transformations and rendering. Although it has to build this list for each frame it wouldn't impact the performance.
Running these steps in parallel means the frame rate will be as fast as the XForm + Render part, instead of Logic + XForm + Render.
I hope that all the 3D Transformations and rasterizer fit into GPU RAM, so no need to be assisted from the 68000 to do some code swapping.