GPU Optimization Plan for NeuroCHIMERA

Current Issues (Only ~10% GPU Utilization)

Framebuffer Recreation (Line 804-809 in engine.py)
- Recreating framebuffer every iteration
- Massive overhead
- Fix: Pre-allocate framebuffers, use ping-pong efficiently
CPU-GPU Data Transfers (Lines 852, 860, 868)
- Reading from GPU to CPU for convergence check
- Downloading entire frame unnecessarily
- Fix: Keep all operations on GPU, use GPU-based convergence
Single Render Pass
- Only using one render target at a time
- Not leveraging multiple render targets simultaneously
- Fix: Use multiple render targets in parallel
No Compute Shaders
- Using fragment shaders only
- Not using compute shaders for better parallelism
- Fix: Implement compute shader version
Sequential Operations
- Evolution, learning, metrics run sequentially
- Fix: Pipeline operations, use async execution
Texture Memory Management
- Not using texture arrays efficiently
- Fix: Use texture arrays, optimize memory layout

High Priority:
- Eliminate framebuffer recreation
- Remove CPU-GPU transfers for convergence
- Pre-allocate all resources
Medium Priority:
- Implement compute shader version
- Parallel render targets
- Batch operations
Low Priority:
- Memory optimization
- Texture compression
- Advanced pipelining