Critical Review: Experiment 5 (Conservation Laws)
Critical Review: Experiment 5 (Conservation Laws)
Issues Identified and Tested
1. Output Scaling Test
Hypothesis: Large output ranges (-121 to +128, std ≈ 35) might be causing learning difficulties.
Test: Applied StandardScaler to outputs before training.
Result: R² = 0.2799 (identical to without scaling)
Conclusion: Output scaling does NOT help. The problem is not about output scale.
2. Brightness Hyperparameter Test
Hypothesis: brightness=0.001 might not be optimal for this problem.
Test: Tested brightness values [0.0001, 0.001, 0.01, 0.1, 1.0]
Results:
- brightness=0.0001: R² = 0.0255 ❌
- brightness=0.001: R² = 0.2799 ✅ (best)
- brightness=0.01: R² = -0.3846 ❌
- brightness=0.1: R² = -1.7143 ❌
- brightness=1.0: R² = -0.0484 ❌
Conclusion: brightness=0.001 is optimal. The hyperparameter is well-tuned.
3. Baseline Comparison
Test: Compared with polynomial baseline (degree 4)
Results:
- Darwinian (Polynomial): R² = 0.9949 ✅
- Chaos Model: R² = 0.2799 ❌
Conclusion: The problem IS learnable (baseline succeeds), but the chaos model fails. This is a genuine model limitation, not a problem design flaw.
4. Data Generation Validation
Test: Verified physics correctness
Results:
- Momentum conservation error: mean = 1.67e-14 (perfect)
- Energy conservation error: mean = 6.29e-13 (perfect)
Conclusion: Data generation is physically correct. No bugs in simulator.
5. Relationship Complexity Analysis
Formula:
Characteristics:
- Non-linear (division by sum)
- Involves interactions between all inputs
- Output range depends on input combinations
Question: Is this too complex for the chaos model?
Answer: The polynomial baseline (degree 4) can learn it, so complexity alone is not the issue.
Root Cause Analysis
Why Does the Chaos Model Fail?
-
Feature Saturation: Features are in [0, 0.5] range (tanh output), with mean ≈ 0.02, std ≈ 0.03. This is very small and may not capture enough information.
-
Limited Expressiveness: The FFT transformation may not naturally encode division operations that are central to the collision formula.
-
Ridge Regression Limitation: With 4096 features but only 3000 samples, Ridge regression may be underfitting or the features may not be informative enough.
-
Comparison with Successful Experiments:
- Experiment 1 (Ballistics): R² = 0.9999 ✅
- Experiment 2 (Relativity): R² = 1.0000 ✅
- Experiment 5 (Conservation): R² = 0.2799 ❌
Key Difference: Experiments 1-2 involve multiplicative relationships (v², sin, sqrt), while Experiment 5 involves division. The chaos model may be better at multiplicative than divisive relationships.
Validated Conclusions
What We Know for Certain:
- ✅ Data is correct: Physics simulator works perfectly (conservation verified)
- ✅ Hyperparameters are optimal: brightness=0.001 is best
- ✅ Output scaling doesn't help: Not a scaling issue
- ✅ Problem is learnable: Baseline achieves R² = 0.99
- ✅ Chaos model genuinely fails: This is a real limitation, not a bug
What This Means:
The low R² = 0.28 is a genuine model limitation, not an experimental design flaw. The chaos model simply cannot learn division-based relationships as well as it learns multiplicative ones.
Recommendations
For Documentation:
- Acknowledge the limitation: The chaos model struggles with division operations
- Note the baseline success: The problem is learnable, just not by this architecture
- Compare with other experiments: Division vs. multiplication may be a key factor
For Future Work:
- Test with different architectures: Maybe a different chaos transformation would work
- Explicit feature engineering: Could add ratio features (m1/m2, etc.) to help
- Hybrid approaches: Combine chaos with explicit division features
Final Verdict
Status: ✅ EXPERIMENT IS VALID
The low performance (R² = 0.28) is a genuine finding about model limitations, not an experimental artifact. The experiment correctly identifies that:
- The problem is learnable (baseline succeeds)
- The chaos model fails (genuine limitation)
- This may be due to difficulty with division operations
The experiment design is sound. The results are honest and meaningful.