Peg Solitaire RL

Output

Training Data + Round 1

I ran DFS on all of the puzzles to extract their solutions:

(kits) z5362216@k201:~/minesweeper $   python3 train_peg_katana.py \
>       --boards english european 6x6 triangle5 triangle6 star \
>       --diamond41-solutions diamond41_solutions.txt \
>       --epochs 400 \
>       --batch-size 128 \
>       --lr 0.001 \
>       --channels 128 \
>       --res-blocks 8 \
>       --attention-layers 3 \
>       --drop-path 0.2 \
>       --label-smoothing 0.1 \
>       --mixup-alpha 0.2 \
>       --ema-decay 0.9995 \
>       --temperature 0.5 \
>       --eval-games 200 \
>       --solutions-per-start 5 \
>       --out models

============================================================
Universal Peg Solitaire Training
============================================================
Started: 2026-01-15 13:27:09

GPU: NVIDIA H200

============================================================
Step 1: Collecting Training Data
============================================================
Boards: ['english', 'european', '6x6', 'triangle5', 'triangle6', 'star']
Solutions per start: 5

Processing english...
Solving english from (3, 3) (32 pegs)...
  Solved in 0.37s, 31 moves
Solving english from (0, 3) (32 pegs)...
  Solved in 0.38s, 31 moves
Solving english from (2, 0) (32 pegs)...
  Solved in 0.15s, 31 moves
Solving english from (1, 3) (32 pegs)...
  Solved in 0.47s, 31 moves
  Total: 2480 training samples from 4 starting position(s)

Processing european...
Solving european from (1, 3) (36 pegs)...
  Solved in 66.04s, 35 moves
  Total: 700 training samples from 1 starting position(s)

Processing 6x6...
Solving 6x6 from (2, 2) (35 pegs)...
  Solved in 4.91s, 34 moves
Solving 6x6 from (0, 0) (35 pegs)...
  Solved in 10.63s, 34 moves
Solving 6x6 from (0, 2) (35 pegs)...
  Solved in 0.02s, 34 moves
Solving 6x6 from (1, 1) (35 pegs)...
  Solved in 54.10s, 34 moves
  Total: 2720 training samples from 4 starting position(s)

Processing triangle5...
Solving triangle5 from (0,) (14 pegs)...
  Solved in 0.00s, 13 moves
Solving triangle5 from (3,) (14 pegs)...
  Solved in 0.00s, 13 moves
Solving triangle5 from (10,) (14 pegs)...
  Solved in 0.00s, 13 moves
Solving triangle5 from (12,) (14 pegs)...
  Solved in 0.00s, 13 moves
  Total: 260 training samples from 4 starting position(s)

Processing triangle6...
Solving triangle6 from (0,) (20 pegs)...
  Solved in 0.00s, 19 moves
Solving triangle6 from (3,) (20 pegs)...
  Solved in 0.02s, 19 moves
Solving triangle6 from (6,) (20 pegs)...
  Solved in 0.00s, 19 moves
Solving triangle6 from (15,) (20 pegs)...
  Solved in 0.02s, 19 moves
  Total: 380 training samples from 4 starting position(s)

Processing star...
Solving star from (0,) (9 pegs)...
  Solved in 0.00s, 8 moves
Solving star from (5,) (9 pegs)...
  Solved in 0.00s, 8 moves
  Total: 80 training samples from 2 starting position(s)

Loading diamond41 solutions from diamond41_solutions.txt...
Loaded 248 solutions for diamond41
Added 1560 diamond41 samples
Saved to models/training_data.json

Total samples: 8180
  6x6: 2720
  diamond41: 1560
  english: 2480
  european: 700
  star: 80
  triangle5: 260
  triangle6: 380

============================================================
Step 2: Training
============================================================
Parameters: 2,653,620

Training for 400 epochs...
Epoch   1/400 | Loss: 4.3549 | Acc: 18.5% | LR: 0.000994
Epoch  10/400 | Loss: 3.1665 | Acc: 35.0% | LR: 0.000501
Epoch  20/400 | Loss: 2.4748 | Acc: 57.4% | LR: 0.001000
Epoch  30/400 | Loss: 2.7241 | Acc: 49.0% | LR: 0.000854
Epoch  40/400 | Loss: 2.6360 | Acc: 52.6% | LR: 0.000501
Epoch  50/400 | Loss: 2.5210 | Acc: 56.0% | LR: 0.000147
Epoch  60/400 | Loss: 2.4930 | Acc: 57.6% | LR: 0.001000
Epoch  70/400 | Loss: 2.3036 | Acc: 61.8% | LR: 0.000962
Epoch  80/400 | Loss: 2.6344 | Acc: 53.1% | LR: 0.000854
Epoch  90/400 | Loss: 2.5161 | Acc: 55.6% | LR: 0.000692
Epoch 100/400 | Loss: 2.1946 | Acc: 64.6% | LR: 0.000501
Epoch 110/400 | Loss: 2.1105 | Acc: 65.4% | LR: 0.000309
Epoch 120/400 | Loss: 2.4915 | Acc: 55.9% | LR: 0.000147
Epoch 130/400 | Loss: 2.2627 | Acc: 62.9% | LR: 0.000039
Epoch 140/400 | Loss: 2.2844 | Acc: 60.1% | LR: 0.001000
Epoch 150/400 | Loss: 2.4334 | Acc: 56.1% | LR: 0.000990
Epoch 160/400 | Loss: 2.4256 | Acc: 58.2% | LR: 0.000962
Epoch 170/400 | Loss: 2.2687 | Acc: 62.9% | LR: 0.000916
Epoch 180/400 | Loss: 2.3684 | Acc: 60.5% | LR: 0.000854
Epoch 190/400 | Loss: 2.2158 | Acc: 61.9% | LR: 0.000778
Epoch 200/400 | Loss: 2.4369 | Acc: 58.2% | LR: 0.000692
Epoch 210/400 | Loss: 2.2957 | Acc: 58.6% | LR: 0.000598
Epoch 220/400 | Loss: 2.1747 | Acc: 65.2% | LR: 0.000501
Epoch 230/400 | Loss: 2.1394 | Acc: 66.1% | LR: 0.000403
Epoch 240/400 | Loss: 2.2170 | Acc: 62.1% | LR: 0.000309
Epoch 250/400 | Loss: 2.5308 | Acc: 54.3% | LR: 0.000223
Epoch 260/400 | Loss: 1.9152 | Acc: 69.7% | LR: 0.000147
Epoch 270/400 | Loss: 2.3018 | Acc: 62.2% | LR: 0.000085
Epoch 280/400 | Loss: 2.1810 | Acc: 62.9% | LR: 0.000039
Epoch 290/400 | Loss: 2.3095 | Acc: 60.3% | LR: 0.000011
Epoch 300/400 | Loss: 2.2366 | Acc: 61.0% | LR: 0.001000
Epoch 310/400 | Loss: 2.5112 | Acc: 55.8% | LR: 0.000998
Epoch 320/400 | Loss: 2.2704 | Acc: 62.5% | LR: 0.000990
Epoch 330/400 | Loss: 2.3118 | Acc: 60.1% | LR: 0.000978
Epoch 340/400 | Loss: 2.2709 | Acc: 61.0% | LR: 0.000962
Epoch 350/400 | Loss: 2.3365 | Acc: 61.4% | LR: 0.000941
Epoch 360/400 | Loss: 2.2527 | Acc: 62.7% | LR: 0.000916
Epoch 370/400 | Loss: 2.2394 | Acc: 57.2% | LR: 0.000887
Epoch 380/400 | Loss: 2.3419 | Acc: 59.7% | LR: 0.000854
Epoch 390/400 | Loss: 2.2350 | Acc: 63.5% | LR: 0.000817
Epoch 400/400 | Loss: 2.3246 | Acc: 55.9% | LR: 0.000778

Training completed in 7.0 minutes
Best accuracy: 74.6%

============================================================
Step 3: Evaluation
============================================================

Temperature = 0.5 (200 games each):
  6x6         :  18.5%
  english     :   0.0%
  european    :  14.5%
  star        :  11.5%
  triangle5   :   0.0%
  triangle6   :   0.0%
  Average     :   7.4%

Temperature = 0 (greedy):
  6x6         :   0.0%
  english     :   0.0%
  european    :   0.0%
  star        :  17.5%
  triangle5   :   0.0%
  triangle6   :   0.0%
  Average     :   2.9%

============================================================
Step 4: Saving
============================================================
Saved PyTorch model to models/peg_universal.pth
Saved ONNX model to models/peg_universal.onnx (10.19 MB)

============================================================
Done!
============================================================

this first run-through was underwhelming though

Round 2

so we ran it all again, but with the data flag on the python file: