A language model built from scratch — no pretrained weights, no transformers.
Architecture: Time Mix + Token Shift + GroupNorm + Channel Mix + Squared ReLU
TERA V2 by Vedaco • ~929K parameters • Trained from scratch