Efficient linear recurrences on device Post date January 14, 2025 Post author By Gating Post categories In ai-research, custom-kernel, deep-learning, efficient-training, hawk-model, rg-lru-layer, scalable-ai, tpu-optimization
Efficient Training: Scaling Griffin Models for Large-Scale AI on TPUs Post date January 14, 2025 Post author By Gating Post categories In ai-model-scaling, ai-research, deep-learning, efficient-training, griffin-model, model-parallelism, scalable-ai, tpu-optimization