Strategic LLM Training: Multi-Token Prediction’s Data Efficiency in Mathematical Reasoning Post date July 23, 2025 Post author By Cosmological thinking: time, space and universal causation Post categories In ai-evaluation, ai-optimization, llm-performance, llm-training, multi-token-llm, multi-token-prediction, natural-language-math, transformer-models
Exploring Alternative Architectures for Multi-Token LLM Prediction Post date July 20, 2025 Post author By Cosmological thinking: time, space and universal causation Post categories In computational-viability, large-scale-training, linear-heads, llm-architecture, multi-token-prediction, neural-network-design, replicated-unembeddings, transformer-models
Unlocking Generative Power: Multi-Token Prediction for Next-Gen LLMs Post date July 19, 2025 Post author By Cosmological thinking: time, space and universal causation Post categories In code-generation, generative-ai, inference-speed, llm-efficiency, multi-token-prediction, next-token-prediction, reasoning-tasks, transformer-models
How Idefics2 Answers the Unasked Questions in Vision-Language Modeling Post date July 15, 2025 Post author By Pierluigi Vinciguerra Post categories In ai-model-designs, efficient-ai, idefics-2, ml-benchmarks, multimodal-ai, open-source-ai, transformer-models, vision-language-models
GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
LogSumExp Function Properties: Lemmas for Energy Functions Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Transformer Performance: Hopfield Theory & Cross-Entropy Loss Data Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
New Regularization-Free Energy Function for Transformer Analysis Post date June 22, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Validating Theoretical Loss Bound: Vanilla Transformer Experiments Post date June 22, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
The Impact of Data Size on Transformer Training: Overfitting & Loss Dynamics Post date June 21, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss Post date June 21, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Related Work: Scaling Laws and Hopfield Models in LLM Research Post date June 18, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Theoretical Framework: Transformer Memorization & Performance Dynamics Post date June 18, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Experiments Post date April 8, 2025 Post author By Machine Ethics Post categories In computational-efficiency, computer-vision-(cv), early-bird-ticket-hypothesis, language-models, model-optimization, natural-language-processing, transformer-models, vision-transformers
How We Found Early-Bird Subnetworks in Transformers Without Retraining Everything Post date April 8, 2025 Post author By Machine Ethics Post categories In computational-efficiency, computer-vision-(cv), early-bird-ticket-hypothesis, language-models, model-optimization, natural-language-processing, transformer-models, vision-transformers
Transformer Training Optimization via Early-Bird Ticket Analysis Post date April 8, 2025 Post author By Machine Ethics Post categories In computational-efficiency, computer-vision-(cv), early-bird-ticket-hypothesis, language-models, model-optimization, natural-language-processing, transformer-models, vision-transformers