Writings, Papers and Blogs on Text Models

AI Models Are Learning to Prioritize Their Thoughts—And It’s Wildly Effective

What If AI Could Skip the Boring Parts? Google Researchers Just Made It Happen

This Clever AI Hack Could Cut Processing Costs in Half

New AI Method Lets Models Decide What to Think About

Google Researchers Develop New AI Tech That Doesn’t Waste Brainpower on Useless Words

PagedAttention and vLLM Explained: What Are They?

PagedAttention and vLLM Explained: What Are They?

General Model Serving Systems and Memory Optimizations Explained

Applying the Virtual Memory and Paging Technique: A Discussion

Evaluating vLLM’s Design Choices With Ablation Experiments

How We Implemented a Chatbot Into Our LLM

How Effective is vLLM When a Prefix Is Thrown Into the Mix?

How Good Is PagedAttention at Memory Sharing?

How Mixtral 8x7B Sets New Standards in Open-Source AI with Innovative Design

Routing Analysis Reveals Expert Selection Patterns in Mixtral

How Instruction Fine-Tuning Elevates Mixtral – Instruct Above Competitors

Mixtral’s Multilingual Benchmarks, Long Range Performance, and Bias Benchmarks

Mixtral Outperforms Llama and GPT-3.5 Across Multiple Benchmarks

Understanding the Mixture of Experts Layer in Mixtral

Mixtral—a Multilingual Language Model Trained with a Context Size of 32k Tokens

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Conclusion, References

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Additional Related Work

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Microbenchmarks

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Comparisons

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Overall Results

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Evaluation and Methodology

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Implementation

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Latency-Focused Adjustments

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Accurate Threshold Tuning

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Preparing Models

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Design

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Challenges

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Early-Exit Models

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Background and Platforms

Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Abstract and Introduction

Comparison with SKD and ARD and Implementations of Stronger Attacker Algorithms

Evaluating NEO-KD Against Single-Exit Defense Methods in Multi-Exit Networks

Examining the Adversarial Test Accuracy of Later Exits in NEO-KD Networks

The Impact of Hyperparameters on Adversarial Training Performance

Clean Test Accuracy and Adversarial Training via Average Attack

Fine-Tuning NEO-KD for Robust Multi-Exit Networks

How NEO-KD Reduces Adversarial Transferability and Improves Accuracy

How Ensemble Strategies Impact Adversarial Robustness in Multi-Exit Networks

How NEO-KD Saves Up to 81% of Computing Power While Maximizing Adversarial Accuracy

Comparative Analysis of Prompt Optimization on BBH Tasks

Prompt Optimization Curves on BBH Tasks

Large Language Models as Optimizers: Meta-Prompt for Prompt Optimization

Large Language Models as Optimizers: Meta-Prompt for Math Optimization

Common Pitfalls in LLM Optimization

Optimizing Scoring Models: Effective Prompting Formats

Optimizing Prompts with LLMs: Key Findings and Future Directions

Everything We Know About Prompt Optimization Today

How Overfitting Affects Prompt Optimization

Key Takeaways from Our Ablation Studies on LLMs

Better Instructions, Better Results: A Look at Prompt Optimization

How OPRO Elevates LLM Accuracy in GSM8K and BBH Benchmarks

How Meta-Prompt Design Boosts LLM Performance

How OPRO Improves Task Accuracy in Prompt Optimization

LLMs vs. Heuristics: Tackling the Traveling Salesman Problem (TSP)

Case Studies in Mathematical Optimization Using LLMs