Flash MLA curated references

Flash MLA Offical Github Repo: FlashMLA – deepseek-ai – Github

DeepSeek Official Anouncement of Flash MLA on X:

// Detect dark theme
var iframe = document.getElementById(‘tweet-1893836827574030466-285’);
if (document.body.className.includes(…


This content originally appeared on DEV Community and was authored by Andy

Flash MLA Offical Github Repo: FlashMLA - deepseek-ai - Github

DeepSeek Official Anouncement of Flash MLA on X:

// Detect dark theme var iframe = document.getElementById('tweet-1893836827574030466-285'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1893836827574030466&theme=dark" }

Hacker News Discussion: DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs | Hacker News

Deepseek Open Source week series

Day 1: Flash MLA

🚀 Day 1 of #OpenSourceWeek: FlashMLA

Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.

✅ BF16 support
✅ Paged KV cache (block size 64)
⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800

🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA

// Detect dark theme var iframe = document.getElementById('tweet-1893836827574030466-83'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1893836827574030466&theme=dark" }

Day 2: DeepEP

🚀 Day 2 of #OpenSourceWeek: DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping

🔗 GitHub: https://github.com/deepseek-ai/DeepEP

// Detect dark theme var iframe = document.getElementById('tweet-1894211757604049133-35'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1894211757604049133&theme=dark" }

Day 3: DeepGEMM

🚀 Day 3 of #OpenSourceWeek: DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts

🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM

// Detect dark theme var iframe = document.getElementById('tweet-1894553164235640933-961'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1894553164235640933&theme=dark" }

Day 4: Optimized Parallelism Strategies

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies

✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🔗 https://github.com/deepseek-ai/DualPipe

✅ EPLB - an expert-parallel load balancer for V3/R1.
🔗 https://github.com/deepseek-ai/eplb

📊 Analyze computation-communication overlap in V3/R1.
🔗 https://github.com/deepseek-ai/profile-data

// Detect dark theme var iframe = document.getElementById('tweet-1894931931554558199-13'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1894931931554558199&theme=dark" }


This content originally appeared on DEV Community and was authored by Andy


Print Share Comment Cite Upload Translate Updates
APA

Andy | Sciencx (2025-02-27T14:55:58+00:00) Flash MLA curated references. Retrieved from https://www.scien.cx/2025/02/27/flash-mla-curated-references/

MLA
" » Flash MLA curated references." Andy | Sciencx - Thursday February 27, 2025, https://www.scien.cx/2025/02/27/flash-mla-curated-references/
HARVARD
Andy | Sciencx Thursday February 27, 2025 » Flash MLA curated references., viewed ,<https://www.scien.cx/2025/02/27/flash-mla-curated-references/>
VANCOUVER
Andy | Sciencx - » Flash MLA curated references. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/27/flash-mla-curated-references/
CHICAGO
" » Flash MLA curated references." Andy | Sciencx - Accessed . https://www.scien.cx/2025/02/27/flash-mla-curated-references/
IEEE
" » Flash MLA curated references." Andy | Sciencx [Online]. Available: https://www.scien.cx/2025/02/27/flash-mla-curated-references/. [Accessed: ]
rf:citation
» Flash MLA curated references | Andy | Sciencx | https://www.scien.cx/2025/02/27/flash-mla-curated-references/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.