This content originally appeared on DEV Community and was authored by Andy

Flash MLA Offical Github Repo: FlashMLA - deepseek-ai - Github

DeepSeek Official Anouncement of Flash MLA on X:

// Detect dark theme var iframe = document.getElementById('tweet-1893836827574030466-285'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1893836827574030466&theme=dark" }

Hacker News Discussion: DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs | Hacker News

Deepseek Open Source week series

Day 1: Flash MLA

🚀 Day 1 of #OpenSourceWeek: FlashMLA

Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.

✅ BF16 support
✅ Paged KV cache (block size 64)
⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800

🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA

// Detect dark theme var iframe = document.getElementById('tweet-1893836827574030466-83'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1893836827574030466&theme=dark" }

Day 2: DeepEP

🚀 Day 2 of #OpenSourceWeek: DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping

🔗 GitHub: https://github.com/deepseek-ai/DeepEP

// Detect dark theme var iframe = document.getElementById('tweet-1894211757604049133-35'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1894211757604049133&theme=dark" }

Day 3: DeepGEMM

🚀 Day 3 of #OpenSourceWeek: DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts

🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM

// Detect dark theme var iframe = document.getElementById('tweet-1894553164235640933-961'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1894553164235640933&theme=dark" }

Day 4: Optimized Parallelism Strategies

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies

✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🔗 https://github.com/deepseek-ai/DualPipe

✅ EPLB - an expert-parallel load balancer for V3/R1.
🔗 https://github.com/deepseek-ai/eplb

📊 Analyze computation-communication overlap in V3/R1.
🔗 https://github.com/deepseek-ai/profile-data

// Detect dark theme var iframe = document.getElementById('tweet-1894931931554558199-13'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1894931931554558199&theme=dark" }

This content originally appeared on DEV Community and was authored by Andy

Print Share Comment Cite Upload Translate Updates

APA

Andy | Sciencx (2025-02-27T14:55:58+00:00) Flash MLA curated references. Retrieved from https://www.scien.cx/2025/02/27/flash-mla-curated-references/

MLA

" » Flash MLA curated references." Andy | Sciencx - Thursday February 27, 2025, https://www.scien.cx/2025/02/27/flash-mla-curated-references/

HARVARD

Andy | Sciencx Thursday February 27, 2025 » Flash MLA curated references., viewed ,<https://www.scien.cx/2025/02/27/flash-mla-curated-references/>

VANCOUVER

Andy | Sciencx - » Flash MLA curated references. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/27/flash-mla-curated-references/

CHICAGO

" » Flash MLA curated references." Andy | Sciencx - Accessed . https://www.scien.cx/2025/02/27/flash-mla-curated-references/

IEEE

" » Flash MLA curated references." Andy | Sciencx [Online]. Available: https://www.scien.cx/2025/02/27/flash-mla-curated-references/. [Accessed: ]

rf:citation

» Flash MLA curated references | Andy | Sciencx | https://www.scien.cx/2025/02/27/flash-mla-curated-references/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Deepseek Open Source week series

Related Posts