MIT Sparse Attention Cuts Transformer Memory 80%
MIT researchers introduce a sparse attention mechanism that slashes Transformer memory usage by 80% while preserving mod…
3 articles about 'LLM efficiency'
MIT researchers introduce a sparse attention mechanism that slashes Transformer memory usage by 80% while preserving mod…
Japan-based Sakana AI develops evolutionary algorithms to merge existing LLMs, creating powerful new models without expe…
Microsoft Research introduces BitNet b2, pushing extreme quantization to slash LLM memory and compute costs while preser…