δ-mem: Cutting LLM Memory Costs by 90%
New δ-mem framework slashes GPU memory usage for LLMs by 90%, enabling efficient online inference on consumer hardware.
4 articles about 'Memory Optimization'
New δ-mem framework slashes GPU memory usage for LLMs by 90%, enabling efficient online inference on consumer hardware.
MIT researchers introduce a sparse attention mechanism that slashes Transformer memory usage by 80% while preserving mod…
UC Berkeley researchers unveil a novel attention mechanism that dramatically reduces memory consumption in Transformer m…
Seoul National University team develops novel memory optimization techniques enabling large AI model training on consume…