δ-mem: Cutting LLM Memory Costs by 90%
New δ-mem framework slashes GPU memory usage for LLMs by 90%, enabling efficient online inference on consumer hardware.
1 articles about 'efficiency'
New δ-mem framework slashes GPU memory usage for LLMs by 90%, enabling efficient online inference on consumer hardware.