ZAYA1-8B Matches DeepSeek-R1 on Math With Just 760M Active Params
A new 8B MoE model called ZAYA1-8B achieves DeepSeek-R1-level math performance while activating only 760M of its 8B para…
5 articles about 'efficient ai'
A new 8B MoE model called ZAYA1-8B achieves DeepSeek-R1-level math performance while activating only 760M of its 8B para…
UC Berkeley researchers unveil a new Transformer architecture that cuts compute costs by up to 60% while maintaining ben…
Vietnam's VinAI Research publishes cutting-edge work on making Vision Transformers faster and lighter for real-world dep…
UC Berkeley researchers unveil a novel attention mechanism that dramatically reduces memory consumption in Transformer m…
Microsoft Research releases Phi-5, a small language model that rivals GPT-4 performance while running on consumer hardwa…