ZAYA1-8B Matches DeepSeek-R1 on Math With Just 760M Active Params
A new 8B MoE model called ZAYA1-8B achieves DeepSeek-R1-level math performance while activating only 760M of its 8B para…
1 articles about 'ZAYA1-8B'
A new 8B MoE model called ZAYA1-8B achieves DeepSeek-R1-level math performance while activating only 760M of its 8B para…