ZAYA1-8B Matches DeepSeek-R1 on Math With Just 760M Active Params
A new 8B MoE model called ZAYA1-8B achieves DeepSeek-R1-level math performance while activating only 760M of its 8B para…
2 articles about 'DeepSeek-R1'
A new 8B MoE model called ZAYA1-8B achieves DeepSeek-R1-level math performance while activating only 760M of its 8B para…
DeepSeek users report widespread service issues, raising questions about the Chinese AI startup's infrastructure reliabi…