JD.com Unveils xLLM Speculative Inference Architecture
JD.com reveals its xLLM speculative inference system at AICon Shanghai, promising magnitude-level speedups for LLM infer…
1 articles about 'speculative-decoding'
JD.com reveals its xLLM speculative inference system at AICon Shanghai, promising magnitude-level speedups for LLM infer…