📑 Table of Contents

AI Weekly: LLM Societies, Huawei's AI-Powered Chip Development, and Quantifying Creativity

📅 · 📁 Research · 👁 13 views · ⏱️ 8 min read
💡 Import AI Issue 444 spotlights three cutting-edge topics: large language models building virtual societies, Huawei leveraging AI to auto-generate high-performance computing kernels, and the ChipBench benchmark for evaluating AI chip design capabilities — sparking deep industry reflection on how to quantify AI creativity.

Introduction: When AI Starts Building Its Own 'Society'

Artificial intelligence is advancing at a pace that exceeds expectations. Import AI Issue 444 brings three remarkable frontier topics — LLM social simulation, Huawei's use of AI to generate high-performance computing kernels, and ChipBench, a benchmark built specifically for chip design. Together, these developments point to a central question: How do we quantify AI creativity? As large language models move beyond simply answering questions and begin collaborating, designing, and even constructing virtual societies, the boundaries of AI capability are being redefined.

LLM Societies: When Language Models Learn to 'Live in Groups'

Researchers have recently begun exploring a highly imaginative direction — having multiple large language models (LLMs) form virtual societies and observing the collective behavioral patterns that emerge from their interactions.

The core idea behind this line of research is straightforward: place multiple LLM agents in a simulated environment, assign them different roles, goals, and memory systems, then observe how they spontaneously form social structures, establish norms, and engage in cooperation and competition. Results show that LLM societies can give rise to surprisingly complex social dynamics, including information dissemination, opinion evolution, and even cultural formation.

The significance of this research extends far beyond academic curiosity. LLM social simulation provides an entirely new experimental platform for social science research, enabling researchers to test various social theory hypotheses in controlled environments. More importantly, understanding interaction patterns among multiple AI agents is crucial for building safe and reliable multi-agent systems in the future. As more and more AI agents are deployed in our world, how they will coordinate and whether they will produce unexpected collective behaviors are questions that urgently need answers.

Huawei Uses AI to Build Kernels: From 'Writing Code' to 'Writing Chips'

Huawei has taken a significant step forward in AI-assisted hardware development. According to reports, Huawei is using AI technology to automatically generate high-performance computing kernels, marking AI's extension from software development into deeper hardware and system-level optimization.

Computing kernels are core components of high-performance computing and traditionally require experienced engineers to spend considerable time on manual optimization. Huawei's approach uses large language models and specialized AI tools to automatically generate kernel code optimized for specific hardware architectures. In certain scenarios, the performance already approaches or even matches versions handwritten by human experts.

This advancement carries multiple strategic implications. First, it dramatically lowers the barrier and shortens the cycle for developing high-performance computing kernels, so that unlocking hardware performance no longer depends entirely on scarce expert talent. Second, in the current global chip competition landscape, Huawei's strategy of using AI to compensate for toolchain and ecosystem shortcomings deserves attention. When external supply is constrained, using AI to accelerate the completion of a proprietary technology stack is a pragmatic and creative path.

ChipBench: Scoring AI's Chip Design Capabilities

The launch of ChipBench provides a standardized measurement tool for evaluating AI's capabilities in chip design. As a benchmark specifically targeting chip design tasks, ChipBench covers multiple stages from RTL code generation and functional verification to performance optimization, aiming to systematically assess the practical performance of large language models in the semiconductor design workflow.

Early test results reveal an interesting reality: while current LLMs perform reasonably well on simple code generation tasks, a significant gap remains between them and human experts on tasks involving complex architectural decisions and deep optimization. This finding provides the industry with a clear capability map — we should neither underestimate AI's potential in chip design nor blindly overestimate its current level.

ChipBench's value also lies in the deeper question it raises: Which stages of chip design are best suited for AI involvement? Is it repetitive verification work, or architectural innovation that requires creative thinking? This directly relates to the fundamental challenge of how to quantify creativity.

Deep Analysis: Can Creativity Be Quantified?

The intersection of these three topics touches on one of the most philosophically profound questions in current AI research — how to quantify creativity.

In LLM society experiments, do the emergent behaviors exhibited by models count as 'creation'? Do the optimization strategies in Huawei's AI-generated high-performance kernels contain genuine 'innovation'? The AI chip design capability that ChipBench attempts to measure is essentially an evaluation of a form of 'engineering creativity.'

Traditionally, creativity is considered to encompass three dimensions: novelty, usefulness, and surprise. Under this framework, AI already performs impressively on the usefulness dimension — the code it generates genuinely works, runs, and optimizes. In terms of novelty, AI occasionally produces solutions humans have not thought of, especially in areas like combinatorial optimization. However, on the dimension of 'surprise' — the most difficult to quantify — AI's performance remains ambiguous.

The industry is currently experimenting with multiple quantification methods: relative scoring against human benchmarks, blind testing in a peer-review style, and statistical analysis of output diversity. But each method has its limitations, because the judgment of creativity itself is inherently subjective and context-dependent.

Outlook: A Paradigm Shift from Tool to Partner

Taken together, the three major trends presented in Import AI Issue 444 trace a clear trajectory in AI development: AI is evolving from a tool for single tasks into a collaborative partner capable of participating in complex system design and social simulation.

Looking ahead, we can anticipate several key directions. First, multi-agent collaboration will become a mainstream paradigm, and research into LLM societies will directly drive the creation of more efficient AI collaboration frameworks. Second, AI-assisted hardware design will accelerate the closed loop of 'software-hardware co-evolution' — AI designs better chips, and better chips in turn support more powerful AI. Third, as benchmarks like ChipBench mature, the industry will establish more scientific AI capability evaluation systems.

Ultimately, the question of 'quantifying creativity' may never have a perfect answer, but the very act of asking it is driving us toward a deeper understanding of the nature of intelligence. In an era when the boundaries of AI capability are constantly expanding, this understanding is more important than ever.