📑 Table of Contents

Doubao's Early Access to Civil Service Exam Scores Sparks Debate, Renewing Focus on AI Data Scraping Boundaries

📅 · 📁 Industry · 👁 10 views · ⏱️ 8 min read
💡 ByteDance's AI product Doubao allegedly scraped 2026 Shandong civil service exam scores before official release. Authorities responded that a test portal was accidentally accessed. The incident has triggered widespread public discussion about AI data scraping capabilities and data security boundaries.

Introduction: A Single Post Ignites Public Debate, AI Capabilities Back in the Spotlight

On the evening of April 23, a post claiming "Doubao found the 2026 Shandong civil service exam scores" went viral on social media. The accompanying screenshot showed a score report (with scores redacted), instantly drawing attention and discussion from countless examinees and netizens. How exactly did ByteDance's AI product Doubao manage to obtain written exam scores before they were officially released? The underlying questions about AI data scraping capabilities and information security deserve serious reflection.

Meanwhile, the tech world received another major piece of news — DeepSeek V4 officially became the default model for OpenClaw, signaling the expanding influence of Chinese-developed large language models. With multiple hot topics converging, the tech discourse at the end of April proved exceptionally lively.

The Core Story: The Truth Behind Doubao's "Early Score Retrieval"

The incident originated from a user posting on social media, claiming to have retrieved 2026 Shandong provincial civil service exam scores through Doubao AI, specifically noting that "it seems only Jinan Huaiyin District scores are accessible." The post quickly went viral, with many users marveling at AI's powerful information-gathering capabilities. Some speculated that "the link was ready but not yet published, and was crawled by AI."

On April 24, reporters contacted the original poster and learned that the content had actually been reposted from another user whose original post had already been deleted. Reporters then called the registration policy hotline, where staff provided an official response.

According to the staff member: "Because we were scheduled to release the written exam scores today, staff were testing the score query portal last night, and a netizen accidentally accessed it." The representative further explained: "Once we discovered this, we promptly shut down the test portal, and this morning we officially released the score announcement to the public. No adverse effects were caused."

In other words, Doubao AI did not actively "crack" or "hack" the score system. Rather, a query portal briefly exposed during the testing phase happened to be captured by a user or by AI's web scraping functionality. Officials urged the public to be understanding and to avoid unnecessary speculation.

In-Depth Analysis: AI Crawler Capabilities Raise New Data Security Concerns

Although the "truth" behind the incident is not particularly complex, the issues it exposes deserve serious scrutiny across the entire industry.

First, AI search tools' information scraping capabilities far exceed those of traditional search engines. Current Chinese AI products such as Doubao, DeepSeek, and Kimi commonly feature real-time internet search and information aggregation capabilities. Their crawlers operate at higher frequencies and cover broader ground, able to discover and index newly appearing web pages in extremely short timeframes — even if a page is merely in a "testing state" and has not been officially published. This incident was an unintentional demonstration of precisely this capability.

Second, government systems urgently need upgraded information security protections. The failure to implement access restrictions on the test portal, allowing unpublished information to be accessed externally, represents a basic yet common oversight at the technical level. In an era where AI crawlers are ubiquitous, any link exposed on the public internet can be scraped and cached within minutes. When conducting functional testing, government systems should employ intranet environments, IP whitelisting, or authentication measures to prevent information leaks at the source.

Third, the boundaries of AI products' responsibility in presenting information remain unclear. When an AI tool directly presents information to users that has not been officially confirmed, should it include a disclaimer such as "this information has not yet been officially released"? If AI scrapes data involving personal privacy — such as exam scores in this case — does the platform have an obligation to filter and anonymize it? These questions currently lack clear answers at the regulatory level.

Industry Developments: The Rise of DeepSeek V4 and Accelerating AI Ecosystem Evolution

Notably, within the same timeframe, DeepSeek V4 was selected as the default model for OpenClaw, marking a further elevation of Chinese large language models' standing in the international open-source ecosystem. From DeepSeek's explosive global debut at the beginning of the year to V4 now becoming the preferred choice on mainstream platforms, Chinese AI companies are closing the gap with international leaders at a visibly rapid pace — and even surpassing them in certain scenarios.

At the same time, OpenAI's CEO publicly apologized for failing to promptly report information related to a shooting suspect, once again thrusting AI ethics and social responsibility into the spotlight. Whether it concerns the boundaries of information scraping, the application of model capabilities, or the social responsibilities of AI companies, the entire industry continues to encounter new challenges and tough questions amid rapid development.

Outlook: Finding Balance Between Capability and Responsibility

Although the Doubao "early score retrieval" incident ultimately proved to be a misunderstanding caused by an exposed test portal, it served as a mirror reflecting the fragility of information security in the AI era and the public's complex attitudes toward AI capabilities — marveling at its power while worrying about its pervasiveness.

In the future, as AI search and intelligent agent technologies continue to evolve, similar incidents will very likely occur again. This demands that government and enterprise systems incorporate "AI visibility" into their security assessment frameworks during digital transformation, while also requiring AI product developers to establish more robust information filtering and ethical review mechanisms.

Technology itself is neither good nor evil, but its application requires rules. As AI capabilities advance at breakneck speed, finding the balance between efficiency and security, openness and protection, will be a long-term challenge facing all practitioners in the field.