Study Finds: LLMs Prefer Resumes They Generated Themselves
When AI Becomes the Recruiter, Does It Favor Its Own "Work"?
The answer is yes. A new study shows that large language models (LLMs), when serving as resume screeners, exhibit an alarming tendency — they consistently favor resumes generated by themselves, ranking those written by humans or produced by other models lower. This finding sounds the alarm for the widespread adoption of AI in recruitment.
Core Finding: A 'Self-Preference' Too Significant to Ignore
Researchers designed a series of controlled experiments, mixing authentic human-written resumes, resumes generated by the same model, and resumes generated by different models, then submitting them to multiple mainstream LLMs for screening and scoring. The results showed that nearly all tested models exhibited a consistent pattern: they preferred resumes they had generated themselves.
This preference was no fluke. Regardless of the industry or seniority level covered in the resumes, the LLMs' self-preference remained stable. More notably, this bias was not because AI-generated resumes were objectively superior — when human recruitment experts conducted blind evaluations, the quality difference between AI-generated and human-written resumes was not significant.
Where Does the Bias Come From? Style Recognition or Training Inertia?
In community discussions, many technical professionals offered insights into the root cause. A prevailing view holds that LLMs are essentially recognizing their own "stylistic fingerprint." Each model has specific phrasing habits, sentence structures, and information organization patterns when generating text. When the same model acts as the evaluator, it naturally assigns higher scores to text that aligns with its own generative distribution.
It's akin to a writer unconsciously favoring works similar to their own style during anonymous peer review — except that LLMs exhibit this tendency in a far more systematic and consistent manner.
Others pointed out that this phenomenon may be linked to the RLHF (Reinforcement Learning from Human Feedback) training process. The standards for "good text" that are repeatedly reinforced during training happen to be the very type of text the model itself tends to generate, creating a self-reinforcing loop.
Far-Reaching Implications for AI Recruitment Practices
The real-world significance of this study should not be underestimated. An increasing number of companies are integrating LLMs into their hiring pipelines for initial resume screening, candidate ranking, and even interview evaluation. At the same time, job seekers are extensively using AI tools to polish or even directly generate their resumes. This creates a rather ironic scenario:
- Job seekers who use ChatGPT to generate resumes may rank higher in GPT-powered recruitment systems
- Candidates who use Claude to refine their resumes may gain an advantage in Claude-powered screening systems
- Meanwhile, candidates who insist on writing resumes by hand may actually be disadvantaged in AI screening
The unfairness introduced by this "model-tool matching" clearly runs counter to the merit-based principles that recruitment should uphold. The deeper issue is that if companies do not disclose which AI screening models they use, job seekers are thrust into a game of information asymmetry.
Community Reflection: Evaluation Benchmarks Themselves Need Scrutiny
This finding has also sparked broader reflection. Commentators noted that the self-preference problem in LLMs extends beyond resume screening and warrants equal vigilance in AI evaluation. Many current model evaluations adopt the "LLM-as-a-Judge" paradigm, where one LLM assesses the output quality of other models. If the judging model itself harbors self-preference, the objectivity of evaluation results becomes questionable.
This also partially explains why, in certain benchmarks, model rankings shift significantly depending on which model serves as the judge.
Looking Ahead: Transparency and Diversity Are Key
To address the self-preference problem in LLMs, the industry needs to pursue solutions on multiple fronts. First, companies deploying AI recruitment tools should maintain transparency by disclosing the models and screening logic they use. Second, multi-model cross-review mechanisms could be introduced, averaging scores from different models to offset any single model's bias. Finally, human review should not be entirely eliminated — especially at critical decision points, human judgment remains irreplaceable.
This study serves as yet another reminder: the "neutrality" of AI tools cannot be taken for granted. Before deploying LLMs in any scenario involving fairness, thoroughly understanding their inherent biases is a prerequisite for responsible deployment.
📌 Source: GogoAI News (www.gogoai.xin)
🔗 Original: https://www.gogoai.xin/article/study-finds-llms-prefer-their-own-generated-resumes
⚠️ Please credit GogoAI when republishing.