A recent study analyzing more than 300,000 domains has revealed that the use of LLMs.TXT—a file format intended to guide AI models on content usage—does not show a consistent effect on AI-generated citations. Despite its introduction as a way for website owners to indicate whether their content should be used in AI training, the data suggests that LLMs.TXT presence has limited impact on how AI systems cite or utilize content.
This finding highlights the complexities of AI content consumption and the challenges of controlling content usage through simple directives.
What Is LLMs.TXT?
LLMs.TXT is a proposed standard that allows web publishers to communicate their preferences about AI usage of their content. Conceptually similar to robots.txt for search engines, it can instruct AI models whether content should be used for training, ignored, or treated in a specific way. The intent is to give publishers more control over their intellectual property in the age of AI.
The Study
Researchers examined over 300,000 domains to determine whether including an LLMs.TXT file affected AI citation behavior. They assessed:
- The presence of the file on websites
- The type of directives included (allow, disallow, or partial use)
- Patterns in AI citations of content from these domains
The goal was to measure whether LLMs.TXT effectively influenced AI systems’ referencing practices.
Key Findings
The analysis revealed:
- No clear correlation between LLMs.TXT directives and AI citation patterns.
- Many domains with explicit “disallow” statements still had content cited by AI.
- Domains without any LLMs.TXT file showed similar citation behavior to those with one.
- Variability in AI usage suggests that current systems may not consistently respect LLMs.TXT instructions.
These results indicate that while LLMs.TXT is well-intentioned, its practical impact on AI behavior is limited under current conditions.
Implications for Website Owners
For web publishers concerned about AI usage:
- LLMs.TXT may not prevent citations or AI training on your content.
- Monitoring and legal safeguards remain important for protecting intellectual property.
- AI companies may need additional standards or enforcement mechanisms for consistent compliance.
The study underscores that technological solutions alone may not fully address content control in AI ecosystems.
Why This Matters
As AI-generated content becomes more prevalent, websites are increasingly aware of how their content may be consumed, referenced, or reproduced. Tools like LLMs.TXT aim to offer transparency and control, but this research shows that implementation and adoption challenges remain.
Without consistent adherence by AI systems, relying solely on LLMs.TXT may not guarantee the intended outcome for publishers. Awareness and continued monitoring of AI usage practices are crucial.
Moving Forward
The study suggests several next steps for improving AI content usage policies:
- Broader adoption and standardization of LLMs.TXT across websites
- Better compliance mechanisms for AI developers to respect publisher instructions
- Ongoing research to measure and track AI citation behavior over time
By combining these approaches, publishers and AI developers can work toward more predictable and respectful content use.
Conclusion
While LLMs.TXT offers a promising framework for expressing content usage preferences to AI systems, current research shows it does not have a clear impact on AI citations across a large sample of websites. Website owners should be aware of its limitations, continue to monitor AI usage of their content, and consider additional strategies to protect their intellectual property.
The study highlights the evolving challenges of content control in an AI-driven digital landscape and the need for both technological and policy solutions to ensure publisher rights are respected.
