Our Methodology
Transparent, auditable, and built on established legal frameworks. Here's exactly how SiteProof AI works — because if we ask you to be transparent about your AI, we should be transparent about ours.
How Our Scanner Works
Every scan follows a four-phase process designed for accuracy, transparency, and respect for your data.
Website Fetching
We retrieve the publicly accessible HTML of your website pages, respecting standard web protocols and your site's own rules.
- We respect your robots.txt — if you block our crawler, we won't scan those pages
- Our User-Agent is clearly identifiable: SiteProofAI/1.0 (+https://siteproof.ai/scanner)
- We do NOT store the raw HTML after analysis — only a content hash for change detection
- Free scans analyze up to 10 pages; paid plans scan all discoverable pages
AI-Powered Analysis
Our AI engine, powered by Anthropic's Claude, analyzes the extracted text content for potential compliance issues across multiple regulatory frameworks.
- We use Claude by Anthropic — not OpenAI, not Google, not open-source models
- Only extracted text content is sent to the AI — never your personal data or account information
- Each finding includes a specific legal reference (e.g., EU AI Act Article 50, GDPR Article 22)
- AI analysis is probabilistic — findings are labeled with confidence levels
Rule-Based Detection
Complementing AI analysis, our deterministic rule engine scans the page source code for known patterns that may indicate compliance issues.
- Detects chatbot scripts (Intercom, Drift, Tidio, custom implementations) in the page source
- Identifies AI-related cookies and third-party tracking scripts
- Verifies the presence of disclosure pages, privacy policies, and consent mechanisms
- Analyzes HTTP headers for AI-related signals — this analysis is fully deterministic (same input = same output)
Compliance Scoring
All findings from AI analysis and rule-based detection are combined into your SiteProof Score — a weighted composite score from 0 to 100.
- Each of the four modules contributes a weighted portion to the total score
- Scores are calculated using a transparent, documented formula
- The score is orientative — it indicates potential risk areas, not legal compliance status
- Scores may change between scans as our methodology evolves and your website changes
The Four Modules
Each module focuses on a specific area of AI compliance. Together, they provide a comprehensive view of your website's compliance posture.
Legal Frameworks We Reference
Every finding in your report cites a specific legal article. Here are the frameworks our analysis is built upon.
Our Scoring System
Your SiteProof Score (0–100) is a weighted composite of all four modules. It provides a quick overview of your website's potential compliance posture.
Score Levels
Significant potential compliance issues detected. Immediate attention recommended.
Several potential issues found. Review and remediation advisable.
Some issues detected but foundational elements are in place. Targeted improvements suggested.
Few or minor issues detected. Website appears to address major compliance requirements.
Excellent compliance posture detected across all modules. Eligible for the SiteProof Certified seal.
Important: The SiteProof Score is an orientative indicator based on automated analysis. It does not constitute a legal compliance certification. Always consult a qualified professional for compliance decisions.
Confidence Levels
Every finding in your report includes a confidence level so you know how it was detected and how to prioritize it.
What We Don't Do
Transparency means being honest about our limitations. Here's what SiteProof AI does NOT do.
We do NOT claim content is "X% AI-generated" — we identify characteristics that may warrant review
We do NOT guarantee compliance — we detect potential issues for your review
We do NOT replace legal advice — always consult a qualified professional for compliance decisions
We do NOT store the HTML of scanned websites — only URLs, findings, scores, and content hashes
We do NOT access password-protected areas — our analysis is limited to publicly accessible content
We do NOT execute JavaScript — our analysis is based on the static HTML source of your pages
Data Privacy & Security
We take the security of your data seriously. Here's how we protect the information involved in every scan.
Continuous Improvement
AI regulation is evolving rapidly. Our methodology evolves with it.
Regulatory Monitoring
We continuously monitor changes to the EU AI Act, GDPR, CCPA, and FTC guidelines. When regulations change, our scanning rules are updated accordingly.
AI Model Updates
As Anthropic releases improved versions of Claude, we evaluate and integrate updates to improve the accuracy and depth of our AI-powered analysis.
Rule Engine Expansion
Our rule-based detection engine is regularly expanded with new patterns as we identify additional chatbot platforms, AI tools, and compliance indicators in the wild.
Subscriber Notifications
Users with active Starter or Pro subscriptions receive notifications when significant methodology changes could affect their compliance scores or findings.
Ready to scan your website?
See our methodology in action. Get your compliance report in 60 seconds — free, no signup required.
SiteProof AI is an automated analysis tool. Results are informational and do NOT constitute legal advice. Consult a qualified legal professional for compliance decisions.