Transparent & Auditable

Our Methodology

Transparent, auditable, and built on established legal frameworks. Here's exactly how SiteProof AI works — because if we ask you to be transparent about your AI, we should be transparent about ours.

How Our Scanner Works

Every scan follows a four-phase process designed for accuracy, transparency, and respect for your data.

Phase 01

Website Fetching

We retrieve the publicly accessible HTML of your website pages, respecting standard web protocols and your site's own rules.

We respect your robots.txt — if you block our crawler, we won't scan those pages
Our User-Agent is clearly identifiable: SiteProofAI/1.0 (+https://siteproof.ai/scanner)
We do NOT store the raw HTML after analysis — only a content hash for change detection
Free scans analyze up to 10 pages; paid plans scan all discoverable pages

Phase 02

AI-Powered Analysis

Our AI engine, powered by Anthropic's Claude, analyzes the extracted text content for potential compliance issues across multiple regulatory frameworks.

We use Claude by Anthropic — not OpenAI, not Google, not open-source models
Only extracted text content is sent to the AI — never your personal data or account information
Each finding includes a specific legal reference (e.g., EU AI Act Article 50, GDPR Article 22)
AI analysis is probabilistic — findings are labeled with confidence levels

Phase 03

Rule-Based Detection

Complementing AI analysis, our deterministic rule engine scans the page source code for known patterns that may indicate compliance issues.

Detects chatbot scripts (Intercom, Drift, Tidio, custom implementations) in the page source
Identifies AI-related cookies and third-party tracking scripts
Verifies the presence of disclosure pages, privacy policies, and consent mechanisms
Analyzes HTTP headers for AI-related signals — this analysis is fully deterministic (same input = same output)

Phase 04

Compliance Scoring

All findings from AI analysis and rule-based detection are combined into your SiteProof Score — a weighted composite score from 0 to 100.

Each of the four modules contributes a weighted portion to the total score
Scores are calculated using a transparent, documented formula
The score is orientative — it indicates potential risk areas, not legal compliance status
Scores may change between scans as our methodology evolves and your website changes

The Four Modules

Each module focuses on a specific area of AI compliance. Together, they provide a comprehensive view of your website's compliance posture.

AI Disclosure Scanner

Detects whether your website properly discloses its use of AI systems to users, as increasingly required by regulation.

What it detects:

Chatbots and virtual assistants not identified as AI-powered
AI-generated content published without transparency disclosures
Recommendation systems operating without adequate transparency
Automated decision-making without required explanations
Missing AI usage disclosures in terms of service or user-facing pages

AI Privacy Scanner

Verifies that AI-related data processing on your website respects privacy regulations and user consent requirements.

What it detects:

AI-related cookies deployed without proper consent mechanisms
User data potentially sent to third-party AI APIs without notice
Privacy policies that may not adequately address AI data processing
International data transfers to AI model providers lacking safeguards
AI systems processing personal data beyond disclosed purposes

AI Content Quality Scanner

Evaluates content characteristics that search engines and regulators may associate with low-quality or undisclosed AI generation.

What it detects:

Content exhibiting patterns commonly associated with AI generation
Pages lacking E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness)
Thin or repetitive content that could trigger search engine quality filters
Missing author attribution or editorial oversight indicators
Content quality issues that could undermine regulatory credibility

AI Risk Assessment

A smart questionnaire that uncovers hidden compliance risks that automated scanning alone cannot detect from outside your organization.

What it detects:

Internal AI tools and systems not visible from outside your website
AI-powered HR, recruitment, or employee monitoring systems
Third-party AI vendor relationships creating shared compliance obligations
Data processing activities that may require a DPIA
AI governance gaps — missing policies, training, or oversight structures

Legal Frameworks We Reference

Every finding in your report cites a specific legal article. Here are the frameworks our analysis is built upon.

EU AI ActRegulation (EU) 2024/1689

The world's first comprehensive AI regulation, establishing obligations for AI system providers and deployers based on risk levels.

Key articles:

Article 50 — Transparency obligations for deployers of certain AI systems
Article 52 — Transparency for chatbots, deepfakes, and emotion recognition
Articles 9 & 10 — Risk management and data governance for high-risk AI
Article 26 — Obligations of deployers of high-risk AI systems

Maximum penalty: €35M or 7% of global annual revenue

GDPRRegulation (EU) 2016/679

The General Data Protection Regulation governing personal data processing, with specific provisions relevant to AI systems.

Key articles:

Articles 5 & 6 — Principles and lawfulness of data processing
Articles 13 & 14 — Transparency and information obligations
Article 22 — Automated individual decision-making, including profiling
Article 25 — Data protection by design and by default
Article 35 — Data Protection Impact Assessment (DPIA)
Articles 44–49 — International data transfers

Maximum penalty: €20M or 4% of global annual turnover

CCPA/CPRACalifornia Civil Code §1798.100-199.100

California's consumer privacy laws granting residents rights over their personal information, including in AI contexts.

Key articles:

Right to know about personal information collected and shared
Right to delete personal information
Right to opt out of automated decision-making technology
Right to non-discrimination for exercising privacy rights

Maximum penalty: $7,500 per intentional violation

FTC GuidelinesFTC Act Section 5 & AI Guidance (2023–2026)

Federal Trade Commission guidelines on AI transparency, fairness, and consumer protection in AI-powered services.

Key articles:

Prohibition of deceptive AI practices under Section 5
Requirements for clear disclosure of AI use in consumer-facing applications
Guidelines on AI-generated content and endorsements
Enforcement actions against unfair or deceptive AI business practices

Maximum penalty: Varies — injunctions, penalties, and consent orders

Our Scoring System

Your SiteProof Score (0–100) is a weighted composite of all four modules. It provides a quick overview of your website's potential compliance posture.

Module Weights

Each module contributes a specific portion to the total score

AI Disclosure

/3030%

AI Privacy

/3535%

Content Quality

/2525%

Risk Assessment

/1010%

Total

/100100%

Score Levels

0–40Non-Compliant

Significant potential compliance issues detected. Immediate attention recommended.

41–60At Risk

Several potential issues found. Review and remediation advisable.

61–80Partially Compliant

Some issues detected but foundational elements are in place. Targeted improvements suggested.

81–95Compliant

Few or minor issues detected. Website appears to address major compliance requirements.

96–100SiteProof Certified

Excellent compliance posture detected across all modules. Eligible for the SiteProof Certified seal.

Important: The SiteProof Score is an orientative indicator based on automated analysis. It does not constitute a legal compliance certification. Always consult a qualified professional for compliance decisions.

Confidence Levels

Every finding in your report includes a confidence level so you know how it was detected and how to prioritize it.

High Confidence

Code-based detection

These findings are based on deterministic pattern matching in your website's source code. The same input will always produce the same result.

Examples:

• Chatbot script detected without AI disclosure
• AI cookie set without consent mechanism
• Missing privacy policy page

Medium Confidence

AI-powered analysis

These findings are identified by our AI engine based on content analysis. They are probabilistic — the AI has identified a potential issue that warrants human review.

Examples:

• Content may exhibit AI-generated characteristics
• Privacy policy may not adequately address AI usage
• Disclosure language may be insufficient

What We Don't Do

Transparency means being honest about our limitations. Here's what SiteProof AI does NOT do.

We do NOT claim content is "X% AI-generated" — we identify characteristics that may warrant review

We do NOT guarantee compliance — we detect potential issues for your review

We do NOT replace legal advice — always consult a qualified professional for compliance decisions

We do NOT store the HTML of scanned websites — only URLs, findings, scores, and content hashes

We do NOT access password-protected areas — our analysis is limited to publicly accessible content

We do NOT execute JavaScript — our analysis is based on the static HTML source of your pages

Data Privacy & Security

We take the security of your data seriously. Here's how we protect the information involved in every scan.

No HTML Storage

Raw HTML is discarded after analysis. We retain only a content hash for change detection between scans.

Robots.txt Respected

We honor your robots.txt directives. If you block our crawler, those pages will not be scanned.

Identifiable User-Agent

Our crawler identifies itself as "SiteProofAI/1.0 (+https://siteproof.ai/scanner)" in every request.

Minimal Data Retention

We store only URLs, findings, compliance scores, and content hashes. Free scan data is deleted after 24 hours.

Continuous Improvement

AI regulation is evolving rapidly. Our methodology evolves with it.

Regulatory Monitoring

We continuously monitor changes to the EU AI Act, GDPR, CCPA, and FTC guidelines. When regulations change, our scanning rules are updated accordingly.

AI Model Updates

As Anthropic releases improved versions of Claude, we evaluate and integrate updates to improve the accuracy and depth of our AI-powered analysis.

Rule Engine Expansion

Our rule-based detection engine is regularly expanded with new patterns as we identify additional chatbot platforms, AI tools, and compliance indicators in the wild.

Subscriber Notifications

Users with active Starter or Pro subscriptions receive notifications when significant methodology changes could affect their compliance scores or findings.

Ready to scan your website?

See our methodology in action. Get your compliance report in 60 seconds — free, no signup required.

Start Free Scan →View Pricing

SiteProof AI is an automated analysis tool. Results are informational and do NOT constitute legal advice. Consult a qualified legal professional for compliance decisions.