Transparent & Auditable

Our Methodology

Transparent, auditable, and built on established legal frameworks. Here's exactly how SiteProof AI works — because if we ask you to be transparent about your AI, we should be transparent about ours.

How Our Scanner Works

Every scan follows a four-phase process designed for accuracy, transparency, and respect for your data.

Phase 01

Website Fetching

We retrieve the publicly accessible HTML of your website pages, respecting standard web protocols and your site's own rules.

  • We respect your robots.txt — if you block our crawler, we won't scan those pages
  • Our User-Agent is clearly identifiable: SiteProofAI/1.0 (+https://siteproof.ai/scanner)
  • We do NOT store the raw HTML after analysis — only a content hash for change detection
  • Free scans analyze up to 10 pages; paid plans scan all discoverable pages
Phase 02

AI-Powered Analysis

Our AI engine, powered by Anthropic's Claude, analyzes the extracted text content for potential compliance issues across multiple regulatory frameworks.

  • We use Claude by Anthropic — not OpenAI, not Google, not open-source models
  • Only extracted text content is sent to the AI — never your personal data or account information
  • Each finding includes a specific legal reference (e.g., EU AI Act Article 50, GDPR Article 22)
  • AI analysis is probabilistic — findings are labeled with confidence levels
Phase 03

Rule-Based Detection

Complementing AI analysis, our deterministic rule engine scans the page source code for known patterns that may indicate compliance issues.

  • Detects chatbot scripts (Intercom, Drift, Tidio, custom implementations) in the page source
  • Identifies AI-related cookies and third-party tracking scripts
  • Verifies the presence of disclosure pages, privacy policies, and consent mechanisms
  • Analyzes HTTP headers for AI-related signals — this analysis is fully deterministic (same input = same output)
Phase 04

Compliance Scoring

All findings from AI analysis and rule-based detection are combined into your SiteProof Score — a weighted composite score from 0 to 100.

  • Each of the four modules contributes a weighted portion to the total score
  • Scores are calculated using a transparent, documented formula
  • The score is orientative — it indicates potential risk areas, not legal compliance status
  • Scores may change between scans as our methodology evolves and your website changes

The Four Modules

Each module focuses on a specific area of AI compliance. Together, they provide a comprehensive view of your website's compliance posture.

AI Disclosure Scanner
Detects whether your website properly discloses its use of AI systems to users, as increasingly required by regulation.

What it detects:

  • Chatbots and virtual assistants not identified as AI-powered
  • AI-generated content published without transparency disclosures
  • Recommendation systems operating without adequate transparency
  • Automated decision-making without required explanations
  • Missing AI usage disclosures in terms of service or user-facing pages
AI Privacy Scanner
Verifies that AI-related data processing on your website respects privacy regulations and user consent requirements.

What it detects:

  • AI-related cookies deployed without proper consent mechanisms
  • User data potentially sent to third-party AI APIs without notice
  • Privacy policies that may not adequately address AI data processing
  • International data transfers to AI model providers lacking safeguards
  • AI systems processing personal data beyond disclosed purposes
AI Content Quality Scanner
Evaluates content characteristics that search engines and regulators may associate with low-quality or undisclosed AI generation.

What it detects:

  • Content exhibiting patterns commonly associated with AI generation
  • Pages lacking E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness)
  • Thin or repetitive content that could trigger search engine quality filters
  • Missing author attribution or editorial oversight indicators
  • Content quality issues that could undermine regulatory credibility
AI Risk Assessment
A smart questionnaire that uncovers hidden compliance risks that automated scanning alone cannot detect from outside your organization.

What it detects:

  • Internal AI tools and systems not visible from outside your website
  • AI-powered HR, recruitment, or employee monitoring systems
  • Third-party AI vendor relationships creating shared compliance obligations
  • Data processing activities that may require a DPIA
  • AI governance gaps — missing policies, training, or oversight structures

Legal Frameworks We Reference

Every finding in your report cites a specific legal article. Here are the frameworks our analysis is built upon.

EU AI ActRegulation (EU) 2024/1689
The world's first comprehensive AI regulation, establishing obligations for AI system providers and deployers based on risk levels.

Key articles:

  • Article 50 — Transparency obligations for deployers of certain AI systems
  • Article 52 — Transparency for chatbots, deepfakes, and emotion recognition
  • Articles 9 & 10 — Risk management and data governance for high-risk AI
  • Article 26 — Obligations of deployers of high-risk AI systems
Maximum penalty: €35M or 7% of global annual revenue
GDPRRegulation (EU) 2016/679
The General Data Protection Regulation governing personal data processing, with specific provisions relevant to AI systems.

Key articles:

  • Articles 5 & 6 — Principles and lawfulness of data processing
  • Articles 13 & 14 — Transparency and information obligations
  • Article 22 — Automated individual decision-making, including profiling
  • Article 25 — Data protection by design and by default
  • Article 35 — Data Protection Impact Assessment (DPIA)
  • Articles 44–49 — International data transfers
Maximum penalty: €20M or 4% of global annual turnover
CCPA/CPRACalifornia Civil Code §1798.100-199.100
California's consumer privacy laws granting residents rights over their personal information, including in AI contexts.

Key articles:

  • Right to know about personal information collected and shared
  • Right to delete personal information
  • Right to opt out of automated decision-making technology
  • Right to non-discrimination for exercising privacy rights
Maximum penalty: $7,500 per intentional violation
FTC GuidelinesFTC Act Section 5 & AI Guidance (2023–2026)
Federal Trade Commission guidelines on AI transparency, fairness, and consumer protection in AI-powered services.

Key articles:

  • Prohibition of deceptive AI practices under Section 5
  • Requirements for clear disclosure of AI use in consumer-facing applications
  • Guidelines on AI-generated content and endorsements
  • Enforcement actions against unfair or deceptive AI business practices
Maximum penalty: Varies — injunctions, penalties, and consent orders

Our Scoring System

Your SiteProof Score (0–100) is a weighted composite of all four modules. It provides a quick overview of your website's potential compliance posture.

Module Weights
Each module contributes a specific portion to the total score
AI Disclosure
/3030%
AI Privacy
/3535%
Content Quality
/2525%
Risk Assessment
/1010%
Total
/100100%

Score Levels

0–40Non-Compliant

Significant potential compliance issues detected. Immediate attention recommended.

41–60At Risk

Several potential issues found. Review and remediation advisable.

61–80Partially Compliant

Some issues detected but foundational elements are in place. Targeted improvements suggested.

81–95Compliant

Few or minor issues detected. Website appears to address major compliance requirements.

96–100SiteProof Certified

Excellent compliance posture detected across all modules. Eligible for the SiteProof Certified seal.

Important: The SiteProof Score is an orientative indicator based on automated analysis. It does not constitute a legal compliance certification. Always consult a qualified professional for compliance decisions.

Confidence Levels

Every finding in your report includes a confidence level so you know how it was detected and how to prioritize it.

High Confidence
Code-based detection

These findings are based on deterministic pattern matching in your website's source code. The same input will always produce the same result.

Examples:

  • • Chatbot script detected without AI disclosure
  • • AI cookie set without consent mechanism
  • • Missing privacy policy page
Medium Confidence
AI-powered analysis

These findings are identified by our AI engine based on content analysis. They are probabilistic — the AI has identified a potential issue that warrants human review.

Examples:

  • • Content may exhibit AI-generated characteristics
  • • Privacy policy may not adequately address AI usage
  • • Disclosure language may be insufficient

What We Don't Do

Transparency means being honest about our limitations. Here's what SiteProof AI does NOT do.

We do NOT claim content is "X% AI-generated" — we identify characteristics that may warrant review

We do NOT guarantee compliance — we detect potential issues for your review

We do NOT replace legal advice — always consult a qualified professional for compliance decisions

We do NOT store the HTML of scanned websites — only URLs, findings, scores, and content hashes

We do NOT access password-protected areas — our analysis is limited to publicly accessible content

We do NOT execute JavaScript — our analysis is based on the static HTML source of your pages

Data Privacy & Security

We take the security of your data seriously. Here's how we protect the information involved in every scan.

No HTML Storage

Raw HTML is discarded after analysis. We retain only a content hash for change detection between scans.

Robots.txt Respected

We honor your robots.txt directives. If you block our crawler, those pages will not be scanned.

Identifiable User-Agent

Our crawler identifies itself as "SiteProofAI/1.0 (+https://siteproof.ai/scanner)" in every request.

Minimal Data Retention

We store only URLs, findings, compliance scores, and content hashes. Free scan data is deleted after 24 hours.

Continuous Improvement

AI regulation is evolving rapidly. Our methodology evolves with it.

Regulatory Monitoring

We continuously monitor changes to the EU AI Act, GDPR, CCPA, and FTC guidelines. When regulations change, our scanning rules are updated accordingly.

AI Model Updates

As Anthropic releases improved versions of Claude, we evaluate and integrate updates to improve the accuracy and depth of our AI-powered analysis.

Rule Engine Expansion

Our rule-based detection engine is regularly expanded with new patterns as we identify additional chatbot platforms, AI tools, and compliance indicators in the wild.

Subscriber Notifications

Users with active Starter or Pro subscriptions receive notifications when significant methodology changes could affect their compliance scores or findings.

Ready to scan your website?

See our methodology in action. Get your compliance report in 60 seconds — free, no signup required.

SiteProof AI is an automated analysis tool. Results are informational and do NOT constitute legal advice. Consult a qualified legal professional for compliance decisions.