How we test dating sites — the 5-axis rubric in detail

This is the long-form version of the methodology. The methodology page is the short statement of intent; this page is the working protocol an editor follows for every review.

The 5-axis rubric

Every review carries five numbers. Each is on a 0–10 scale, rounded to one decimal. The overall score is a weighted blend of the four sub-scores.

| Axis | What it captures | |------|------------------| | Overall | Weighted blend of the four below | | UX | Onboarding, defaults, accessibility, mobile behaviour | | Value | Free-tier reality vs paid wall; price-to-pool ratio | | Audience | Pool quality, profile depth, automation signals | | Safety | Verification rigor, moderation, public incidents |

The rubric is fixed. It does not vary by product, by category, or by partnership status.

Scoring system

A score of 7 means "we would recommend this to a friend in the right situation". A score of 5 means "this exists; whether you should use it depends on factors we don't share with the average reader". Below 5 is "actively unrecommended" — those products are still reviewed because the reader deserves to see the negative case, but they do not feature in category rankings.

The numbers behind the score are stored in the review's frontmatter and visible on the live page. We do not hide sub-scores. If a product earns a 7.6 overall but a 5.0 on Safety, the reader sees both.

Audience scoring

The Audience axis is the most ambiguous to measure and the most useful to the reader. Our protocol:

Profile depth: log 50 profiles from the test account's feed

or recommended-matches surface. Score depth on a per-profile basis (bio length, photo count, prompt answers).

Automation signals: flag accounts that match obvious

templates (one-line bio, single photo, generic prompts) or respond with template-shaped opening messages.

Pool fit vs claim: compare the audience the product markets

to (serious, casual, over-50, LGBTQ+, religious, professional) against the audience the editor actually observed.

Geographic density: note when the product is meaningfully

thinner outside major metros. This becomes a friction line on the review page when it materially affects the experience.

We do not buy member-count data from third-party data brokers. The audience score is what an editor actually observed, not what the operator's press kit claims.

Pricing analysis

Pricing is the axis most likely to mislead readers because the list price is rarely what people pay. Our protocol:

Record the list price for each plan available in the editor's

region.

Record the promotional price for each plan after at least one

navigation cycle (close → revisit) — many operators show a lower "return" price after abandonment.

Test the cancellation flow before cancelling the test account.

Document any retention discount that appears (these are real prices and the reader should know about them).

Note any paywalled action that materially blocks the core

interaction (sending a first message, seeing who liked you, filtering matches) — these get called out in the Pricing reality check panel on the review page.

We do not republish vendor pricing copy. The price on the review page is the price the editor saw at the test viewport on the reviewed_at date.

Onboarding friction

Onboarding is rated by the experience, not by step count. A 20-question matching questionnaire that takes 10 minutes can earn a high UX score if every question is well-written and the defaults are sensible; a 4-step signup that asks for a credit card before showing a single profile earns a low UX score even though the step count is small.

Our protocol:

Time the full signup from landing page → first usable surface.
Note every required field that surprises the editor (mandatory

phone number, mandatory photo, mandatory profile prompt).

Note every dark pattern: pre-checked email opt-ins, hidden

cancel buttons, "are you sure?" friction on legitimate exits.

Note accessibility regressions: missing skip-links, focus traps,

contrast failures in critical CTAs.

These observations land in the review's pros, cons, and the Real-world friction panel.

Free-tier evaluation

For freemium products: what the free tier actually unlocks matters more than what the operator markets it as. Our protocol:

Use the free tier for the minimum testing window without

upgrading. Document the surfaces that lock out or throttle.

Note when the free tier is functionally a 7-day trial in

disguise (limits ratchet down after a fixed window).

Note when the free tier is functionally usable indefinitely

for a specific audience (e.g. women on women-first products, active users who initiate often).

Compare what the free tier provides against the median price of

the paid tier — this becomes the Value axis input.

A free tier that's a marketing pretext earns a low Value score. A free tier that's genuinely usable earns a high one.

Safety analysis

The Safety axis weighs verification rigor, moderation transparency, and the operator's public response to incidents. Our protocol:

Document the verification flow: what's required (photo, ID,

phone), what's optional, what's bypassable.

Document the reporting flow: how a user reports another user,

what the confirmation looks like, what (if anything) the reporter is told afterwards.

Note public moderation transparency: does the operator publish

a transparency report, a community guideline change log, or an incident postmortem?

Note known public incidents and how the operator responded.

We weight responsiveness highly. An operator that ships a fix in days when something goes wrong scores better than one with no public incidents but no public process either.

Messaging quality

Messaging is where a dating product earns or loses trust, and where the editorial test session spends the most time. Our protocol:

First-message throughput. From the test account, send a small

set of openers across a representative slice of the discover feed. Record how many messages reach the recipient surface vs how many are blocked by paywalls, throttles, or shadow filters.

Reply rate and reply shape. Log the share of replies received

inside the testing window. Note whether replies are conversation shapes (questions, follow-ups) or template shapes (one-liners, link drops, off-platform redirects).

Off-platform pull. Note when the product's messaging surface

pushes the conversation off-platform (WhatsApp, Telegram, SMS) faster than the editor would expect from a real fit.

Spam and bot pressure. Count low-effort openers, image

attachments without context, and obvious sales pitches. Flag products where the volume materially harms the editorial experience.

Paywall placement. Where the product walls off messaging —

before send, before read, before reply — is documented and shows up on the Pricing reality check panel.

Messaging observations feed the Audience and UX axes; egregious spam or moderation gaps also affect Safety.

Comparison methodology

The compare pages (`/compare/<a>-vs-<b>/`) are not independent reviews. They are deterministic re-cuts of the two underlying reviews. The comparison applies the same rubric to both products side by side. Anything stated on a compare page can be traced back to one of the two source reviews.

Specifically, every compare page shows:

the 5-axis scores of both products
the editorial pros and cons of both products
the "best for" line for both products
a deterministic best-for-you panel routed by the reader's

prompted situation

the same human-signal and friction blocks that appear on the

individual review pages

We do not invent comparative claims that are not present in the underlying reviews. If a product was not reviewed against a specific axis, the compare page reflects that gap rather than filling it.

Update cadence

A review is re-tested on a fixed minimum cadence and earlier on event triggers. The cadence:

Every twelve months minimum. Every active review is re-tested

at least once a year. The `reviewed_at` date on the page is the day the most recent re-test was completed.

Earlier on pricing changes. A material price change (list

price moves, a tier disappears, a paid action moves behind a new wall) triggers an out-of-cycle re-test.

Earlier on matching or moderation changes. A new onboarding

questionnaire, a new feed algorithm, a public moderation incident — any of these triggers an out-of-cycle re-test.

Earlier on safety incidents. A reported safety failure

(data breach, mass-impersonation event, abuse-handling failure) forces a same-week re-test if the editor can reach the surface.

Earlier on screenshot drift. When the product redesigns the

surface our hero shows, the hero is re-shot or archived; the review is re-tested in the same sprint.

We do not publish a score on a stale test. If we cannot re-test in the cadence above, the page eyebrow shows the last-tested date and the review is queued for retest rather than allowed to age out silently.

What we do not claim

Trust depends as much on what we don't say as on what we do. The list below is the discipline:

We do not claim a user count, traffic number, member-base size,

match volume, or success rate that we cannot verify against a primary source the editor has tested. Vendor-supplied numbers go through `[email protected]` for verification before they appear on the page.

We do not claim a brand has won an award unless a public, dated

citation exists.

We do not claim a brand is "the best" without naming who it is the

best for and what it trades off.

We do not claim a product is safe, private, or secure beyond what

the Safety axis observed in testing. We do not aggregate industry-wide claims onto a single review.

We do not publish a star rating or score from a third-party

reviewer or aggregator on this site. Every score here is ours.

We do not invent author credentials. The reviewer's role,

disciplines, and focus areas are listed in `src/content/authors/`.

These rules are enforced by automated checks on every update. When we drift, the change doesn't publish.

Where the protocol lives

The protocol described above is not aspirational. It is enforced by code, content schema, and verifiers:

Every review must carry the full 5-axis score — one without it

never publishes.

The pros/cons arrays are required to be non-empty.
The `reviewed_at` date is required to be ISO-formatted.
Author credentials are validated to not claim disciplines the

editor hasn't documented.

Screenshot metadata is gated by `verify-screenshot-metadata`

and `verify-evidence-freshness`.

Reviewer language is gated by `verify-editorial-trust` to never

claim fake credentials, awards, employers, or year counts.

If we drift, those checks fail and the change doesn't publish.

Methodology summary — short version
Editorial policy — what we will and won't do
Affiliate disclosure — how we get paid
About — who we are