Phishing Simulations Work: When They’re Designed to Build Skill, Not Shame

Post hero image

Table of contents

See Hoxhunt in action
Drastically improve your security awareness & phishing training metrics while automating the training lifecycle.
Get a Demo
Updated
September 18, 2025
Written by
Maxime Cartier
Fact checked by

Every few months, a study lands in the headlines declaring: “Phishing simulations don’t work.” It’s easy to see why these stories go viral: nearly every organization runs a phishing simulation program, and the critique feels refreshing. And the truth is, many traditional approaches don’t move the needle much.

But the real question isn’t if phishing simulations work. It’s what design reliably builds skill, confidence, and measurable risk reduction.

The latest research, published in the Wall Street Journal, gives us a clear answer. Poorly designed programs - annual compliance modules, generic “click-and-train” popups, or punitive gotchas - don’t change behavior. But well-designed programs - frequent, relevant, interactive, encouraging - do. They can transform people from passive recipients into active defenders who report, discuss, and even enjoy learning about phishing. That’s not just opinion. It’s what the strongest data from academia, industry, and real-world programs shows.

I’ve spent a decade helping people get better at spotting and reporting phish. Along the way I’ve learned two truths that happily coexist: (1) a single research paper is never the truth; and (2) phishing simulations absolutely work, when they’re designed to build skill, not shame. The debate usually stalls at the wrong question (“Do simulations work?”) instead of the better one: what kind of program reliably changes behavior and culture?

This piece is my practical answer, grounded in recent research and what we see across many enterprises. I’ll show where the skeptical headlines are right (about bad training) and how to design programs that produce the outcomes CISOs actually want: higher reporting, faster time-to-report, and fewer risky actions. Without trashing morale.

What the evidence really says (and why the headlines feel so loud)

Let’s acknowledge the elephant. Several studies have challenged the impact of common training patterns:

  • Ho et al. (2025): In a large healthcare organization, annual one-off training had no impact on simulated-phish failure. Optional “click-to-train” popups barely budged outcomes, since most users closed the page in under 10 seconds.
  • ETH Zurich (Lain et al., 2022): Over 15 months and 14k+ employees, they observed that given enough attempts, many people eventually click (human fallibility is real), and optional post-click training didn’t help: sometimes it backfired.
  • Marshall et al. (2024): In this meta-analysis of 42 studies, researchers confirmed that annualized programs are unlikely to provide sustained protection, whereas active engagement and repeated practice improved outcomes.

It’s no wonder some papers conclude “training doesn’t work.” With a long video once a year plus a sporadic gotcha email and a scolding… the skeptical view is mostly right. That’s not a behavior change program. That’s broccoli boiled for an hour: technically nutritious, basically inedible.

But zoom out and the picture changes.

  • Verizon DBIR 2025: This is the largest industry dataset about phishing simulations, and it found that across tens of thousands of organizations, employees trained within the last 30 days were 4× more likely to report phishing emails than those trained earlier. That’s population-scale evidence of the power of recency. The data also shows that organizations running sustained programs drive simulation failure down to ~1.5%, a dramatic hardening of the human attack surface.
  • Case studies: Our own data at Hoxhunt offers insight from 2.5m+ users and shows that well-designed, gamified programs make employees much more likely to report phishing. Many of our customers shared their findings and results, such as AES or Qualcomm, which turned even “worst-to-first” cohorts into reliable reporters.
The message is clear: phishing simulations can absolutely work, when they’re designed right.

Five principles to make phishing simulations succeed

These are the five design principles I’ve seen consistently succeed, aligned with what the literature suggests.

1. Personalize and progressively challenge

One-size-fits-all is a recipe for disengagement. But people learn when examples look like their world. Developers see Git prompts, finance sees invoice fraud, exec admins see travel changes. Start at the right difficulty and make it a little tougher over time. A small-scale study from Schöni et al (2024) showed that when low-proficiency participants received training tailored to their skill level, something remarkable happened: the performance gaps disappeared.

In our programs at Hoxhunt we lean on adaptive difficulty and localized content. Simulations adapt automatically: clickers get easier challenges to rebuild self-efficacy, while skilled reporters get harder ones. Over time, people grow instead of feeling tricked.

Measure it: reporting rates by role and region, plus how many users graduate to more advanced levels while maintaining high reporting.

2. Right frequency, right duration

Marshall et al. (2024) showed training effects decay without reinforcement; the DBIR 2025 confirmed it at scale: reporting likelihood peaks in the first month after training. I think of it as gym logic: one epic workout each December doesn’t make you fit. That’s why simulations should be frequent: at least once a month!

Ho et al. (2025) says “the majority of the users in our study spent less than 30 seconds looking at embedded training content”. The fix is micro-training: 30-second interactive lessons delivered regularly.

Even strong reporters should get a micro-training following a report, as we need frequency. Regular micro-drills for everyone build “muscle memory”, because vigilance is perishable.  

Tactic: Replace the annual video with monthly micro-drills and quick refreshers. Keep total time investment low, cadence high.

3. Positive reinforcement & psychological safety

The ETH Zurich study found that optional, half-hearted post-click training sometimes made things worse. In practice, punitive responses crush reporting culture, and people try to hide mistakes. I’d rather turn a click into a teachable moment and reward reporting. When teams celebrate quick reporting, people lean in.

What works instead: encouragement. Thank-you notes, small rewards, or even just a leaderboard can make reporting feel fun. Plus, by training everyone (whether they click or report, as mentioned in the point above), you remove the feeling of punishment from the training moment coming after a click.  

The best programs equip managers with simple scripts and dashboards: “Here’s how our team’s doing; let’s talk without blame.”

Tactic: No public shaming. Private, supportive coaching after a miss. Publicly recognise excellent reporters and teams with fast time-to-report.

Measure it: repeat reporters per month, survey sentiment, completion of optional micro-coaching tasks.

4. Gamification and interactivity

Ho et al. (2025) hinted at the obvious: “Users who complete static training sessions have worse phishing failure rates. In contrast, users who complete interactive training sessions have a lower likelihood of failing future simulations”. The counter is interactive, just-in-time coaching, still 30 seconds, but hands-on. Because we learn best by doing, not by skimming.  

Combined with gamification techniques, training can also feel fun, and contribute to a positive culture of security.  

5. Make reporting frictionless, and wire it to response

Click rate alone is a dangerous vanity metric. It’s sensitive to email difficulty and tells you nothing about detection. Often programs optimize the wrong KPI: focus on defence, not embarrassment. Not clicking is good, but reporting quickly is what reduces risk.  

Reporting must be one-click, integrated with SOC workflows, and rewarded with feedback: “Thanks, this helped block a campaign.” At Hoxhunt, reported phishing emails feed straight into our analysis engine. Every month, our users worldwide report over half a million real threats. That data strengthens defenses and keeps training relevant, since people practice on real attacker lures.

Measure it: number of real-threat reported, time-to-report (dwell time)

What this looks like in the real world

Peer-reviewed studies tell us what to avoid and where to focus. Case evidence shows what “good” can look like when you apply those lessons. It isn’t a randomised trial, but evidence compounds when different sources rhyme.

None of these are randomly controlled trials, but together with DBIR and academic studies, they show that when design is right, behaviour shifts massively.

Elisa Benchmark Study

Where research and practice meet (and why I’m optimistic)

The most cited skeptical papers don’t say “give up.” They say: don’t do bad training and expect good outcomes. Ho et al. (2025) asked us to rethink stale patterns. ETH Zurich (2022) warned that optional, bland micro-pages can backfire and that some number of clicks will happen over time. Marshall et al. (2024) challenged us to mind the decay curve. Schöni et al (2024) told us to personalize.

When we follow those hints - short, frequent, interactive, role-relevant, psychologically safe, easy to report, measured on the right outcomes - we see the behaviours that matter rise: more eyes on target, faster escalation, fewer harmful actions. That’s not hype; it’s the composite picture you get when peer-reviewed findings and multi-company outcomes rhyme.

At Hoxhunt, our defaults were built around exactly that: ~30-second interactive micro-lessons, frequent practice, adaptive difficulty, and training for everyone (including your best reporters). We reward reporting, never punish clicking, and we measure behaviour change rather than vanity. I’m proud of those choices not because they’re ours, but because the research increasingly validates the approach.

Build confidence, not fear

So do phishing simulations “work”? The evidence says: they can, when they’re designed to build skill, not shame.

Annual compliance modules and punitive click-to-train popups won’t save you. But adaptive, frequent, interactive, positive programs will massively help in improving reporting, cutting dwell time, and strengthening culture.

People want to do the right thing. Give them training that respects their time and intelligence, make the secure action the easy one, and celebrate the behaviours that protect the company. When people are supported instead of punished, they become your strongest line of defence. That’s something worth celebrating.

References (selected)

  • Ho, G., Mirian, A., Luo, E., et al. (2025). Understanding the Efficacy of Phishing Training in Practice. IEEE Symposium on Security & Privacy. DOI: 10.1109/SP61157.2025.00076.
  • Lain, D., et al. (2022). Phishing in Organizations: Findings from a Large-Scale and Long-Term Study. ETH Zurich / IEEE S&P.
  • Marshall et al. (2024): Exploring the evidence for email phishing training: A scoping review. Computers & Security, Volume 139.
  • Schöni et al. (2024): You Know What? - Evaluation of a Personalised Phishing Training Based on Users’ Phishing Knowledge and Detection Skills. EuroUSEC '24: Proceedings of the 2024 European Symposium on Usable Security
  • Case evidence:
  • Creating a Company Culture for Security (Hoxhunt, 2025). ~7× higher reporting among trained vs. untrained groups.
  • AES Case Study (Hoxhunt): Reporting up 526% (11.5% → 60.5%); simulation failure down from 7.6% → 1.6%.
  • Qualcomm Case Study (Hoxhunt): “Worst-to-first” cohort becomes top reporters; org-wide reporting increases while clicks fall.

Addressing the critiques

  • “Studies show no effect.” True for the outdated model of annual, static, passive-learner training. Not true for frequent, adaptive, interactive programs. Context matters.
  • “Click rate is the golden metric.” Clicks alone mislead. Better metrics involve: reporting rate, time-to-report, recovery after mistakes.
  • “Employees feel tricked.” Only if simulations are unfair or punitive. Start easy, ramp up, and frame mistakes as learning.
  • “It’s just compliance.” Not if you align to real incidents and measure impact on detection and dwell time.
Want to learn more?
Be sure to check out these articles recommended by the author:
Get more cybersecurity insights like this