Deepfake Phishing Simulation Software: How to Train Teams

Train teams for AI voice/video scams. Our Hoxhunt guide to deepfake phishing simulation software shows safe rollout, realistic workflows, KPIs, and vendor comparisons.

Post hero image

Table of contents

See Hoxhunt in action
Drastically improve your security awareness & phishing training metrics while automating the training lifecycle.
Get a Demo
Updated
September 30, 2025
Written by
Hoxhunt
Fact checked by

What is deepfake phishing simulation software?

Deepfake phishing simulation software is training tech that safely rehearses AI-generated voice and video social-engineering attacks. Employees receive a phishing email that leads to a fake Teams/Meet/Zoom call, where an avatar (likeness + voice) urges an action; safe fails trigger instant micro-training, building resilience against advanced fraud.

Platforms like Hoxhunt support multi-channel environments and can deliver either targeted benchmarks or broad awareness modules.

Key capabilities you should expect

  • Multi-step, multi-channel flow (email + live “call” simulation) to reflect real attack paths.
  • Realistic avatars (likeness + voice) with consent-first production using vetted providers.
  • Instant, in-context micro-training when a risky action is taken - no public shaming, fast feedback.
  • Responsible realism (e.g., light glitches/lag to support a “bad connection” pretext) without crossing psychological-safety lines.

How it differs from traditional phishing simulations

  • Trains voice/video-based persuasion (not just text/email).
  • Emphasizes reflexes over pixel-peeping (assume fakes can look real; scrutinize urgency and request).
  • Aligns to high-value processes (payments, access, approvals) where deepfakes deliver real ROI for attackers.

Inside Hoxhunt’s deepfake simulation workflow

Hoxhunt's deepfake phishing simulation mirrors real attacks: a phishing email routes the user to a fake Teams page, a cloned-voice avatar urges an urgent action, and clicking the chat link triggers a safe fail with instant micro-training.

Step-by-step flow (multi-channel simulation)

  1. Phishing email (entry point): Users receive a realistic phishing simulation email (Outlook/Gmail) prompting a “quick call.” Hoxhunt's reporting button can capture reports directly from the inbox.
  2. Attacker-controlled “meeting” (landing page): Clicking opens a browser page that convincingly mimics Microsoft Teams / Google Meet / Zoom - mic and camera can be enabled so it feels live. Links can display as legit while redirecting under the attacker’s control.
  3. AI-generated voices + avatar (deepfake voice & video): A scripted avatar (likeness + voice cloning) urges an urgent action (e.g., open a “SharePoint” link). Light lag/glitch effects support the “bad connection” pretext without overstepping psychological safety.
  4. User action → safe fail (deepfake fraud simulation): Clicking the chat link triggers a simulation fail and instant micro-training. This is how Hoxhunt's security awareness training coaches away risky behavior - no public shaming, just fast, contextual feedback.
  5. Threat reporting path (before failure): If the user suspects AI-generated voice phishing and backs out, they can report the original email via the Hoxhunt button; it still counts as a successful report if done before failing.
  6. Admin visibility & outcomes: Admins view campaign metrics in real time and can trigger behavior-based nudges or training packages for users who exhibit risky behavior.

Watch our live demo of Hoxhunt's deepfake simulation.

KPIs to track (beyond “click rate”)

📊 Metrics That Matter (Deepfake Phishing Simulations)

Use these three KPIs to quantify behavior change and reduce human cyber risk in multi-channel deepfake phishing simulation exercises.

  • 📨 Reporting rate (from inbox)
    Definition: % of recipients who report the training phish via the Hoxhunt button before failing.
    Why it matters: Reinforces the “see it → report it” reflex your security awareness program is aiming for.
    Target: Trend up over time (overall and by role/region).
  • 🧭 Drop-off before action
    Definition: % who join the fake Teams/Meet/Zoom call but exit or refuse the chat-link request.
    Why it matters: Signals premise-based detection under pressure (behavioral resilience).
    Target: Trend up as training packages roll out.
  • Final fail rate
    Definition: % who click the chat link in the attacker-controlled meeting (simulation fail).
    Why it matters: Core indicator for deepfake scams and credential-capture scenarios.
    Target: Trend down overall; highlight improvements in high-risk Business Processes.

Why the realism works but stays responsible

  • Web environments closely mirror Teams/Meet/Zoom (familiar communication channels), including camera/mic prompts and domain styling.
  • The experience is “real but responsible” - brief, targeted, and followed by micro-training to build behavioral resilience against deepfake scams.
Where this fits your security awareness program: run as targeted phishing simulation exercises (benchmark high-risk business processes) or as part of SAT training packages to reduce human cyber risk across the org.

Below you can walk through our deepfake simulation training at your own pace and see how it works under the hood.

Realism without risk: consent, ethics, and safety

Security teams need realistic simulations without “gotchas.” That's why Hoxhunt’s deepfake training is consent-only, pre-scripted, brief, and delivered in an attacker-style fake meeting - ending with micro-training instead of shame. Reporting happens from the original email, and rollouts are targeted one-offs with spacing, followed by additional training for risky users.

1) Consent & legal guardrails (non-negotiable)

  • Contractual cover: deepfake amendment for customers (or deepfake language in the main contract).
  • Customer-obtained consent: the customer confirms the likeness/voice owner has consented (one-line confirmation is sufficient).
  • Providers & data use: voice via ElevenLabs; video via Synthesia/HeyGen; agreements specify no storage or model training on customer content.

2) “Real but responsible” employee experience

  • No non-consensual cloning; scenarios are pre-scripted, not open-ended chats.
  • Short, respectful interactions (no prolonged cat-and-mouse); follow with micro-training instead of blame.
  • Clicking link leads to “oops - that could have been dangerous” learning moment.

3) Designed limits that keep risk low

  • Environment, not apps: simulations run in browser-based look-alikes of Teams/Meet/Zoom; we can add light lag/glitch to sell the pretext, but we don’t escalate intensity.
  • Clear end state: the chat-link click is the defined “fail,” which triggers instant coaching; users can exit safely before clicking.
  • Reporting path: employees report from the original email (not from the browser); a report still counts as success if made before failing.

4) Cadence and content strategy (avoid training plateaus)

  • Run as targeted one-offs (e.g., finance, EA, approvers), let the buzz settle, then re-test after regular training refreshers.
  • Tie additional training to user behavior (fails, risky clicks) rather than pushing long-form training modules; keep ongoing training lightweight with ready-made training modules.

Use cases & rollout: where deepfakes fit in your phishing campaigns

Use deepfake simulations where real-world threats hurt most: executive/BEC approvals, vendor fraud, and IT support pretexts. Run a targeted one-off benchmark, space it, then assign additional training and re-test. Measure reporting rate, drop-off before action, and final fail rate to prove behavior change across your phishing campaigns and simulation programs.

High-impact use cases (pair with realistic scenarios)

  • Executive impersonation / BEC approvals. Cloned “CEO/CFO” on a quick call pushes an urgent wire or bank-detail change - classic business email compromise pattern. Outcome can be a chat-link to a credential harvester.
  • Vendor fraud in finance. Pretexted “updated remit info” or “invoice about to bounce” delivered via email + call; train approvers and AP analysts.
  • IT support scams. “Reset now / verify device” pushed in a fake Teams/Meet/Zoom room; payload is a “SharePoint/SSO” chat link.
  • Multi-channel pretexting. Email ↔ call sequencing boosts credibility; include voice messages in your narrative to mirror AI-powered voice phishing.

Who to target first (and why)

  • Finance, executive assistants, approvers → highest monetary risk from Fraud attempts/BEC.
  • IT / service desk → credential theft and policy bypass pressure.
  • Broader org for awareness → short training modules while building baseline recognition.

Rollout pattern we recommend

  1. Targeted benchmark (one-off). Launch to a defined cohort.
  2. Space it. Let buzz cool, deliver micro-training or ready-made training modules to those who struggled.
  3. Re-test. New scenario, same process; show improvement over time in measurable behavior (↑ reports, ↑ safe drop-offs, ↓ fails).
  4. Close the loop. Use analytics to trigger additional training and keep ongoing training lightweight - avoid long-form defaults that cause training plateaus.

Getting started: deployment, consent & customization

Set up a deepfake phishing simulation program in weeks, not months: pick a high-risk cohort and scenario, secure consent, capture a short voice sample, generate the avatar, and launch a realistic simulation (email ➝ fake video meeting).

Step-by-step rollout

Choose the scenario & cohort

Start where business email compromise or approvals hurt most (finance/AP, EAs, approvers; IT for credential resets). Keep the narrative anchored to real-world threats.

Consent & contract

You confirm likeness/voice consent. No model training/storage on your content.

Capture assets

Record 1–2 minutes of the subject’s voice (or use approved samples) and supply a still/video + script. Two avatar options: fully consented capture (most realistic) or photo + short voice (lower effort).

Production window

Standard turnaround ~5 business days after voice received; avatar generation ~1 day.

Launch the drill (realistic but safe)

Send a simulated phishing email that routes to a browser-based Teams/Meet/Zoom look-alike. The cloned-voice avatar applies urgency and drops a “SharePoint/SSO” chat link (the safe fail).

Define success before go-live

  • Reporting rate (from inbox): report via the original email counts as success.
  • Drop-off before action: users who exit the fake call without clicking.
  • Final fail rate: chat-link clicks (endpoint).

Close the loop with training

Trigger micro-training/training modules for risky user behavior.

Re-test for proof

Space out simulations , swap in a fresh scenario, and re-run to show reductions in phishing click and behavior change across phishing campaigns.

Deepfake phishing simulation vendor comparison table

🔎 Deepfake training solutions - compact compare

Click a card to expand details.
Hoxhunt - Custom Deepfake Attack
Platform Multi-channel Bespoke
Why it stands out

Realism & attack: Email lure → fake Teams/Meet/Zoom video call with cloned exec voice/avatar → chat-link payload.

Reporting & analytics: Behavior KPIs — reporting rate, drop-off, final fail — in the same awareness dashboard.

Training modules: Micro-training + SAT; avoids plateaus; easy follow-ups for high-risk cohorts.

Proofpoint Phishing Simulations
Platform Templates
Details

Realism & attack: Email-centric simulations inside the security suite; verify live-call/fake-meeting support.

Reporting & analytics: Suite dashboards; confirm behavior-level metrics beyond click rate.

Training modules: Ready-made lessons; ensure micro-lesson depth to avoid plateaus.

Phished
Platform Templates
Details

Realism & attack: AI-generated emails; validate multi-channel rehearsal depth.

Reporting & analytics: Trends & campaign views; check behavior KPIs.

Training modules: Just-in-time lessons; avoid long-form defaults.

Living Security
Platform Multi-channel Templates
Details

Realism & attack: Coordinated email/SMS/phone drills; confirm fake-meeting realism.

Reporting & analytics: Human-risk views; check board-ready depth.

Training modules: Short modules; target refreshers to avoid plateaus.

Abnormal Security (AI Phishing Coach)
Platform Templates
Details

Realism & attack: Strong on BEC/email using sanitized real threats; validate meeting/vishing.

Reporting & analytics: Coaching & trends; verify behavior KPIs.

Training modules: Auto nudges + micro-lessons recommended.

Breacher.ai
Standalone Multi-channel Bespoke
Details

Realism & attack: Managed email/phone/video/social with org-specific narratives.

Reporting & analytics: Engagement reports; plan continuity for behavior KPIs.

Training modules: Consultative moments; schedule internal follow-ups.

Adaptive Security
Standalone Multi-channel Bespoke + Templates
Details

Realism & attack: Email/SMS/voice/video orchestration; confirm exec cloning.

Reporting & analytics: Risk scoring & compliance; check board-level views.

Training modules: Gamified modules; keep micro.

Revel8
Platform Multi-channel Bespoke
Details

Realism & attack: Real-time personalization across email/SMS/voice/video.

Reporting & analytics: Human-risk views; verify KPI coverage.

Training modules: On-demand, role-relevant lessons.

OSS / DIY (Phishing Frenzy + voice sim)
Standalone Multi-channel Bespoke (you build)
Details

Realism & attack: Script your own fake meetings & voice cloning.

Reporting & analytics: Stitch behavior metrics yourself.

Training modules: Ongoing content ops to avoid stale templates.

How attackers use deepfakes in phishing today

Attackers combine voice cloning and avatar video with multi-channel pretexting to impersonate trusted people. They scrape org intel (LinkedIn, email signatures), spoof caller ID or domains, lure targets into a fake Teams/Meet/Zoom page, then push urgent actions - credential capture or wire/vendor fraud.

Encoder-decoder architecture

The process starts with the encoder, which is a neural network that compresses the input data (e.g., a face image) into a smaller, more manageable representation known as a latent space.

This compressed representation contains the most critical features of the face, such as its shape, structure, and texture, without the original high-dimensional details.

The latent space is a lower-dimensional representation of the input data, capturing essential features like facial expressions, angles, and lighting in a compressed form.

This space is crucial because it allows the model to efficiently manipulate and transform these features to generate new outputs.

After encoding, the decoder takes the latent space representation and reconstructs it into a full image.

For deepfakes, the decoder is trained to take the latent representation of one face and generate the face of another person, often swapping features between the two.

Generative adversarial networks (GANs)

Deepfake technology primarily relies on artificial intelligence.

More specifically, a type of deep learning called Generative Adversarial Networks (GANs).

GANs consist of two neural networks:

  • Generator: This network generates fake data (such as images or video frames) that resembles real data. Its goal is to create content so realistic that the Discriminator cannot tell it apart from actual data.
  • Discriminator: This network evaluates the data and tries to distinguish between real and fake content. It essentially acts as a critic, judging whether the content produced by the Generator is authentic or fake.

During training, the Generator and Discriminator are essentially pitted against each other.

The Generator attempts to create more convincing fakes...

And the Discriminator gets better at detecting them.

Over time, this adversarial process improves the quality of the generated content, resulting in highly realistic deepfakes.

The generator becomes increasingly adept at producing highly realistic images or videos that the discriminator struggles to distinguish from real content.

The AI is trained using datasets

The GAN is trained on vast datasets of real images, audio, or video, depending on the type of deepfake being created.

If you wanted to create a deepfake video of a person, for example, the GAN would be trained on thousands of images or videos of that person's face, capturing various angles, expressions, and lighting conditions.

The quality of a deepfake will generally depend on the amount and quality of the training data.

The more diverse and comprehensive that datasets are, the more convincing the deepfake will be.

As the GAN trains, it iterates on the deepfakes being produced over and over.

This process will go on until the Generator produces content that is virtually indistinguishable from real content.

And these datasets are then used to generate deepfakes

Once the GAN is trained, it can generate deepfake content.

In a deepfake video, the trained Generator can manipulate a video by altering the subject’s facial expressions, syncing their lips to different audio, or even replacing their face entirely with someone else's.

The result is a video that looks authentic but has been artificially created or manipulated.

GANs is typically used in a couple of different ways:

  • Face swapping: The Generator replaces the face of a person in a video with the face of another person, maintaining natural expressions and movements.
  • Voice cloning: GANs can also be used to create deepfake audio by cloning a person’s voice. The model is trained on recordings of the person speaking, learning to mimic their tone, pitch, and speech patterns.

What signals they exploit (why people fall for it)

  • Authority + urgency from a familiar voice (“I’m boarding - do this now”).
  • Familiar communication channels (Microsoft Teams, Google Meet, Zoom) that lower suspicion.
  • Caller ID & domain trust - still widely believed in 2025.
  • Remote/hybrid workflows where quick approvals happen off-email trails.

Deepfake statistics

  • 70% of people say they aren’t confident that they can tell the difference between a real and cloned voice
  • 53% of people share their voices online or via recorded notes at least once a week
  • Searches for “free voice cloning software” rose 120% over the past year
  • Three seconds of audio is sometimes all that’s needed to produce an 85% voice match.
  • CEO fraud targets at least 400 companies per day.

What this means for defenses & training

  • Train reflexes and 'the pause' over pixel-peeping (assume visuals can look perfect).
  • Rehearse multi-channel scenarios inside your security awareness program (email → meeting → request).
  • Measure what matters in deepfake phishing simulations: reporting rate (from inbox), drop-off before action, and final fail rate - and use behavior-based nudges for users who exhibit risky behavior.
How do deepfake scams work?

What to teach employees about deepfake phishing

Coach employees to verify the premise, not the pixels. Teach them to pause on urgent, out-of-process requests over familiar communication channels, distrust caller ID, exit dubious calls, and report suspicious interactions.

1) Teach the reflexes (deepfakes can get around typical red flags)

  • Warning signs: authority + urgency, out-of-process asks (payments, credential verification), unusual channel for the task.
  • Coach the reflex: ask “Why me, why now, why this channel?” then verify out-of-band.
  • Rationale: quality is rising; glitch-spotting is unreliable - prioritize the request context.

2) Teach voice/call protocols (AI-generated voice phishing)

  • Treat caller ID as untrusted; spoofing internal names/extensions still works in 2025.
  • Verification playbook: hang up, call back via the directory, or message on Slack/Teams; agree passphrases for frequent contacts.

3) Teach meeting/link hygiene (fake meeting rooms)

  • Pattern to expect: email lure → browser-based Teams/Meet/Zoom clone with mic/camera prompts to feel “live.”
  • Golden rule: chat link = payload (SharePoint/SSO look-alikes can redirect). Exiting before clicking is success.

4) Teach the reporting flow (maps to KPIs)

  • Report from inbox: in Hoxhunt, users can report via the Hoxhunt button on the original email - it still counts as success even if they already joined the call (↑ reporting rate).
  • Safe exits: leaving the fake call without clicking is a good outcome (↑ drop-off before action).
  • Avoid the click: do not open the chat link (↓ final fail rate).

5) Teach process boundaries for high-risk workflows

  • Finance & vendor fraud: never approve urgent wire/bank-change on a live call - re-route to documented process.
  • IT support scams: no credentials over voice; use official reset flows.
  • Executive impersonation: unexpected “deepfake CEO messages” get verified out-of-band.
How to spot deepfake content

Why deepfake phishing needs specialized simulation tools

Most phishing simulation tools teach link-spotting in email. Deepfake attacks weaponize artificial intelligence across channels - email lure → fake Teams/Meet/Zoom → urgent request from a cloned voice/avatar. To build strong cybersecurity habits, programs must rehearse these realistic scenarios, measure behavior (report, drop-off, fail), and refresh training content beyond generic templates.

What basic programs miss

  • Email-only focus. Real attackers chain channels; simulations must mirror multi-vector realism (inbox → meeting clone → chat link).
  • Pixel-peeping bias. As quality rises, “spot the glitch” fails - teach premise checks under pressure instead.
  • Training plateaus. Stale, generic templates and long-form training modules stall engagement and learning. Rotate a library of phishing templates and keep lessons micro.

Deepfake phishing simulation software FAQ

Does this integrate with Outlook/Office 365?

Yes - Hoxhunt’s single phish alert button works across Outlook (Office 365) and Gmail, reducing confusion from “too many buttons” and supporting a simple admin experience. Use it to capture reports from simulated phishing emails and real threats.

How “deep” is the simulation (email-only vs multi-channel)?

For real phishing simulation depth, Hoxhunt pairs a realistic email lure with an attacker-controlled fake meeting (Teams/Meet/Zoom): camera/mic prompts + believable phishing domains/phishing websites, ending in a “chat-link” payload. That’s multi-vector realism - not just basic phishing scenarios.

What KPIs and dashboards should we expect?

Go beyond phishing click rates. Track behavior signals - reporting rate, drop-off before action, final fail rate - and present ready risk visuals in executive dashboards for complex stakeholder reporting (security leadership, compliance teams, board). Tie outcomes to Human Risk Management and business risk metrics.

Can we run templates and deepfakes side-by-side?

Yes. Rotate a library of phishing templates (email-based threats) alongside deepfake phishing attacks to represent the broader threat posture. Keep lessons micro (avoid default training bloat) to prevent training plateaus and speed ramp time to measurable improvement.

How is this “realistic but safe” for employees?

Scenarios are pre-scripted, short, and end in micro-training - no shaming. Users can report from the original email (counts as success) or leave the call without clicking (good outcome). That builds strong cybersecurity habits in security awareness training.

What about forensic reporting on the payload?

The safe fail is the chat-link (often a SharePoint/SSO look-alike). You get clean, comparable endpoints for forensic reporting and trendlines across phishing campaigns (who clicked, who exited, who reported).

We’re considering Proofpoint Phishing Simulations / OSS (Phishing Frenzy). Pros/cons?

Email-first suites (Proofpoint Phishing Simulations) and OSS (Phishing Frenzy + voice phishing simulator) can cover training content and a library of phishing templates. For realism & attack depth (email ➝ fake meeting) and org-specific deepfakes, validate multi-channel support and consent workflows, or consider Hoxhunt’s integrated approach.

How do we map analytics to user training?

Use reporting and analytics to trigger micro-training for risky user behavior (fails, hesitant drop-offs). This drives risk scoring, targeted user training, and improvement over time.

Any tips for social-engineering verification (voice phishing)?

Security Teams should teach employees to distrust caller ID, drop the call, and direct message the person on Slack/Teams - or call back via the directory. Passphrases for frequent contacts help reduce voice phishing impact.

Sources

Contextualizing Deepfake Threats to Organizations - NSA, FBI & CISA
Threat actors misusing Quick Assist in social engineering attacks leading to ransomware
- Microsoft Threat Intelligence
Business Email Compromise: The $55 Billion Scam
- FBI IC3
Arup lost $25mn in Hong Kong deepfake video conference scam
- Financial Times
Number spoofing scams
- Ofcom
Guide for Preparing and Responding to Deepfake Events
- OWASP GenAI Security Project

Want to learn more?
Be sure to check out these articles recommended by the author:
Get more cybersecurity insights like this