Skip to main content
PPPhishPondPhishing Tradecraft Intelligence

Attack · Detection · Validation

CampaignTradecraftInfrastructureDetectionResearchRadarNewsroomAboutSubscribe
CampaignTradecraftInfrastructureDetectionResearchRadarNewsroomAboutSubscribe

Research Desk

PhishPond

Phishing tradecraft research desk covering campaign analysis, adversary infrastructure, detection engineering, and validation workflows.

High signal for security teams who need tradecraft, not recycled filler.

Navigate

  • Home
  • Newsroom
  • Research
  • Subscribe

Signals

  • editorial@phishpond.dev
  • Research Mission & Ethics
  • Intel Brief
  • RSS Feed
  • Submit Research Tip
© 2026 PhishPond. Authorized security research use only.

GitHub RadarBlue team tool

Himanshu49Gaur/PhishDefender-PhishingResponseSystem

An end-to-end AI-powered phishing detection and response platform built as a Chrome browser extension. Combines machine learning, threat intelligence APIs, and rule-based automation to detect, analyze, and respond to phishing emails in real time directly inside Gmail. Primary language: Jupyter Notebook. 12 stars.

Jupyter Notebook12 stars10 forkspushed Jun 10, 2026MIT

Project links:Open GitHub projectBack to radar

README Preview

Fetched from GitHub

PhishDefender-PhishingResponseSystem

Image: Python Image: Flask Image: LightGBM Image: XGBoost Image: Chrome Extension Image: License

An end-to-end AI-powered phishing detection and response
platform built as a Chrome browser extension.
Combines machine learning, threat intelligence APIs,
and rule-based automation to detect, analyze, and respond
to phishing emails in real time directly inside Gmail.

---

Table of Contents

  1. Problem Statement
  2. Proposed Solution
  3. How It Works — Architecture Overview
  4. Project Structure
  5. Tech Stack
  6. Datasets Used
  7. Machine Learning Pipeline
  8. Backend Modules — Deep Dive
  9. Browser Extension — Deep Dive
  10. API Endpoints
  11. Threat Scoring System
  12. Installation and Setup
  13. Running the Project
  14. Loading the Extension
  15. How to Use
  16. Free Deployment Alternatives
  17. Resume Highlights
  18. Known Limitations
  19. Future Improvements
  20. License

---

1. Problem Statement

Phishing attacks remain one of the most prevalent and damaging forms of cybercrime worldwide. According to industry reports, over 3.4 billion phishing emails are sent every single day, and phishing is responsible for more than 90% of all data breaches globally.

Despite the existence of spam filters and basic email security tools, modern phishing attacks have become increasingly sophisticated. Attackers now craft emails that:

  • Pass SPF, DKIM, and DMARC authentication checks by

abusing legitimate email infrastructure

  • Use legitimate URL shorteners and redirect chains to

hide malicious destinations from static filters

  • Employ social engineering tactics that exploit urgency,

fear, and authority to bypass human judgment

  • Target individuals specifically using information

harvested from social media (spear phishing)

  • Deploy payloads with advanced capabilities including

keylogging, screen capture, memory inspection, and GUI spoofing

The core problem is that existing solutions operate reactively — they either block known bad domains using static blocklists, or they rely entirely on human judgment which is prone to error under social pressure. Neither approach scales well against zero-day phishing campaigns that use freshly registered domains and never-before-seen payloads.

Additionally, most enterprise-grade phishing detection tools are expensive, require complex IT infrastructure, and are inaccessible to individual users, small organizations, and security students who need to learn these concepts hands-on.

There is a clear need for an intelligent, automated, real-time phishing detection system that:

  • Operates at the point of attack (the inbox itself)
  • Combines multiple detection signals rather than

relying on a single method

  • Is accessible and free to deploy
  • Produces actionable, human-readable output including

identified IOCs and recommended responses

  • Can be extended and improved as new attack patterns

emerge

---

2. Proposed Solution

This project proposes a full-stack AI-powered phishing detection platform delivered as a Chrome browser extension with a Python backend.

The core philosophy of the solution is defence in depth — rather than relying on any single detection method, the platform layers five independent detection signals and combines them into a unified threat score:

| Layer | Method | Signal Type | |-------|--------|-------------| | 1 | Email header analysis | SPF, DKIM, DMARC, Reply-To | | 2 | LightGBM email classifier | NLP + text features | | 3 | LightGBM URL classifier | Structural URL features | | 4 | VirusTotal API | External threat intelligence | | 5 | AbuseIPDB API | IP reputation intelligence |

The five signals are then fed into a rule-based triage engine that assigns weighted scores to each signal and produces a final verdict of Malicious, Suspicious, or Benign with a score from 0 to 100.

For malicious and suspicious emails the platform automatically generates a professional PDF incident report containing all identified Indicators of Compromise (IOCs), the full rule trace, ML model confidence scores, and recommended response actions — exactly the kind of output a real SOC analyst would produce after investigating a phishing alert.

The architecture is deliberately split into two layers:

  • Browser Extension (JavaScript) — a thin layer that

reads the open email from the Gmail DOM and displays results. It contains no ML logic and no API keys.

  • Python Backend (Flask) — handles all intelligence

processing including ML inference, API calls, rule evaluation, and report generation. This is where the real work happens and where all sensitive credentials are stored.

This split architecture means the extension itself is lightweight, fast, and secure — it simply reads and displays. All the heavy computation runs on the Python side which can be improved, extended, or replaced without touching the extension code.

---

3. How It Works — Architecture Overview

┌─────────────────────────────────────────────────────┐
│                  GMAIL (Browser)                     │
│                                                     │
│  User opens email → content.js reads the Gmail DOM  │
│  Extracts: subject, sender, body, URLs, headers     │
└────────────────────┬────────────────────────────────┘
                     │ chrome.runtime.sendMessage
                     ▼
┌─────────────────────────────────────────────────────┐
│              background.js (Service Worker)          │
│                                                     │
│  Receives extracted email data                      │
│  POSTs to Python backend via fetch()                │
│  Handles timeout, retry, notifications, storage     │
└────────────────────┬────────────────────────────────┘
                     │ POST /analyze (JSON)
                     ▼
┌─────────────────────────────────────────────────────┐
│              app.py (Flask Server)                   │
│                                                     │
│  Step 1 → email_parser.py                           │
│           Parse headers, extract URLs, clean text   │
│                                                     │
│  Step 2 → ml_classifier.py                         │
│           LightGBM email model (TF-IDF + features)  │
│           LightGBM URL model (structural features)  │
│           Combined probability score                │
│                                                     │
│  Step 3 → threat_intel.py                          │
│           VirusTotal URL scanning                   │
│           VirusTotal IP reputation                  │
│           AbuseIPDB IP abuse confidence             │
│                                                     │
│  Step 4 → rules_engine.py                          │
│           Weight all signals → 0-100 threat score   │
│           Determine verdict (malicious/suspicious)  │
│           Build IOC list and recommended actions    │
│                                                     │
│  Step 5 → report_generator.py (if threat detected)  │
│           Generate professional PDF incident report  │
│                                                     │
│  Returns unified JSON result                        │
└────────────────────┬────────────────────────────────┘
                     │ JSON response
                     ▼
┌─────────────────────────────────────────────────────┐
│              popup.js + popup.html                   │
│                                                     │
│  Renders verdict banner (red/orange/green)          │
│  Displays ML scores, IOCs, triggered rules          │
│  Shows URL scan results and header auth status      │
│  Provides PDF report download button                │
└─────────────────────────────────────────────────────┘

---

4. Project Structure

phishing-detector/
│
├── icons/                      # Extension icons
│   ├── icon16.png
│   ├── icon48.png
│   └── icon128.png
│
├── reports/                    # Auto-generated PDF reports
│
├── Model files (from Kaggle notebook)
│   ├── email_model.pkl         # Trained LightGBM email classifier
│   ├── url_model.pkl           # Trained LightGBM URL classifier
│   ├── tfidf_vectorizer.pkl    # Fitted TF-IDF vectorizer
│   ├── scaler.pkl              # Fitted StandardScaler
│   ├── url_feature_names.pkl   # URL feature column names
│   └── model_metadata.json     # Model performance metrics
│
├── Config / Support
│   ├── config.py               # API keys, paths, thresholds
│   └── requirements.txt        # Python dependencies
│
├── Python Backend
│   ├── app.py                  # Flask server, main entry point
│   ├── email_parser.py         # Email parsing and feature extraction
│   ├── threat_intel.py         # VirusTotal and AbuseIPDB API calls
│   ├── rules_engine.py         # Rule-based triage and scoring
│   ├── ml_classifier.py        # ML model loading and inference
│   └── report_generator.py     # PDF incident report generation
│
├── Chrome Extension
│   ├── manifest.json           # Extension configuration (MV3)
│   ├── popup.html              # Extension popup UI structure
│   ├── popup.css               # Dark cybersecurity theme styles
│   ├── popup.js                # Popup UI logic and rendering
│   ├── content.js              # Gmail DOM extraction script
│   └── background.js           # Service worker, API bridge
│
└── ML Notebook (Kaggle)
    └── phishing_model.ipynb    # Full ML pipeline notebook

---

5. Tech Stack

Python Backend

| Technology | Version | Purpose | |------------|---------|---------| | Python | 3.10+ | Core backend language | | Flask | 3.0.3 | REST API server | | Flask-CORS | 5.0.0 | Cross-origin requests from extension | | LightGBM | 4.5.0 | Primary ML classifier (GPU accelerated) | | XGBoost | Latest | Secondary ML classifier (GPU accelerated) | | scikit-learn | 1.5.1 | TF-IDF vectorizer, StandardScaler | | pandas | 2.2.2 | Data manipulation | | numpy | 1.26.4 | Numerical operations | | scipy | 1.13.1 | Sparse matrix operations | | joblib | 1.4.2 | Model serialization | | requests | 2.32.3 | External API calls | | BeautifulSoup4 | 4.12.3 | HTML parsing in email bodies | | ReportLab | 4.2.2 | PDF incident report generation | | python-whois | 0.9.4 | Domain age lookup | | dnspython | 2.6.1 | DNS resolution utilities |

Chrome Extension

| Technology | Purpose | |------------|---------| | JavaScript (ES6+) | Extension logic | | Chrome Extension Manifest V3 | Latest extension standard | | Chrome Storage API | Persisting results and history | | Chrome Notifications API | Phishing alert notifications | | Chrome Tabs API | Gmail tab interaction | | MutationObserver | Gmail DOM change detection |

ML and Data

| Technology | Purpose | |------------|---------| | Kaggle Notebooks | Training environment | | CUDA / GPU T4 x2 | GPU-accelerated model training | | TF-IDF (5000 features) | Email text vectorization | | LightGBM (GPU mode) | Both email and URL classi