Skip to main content
PPPhishPondPhishing Tradecraft Intelligence

Attack · Detection · Validation

CampaignTradecraftInfrastructureDetectionResearchRadarNewsroomAboutSubscribe
CampaignTradecraftInfrastructureDetectionResearchRadarNewsroomAboutSubscribe

Research Desk

PhishPond

Phishing tradecraft research desk covering campaign analysis, adversary infrastructure, detection engineering, and validation workflows.

High signal for security teams who need tradecraft, not recycled filler.

Navigate

  • Home
  • Newsroom
  • Research
  • Subscribe

Signals

  • editorial@phishpond.dev
  • Research Mission & Ethics
  • Intel Brief
  • RSS Feed
  • Submit Research Tip
© 2026 PhishPond. Authorized security research use only.

GitHub RadarBlue team tool

lindsey98/Phishpedia

Official Implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21 Primary language: Python. 352 stars.

Python352 stars55 forkspushed Jun 5, 2026CC0-1.0

Project links:Open GitHub projectBack to radar

README Preview

Fetched from GitHub

Phishpedia A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

<div align="center">

Image: Dialogues Image: Dialogues

</div> <p align="center"> <a href="https://www.usenix.org/conference/usenixsecurity21/presentation/lin">Paper</a> • <a href="https://sites.google.com/view/phishpedia-site/">Website</a> • <a href="https://www.youtube.com/watch?v=ZQOH1RW5DmY">Video</a> • <a href="https://drive.google.com/file/d/12ypEMPRQ43zGRqHGut0Esq2z5en0DH4g/view?usp=drive_link">Dataset</a> • <a href="#citation">Citation</a> </p>

  • This is the official implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21 link to paper, link to our website, link to our dataset.
  • Existing reference-based phishing detectors:
  • :x: Lack of interpretability, only give binary decision (legit or phish)
  • :x: Not robust against distribution shift, because the classifier is biased towards the phishing training set
  • :x: Lack of a large-scale phishing benchmark dataset
  • The contributions of our paper:
  • :white_check_mark: We propose a phishing identification system Phishpedia, which has high identification accuracy and low runtime overhead, outperforming the relevant state-of-the-art identification approaches.
  • :white_check_mark: We are the first to propose to use consistency-based method for phishing detection, in place of the traditional classification-based method. We investigate the consistency between the webpage domain and its brand intention. The detected brand intention provides a visual explanation for phishing decision.
  • :white_check_mark: Phishpedia is NOT trained on any phishing dataset, addressing the potential test-time distribution shift problem.
  • :white_check_mark: We release a 30k phishing benchmark dataset, each website is annotated with its URL, HTML, screenshot, and target brand: https://drive.google.com/file/d/12ypEMPRQ43zGRqHGut0Esq2z5en0DH4g/view?usp=drive_link.
  • :white_check_mark: We set up a phishing monitoring system, investigating emerging domains fed from CertStream, and we have discovered 1,704 real phishing, out of which 1133 are zero-days not reported by industrial antivirus engine (Virustotal).

Framework

<img src="./datasets/overview.png" style="width:2000px;height:350px"/>

Input: A URL and its screenshot Output: Phish/Benign, Phishing target

  • Step 1: Enter <b>Deep Object Detection Model</b>, get predicted logos and inputs (inputs are not used for later prediction, just for explanation)
  • Step 2: Enter <b>Deep Siamese Model</b>
  • If Siamese report no target, Return Benign, None
  • Else Siamese report a target, Return Phish, Phishing target

Setup

Prerequisite: Pixi installed

For Linux/Mac,

  export KMP_DUPLICATE_LIB_OK=TRUE
  git clone https://github.com/lindsey98/Phishpedia.git
  cd Phishpedia
  pixi install
  chmod +x setup.sh
  ./setup.sh

For Windows, in PowerShell,

  git clone https://github.com/lindsey98/Phishpedia.git
  cd Phishpedia
  pixi install
  setup.bat

Running Phishpedia from Command Line

pixi run python phishpedia.py --folder <folder you want to test e.g. ./datasets/test_sites>

The testing folder should be in the structure of:

test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
......

Running Phishpedia as a GUI tool (web-browser-based)

See WEBtool/

Install Phishpedia as a Chrome plugin

See Plugin_for_Chrome/

Project structure

- models/
|___ rcnn_bet365.pth
|___ faster_rcnn.yaml
|___ resnetv2_rgb_new.pth.tar
|___ expand_targetlist/
  |___ Adobe/
  |___ Amazon/
  |___ ......
|___ domain_map.pkl
- logo_recog.py: Deep Object Detection Model
- logo_matching.py: Deep Siamese Model
- configs.yaml: Configuration file
- phishpedia.py: Main script

Miscellaneous

  • In our paper, we also implement several phishing detection and identification baselines, see here
  • The logo targetlist described in our paper includes 181 brands, we have further expanded the targetlist to include 277 brands in this code repository
  • For the phish discovery experiment, we obtain feed from Certstream phish_catcher, we lower the score threshold to be 40 to process more suspicious websites, readers can refer to their repo for details
  • We use Scrapy for website crawling

Citation

If you find our work useful in your research, please consider citing our paper by:

@inproceedings{lin2021phishpedia,
  title={Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages},
  author={Lin, Yun and Liu, Ruofan and Divakaran, Dinil Mon and Ng, Jun Yang and Chan, Qing Zhou and Lu, Yiwen and Si, Yuxuan and Zhang, Fan and Dong, Jin Song},
  booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
  year={2021}
}

Contacts

If you have any issues running our code, you can raise an issue or send an email to liu.ruofan16@u.nus.edu, lin_yun@sjtu.edu.cn, and dcsdjs@nus.edu.sg