Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach
目的
Identification text verification code
摘要
Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism. One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct. This means that a change in the captcha security features like a noisier background can simply invalid an earlier attack. This paper presents a generic, yet effec-tive text captcha solver based on the generative adversarial network. Unlike prior machine-learning-based approaches that need a large volume of manually-labeled real captchas to learn an effective solver, our approach requires significantly fewer real captchas but yields much better performance. This is achieved by first learning a captcha synthesizer to automatically generate synthetic captchas to learn a base solver, and then fine-tuning the base solver on a small set of real captchas using transfer learning. We evaluate our ap- proach by applying it to 33 captcha schemes, including 11 schemes that are currently being used by 32 of the top-50 popular websites including Microsoft, Wikipedia, eBay and Google. Our approach is the most capable attack on text captchas seen to date. It outperforms four state-of-the-art text-captcha solvers by not only delivering a significant higher accuracy on all testing schemes, but also success-fully attacking schemes where others have zero chance. We show that our approach is highly efficient as it can solve a captcha within 0.05 second using a desktop GPU. We demonstrate that our attack is generally applicable because it can bypass the advanced security features employed by most modern text captcha schemes. We hope the results of our work can encourage the community to revisit the design and practical use of text captchas.
background
- Character overlapping
- Occluding line
- Solid and hollow fonts
- Character rotating, distortion or waving
- Different font sizes and colors
- Noisy background
2.3 million unique training images
方法
- GAN
- Transfer learning
- Our attack is based on the recently proposed GAN architecture [22]. A GAN consists of two models: a generative network for creating synthetic examples and a discriminative network to distinguish the synthesized examples from the real ones. We use backpropaga- tion [28] to train both networks, so that over the training iterations, the generator produces better synthetic samples, while the discrim- inator becomes more skilled at flagging synthetic samples.
- If the discriminator can successfully distinguish a large number of synthetic captchas from the real ones, the grid search method is employed to adjust the parameter values for synthesizing another batch of captchas.
- This process continues until the discriminator can distinguish less than 5% of the synthetic captchas from the real ones
- Specifically, we adapt the Pix2Pix image-to-image translation framework [14]. This algorithm was developed to transform an image from one style to another
- Captcha Solvers:a classical CNN called LeNet-5(has five convolutional layers, five polling layers followed by two fully- connected layers(3 × 3 filter for the convolutional layer a max-pooling filter))
- We use a Bayesian based parameter tuner [20] to automatically choose the hyperparameters for training the base solver
- Overall, applying transfer learn- ing to the second or third CL onward leads to the best performance.
模型
數據集
各個主流網站搜集并標記1500驗證碼,500訓練,1000測試
合成器生成200,000驗證碼做求解器數據集
預處理用20000訓練
求解器用200000訓練
solver use Keras
訓練環境
- trainging:cloud server with a 2.4GHz Intel Xeon CPU, four NVIDIA Tesla P40 GPUs and 256GB of RAM, running Centos 7 operating system with Linux kernel 3.10.
- training time:five hours
- testing:The trained solver is then run and tested on a workstation with a 3.2GHz Intel Xeon CPU, a NVIDIA Titan GPU and 64GB of RAM
效果
limitation
tips
- CAPTCHAs(Completely Automated Public Turing Test To Tell Computers and Humans Apart,全自動區分計算機和人類的圖靈測試即通用驗證碼
總結
以上是生活随笔為你收集整理的Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 家电龙头业绩高 机构重仓股票名单!
- 下一篇: 绕过tp路由器管理密码_TP路由器怎么重