抽取PDF文本
下載iTextSharp的dll,鏈接:
http://cdnetworks-kr-2.dl.sourceforge.net/project/itextsharp/itextsharp/iTextSharp-5.0.4/itextsharp-5.0.4-dll.zip
示例代碼 using iTextSharp.text.pdf;using iTextSharp.text.pdf.parser;
namespace ReadPdfDemo
{
????class Program
????{
????????static void Main(string[] args)
????????{
????????????string str = GetAllText(@"C:\Users\dc\Desktop\20101098504717.pdf");
????????}
????????static public string GetAllText(string filePath)
????????{
????????????string text = string.Empty;
????????????PdfReader reader = new PdfReader(filePath);
????????????for (int i = 1; i <= reader.NumberOfPages; i++)
????????????{
????????????????text += GetTextFromPage(reader, i);
????????????}
????????????return text;
????????}
????????static public string GetTextFromPage(PdfReader reader, int pageNum)
????????{
????????????ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
????????????return PdfTextExtractor.GetTextFromPage(reader, pageNum, strategy);
????????}
????}
}
轉(zhuǎn)載于:https://www.cnblogs.com/dc10101/archive/2010/10/11/1847810.html
總結(jié)
- 上一篇: 生成图片验证码
- 下一篇: 引用 vsftpd配置手册(实用)