java xpdf 转换成html_java将Word/Excel/PDF文件转换成HTML整理
項目開發過程中,需求涉及到了各種文檔轉換為HTML或者網頁易顯示格式,現在將實現方式整理如下:
一、使用Jacob轉換Word,Excel為HTML
“JACOB一個Java-COM中間件.通過這個組件你可以在Java應用程序中調用COM組件和Win32 libraries。”
首先下載Jacob包,JDK1.5以上需要使用Jacob1.9版本(JDK1.6尚未測試),與先前的Jacob1.7差別不大
1、將壓縮包解壓后,Jacob.jar添加到Libraries中;
2、將Jacob.dll放至“WINDOWS\SYSTEM32”下面。
需要注意的是:
【使用IDE啟動Web服務器時,系統讀取不到Jacob.dll,例如用MyEclipse啟動Tomcat,就需要將dll文件copy到MyEclipse安裝目錄的“jre\bin”下面。
一般系統沒有加載到Jacob.dll文件時,報錯信息為:“java.lang.UnsatisfiedLinkError: no jacob in java.library.path”】
新建類:
1
public
class
JacobUtil
2
{3public staticfinalintWORD_HTML=8;45public staticfinalintWORD_TXT =7;67public staticfinalintEXCEL_HTML=44;89
/** *//**10???? * WORD轉HTML11???? *@paramdocfile WORD文件全路徑12???? *@paramhtmlfile 轉換后HTML存放路徑13*/14public staticvoidwordToHtml(String docfile, String htmlfile)15
{16??????? ActiveXComponent app=newActiveXComponent("Word.Application");//啟動word 17try18
{19??????????? app.setProperty("Visible",newVariant(false));20??????????? Dispatch docs=app.getProperty("Documents").toDispatch();21??????????? Dispatch doc=Dispatch.invoke(22??????????????????? docs,23"Open",24??????????????????? Dispatch.Method,25
newObject[] { docfile,newVariant(false),26newVariant(true) },newint[1]).toDispatch();27
??????????? Dispatch.invoke(doc,"SaveAs", Dispatch.Method,newObject[] {28??????????????????? htmlfile,newVariant(WORD_HTML) },newint[1]);29??????????? Variant f=newVariant(false);30??????????? Dispatch.call(doc,"Close", f);31??????? }32catch(Exception e)33
{34??????????? e.printStackTrace();35??????? }36finally37
{38
??????????? app.invoke("Quit",newVariant[]{});39??????? }40??? }4142
/** *//**43???? * EXCEL轉HTML44???? *@paramxlsfile EXCEL文件全路徑45???? *@paramhtmlfile 轉換后HTML存放路徑46*/47public staticvoidexcelToHtml(String xlsfile, String htmlfile)48
{49??????? ActiveXComponent app=newActiveXComponent("Excel.Application");//啟動word 50try51
{52??????????? app.setProperty("Visible",newVariant(false));53??????????? Dispatch excels=app.getProperty("Workbooks").toDispatch();54??????????? Dispatch excel=Dispatch.invoke(55??????????????????? excels,56"Open",57??????????????????? Dispatch.Method,58
newObject[] { xlsfile,newVariant(false),59newVariant(true) },newint[1]).toDispatch();60
??????????? Dispatch.invoke(excel,"SaveAs", Dispatch.Method,newObject[] {61??????????????????? htmlfile,newVariant(EXCEL_HTML) },newint[1]);62??????????? Variant f=newVariant(false);63??????????? Dispatch.call(excel,"Close", f);64??????? }65catch(Exception e)66
{67??????????? e.printStackTrace();68??????? }69finally70
{71
??????????? app.invoke("Quit",newVariant[]{});72??????? }73??? }7475}
76
當時我在找轉換控件時,發現網易也轉載了一偏關于Jacob使用幫助,但其中出現了比較嚴重的錯誤:String htmlfile = "C:\\AA";
只指定到了文件夾一級,正確寫法是String htmlfile = "C:\\AA\\xxx.html";
到此WORD/EXCEL轉換HTML就已經差不多了,相信大家應該很清楚了:)
二、使用XPDF將PDF轉換為HTML
2、下載中文支持包
3、下載pdftohtml支持包
4、解壓調試
1) 先將xpdf-3.02pl2-win32.zip解壓,解壓后的內容可根據需要進行刪減,如果只需要轉換為txt格式,其他的exe文件可以刪除,只保留pdftotext.exe,以此類推;
2) 然后將xpdf-chinese-simplified.tar.gz解壓到剛才xpdf-3.02pl2-win32.zip的解壓目錄;
3) 將pdftohtml-0.39-win32.tar.gz解壓,pdftohtml.exe解壓到xpdf-3.02pl2-win32.zip的解壓目錄;
4) 目錄結構:
+---[X:\xpdf]
|-------各種轉換用到的exe文件
|
|-------xpdfrc
|
+------[X:\xpdf\xpdf-chinese-simplified]
|
|
+-------很多轉換時需要用到的字符文件
xpdfrc:此文件是用來聲明轉換字符集對應路徑的文件
5) 修改xpdfrc文件(文件原名為sample-xpdfrc)
修改文件內容為: Txt代碼
#
-----
begin Chinese Simplified support
package
cidToUnicode??? Adobe
-
GB1?????? xpdf
-
chinese
-
simplified\Adobe
-
GB1.cidToUnicode
unicodeMap????? ISO
-
2022
-
CN???? xpdf
-
chinese
-
simplified\ISO
-
2022
-
CN.unicodeMap
unicodeMap????? EUC
-
CN????????? xpdf
-
chinese
-
simplified\EUC
-
CN.unicodeMap
unicodeMap? GBK??? xpdf
-
chinese
-
simplified\GBK.unicodeMap
cMapDir???????? Adobe
-
GB1?????? xpdf
-
chinese
-
simplified\CMap
toUnicodeDir??????????????????? xpdf
-
chinese
-
simplified\CMap
fontDir? C:\WINDOWS\Fonts??
displayCIDFontTT Adobe
-
GB1 C:\WINDOWS\Fonts\simhei.ttf
#
-----
end Chinese Simplified support
package
6) 創建bat文件pdftohtml.bat(放置的路徑不能包含空格)
內容為: Txt代碼
@echo off
set folderPath
=%
1
set filePath
=%
2
cd
/
d
%
folderPath
%
pdftohtml
-
enc GBK
%
filePath
%
exit 7) 創建類
JAVA代碼
public
class
ConvertPdf
{
private staticString INPUT_PATH;??
private staticString PROJECT_PATH;??
??????
public staticvoidconvertToHtml(String file, String project)??
{??
??????? INPUT_PATH=file;??
??????? PROJECT_PATH=project;??
if(checkContentType()==0)??
{??
??????????? toHtml();??
??????? }??? }??????
private staticintcheckContentType()??
{??
??????? String type=INPUT_PATH.substring(INPUT_PATH.lastIndexOf(".")+1, INPUT_PATH.length())??
??????????????? .toLowerCase();??
if(type.equals("pdf"))??
return 0;??
elsereturn 9;??
??? }??????
private staticvoidtoHtml()??
{??
if(newFile(INPUT_PATH).isFile())??
{??
try
{??
??????????????? String cmd="cmd /c start X:\\pdftohtml.bat \""+ PROJECT_PATH +"\"\""+ INPUT_PATH +"\"";??
??????????????? Runtime.getRuntime().exec(cmd);??
??????????? }catch(IOException e)??
{??
??????????????? e.printStackTrace();??
??????????? }??????? }??? }??????
}
總結
以上是生活随笔為你收集整理的java xpdf 转换成html_java将Word/Excel/PDF文件转换成HTML整理的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java趣味_Java趣味分享:try
- 下一篇: java8.0 platform图_Ja