PHP curl_setopt函数用法介绍中篇
此篇已實(shí)例為主。
一.一般的實(shí)例
demo1.php
<?php$user = "admin123";$pass = "admin456";// $curlPost = "user=$user&pass=$pass"; #### 測(cè)試一######測(cè)試二
$curlPost = array( 'a'=>123,'b'=>456,'c'=>789);$ch = curl_init(); //初始化一個(gè)CURL對(duì)象curl_setopt($ch, CURLOPT_URL, "demo2.php");//設(shè)置你所需要抓取的URLcurl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);//設(shè)置curl參數(shù),要求結(jié)果是否輸出到屏幕上,為true的時(shí)候是不返回到網(wǎng)頁中// 假設(shè)上面的0換成1的話,那么接下來的$data就需要echo一下。curl_setopt($ch, CURLOPT_POST, 1);//post提交 curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);$data = curl_exec($ch);// echo $data;//運(yùn)行curl,請(qǐng)求網(wǎng)頁。 curl_close($ch);?>
查看結(jié)果:
<?php // echo "aaaaa"; echo "<pre>"; var_dump($_POST); echo "<br/>"; echo $_POST['b'];echo "<br/>";$a = array('1'=>1234,'2'=>567); var_dump($a);?> <?php // 初始化一個(gè) cURL 對(duì)象 $curl = curl_init();// 設(shè)置你需要抓取的URL curl_setopt($curl, CURLOPT_URL, 'http://www.baidu.com');// 設(shè)置header curl_setopt($curl, CURLOPT_HEADER, 1);// 設(shè)置cURL 參數(shù),要求結(jié)果保存到字符串中還是輸出到屏幕上。 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);// 運(yùn)行cURL,請(qǐng)求網(wǎng)頁 $data = curl_exec($curl);// 關(guān)閉URL請(qǐng)求 curl_close($curl);// 顯示獲得的數(shù)據(jù) var_dump($data); ?>?
?二.采集數(shù)據(jù)信息
在實(shí)踐的項(xiàng)目中,要求采集某網(wǎng)站產(chǎn)品列表的信息。右擊網(wǎng)站,查看源碼,發(fā)現(xiàn)并無產(chǎn)品信息。通過分析該網(wǎng)站,產(chǎn)品信息是通過ajax調(diào)用的。
利用跑數(shù)據(jù)的形式,獲取產(chǎn)品信息,分析url地址,得到的json形式的產(chǎn)品數(shù)據(jù),通過轉(zhuǎn)換,將產(chǎn)品屬性“對(duì)坐入號(hào)”獲取,我采用的是插入到數(shù)據(jù)庫當(dāng)中。
代碼如下:
?
<?php header('Content-Type:text/html;charset=UTF-8');set_time_limit(0);$id = isset($_GET['id']) ? intval($_GET['id']) : 1;$listurl = "http://www.ptgcn.com/Handler/PublicationHandler.ashx?r=0.38488502931227453&func=GetDataByPCD&index={$id}&size=20&sort=&dir=asc¶m="; // echo $listurl; exit; ####默認(rèn)結(jié)束狀態(tài) $ch = curl_init(); // var_dump($ch); // 2. 設(shè)置選項(xiàng),包括URL curl_setopt($ch, CURLOPT_URL, $listurl); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch,CURLOPT_HTTPHEADER,array ("Content-Type: text/xml; charset=utf-8","Expect: 100-continue"));// 3. 執(zhí)行并獲取HTML文檔內(nèi)容 $output = curl_exec($ch);$httpCode = curl_getinfo($ch,CURLINFO_HTTP_CODE);curl_close($ch); // var_dump($httpCode); if($httpCode == '200'){ // echo "123456";// var_dump($output);$json_b = json_decode($output,true);// var_dump($json_b);$new = $json_b['rows'];// var_dump($new);foreach($new as $row){$catalog ='';$catalog = addslashes(trim($row['CatalogNo']));$catalog = "'". $catalog."'";$url = '';$url = "http://www.ptgcn.com/Products/".trim($row['PermaLink']).".html";$catalog_addr = $url;$catalog_addr = "'".$catalog_addr."'";$pubmed_id ='';$pubmed_id = addslashes(trim($row['PMID']));$pubmed_id = "'".$pubmed_id."'";$author = '';$author = addslashes(trim($row['Author']));$author = "'".$author."'";$journal_a = '';$journal_b = '';$journal = '';$journal_a = addslashes(trim($row['Journal']));$journal_b = addslashes(trim($row['PubDate']));$journal = $journal_a.",".$journal_b;$journal = "'".$journal."'";// $journal = addslashes(trim($row['CatalogNo']));$application = '';$application = addslashes(trim($row['App']));$application = "'".$application."'";$species = '';$species = addslashes(trim($row['Species']));$species = "'".$species."'";$title = '';$title = addslashes(trim($row['Subject']));$title = "'".$title."'";var_dump($row);exit;$con = mysql_connect('localhost',"root","root");mysql_set_charset("utf8");$select = mysql_select_db("wh");$sql = '';$sql = "insert into ptg(catalog,catalog_addr,pubmed_id,author,journal,application,species,title)values({$catalog},{$catalog_addr},{$pubmed_id},{$author},{$journal},{$application},{$species},{$title})";echo $sql;$query_insert = mysql_query($sql);if($query_insert){echo "success!";}else{$html = '';$html = $html.$catalog."\r\n";file_put_contents('error.txt',$html,FILE_APPEND); //用于裝載執(zhí)行失敗的數(shù)據(jù) }mysql_close($con);// var_dump($con,$select);// exit; } }else{echo 'finished !';exit; }?><script> function JumpUrl() {location.href='?id=<?php echo ($id+1);?>'; } setTimeout('JumpUrl()',1); </script>
說到采集再介紹一種方法:
三.利用phpQuery.采集數(shù)據(jù)信息
下載地址:http://www.jb51.net/article/59522.htm
?????? git:https://github.com/TobiaszCudnik/phpquery
1.使用場(chǎng)合,右擊某網(wǎng)站,查看源碼,如果源碼信息與網(wǎng)站本身一樣,可以考慮使用phpQuery進(jìn)行采集
2.phpQuery,顧名思義:及將源碼的編譯方式轉(zhuǎn)換為jQuery的形式。然后一切調(diào)用都使用jQuery風(fēng)格即可
3.說明:先要引用? include 'phpQuery/phpQuery.php';
如:
?
簡(jiǎn)單的三行代碼,就可以獲取頭條內(nèi)容。首先在程序中包含phpQuery.php核心程序,然后調(diào)用讀取目標(biāo)網(wǎng)頁,最后輸出對(duì)應(yīng)標(biāo)簽下的內(nèi)容。
?
pq()是一個(gè)功能強(qiáng)大的方法,跟jQuery的$()如出一轍,jQuery的選擇器基本上都能使用在phpQuery上,只要把“.”變成 “->”。
如上例中,pq(".blkTop h1:eq(0)")抓取了頁面class屬性為blkTop的DIV元素,并找到該DIV內(nèi)部的第一個(gè)h1標(biāo)簽,
然后用html()方法獲取h1標(biāo)簽里 的內(nèi)容(帶html標(biāo)簽),也就是我們要獲取的頭條信息,如果使用text()方法,則只獲取頭條的文本內(nèi)容。
當(dāng)然要使用好phpQuery,關(guān)鍵是要找 對(duì)文檔中對(duì)應(yīng)內(nèi)容的節(jié)點(diǎn)。
4.具體的采集代碼如下:
<?php header('Content-Type:text/html;charset=UTF-8'); include 'phpQuery/phpQuery.php';set_time_limit(0);$id = isset($_GET['id']) ? intval($_GET['id']) : 1;if($id > 14172){echo "finish!";exit; }$con = mysql_connect('localhost',"root","root"); mysql_set_charset("utf8"); $select = mysql_select_db("wh");$sql = "select catalog_addr from ptg where id = {$id}"; $query = mysql_query($sql); $row = mysql_fetch_assoc($query); // var_dump($row); $url = stripslashes(trim($row['catalog_addr'])); $url = str_replace("html","htm",$url); // echo $url; // exit;// echo "aaaaa"; // phpQuery::newDocumentFile('http://helloweba.com/blog.html');####$url,就是單純的url地址
phpQuery::newDocumentFile($url); // echo "aaaaa"; // exit; $artList = pq(".proTitle");// var_dump($artList);#### No.1 foreach($artList as $li){$one = pq($li)->find('h1')->html();$one = strip_tags($one);$one = trim($one);echo $one.'<br/>'; }#### No.2$artList_a = pq("#dvProApp");foreach($artList_a as $li){$two_a = pq($li) -> find('tr td')->eq(1) ->html();$two_b = pq($li) -> find('tr td')->eq(1) ->find('a') ->html();$two_a = strip_tags($two_a);$two_a = trim($two_a);// var_dump($two_a);// echo '<br/>';$two_b = strip_tags($two_b);$two_b = trim($two_b);// var_dump($two_b);// echo '<br/>';$two = str_replace($two_b,"",$two_a);$two = strip_tags($two);$two = str_replace(" ","",$two);// $two = str_replace(" ","",$two);$two = trim($two);echo $two.'<br/>';// $two = preg_replace('/<(\/?a.*?)>/si','',$two);// echo $two;// exit; }##### N0.3 $artList_b = pq("#dvProApp");foreach($artList_b as $li){$three = pq($li) -> find('tr')->eq(1)->find('td')->eq(1)->html();$three = trim(strip_tags($three));echo $three.'<br/>'; }##### N0.4 $artList_c = pq("#dvProApp");foreach($artList_c as $li){$four = pq($li) -> find('tr')->eq(2)->find('td')->eq(1)->html();$four = trim(strip_tags($four));echo $four.'<br/>'; }##### N0.5 $artList_d = pq("#dvProApp");foreach($artList_d as $li){$five = pq($li) -> find('tr')->eq(3)->find('td')->eq(1)->html();$five = trim(strip_tags($five));echo $five.'<br/>'; }##### N0.6 $artList_e = pq("#dvProImm");foreach($artList_e as $li){$six = pq($li) -> find('tr')->eq(2)->find('td')->eq(1)->html();$six = trim(strip_tags($six));echo $six.'<br/>'; }##### N0.7 $artList_f = pq("#dvProImm");foreach($artList_f as $li){$seven = pq($li) -> find('tr')->eq(2)->find('td')->eq(3)->html();$seven = trim(strip_tags($seven));echo $seven.'<br/>'; }##### N0.8 $artList_g = pq("#dvProImm");foreach($artList_g as $li){$ba = pq($li) -> find('tr')->eq(3)->find('td')->eq(1)->html();$ba = trim(strip_tags($ba));echo $ba.'<br/>'; }##### N0.9 $artList_h = pq("#dvProImm");foreach($artList_h as $li){$nine = pq($li) -> find('tr')->eq(3)->find('td')->eq(3)->html();$nine = trim(strip_tags($nine));echo $nine.'<br/>'; }####執(zhí)行插入操作 $one = "'".addslashes(trim($one))."'"; $two = "'".addslashes(trim($two))."'"; $three = "'".addslashes(trim($three))."'"; $four = "'".addslashes(trim($four))."'"; $five = "'".addslashes(trim($five))."'"; $six = "'".addslashes(trim($six))."'"; $seven = "'".addslashes(trim($seven))."'"; $ba = "'".addslashes(trim($ba))."'"; $nine = "'".addslashes(trim($nine))."'";$sql_in = "insert into ptg_page(title,house_app,pub_app,spe_spe,pub_spe,gen_num,gen_id,gene_sym,syn)values({$one},{$two},{$three},{$four},{$five},{$six},{$seven},{$ba},{$nine})"; echo $sql_in.'<br/>';$query_in = mysql_query($sql_in);if($query_in){echo "success!"; }else{$html = '';$html = $html.$url."\r\n";file_put_contents('error_url.txt',$html,FILE_APPEND); //用于裝載執(zhí)行失敗的數(shù)據(jù) }mysql_close($con);?><script> function JumpUrl(){location.href='?id=<?php echo ($id+1);?>'; } setTimeout('JumpUrl()',1); </script>
?
總結(jié)
以上是生活随笔為你收集整理的PHP curl_setopt函数用法介绍中篇的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 高效的沟通方式-会议
- 下一篇: Linux 无法使用su