DotText源碼閱讀(7) --Pingback/TrackBack 博客這種服務的區(qū)別于論壇和所謂文集網(wǎng)站,很大程度上我認為是由于pingback/trackback的存在,使得博客這種自媒體有可以延展加入SNS的要素。所以分析博客程序,我們需要了解這種協(xié)議以及協(xié)議的實施細節(jié)。 在dottext的源代碼中,在發(fā)表作品中,我們可以看到有pingback協(xié)議的支持,同時在web services的實現(xiàn)中,有trackback協(xié)議的實現(xiàn)。至于什么是piongback/trackback協(xié)議,google下應當可以找到,也不用我費口舌。 通過
???????? ???? <HttpHandlerpattern="/(?:admin)"type="Dottext.Web.UI.Handlers.BlogExistingPageHandler, Dottext.Web"handlerType="Factory"/> 的映射,使得我們訪問每一個blog的admin目錄時候,都會UrlRewrite到dottexweb\admin目錄下的相對應aspx文件(參考前面部分),其中在發(fā)表post的時候,我們看到是這樣一個調(diào)用關系:
private void UpdatePost() ???????? {??? ???????? ???? if(Page.IsValid) ????????????? { ????????????? ???? string successMessage = Constants.RES_SUCCESSNEW; ????????????? ???? try ????????????? ???? { ????????????? ???????? Entry entry = new Entry(EntryType); ????????????? ???? ???? entry.Title = txbTitle.Text; ????????????? ???? ???? entry.Body = Globals.StripRTB(ftbBody.Text,Request.Url.Host); ????????????? ???????? … ????????????? ???? ???? entry.BlogID = Config.CurrentBlog(Context).BlogID; ????????????? ???????? ????????????? ???????? if (PostID > 0) ????????????? ???????? {//是更新操作 ????????????? ???????? ???? successMessage = Constants.RES_SUCCESSEDIT; ????????????? ???????? ???? entry.DateUpdated = DateTime.Now;//BlogTime.CurrentBloggerTime; ????????????? ???????? ???? entry.EntryID = PostID; ????????????? ????????????? … ????????????? ???????? ???? Entries.Update(entry); ????????????? ????????????? … ????????????? ???????? } ????????????? ???????? else ????????????? ???????? {//新建操作 ????????????? ???????? ???? entry.DateCreated = DateTime.Now;//BlogTime.CurrentBloggerTime; ????????????? ???????? ???? PostID = Entries.Create(entry);??????? ????????????? ???????? } … ????????????? ???? } ????????????? ???? catch(Exception ex) ????????????? ???? {…????? ???? } ????????????? ???? finally ????????????? ???? {??? …?? ???? } ????????????? } ???? } ???? Entries.Create(entry);是這樣的: ???????? public static int Create(Entry entry, int[] CategoryIDs) ???????? { ???????? ???? HandlerManager.PreCommit(entry,ProcessAction.Insert);??? ???????? ???????? ???? int result = DTOProvider.Instance().Create(entry,CategoryIDs);????????????? ???????? ???? if(result > 0) ????????????? { ????????????? ???? HandlerManager.PostCommit(entry,ProcessAction.Insert);?? ????????????? ????????????? } ???????? ???? return result; ???? } ???? 最終的數(shù)據(jù)存儲試調(diào)用DTOProvider也就是DataDTOProvider 最終是落到 SqlDataProvider 來實現(xiàn)數(shù)據(jù)存儲操作。但是我們注意到 HandlerManager.PostCommit(entry,ProcessAction.Insert);???? 這個操作。仔細看看: ???? HandlerManager 是一個關于Entry操作類的包裝類(wapper class),PreCommit是這樣定義的: ???? ???? Process(ProcessState.PreCommit,e,pa); ???? 而Process是這樣讀取web.config的 ???? public static void Process(ProcessState ps, Entry e, ProcessAction pa) ???????? {???? //Do we have factories??在疑惑是否該用工廠模式呢 ???????? ???? EntryHandler[] hanlers = Config.Settings.EntryHandlers;???? //這是反序列化哦,這里的Config是Dottext.Framework.Configuration.Config ???????? ???? if(e != null && hanlers != null) ????????????? {???? //walk the entries?遍歷全部處理例程 ????????????? ???? for(int i = 0; i<hanlers.Length; i++) ????????????? ???? { ????????????? ???? ???? EntryHandler handler = hanlers[i]; ????????????? ???? ???? if(ShouldProcess(ps,handler,e.PostType,pa)) ????????????? ???????? { ????????????? ???????? ???? IEntryFactoryHandler ihandler = handler.IEntryFactoryHandlerInstance;???????? ????????????? ???? ????????????? ???????? ???? //Call the IEntryFactoryHandler configure method. This gives async items a chance to "ready" themselves ????????????? ???????? ???? //before leaving the main thread and entering the managed queue. ????????????? ???????? ???? ihandler.Configure(); ????????????? ???????? ???? if(handler.IsAsync) ????????????? ???????? ???? {//Add factory to managed queue. ????????????? ????????????? ???? EntryHanlderQueue.Enqueue(ihandler,e); ????????????? ????????????? } ????????????? ???????? ???? else ????????????? ????????????? { ????????????? ????????????? ???? ihandler.Process(e); ????????????? ????????????? } ????????????? ???????? } ????????????? ???????? ????????????? ???? } ????????????? } ???????? } ShouldProcess 是判斷是預提交還是已經(jīng)提交post,決定是否應該進行handler的實例化,如果是已經(jīng)提交的Post,我們需要進行handler.IEntryFactoryHandlerInstance; ???? IentryFactoryHandlerInstance最終是通過 ihandler = (IEntryFactoryHandler)Activator.CreateInstance(Type.GetType(this.ItemType)); 來實例化數(shù)組元素的(). 經(jīng)過實例化后,就可以執(zhí)行了。此時根據(jù) handler.IsAsync 的屬性,決定是允許 EntryHanlderQueue.Enqueue(ihandler,e); 加入隊列,還是馬上處理 ihandler.Process(e);. 對于可以異步執(zhí)行的靜態(tài)函數(shù) Enque 處理: public static void Enqueue(IEntryFactoryHandler factory, Entry e) ???????? { ???????? ???? EntryHanlderQueue ehq = new EntryHanlderQueue(factory,e); ???????? ???? ManagedThreadPool.QueueUserWorkItem(new WaitCallback(ehq.Enqueue)); ???? } 構造一個實例,然后加入線程隊列進行任務排隊。線程管理暫不討論。我們看看這幾個EntryHandler. TrackBack Handler是如何處理的呢?
public void Process(Dottext.Framework.Components.Entry e) ???????? { ???????? ???? //Get a list of links from the current post ???????? ???? StringCollection links = TrackHelpers.GetLinks(e.Body); ???????? ???? if(links != null && links.Count > 0) ????????????? { ????????????? ???? //Instantiate our proxy ????????????? ???? TrackBackNotificationProxy proxy = new TrackBackNotificationProxy(); ????????????? ???? ????????????? ???? //Walk the links ????????????? ???? for(int i = 0; i<links.Count; i++) ????????????? ???? { ????????????? ???????? string link = links[i]; ????????????? ???????? //get the page text ????????????? ???????? string pageText = BlogRequest.GetPageText(link,e.Link); ????????????? ???? ???? if(pageText != null) ????????????? ???????? { ????????????? ???????? ???? try ????????????? ????????????? { ????????????? ????????????? ???? string desc = null; ????????????? ????????????? ???? if(e.HasDescription) ????????????? ????????????? ???? { ????????????? ????????????? ???????? desc = e.Description; ????????????? ????????????? ???? } ????????????? ????????????? ???? else ????????????? ????????????? ???? { ????????????? ????????????? ???? ???? desc=string.Format("TrackBack From:{0}",e.Link); ????????????? ????????????? ???????? ? ????????????? ????????????? ???? }??? desc = regexStripHTML.Replace(e.Body,string.Empty); ????????????? ????????????? ???? ???? if(desc.Length > 100) ????????????? ????????????? ???????? { ????????????? ????????????? ???????? ???? int place = 100; ????????????? ????????????? ???????? ???? int len = desc.Length-1; ????????????? ????????????? ???????? ???? while(!Char.IsWhiteSpace(desc[place]) && i < len) ????????????? ????????????? ????????????? { ????????????? ????????????? ????????????? ???? place++; ????????????? ????????????? ????????????? } ????????????? ????????????? ???????? ???? desc = string.Format("{0}...",desc.Substring(0,place)); ????????????? ????????????? ???????? } ????????????? ????????????? ???? } ????????????? ????????????? ???? //attempt a trackback. ????????????? ???????? ???? proxy.TrackBackPing(pageText,link,e.Title,e.Link,e.Author,desc);???????? ????????????? ???? ????????????? ????????????? } ????????????? ???????? ???? catch(Exception ex) ????????????? ????????????? {????????????? ????????????? ???? Logger.LogManager.CreateExceptionLog(ex,string.Format("Trackback Failure: {0}",link)); ????????????? ????????????? } ????????????? ???????? } ????????????? ???? } ????????????? } ???? } ?TrackHelpers.GetLinks 會分析Entry.Body字符串,獲得post的全部href連結,也就是對外引用部分,這個TrackBack利用proxy.TrackBackPing(pageText,link,e.Title,e.Link,e.Author,desc); 將本文的對外引用通告剛剛獲得的連接地址。 TrackBackPing : ??? string pageText = BlogRequest.GetPageText(link,e.Link);會利用BlogRequest的http協(xié)議能力下載被引用地址的source code,然后 link為另外blog的地址,而e.Link為reffer,這是為了告知對方那個頁面引用了link。經(jīng)過安全解碼后,獲得了link的源代碼,然后TrackBackPing會進行分析,找尋string sPattern = @"<rdf:\w+\s[^>]*?>(</rdf:rdf>)?";匹配的部分,分析出其中的引用通告地址。下一步就是利用SendPing(string trackBackItem, string parameters),向目標地址處post一個application/x-www-form-urlencoded"的數(shù)據(jù)。此即完成了一次trackBack. ? 其他幾個EntryHandler也是分同步和異步的,大家可以照此閱讀。 ???? ??? 題外話:那些沒有禮貌的實現(xiàn)pingback/Trackback的所謂blog,就不要妄自稱自己為博客服務商(BSP)吧。 CNBlogsDottext10Beta2版本中,TRACKBACK功能被屏蔽掉了,原因可能是因為很多人安裝成功后,在提交包含引用鏈接的POSTS時,出現(xiàn)錯誤:
?將截斷字符串或二進制數(shù)據(jù)
其實這是因為發(fā)送TRACKBACK的關鍵方法:SendPing(string trackBackItem, string parameters)里,發(fā)送字節(jié)流時按照ASCII碼的長度來發(fā)送,當PARAMETERS中包含中文時,就會出錯,解決方法是轉換成UTF-8發(fā)送,下面是我修改過后的代碼:
??private void SendPing(string trackBackItem, string parameters)
??{
???HttpWebRequest request = BlogRequest.CreateRequest(trackBackItem);
???request.Method = "POST";
???request.ContentType = "application/x-www-form-urlencoded";
???request.KeepAlive = false;
???byte [] buff = Encoding.GetEncoding("UTF-8").GetBytes(parameters);?????
???request.ContentLength = buff.Length;
???Stream reqStream?= null;
???try
???{
????reqStream = request.GetRequestStream();
????reqStream.Write(buff, 0, buff.Length);
???}
???catch(Exception e)
???{
????Logger.LogManager.CreateExceptionLog(e,"SendPing Exception");
???}
???finally
???{
????reqStream.Close();
???}
首先我們來看一下是怎么發(fā)送TRACKBACK的:入口是Dottext.Framework.EntryHandling.Process
檢查文章內(nèi)容中是否已經(jīng)包含了遠程網(wǎng)頁的鏈接,只有包含才能繼續(xù) 從遠程鏈接的網(wǎng)頁下載HTML代碼,如果沒得到,說明不是合法鏈接,要返回 在得到的HTML代碼中檢查是否已經(jīng)包含了本文的鏈接,有說明已經(jīng)PING過了,要返回 在得到的HTML代碼中根據(jù)TRACKBACK標準取到要TRACKBACK鏈接(鏈接包含在RDF為鍵值的被注釋的XHTML代碼中),從而完成了由網(wǎng)頁鏈接到TRACKBACK鏈接的轉化 發(fā)送(PING)TRACKBACK。
再來看一下接收TRACKBACK的流程,入口是:Dottext.Framework.Tracking.TrackBackHandler.ProcessRequest
根據(jù)PING過來的TRACKBACK鏈接得到本地文章的ID號,得不到則不是合法鏈接,要返回 REQUEST方法是否是POST,不是要返回,這是TRACKBACK標準規(guī)定的。 根據(jù)ID號從庫中檢索數(shù)據(jù),生成ENTRY對象 根據(jù)傳過來的URL下載遠程網(wǎng)頁的HTML代碼,如果沒得到或得到的HTML中沒有包含本地文章的鏈接,說明不是合法鏈接,要返回 從得到的HTML代碼中分析出對方的頁面標題,如果沒有,要返回 生成一個新的ENTRY對象,并對其各個屬性賦值,然后入庫
由此我們看到DOTTEXT發(fā)送TRACKBACK時效率是比較低的。原因是需要去下載遠程的HTML,這將是一個非常耗時的工作,更不用說還要從很可能十分龐大的HTML代碼中提取出TRACKBACK鏈接。
再者就是接收TRACKBACK時,并沒有建立屏蔽機制。從而無法避免垃極廣告的侵襲,也就是我們所說的SPAM COMMENT。
我想要解決這些問題,需要改變發(fā)送TRACKBACK的發(fā)送機制。
不再根據(jù)TRACKBACK標準去自動獲取TRACKBACK,這樣不僅效率極低,而且很多網(wǎng)站并不支持這個標準(比如
www.blogchinese.com 就直接提供引用通告,而不是隱藏在網(wǎng)頁中,哈哈)的話,就無法TRACKBACK成功了。我們就認為用戶輸入的就是合法的TRACKBACK鏈接。直接進行發(fā)送。 為了能讓用戶得到合法的TRACKBACK地址,在每一篇文章內(nèi)容之后,都顯示此文的TRACKBACK鏈接。 再提供一個頁面,輸入網(wǎng)頁鏈接,就能顯示出此網(wǎng)頁的TRACKBACK鏈接,以繼續(xù)支持那些符合標準的網(wǎng)站
在接收TRACKBACK時,我們相應做以下改動:
將對方URL拿到庫里去驗證,看對方是否已經(jīng)PING過了,因為是在本地進行,速度會非常快。 在庫中建立BLACKIP表,對來方的IP進行校驗,這樣就擁有了封對方IP的功能。
以上只是我的設想,因為時間的原因,還沒有動手去實現(xiàn),如果大家有更好的建議,可以一起來探討。
posted on
2007-07-13 15:29 方正 閱讀(
...) 評論() 編輯 收藏
轉載于:https://www.cnblogs.com/linckle/archive/2007/07/13/817321.html
總結
以上是生活随笔為你收集整理的DotText源码阅读(7) --Pingback/TrackBack的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。