用户代理字符串_用户代理字符串(或者,不要让我追随您)
用戶代理字符串
A very long time ago (read: ten years ago), we were in-between the so-called First and Second Browser Wars. Internet Explorer had killed Netscape Navigator by taking advantage of their desktop monopoly and Scrooge McDuck-like financial reserves to install a free copy of Internet Explorer on every single computer in the world (basically). Internet Explorer 6 was the dominant browser, and Netscape as a company was over.
很久以前(讀:十年前),我們處于所謂的“第一次和第二次瀏覽器大戰”之間。 Internet Explorer通過利用其臺式機壟斷和類似Scrooge McDuck的財務準備金殺死了Netscape Navigator,從而在世界上的每臺計算機上(基本上)安裝了Internet Explorer的免費副本。 Internet Explorer 6是主要的瀏覽器,而Netscape公司已經結束。
Netscape, before their demise, had embarked on a project to totally rewrite their web browser. Their new code was open-sourced and given to the Mozilla foundation. In hindsight, this was a stunningly successful move, with the ever-awesome Mozilla Foundation going from strength to strength now, nearly ten years after its foundation.
Netscape滅亡之前,已經著手進行一個項目以完全重寫其Web瀏覽器。 他們的新代碼是開源的,并提供給Mozilla基金會。 事后看來,這是一次令人驚訝的成功舉動,令人難以置信的Mozilla基金會在成立近十年后,如今正不斷壯大。
The second browser war was initially a festering cold war between the reborn Netscape Navigator (now entitled Mozilla Firefox) and the dormant Internet Explorer 6 (eventually updated to IE 7 after a 6 year development freeze). Later, other parties like Google Chrome joined the party. Oh, and Safari and Opera were kinda floating around in this war too, but honestly they’re not that important to the story I’m trying to tell.
第二次瀏覽器之戰最初是重生的Netscape Navigator(現稱Mozilla Firefox)和Hibernate的Internet Explorer 6(在長達6年的開發凍結后最終更新為IE 7)之間的一場激烈的冷戰。 后來,像Google Chrome這樣的其他聚會也加入了聚會。 哦,在這場戰爭中,Safari和Opera也在其中徘徊,但說實話,它們對我要講的故事并不那么重要。
Anyway, long story still kinda long, as part of these two browser wars, browsers felt the need to compete with each other on features. However, to use these features you needed to get web developers to build web sites that used them. The problem is that your new feature would only work on your browser. This meant that, when some poor soul came along trying to view your super-awesome ActiveX powered web page, and they had the misfortune to be using Netscape Navigator, your website would at best look awful, and at worst explode in several mysterious ways.
無論如何,長話短說還算長,作為這兩次瀏覽器大戰的一部分,瀏覽器感到有必要在功能上相互競爭。 但是,要使用這些功能,您需要使Web開發人員構建使用它們的網站。 問題在于您的新功能只能在您的瀏覽器上使用。 這意味著,當一些可憐的人試圖查看您的超贊ActiveX驅動的網頁,而他們不幸使用Netscape Navigator時,您的網站最好看起來糟透了,最糟糕的是會以幾種神秘的方式爆炸。
These people would then go away and tell their friends about your crappy website that wouldn’t even render properly! And they’d say that their friends should use your competitor’s website, even though your competitor can’t even spell ActiveX! And you’d go out of business and your children would have to go to a state school, and it would just be horrible.
然后這些人會走開,并告訴他們的朋友您的糟糕網站甚至無法正確呈現! 他們說他們的朋友應該使用您競爭對手的網站,即使您的競爭對手甚至無法拼寫ActiveX! 而且您將倒閉,您的孩子將不得不上公立學校,這簡直太可怕了。
So you needed some way to tell what features a browser had. There was a way to do that, of course: Javascript. Unfortunately, some features couldn’t be easily detected in Javascript, and writing Javascript was, well, weird, and Javascript was slow, and so lots of websites didn’t want to do that (or didn’t know they should). What would they do instead?
因此,您需要某種方式來告訴瀏覽器具有哪些功能。 當然,有一種方法可以實現:Javascript。 不幸的是,某些功能無法用Javascript輕易檢測到,并且編寫Javascript很奇怪而且Javascript速度很慢,因此很多網站都不想這樣做(或者不知道應該這樣做)。 他們會怎么做?
Well, RFC 1945 and RFC 2616 (the HTTP 1.0 and HTTP 1.1 specifications) stated that all browsers, web crawlers and other tools that interacted with web servers should identify themselves using a special header in the HTTP they send: the User-Agent header. This header should be (as much as possible) unique to a specific type of agent. This means that Internet Explorer should send a User-Agent header that is different to all other browsers and to all other versions of IE.
好吧,RFC 1945和RFC 2616(HTTP 1.0和HTTP 1.1規范)規定,所有與Web服務器交互的瀏覽器,Web爬網程序和其他工具都應使用其發送的HTTP中的特殊標頭來標識自己: User-Agent標頭。 此標頭應(盡可能)對于特定類型的代理是唯一的。 這意味著Internet Explorer應該發送與所有其他瀏覽器和所有其他版本的IE不同的User-Agent標頭。
“Perfect!” cry the web developers. “Our servers can check for this string,. And so begins the the trouble.
“完善!” 讓網絡開發人員大哭。 “我們的服務器可以檢查此字符串。 這樣就開始了麻煩。
麻煩 (The Trouble)
You see, the problem with using the User-Agent string to check for features is that the User-Agent string tells you nothing about what features a given User-Agent has. After all, that’s not what it’s for! So you, na?ve late-1990s web programmer, might write your site when only Mozilla Firefox has support for the hot new Twiddlor feature (note: not a real feature). So you only server Twiddlor-enabled pages to people whose User-Agent strings identify them as being a version of Firefox.
您會看到,使用User-Agent字符串檢查功能的問題在于,User-Agent字符串無法告訴您給定User-Agent具有的功能。 畢竟,這不是它的目的! 因此,您(1990年末才真正的Web程序員)可能只在Mozilla Firefox支持新的熱門Twiddlor功能(注意:不是真正的功能)時編寫您的網站。 因此,您僅將啟用了Twiddlor的頁面服務器提供給其User-Agent字符串將其標識為Firefox版本的用戶。
The problem is, six months later the guys in Redmond get around to adding Twiddlor support to Internet Explorer. But all their users are still complaining that none of their favourite websites will let them use Twiddlor, instead claiming that the website is “Best used in Mozilla Firefox” or some such nonsense.
問題是,六個月后,雷德蒙德的家伙開始為Internet Explorer添加Twiddlor支持。 但是所有用戶仍然抱怨他們最喜歡的網站都不會讓他們使用Twiddlor,而是聲稱該網站是“ Mozilla Firefox中最佳使用”網站或類似的廢話。
How does Microsoft get you to show them the Twiddlor-enabled page? Simple: they change their User-Agent string! Sadly, I’m not even joking: this is actually what happened. To prove it, I’m going to show you a few modern browser UA strings.
Microsoft如何讓您向他們顯示啟用Twiddlor的頁面? 很簡單:他們更改了用戶代理字符串! 可悲的是,我什至沒有在開玩笑:這實際上是發生了什么。 為了證明這一點,我將向您展示一些現代的瀏覽器UA字符串。
Here’s the UA string sent by Google Chrome version 27.0.1453.47 beta (yeah), running on my Mac:
這是在我的Mac上運行的Google Chrome版本27.0.1453.47 beta(是)發送的UA字符串:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.47 Safari/537.36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.47 Safari/537.36“What is all that crap?”, I hear you ask, quite rightly. Why does it say it’s Mozilla? It’s not Mozilla! You’re quite right. But enough people have tested for Firefox by just checking that the word ‘Mozilla’ is in the UA string that everyone puts it there. And I mean everyone. Check out Safari, also on my Mac:
“那是什么廢話?”,我很正確地聽到你問。 為什么說是Mozilla? 不是Mozilla! 你說得很對。 但是,已經有足夠多的人通過僅檢查UA字符串中是否包含每個人都在其中的“ Mozilla”一詞來測試Firefox。 我是指每個人 。 也在我的Mac上查看Safari:
Notice that both Safari and Chrome claim to be versions of Safari. That’s pretty damn weird.
請注意,Safari和Chrome都聲稱是Safari的版本。 真是不可思議。
What about Internet Explorer 10, on my Windows machine?
我的Windows機器上的Internet Explorer 10怎么樣?
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0) Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)At least it’s not claiming to be Safari! In fact, this is the best UA string I’ve seen, being a fairly honest representation of the browser.
至少它不是自稱為Safari! 實際上,這是我見過的最好的UA字符串,是瀏覽器的一個非常誠實的表示。
Finally, let’s check Firefox, also on my Windows box.
最后,讓我們在Windows框中選中Firefox。
用戶代理字符串應該是什么樣? (What Should A User-Agent String Look Like?)
To see an example of how these were supposed to look when the standard was originally proposed, we can see what Requests sends.
要查看有關最初提出該標準時這些外觀的示例,我們可以看到請求發送的內容。
python-requests/1.2.0 CPython/2.7.2 Darwin/12.2.0 python-requests/1.2.0 CPython/2.7.2 Darwin/12.2.0Short and to the point. The ‘browser’ and its version, the ‘platform’ and its version, and the OS (sort of) and its version.
簡明扼要。 “瀏覽器”及其版本,“平臺”及其版本,以及OS(某種)及其版本。
為什么如此重要? (Why Does This Matter?)
In principle, the new Javascript-heavy world should have cured us of this problem. People should write JS that tests for features and then uses them, and serves a less interesting version of the web page if you don’t support it. And, mostly, this is what happens! Libraries like JQuery have taken a lot of the hard work out of doing this, so most websites you’ll encounter nowadays do the right thing.
原則上,新的Java繁重的世界應該已經解決了這個問題。 人們應該編寫用于測試功能的JS,然后再使用它們,并且如果您不支持該功能,則可以提供不太有趣的網頁版本。 而且,大多數情況下,這就是發生的事情! 像JQuery這樣的庫已經為完成此工作付出了很多辛苦的工作,因此,如今您會遇到的大多數網站都做對了。
The problem is, sometimes they don’t. And when they don’t, you can encounter strange and confusing bugs. These bugs then tie up developer time and generally make everyone’s life worse. To provide an example, I’m going to briefly walk you through a bug that appeared on the Requests GitHub page a few days ago.
問題是,有時他們沒有。 如果沒有,您可能會遇到奇怪而令人困惑的錯誤。 這些錯誤會占用開發人員的時間,并且通常會使每個人的生活變得更糟。 為了提供示例,我將簡要介紹幾天前出現在Requests GitHub頁面上的錯誤。
一個例子 (An Example)
A user reported that, when he accessed a specific web page by doing a simple GET with no complicated stuff, he was getting a httplib.IncompleteRead exception thrown into his face.
一個用戶報告說,當他通過執行簡單的GET而沒有復雜的內容訪問特定的網頁時,他的臉上出現了httplib.IncompleteRead異常。
This was odd in itself. This exception is only ever thrown when either the user or the remote server is using chunked encoding, but the user reported that he didn’t think either party was doing so. He also kindly provided the URL, so that I could reproduce the bug locally. (This is excellent practice, by the way: I’m far more likely to help out if I can easily reproduce your bug on my machine.)
這本身很奇怪。 僅當用戶或遠程服務器使用分塊編碼時才會拋出此異常,但是用戶報告他認為任何一方都沒有這樣做。 他還提供了URL,以便我可以在本地重現該錯誤。 (順便說一句,這是一種很好的做法:如果我可以輕松地在計算機上重現您的錯誤,我很有可能會提供幫助。)
When I made the same request, I also got the IncompleteRead exception thrown in my face. Further investigation showed that the web server claimed to be serving using chunked encoding, but in fact was just sending the page as normal. This is pretty bad, and there’s not much Requests can do about this: the web server is simply doing the wrong thing. First note for website developers: do NOT claim to be using chunked encoding when you are not!
當我發出相同的請求時,我的臉上也拋出了IncompleteRead異常。 進一步的調查表明,Web服務器聲稱使用分塊編碼進行服務,但實際上只是正常發送頁面。 這是非常糟糕的,請求對此無能為力:Web服務器只是在做錯事。 網站開發人員的首要注意事項:請勿在未使用時聲明使用分塊編碼!
I was interested to see if we could get the page data anyway, so I patched my local copy of the standard library to see what we got when I returned the data instead of throwing an exception. What I saw was the second unpleasant thing this web site had done. The HTML for this page was about 20 lines long. All it did was embed, at full size, a frame containing another page, or a warning if your browser doesn’t support frames.
我很想看看是否仍然可以獲得頁面數據,所以我修補了標準庫的本地副本,以查看返回數據時得到的結果,而不是引發異常。 我看到的是該網站造成的第二個令人不快的事情。 該頁面HTML大約有20行。 它所做的全部是全尺寸嵌入包含另一個頁面的框架 ,或者如果您的瀏覽器不支持框架則發出警告。
This is pretty obnoxious: why not just server the other page? Why require frames? You aren’t even doing anything with them, you’re just using them for the sake of using them! Second note for website developers: do not use frames when you don’t need them! They are awkward for anything that isn’t a browser.
這很令人討厭:為什么不只服務器其他頁面? 為什么需要鏡架? 您甚至沒有對它們做任何事情,只是為了使用它們而使用它們! 網站開發人員的第二個注意事項:不需要框架時不要使用框架! 他們對于不是瀏覽器的任何東西都很尷尬。
In an attempt to be helpful, I pulled the URL being framed out of the HTML and suggested the user hit that instead. Out of sheer curiosity, I then did a Requests GET on the URL.
為了提供幫助,我從HTML中拉出了被框架化的URL,并建議用戶點擊該URL。 出于好奇,我隨后在URL上執行了Requests GET。
Requests threw an exception again.
請求再次引發異常。
I was pretty surprised here, the page rendered fine in my browser. So I looked at the exception. Connection Reset By Peer, read the socket error text. For those who don’t know their network protocols, this indicates that the TCP connection to the web server was closed while we were expecting data on it.
我在這里感到很驚訝,頁面在瀏覽器中呈現良好。 因此,我查看了異常。 Connection Reset By Peer ,讀取套接字錯誤文本。 對于那些不了解其網絡協議的用戶,這表明在我們期望其上有數據時,與Web服務器的TCP連接已關閉。
This is very odd. Requests sent a totally compliant, basic HTTP GET request, and the remote server was shutting the connection in response to it. Doing this is totally against the HTTP specification. Any compliant server is required to respond with an HTTP error code and a Connection: close header if it wants to tear the connection down. Additionally, why did it work fine in Chrome but fail in Requests?
這很奇怪。 請求發送了完全合規的基本HTTP GET請求,并且遠程服務器正在響應該請求而關閉連接。 這樣做完全違反了HTTP規范。 如果任何兼容的服務器想要斷開連接,則需要使用HTTP錯誤代碼和Connection: close標頭進行響應。 此外,為什么它在Chrome中工作正常,但在請求中失敗?
There’s really only one obvious thing to do. I grabbed Chrome’s User-Agent string and got Requests to send that instead of its own UA string. (For those who want to spoof their UA string, Requests allows you to pass it as a header. We only set one ourselves if you don’t provide one for us.)
確實只有一件顯而易見的事情要做。 我抓取了Chrome的User-Agent字符串,并收到了發送該請求而不是其自己的UA字符串的請求。 (對于那些想要欺騙其UA字符串的用戶,Requests允許您將其作為標頭傳遞。如果您不為我們提供一個,我們只會設置一個。)
Success! The page rendered and returned to us.
成功! 頁面呈現并返回給我們。
For those who want a summary, what was happening here is that the remote site was sniffing the User-Agent header. Instead of checking for features, however, what it was doing was using the header as a gatekeeper! If you don’t have the right User-Agent, you don’t just get a less feature-filled site: you get nothing. Not even an HTTP error page.
對于那些想要摘要的人,這里發生的是遠程站點正在嗅探User-Agent標頭。 但是,它沒有檢查功能,而是在使用標頭作為網守! 如果您沒有合適的User-Agent,您不僅會獲得功能較少的網站:您一無所獲。 甚至沒有HTTP錯誤頁面。
This is probably the worst example of User-Agent sniffing I’ve ever seen. This was a website developer using a bad practice to violate the HTTP specification. In addition to simply being rude, this is also a genuine cost for many developers. And crap like this leads to stupid UA strings like the ones I showed above.
這可能是我見過的最嚴重的User-Agent監聽示例。 這是一位網站開發人員,使用不良做法違反了HTTP規范。 除了簡單起見,這對于許多開發人員來說也是一筆真正的代價。 像這樣的胡扯會導致愚蠢的UA字符串,就像我上面顯示的那樣。
This is also the third note for website developers: always send HTTP error codes, don’t just close connections.
這也是網站開發人員的第三個注意事項:始終發送HTTP錯誤代碼,而不僅僅是關閉連接 。
這個故事的主旨 (The Moral Of The Story)
The most important lesson, however, is this.
然而,最重要的一課是這個。
Ignore the User-Agent string unless you absolutely have to.
除非絕對必要,否則請忽略User-Agent字符串。
Detecting browser features is not what the User Agent string is for, so please don’t use it for that. And if you do (which I’m sure you will, because no-one listens to me anyway), make sure that you don’t refuse service based on the User-Agent. If you want to render a slightly different page, fine, I get that. But don’t refuse to render it at all. It’s obnoxious, it’s brittle, and it’s so 1990s. And besides, as I showed above, all modern User-Agents can lie in their User-Agent string! You can set it in Firefox, and in Chrome, and (probably) in IE, Safari and Opera as well. So not only are you mis-using it, you’re not even getting accurate information!
檢測瀏覽器功能不是用戶代理字符串的用途,因此請勿將其用于此目的。 而且如果您這樣做了(我相信您會這樣做,因為無論如何也沒人聽我說),請確保您不拒絕基于User-Agent的服務。 如果您要呈現稍有不同的頁面,可以了。 但是,請不要拒絕渲染它。 它令人討厭,它很脆,并且是1990年代。 此外,正如我在上面顯示的那樣,所有現代User-Agent都可以位于其User-Agent字符串中! 您可以在Firefox,Chrome和(可能)IE,Safari和Opera中進行設置。 因此,您不僅會濫用它,甚至無法獲得準確的信息!
翻譯自: https://www.pybloggers.com/2013/04/user-agent-strings-or-dont-make-me-come-after-you/
用戶代理字符串
總結
以上是生活随笔為你收集整理的用户代理字符串_用户代理字符串(或者,不要让我追随您)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: STEPS to Success – D
- 下一篇: otrs安装mysql_CentOS6.