php自动报价,使用PHP转换所有类型的智能报价
小編典典
您需要這樣的內容(假設輸入UTF-8,而忽略CJK(中文,日文,韓文)):
$chr_map = array(
// Windows codepage 1252
"\xC2\x82" => "'", // U+0082?U+201A single low-9 quotation mark
"\xC2\x84" => '"', // U+0084?U+201E double low-9 quotation mark
"\xC2\x8B" => "'", // U+008B?U+2039 single left-pointing angle quotation mark
"\xC2\x91" => "'", // U+0091?U+2018 left single quotation mark
"\xC2\x92" => "'", // U+0092?U+2019 right single quotation mark
"\xC2\x93" => '"', // U+0093?U+201C left double quotation mark
"\xC2\x94" => '"', // U+0094?U+201D right double quotation mark
"\xC2\x9B" => "'", // U+009B?U+203A single right-pointing angle quotation mark
// Regular Unicode // U+0022 quotation mark (")
// U+0027 apostrophe (')
"\xC2\xAB" => '"', // U+00AB left-pointing double angle quotation mark
"\xC2\xBB" => '"', // U+00BB right-pointing double angle quotation mark
"\xE2\x80\x98" => "'", // U+2018 left single quotation mark
"\xE2\x80\x99" => "'", // U+2019 right single quotation mark
"\xE2\x80\x9A" => "'", // U+201A single low-9 quotation mark
"\xE2\x80\x9B" => "'", // U+201B single high-reversed-9 quotation mark
"\xE2\x80\x9C" => '"', // U+201C left double quotation mark
"\xE2\x80\x9D" => '"', // U+201D right double quotation mark
"\xE2\x80\x9E" => '"', // U+201E double low-9 quotation mark
"\xE2\x80\x9F" => '"', // U+201F double high-reversed-9 quotation mark
"\xE2\x80\xB9" => "'", // U+2039 single left-pointing angle quotation mark
"\xE2\x80\xBA" => "'", // U+203A single right-pointing angle quotation mark
);
$chr = array_keys ($chr_map); // but: for efficiency you should
$rpl = array_values($chr_map); // pre-calculate these two arrays
$str = str_replace($chr, $rpl, html_entity_decode($str, ENT_QUOTES, "UTF-8"));
這里是背景:
每個Unicode字符都完全屬于一個“常規類別”,其中可以包含引號字符的字符如下:
(這些頁面可方便地檢查您是否沒有錯過任何內容-
類別索引)
有時在啟用Unicode的正則表達式中匹配這些類別很有用。
此外,Unicode字符具有“屬性”,您感興趣的是Quotation_Mark。不幸的是,這些不能在正則表達式中訪問。
在Wikipedia中,您可以找到具有Quotation_Mark屬性的字符組。最終參考是unicode.org上的PropList.txt,但這是一個ASCII文本文件。
如果您還需要翻譯CJK字符,則只需獲取它們的代碼點,確定其翻譯,并找到其UTF-8編碼,例如,通過在fileformat.info中查找(例如,對于U +
301E:http
://www.fileformat.info/info/unicode/char/301e/index.htm)。
關于Windows代碼頁1252:Unicode定義了前256個代碼點,以表示與ISO-8859-1完全相同的字符,但是ISO-8859-1通常與Windows代碼頁1252混淆,因此所有瀏覽器都呈現0x80-0x9F范圍,在ISO-8859-1中為“空”(更確切地說:它包含控制字符),就像Windows代碼頁1252一樣。Wikipedia頁中的表列出了Unicode等效項。
注意:strtr()通常比慢str_replace()。使用您的輸入和PHP版本為其計時。如果速度足夠快,您可以直接使用地圖,例如my $chr_map。
如果您不確定您的輸入是UTF-8編碼的,并且愿意假設輸入不是UTF-8編碼的,那么它是ISO-8859-1或Windows代碼頁1252,那么您可以在執行其他操作之前執行此操作:
if ( !preg_match('/^\\X*$/u', $str)) {
$str = utf8_encode($str);
}
警告:不過,這種正則表達式在極少數情況下可能無法檢測到非UTF-8編碼。例如:"Gru?…"/*CP-1252*/=="Gru\xDF\x85"看起來像此正則表達式的UTF-8(U
+ 07C5是N’ko數字5)。該正則表達式可以稍作增強,但是不幸的是,可以證明,對于編碼檢測問題,不存在完全可靠的解決方案。
如果要將Windows代碼頁1252產生的范圍0x80-0x9F標準化為常規Unicode代碼點,則可以執行此操作(并刪除$chr_map上面的第一部分):
$normalization_map = array(
"\xC2\x80" => "\xE2\x82\xAC", // U+20AC Euro sign
"\xC2\x82" => "\xE2\x80\x9A", // U+201A single low-9 quotation mark
"\xC2\x83" => "\xC6\x92", // U+0192 latin small letter f with hook
"\xC2\x84" => "\xE2\x80\x9E", // U+201E double low-9 quotation mark
"\xC2\x85" => "\xE2\x80\xA6", // U+2026 horizontal ellipsis
"\xC2\x86" => "\xE2\x80\xA0", // U+2020 dagger
"\xC2\x87" => "\xE2\x80\xA1", // U+2021 double dagger
"\xC2\x88" => "\xCB\x86", // U+02C6 modifier letter circumflex accent
"\xC2\x89" => "\xE2\x80\xB0", // U+2030 per mille sign
"\xC2\x8A" => "\xC5\xA0", // U+0160 latin capital letter s with caron
"\xC2\x8B" => "\xE2\x80\xB9", // U+2039 single left-pointing angle quotation mark
"\xC2\x8C" => "\xC5\x92", // U+0152 latin capital ligature oe
"\xC2\x8E" => "\xC5\xBD", // U+017D latin capital letter z with caron
"\xC2\x91" => "\xE2\x80\x98", // U+2018 left single quotation mark
"\xC2\x92" => "\xE2\x80\x99", // U+2019 right single quotation mark
"\xC2\x93" => "\xE2\x80\x9C", // U+201C left double quotation mark
"\xC2\x94" => "\xE2\x80\x9D", // U+201D right double quotation mark
"\xC2\x95" => "\xE2\x80\xA2", // U+2022 bullet
"\xC2\x96" => "\xE2\x80\x93", // U+2013 en dash
"\xC2\x97" => "\xE2\x80\x94", // U+2014 em dash
"\xC2\x98" => "\xCB\x9C", // U+02DC small tilde
"\xC2\x99" => "\xE2\x84\xA2", // U+2122 trade mark sign
"\xC2\x9A" => "\xC5\xA1", // U+0161 latin small letter s with caron
"\xC2\x9B" => "\xE2\x80\xBA", // U+203A single right-pointing angle quotation mark
"\xC2\x9C" => "\xC5\x93", // U+0153 latin small ligature oe
"\xC2\x9E" => "\xC5\xBE", // U+017E latin small letter z with caron
"\xC2\x9F" => "\xC5\xB8", // U+0178 latin capital letter y with diaeresis
);
$chr = array_keys ($normalization_map); // but: for efficiency you should
$rpl = array_values($normalization_map); // pre-calculate these two arrays
$str = str_replace($chr, $rpl, $str);
2020-05-29
總結
以上是生活随笔為你收集整理的php自动报价,使用PHP转换所有类型的智能报价的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java 无法加载资源,JavaScri
- 下一篇: 耳机电声测试仪软件,杭州爱华 AWA61