为什么我们需要使用Pandas新字符串Dtype代替文本数据对象
We have to represent every bit of data in numerical values to be processed and analyzed by machine learning and deep learning models. However, strings do not usually come in a nice and clean format and require a lot preprocessing.
我們必須以數(shù)值表示數(shù)據(jù)的每一位,以便通過機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型進(jìn)行處理和分析。 但是,字符串通常不會采用簡潔的格式,并且需要大量預(yù)處理。
Pandas provides numerous functions and methods to process textual data. In this post, we will focus on data types for strings rather than string operations. Using appropriate data types is the first step to make most out of Pandas. There are currently two data types for textual data, object and StringDtype.
熊貓?zhí)峁┝硕喾N功能和方法來處理文本數(shù)據(jù)。 在本文中,我們將重點(diǎn)介紹字符串的數(shù)據(jù)類型,而不是字符串操作。 使用適當(dāng)?shù)臄?shù)據(jù)類型是充分利用Pandas的第一步。 當(dāng)前,文本數(shù)據(jù)有兩種數(shù)據(jù)類型: object和StringDtype。
Before pandas 1.0, only “object” datatype was used to store strings which cause some drawbacks because non-string data can also be stored using “object” datatype. Pandas 1.0 introduces a new datatype specific to string data which is StringDtype. As of now, we can still use object or StringDtype to store strings but in the future, we may be required to only use StringDtype.
在pandas 1.0之前,僅使用“對象”數(shù)據(jù)類型來存儲字符串,這會導(dǎo)致一些缺點(diǎn),因?yàn)榉亲址當(dāng)?shù)據(jù)也可以使用“對象”數(shù)據(jù)類型來存儲。 Pandas 1.0引入了特定于字符串?dāng)?shù)據(jù)的新數(shù)據(jù)類型StringDtype 。 到目前為止,我們?nèi)匀豢梢允褂胦bject或StringDtype來存儲字符串,但是將來,可能會要求我們僅使用StringDtype。
One important thing to note here is that object datatype is still the default datatype for strings. To use StringDtype, we need to explicitly state it.
這里要注意的一件事是對象數(shù)據(jù)類型仍然是字符串的默認(rèn)數(shù)據(jù)類型。 要使用StringDtype,我們需要明確聲明它。
We can pass “string” or pd.StringDtype() argument to dtype parameter to select string datatype.
我們可以將“ string ”或pd.StringDtype()參數(shù)傳遞給dtype參數(shù)以選擇字符串?dāng)?shù)據(jù)類型。
We can also convert from “object” to “string” data type using astype function:
我們還可以使用astype函數(shù)將“ object”數(shù)據(jù)類型轉(zhuǎn)換為“ string”數(shù)據(jù)類型:
Although the default type is “object”, it is recommended to use “string” for a few reasons.
盡管默認(rèn)類型為“對象”,但出于一些原因,建議使用“字符串”。
- Object data type has a broader scope and allows to store pretty much anything. Thus, even if we have non-strings in a place that is supposed to be a string, we don’t get any error. 對象數(shù)據(jù)類型的范圍更廣,可以存儲幾乎所有內(nèi)容。 因此,即使我們在應(yīng)該是字符串的地方放置了非字符串,也不會出現(xiàn)任何錯誤。
- It is always better to have a dedicated data type. For instance, if we try to the example above with “string” data type, we get a TypeError. 最好使用專用的數(shù)據(jù)類型。 例如,如果我們嘗試上面的示例使用“字符串”數(shù)據(jù)類型,則會得到TypeError。
Having a dedicated data type allows for data type specific operations. For instance, we cannot use select_dtypes to choose only text columns if “object” data type is used. Select_dtypes(include=”object”) will return any column with object data type. On the other hand, if we use “string” data type for textual data, select_dtypes(include=”string”) will give just what we need.
具有專用數(shù)據(jù)類型允許進(jìn)行特定于數(shù)據(jù)類型的操作。 例如,如果使用“對象”數(shù)據(jù)類型,則不能使用select_dtypes僅選擇文本列。 Select_dtypes(include =“ object”)將返回任何具有對象數(shù)據(jù)類型的列。 另一方面,如果我們對文本數(shù)據(jù)使用“字符串”數(shù)據(jù)類型,則select_dtypes(include =“ string”)會滿足我們的需求。
“String” data type is not superior to “object” in terms of performance as of now. However, it is expected, with future enhancements, the performance of “string” data type will be increased and the memory consumption will be decreased. Thus, we should already be using “string” instead of “object” for textual data.
到目前為止,就性能而言,“字符串”數(shù)據(jù)類型并不優(yōu)于“對象”。 但是,可以預(yù)料,隨著將來的增強(qiáng),“字符串”數(shù)據(jù)類型的性能將得到提高,內(nèi)存消耗將減少。 因此,我們應(yīng)該已經(jīng)在文本數(shù)據(jù)中使用“字符串”而不是“對象”。
Thank you for reading. Please let me know if you have any feedback.
感謝您的閱讀。 如果您有任何反饋意見,請告訴我。
翻譯自: https://towardsdatascience.com/why-we-need-to-use-pandas-new-string-dtype-instead-of-object-for-textual-data-6fd419842e24
總結(jié)
以上是生活随笔為你收集整理的为什么我们需要使用Pandas新字符串Dtype代替文本数据对象的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 女人梦到吃饼是什么意思
- 下一篇: 梦到用水灭火什么预兆