python 内存不足 dict 替代方案_2D数组代表一个巨大的python dict,COOrdinate就像解决方案来节省内存...
我嘗試用數(shù)組中的數(shù)據(jù)更新dict_with_tuples_key:
myarray = np.array([[0, 0], # 0, 1
[0, 1],
[1, 1], # 1, 2
[1, 2], # 1, 3
[2, 2],
[1, 3]]
) # a lot of this with shape~(10e6, 2)
dict_with_tuples_key = {(0, 1): 1,
(3, 7): 1} # ~10e6 keys
使用數(shù)組來存儲dict值,(感謝@MSeifert)我們得到了這個:
def convert_dict_to_darray(dict_with_tuples_key, myarray):
idx_max_array = np.max(myarray, axis=0)
idx_max_dict = np.max(dict_with_tuples_key.keys(), axis=0)
lens = np.max([list(idx_max_array), list(idx_max_dict)], axis=0)
xlen, ylen = lens[0] + 1, lens[1] + 1
darray = np.zeros((xlen, ylen)) # Empty array to hold all indexes in myarray
for key, value in dict_with_tuples_key.items():
darray[key] = value
return darray
@njit
def update_darray(darray, myarray):
elements = myarray.shape[0]
for i in range(elements):
darray[myarray[i][0]][myarray[i][1]] += 1
return darray
def darray_to_dict(darray):
updated_dict = {}
keys = zip(*map(list, np.nonzero(darray)))
for x, y in keys:
updated_dict[(x, y)] = darray[x, y]
return updated_dict
darray = convert_dict_to_darray(dict_with_tuples_key, myarray)
darray = update_darray(darray, myarray)
我得到了所需的確切結(jié)果:
# print darray_to_dict(darray)
# {(0, 1): 2.0,
# (0, 0): 1.0,
# (1, 1): 1.0,
# (2, 2): 1.0,
# (1, 2): 1.0,
# (1, 3): 1.0,
# (3, 7): 1.0, }
對于小矩陣,它的工作狀態(tài)很好,@ njit可以在它上面工作,所以速度非常快,
但…
巨大的空darray = np.zeros((xlen,ylen))的創(chuàng)建不適合記憶.我們?nèi)绾伪苊夥峙湟粋€非常稀疏的數(shù)組,并且只在COOrdinate格式中存儲非空值(如稀疏矩陣)?
最佳答案 使用來自scipy的dok_matrix; dock_matrix是基于密鑰的稀疏矩陣的字典.它們允許您逐步構(gòu)建稀疏矩陣,并且它們不會分配不適合您的計(jì)算機(jī)內(nèi)存的巨大的空darray = np.zeros((xlen,ylen)).
唯一要做的更改是從scipy導(dǎo)入正確的模塊,并在函數(shù)convert_dict_to_darray中更改darray的定義.
它看起來像這樣:
from scipy.sparse import dok_matrix
def convert_dict_to_darray(dict_with_tuples_key, myarray):
idx_max_array = np.max(myarray, axis=0)
idx_max_dict = np.max(dict_with_tuples_key.keys(), axis=0)
lens = np.max([list(idx_max_array), list(idx_max_dict)], axis=0)
xlen, ylen = lens[0] + 1, lens[1] + 1
darray = dok_matrix( (xlen, ylen) )
for key, value in dict_with_tuples_key.items():
darray[key[0], key[1]] = value
return darray
總結(jié)
以上是生活随笔為你收集整理的python 内存不足 dict 替代方案_2D数组代表一个巨大的python dict,COOrdinate就像解决方案来节省内存...的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java ee 博客园_JAVAEE
- 下一篇: java字符串转日期_JAVA字符串转日