python - Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix -
i noticed pandas has support sparse matrices , arrays. currently, create dataframe()
s this:
return dataframe(matrix.toarray(), columns=features, index=observations)
is there way create sparsedataframe()
scipy.sparse.csc_matrix()
or csr_matrix()
? converting dense format kills ram badly. thanks!
a direct conversion not supported atm. contributions welcome!
try this, should ok on memory spareseries csc_matrix (for 1 column) , pretty space efficient
in [37]: col = np.array([0,0,1,2,2,2]) in [38]: data = np.array([1,2,3,4,5,6],dtype='float64') in [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) ) in [40]: m out[40]: <3x3 sparse matrix of type '<type 'numpy.float64'>' 6 stored elements in compressed sparse column format> in [46]: pd.sparsedataframe([ pd.sparseseries(m[i].toarray().ravel()) in np.arange(m.shape[0]) ]) out[46]: 0 1 2 0 1 0 4 1 0 0 5 2 2 3 6 in [47]: df = pd.sparsedataframe([ pd.sparseseries(m[i].toarray().ravel()) in np.arange(m.shape[0]) ]) in [48]: type(df) out[48]: pandas.sparse.frame.sparsedataframe
Comments
Post a Comment