parallel processing - Is the multiprocessing module of python the right way to speed up large numeric calculations? -


i have strong background in numeric compuation using fortran , parallelization openmp, found easy enough use on many problems. switched python since more fun (at least me) develop with, parallelization nummeric tasks seem more tedious openmp. i'm interested in loading large (tens of gb) data sets to main memory , manipulate in parallel while containing single copy of data in main memory (shared data). started use python module multiprocessing , came generic example:

#test cases     #python parallel_python_example.py 1000 1000 #python parallel_python_example.py 10000 50  import sys import numpy np import time import multiprocessing import operator  n_dim = int(sys.argv[1]) n_vec = int(sys.argv[2])  #class contains large dataset , computationally heavy routine class compute:     def __init__(self,n_dim,n_vec):         self.large_matrix=np.random.rand(n_dim,n_dim)#define large random matrix         self.many_vectors=np.random.rand(n_vec,n_dim)#define many random vectors organized in matrix     def dot(self,a,b):#dont use numpy run on single core only!!         return sum(p*q p,q in zip(a,b))     def __call__(self,ii):# use __call__ computation such can handled multiprocessing (pickle)         vector = self.dot(self.large_matrix,self.many_vectors[ii,:])#compute product of 1 of vectors , matrix         return self.dot(vector,vector)# return "length" of result vector  #initialize data comp = compute(n_dim,n_vec)  #single core tt=time.time() result = [comp(ii) ii in range(n_vec)] time_single = time.time()-tt print "time:",time_single  #multi core prc in [1,2,4,10]:#the 20 case there check large_matrix once in main memory   tt=time.time()   pool = multiprocessing.pool(processes=prc)   result = pool.map(comp,range(n_vec))   pool.terminate()   time_multi = time.time()-tt   print "time using %2i processes. time: %10.5f, speedup:%10.5f" % (prc,time_multi,time_single/time_multi) 

i ran 2 test cases on machine (64bit linux using fedora 18) following results:

andre@lot:python>python parallel_python_example.py 10000 50 time: 10.3667809963 time using  1 processes. time:   15.75869, speedup:   0.65785 time using  2 processes. time:   11.62338, speedup:   0.89189 time using  4 processes. time:   15.13109, speedup:   0.68513 time using 10 processes. time:   31.31193, speedup:   0.33108 andre@lot:python>python parallel_python_example.py 1000 1000 time: 4.9363951683 time using  1 processes. time:    5.14456, speedup:   0.95954 time using  2 processes. time:    2.81755, speedup:   1.75201 time using  4 processes. time:    1.64475, speedup:   3.00131 time using 10 processes. time:    1.60147, speedup:   3.08242 

my question is, misusing multiprocessing module here? or way goes python (i.e. don't parallelize within python rely totally on numpy's optimizations)?

while there no general answer question (in title), think valid multiprocessing alone not key great number-crunching performance in python.

in principle however, python (+ 3rd party modules) awesome number crunching. find right tools, amazed. of times, pretty sure, better performance writing (much!) less code have achieved before doing manually in fortran. have use right tools , approaches. broad topic. few random things might interest you:

  • you can compile numpy , scipy using intel mkl , openmp (or maybe sys admin in facility did so). way, many linear algebra operations automatically use multiple threads , best out of machine. awesome , underestimated far. hands on compiled numpy , scipy!

  • multiprocessing should understood useful tool managing multiple more or less independent processes. communication among these processes has explicitly programmed. communication happens through pipes. processes talking lot each other spend of time talking , not number crunching. hence, multiprocessing best used in cases when transmission time input , output data small compared computing time. there tricks, can instance make use of linux' fork() behavior , share large amounts of memory (read-only!) among multiple multiprocessing processes without having pass data around through pipes. might want have @ https://stackoverflow.com/a/17786444/145400.

  • cython has been mentioned, can use in special situations , replace performance-critical code parts in python program compiled code.

i did not comment on details of code, because (a) not readable (please used pep8 when writing python code :-)) , (b) think regarding number crunching depends on problem right solution is. have observed in benchmark have outlined above: in context of multiprocessing, important have eye on communication overhead.

spoken generally, should try find way within python control compiled code heavy work you. numpy , scipy provide great interfaces that.


Comments

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

Does Firefox offer AppleScript support to get URL of windows? -

android - How to install packaged app on Firefox for mobile? -