i'm writing code has compute large numbers of eigenvalue problems (typical matrices dimension few hundreds). wondering whether possible speed process using ipython.parallel module. former matlab user , python newbie looking similar matlab's parfor...

following tutorials online wrote simple code check if speeds computation @ , found out doesn't , slows down(case dependent). think, might missing point in , maybe scipy.linalg.eig implemented in such way uses cores available , trying parallelise interrupt engine management.

here 'parralel' code:

import numpy np scipy.linalg import eig ipython import parallel  #create matrices matrix_size = 300 matrices = {}  in range(100):     matrices[i] = np.random.rand(matrix_size, matrix_size)      rc = parallel.client() lview = rc.load_balanced_view() results = {}  #compute eigenvalues in range(len(matrices)):     asyncresult = lview.apply(eig, matrices[i], right=false)     results[i] = asyncresult  i, asyncresult in results.iteritems():     results[i] = asyncresult.get()

the non-parallelised variant:

#no parallel in range(len(matrices)):     results[i] = eig(matrices[i], right=false)

the difference in cpu time 2 subtle. if on top of eigenvalue problem parallelised function has more matrix operations starts last forever, i.e. @ least 5 times longer non-parallelised variant.

am right eigenvalue problems not suited kind of parallelisation, or missing whole point?

many thanks!

edited 29 jul 2013; 12:20 bst

following moarningsun's suggestion tried run eig while fixing number of threads mkl.set_num_threads. 500-by-500 matrix minimum times of 50 repetitions set following:

no of. threads    minimum time(timeit)    cpu usage(task manager)  ================================================================= 1                  0.4513775764796151                 12-13% 2                  0.36869288559927327                25-27% 3                  0.34014644287680085                38-41% 4                  0.3380558903450037                 49-53% 5                  0.33508234276183657                49-53% 6                  0.3379019065051807                 49-53% 7                  0.33858615048501406                49-53% 8                  0.34488405094054997                49-53% 9                  0.33380300334101776                49-53% 10                 0.3288481198342197                 49-53% 11                 0.3512653110685733                 49-53%

apart 1 thread case there no substantial difference (maybe 50 samples bit small...). still think i'm missing point , lot done improve performance, not sure how. these run on 4 cores machine hyperthreading enabled giving 4 virtual cores.

thanks input!

interesting problem. because think should possible achieve better scaling investigated performance small "benchmark". test compared performance of single , multi-threaded eig (multi-threading being delivered through mkl lapack/blas routines) ipython parallelized eig. see difference make varied view type, number of engines , mkl threading method of distributing matrices on engines.

here results on old amd dual core system:

 m_size=300, n_mat=64, repeat=3 +------------------------------------+----------------------+ | settings                           | speedup factor       | +--------+------+------+-------------+-----------+----------+ | func   | neng | nmkl | view type   | vs single | vs multi | +--------+------+------+-------------+-----------+----------+ | ip_map |    2 |    1 | direct_view |      1.67 |     1.62 | | ip_map |    2 |    1 |  loadb_view |      1.60 |     1.55 | | ip_map |    2 |    2 | direct_view |      1.59 |     1.54 | | ip_map |    2 |    2 |  loadb_view |      0.94 |     0.91 | | ip_map |    4 |    1 | direct_view |      1.69 |     1.64 | | ip_map |    4 |    1 |  loadb_view |      1.61 |     1.57 | | ip_map |    4 |    2 | direct_view |      1.15 |     1.12 | | ip_map |    4 |    2 |  loadb_view |      0.88 |     0.85 | | parfor |    2 |    1 | direct_view |      0.81 |     0.79 | | parfor |    2 |    1 |  loadb_view |      1.61 |     1.56 | | parfor |    2 |    2 | direct_view |      0.71 |     0.69 | | parfor |    2 |    2 |  loadb_view |      0.94 |     0.92 | | parfor |    4 |    1 | direct_view |      0.41 |     0.40 | | parfor |    4 |    1 |  loadb_view |      1.62 |     1.58 | | parfor |    4 |    2 | direct_view |      0.34 |     0.33 | | parfor |    4 |    2 |  loadb_view |      0.90 |     0.88 | +--------+------+------+-------------+-----------+----------+

as see performance gain varies on different settings used, maximum of 1.64 times of regular multi threaded eig. in these results parfor function used performs badly unless mkl threading disabled on engines (using view.apply_sync(mkl.set_num_threads, 1)).

varying matrix size gives noteworthy difference. speedup of using ip_map on direct_view 4 engines , mkl threading disabled vs regular multi threaded eig:

 n_mat=32, repeat=3 +--------+----------+ | m_size | vs multi | +--------+----------+ |     50 |     0.78 | |    100 |     1.44 | |    150 |     1.71 | |    200 |     1.75 | |    300 |     1.68 | |    400 |     1.60 | |    500 |     1.57 | +--------+----------+

apparently relatively small matrices there performance penalty, intermediate size speedup largest , larger matrices speedup decreases again. achieve performance gain of 1.75 make using ipython.parallel worthwhile in opinion.

i did tests earlier on intel dual core laptop also, got funny results, apparently laptop overheating. on system speedups little lower, around 1.5-1.6 max.

now think answer question should be: depends. performance gain depends on hardware, blas/lapack library, problem size , way ipython.parallel deployed, among other things perhaps i'm not aware of. , last not least, whether it's worth depends on how of performance gain you think worthwhile.

the code used:

from __future__ import print_function numpy.random import rand ipython.parallel import client mkl import set_num_threads timeit import default_timer clock scipy.linalg import eig functools import partial itertools import product  eig = partial(eig, right=false)  # desired keyword arg standard  class bench(object):     def __init__(self, m_size, n_mat, repeat=3):         self.n_mat = n_mat         self.matrix = rand(n_mat, m_size, m_size)         self.repeat = repeat         self.rc = client()      def map(self):         results = map(eig, self.matrix)      def ip_map(self):         results = self.view.map_sync(eig, self.matrix)      def parfor(self):         results = {}         in range(self.n_mat):             results[i] = self.view.apply_async(eig, self.matrix[i,:,:])         in range(self.n_mat):             results[i] = results[i].get()      def timer(self, func):         t = clock()         func()         return clock() - t      def run(self, func, n_engines, n_mkl, view_method):         self.view = view_method(range(n_engines))         self.view.apply_sync(set_num_threads, n_mkl)         set_num_threads(n_mkl)         return min(self.timer(func) _ in range(self.repeat))      def run_all(self):         funcs = self.ip_map, self.parfor         n_engines = 2, 4         n_mkls = 1, 2         views = self.rc.direct_view, self.rc.load_balanced_view         times = []         n_mkl in n_mkls:             args = self.map, 0, n_mkl, views[0]             times.append(self.run(*args))         args in product(funcs, n_engines, n_mkls, views):             times.append(self.run(*args))         return times

dunno if matters start 4 ipython parallel engines typed @ command line:

ipcluster start -n 4

hope helps :)

Search This Blog

IO

python - Is it worth using IPython parallel with scipy's eig? -

edited 29 jul 2013; 12:20 bst

Comments

Post a Comment

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

html - Accumulated Depreciation of Assets on php -

c# - WPF DataGrids for hierarchical information -