cloud - why Hbase with Hadoop map reduce performance is slow? -

- February 15, 2010

i have configured hadoop1.0.3 on 3 machines distributed mode.on first machine below jobs running:

1) 4316 secondarynamenode 4006 namenode 4159 datanode 4619 tasktracker 4425 jobtracker

2) 2794 tasktracker 2672 datanode

3) 3338 datanode 3447 tasktracker

now when run simple map reduce job on it,it takes longer time execute map reducejob.so installed hbase layer on hadoop.now have below processes hbase on 3 clusters.

1)    5115 hquorumpeer     5198 hmaster    5408 hregionserver 2)    3719 hregionserver    3617 hquorumpeer 3)    2937 hquorumpeer    2719 hregionserver

when run map-reduce job on hbase 1,00,000 data taking 1 minute , same 1,00,00,000 data.now want result in milliseconds. steps should take improvement?

i newbie please me out or suggest layering on hbase or hadoop can result in milliseconds.

i summarizing below records:

hbase(main):007:0> describe 'weblog' description                                                                 enabled                                         'weblog', {name => 'info', data_block_encoding => 'none', bloomfilter =>    true  'none', replication_scope => '0', versions => '3', compression => 'none'  , min_versions => '0', ttl => '2147483647', keep_deleted_cells =>  'false', blocksize => '65536', in_memory => 'false',    encode_on_disk => 'true', blockcache => 'true'}

in weblog table -> info:category,info:hits columns.

info:category info:hits

web             2  mail           10  ftp             1  web             3  mail           11  ftp             2

the data summarized in map reduce , stored in table.

hbase(main):004:0> put 'weblog', 'row1', 'info:category', 'web' 0 row(s) in 0.0560  hbase(main):004:0> put 'weblog', 'row1', 'info:hits', '2' 0 row(s) in 0.0560

please on this.as googled lot not able find helps me.

hadoop, or other batch processing system matter, not suitable choice if have real time needs or if need performance in ~ms. no matter how h/w , how mr job is, there'll initial delay when run mr job. , unavoidable. reason being, when submit mr job, lot of things happen before processing starts, checking input path, creation of splits, creation of map tasks etc etc.

it correct hbase provided real-time data access. doesn't hold if accessing hbase through mr. if need ~ms access, better off writing normal java+hbase api programs. won't able leverage parallelism provided mr then. so, need think before arrive @ decision.

tools impala , phoenix of if have real time needs. have own + , -.

i point 1 thing here. if plan access not-so-big data @ time, can use hbase sequential java programs. remember, random reads/writes come greater costs compared sequential acces. so, think before act.

Search This Blog

IO

cloud - why Hbase with Hadoop map reduce performance is slow? -

Comments

Post a Comment

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

html - Accumulated Depreciation of Assets on php -

c# - WPF DataGrids for hierarchical information -