cloud - why Hbase with Hadoop map reduce performance is slow? -
i have configured hadoop1.0.3 on 3 machines distributed mode.on first machine below jobs running:
1) 4316 secondarynamenode
4006 namenode
4159 datanode
4619 tasktracker
4425 jobtracker
2) 2794 tasktracker
2672 datanode
3) 3338 datanode
3447 tasktracker
now when run simple map reduce job on it,it takes longer time execute map reducejob.so installed hbase layer on hadoop.now have below processes hbase on 3 clusters.
1) 5115 hquorumpeer 5198 hmaster 5408 hregionserver 2) 3719 hregionserver 3617 hquorumpeer 3) 2937 hquorumpeer 2719 hregionserver
when run map-reduce job on hbase 1,00,000 data taking 1 minute , same 1,00,00,000 data.now want result in milliseconds. steps should take improvement?
i newbie please me out or suggest layering on hbase or hadoop can result in milliseconds.
i summarizing below records:
hbase(main):007:0> describe 'weblog' description enabled 'weblog', {name => 'info', data_block_encoding => 'none', bloomfilter => true 'none', replication_scope => '0', versions => '3', compression => 'none' , min_versions => '0', ttl => '2147483647', keep_deleted_cells => 'false', blocksize => '65536', in_memory => 'false', encode_on_disk => 'true', blockcache => 'true'}
in weblog table -> info:category,info:hits columns.
info:category info:hits
web 2 mail 10 ftp 1 web 3 mail 11 ftp 2
the data summarized in map reduce , stored in table.
hbase(main):004:0> put 'weblog', 'row1', 'info:category', 'web' 0 row(s) in 0.0560 hbase(main):004:0> put 'weblog', 'row1', 'info:hits', '2' 0 row(s) in 0.0560
please on this.as googled lot not able find helps me.
hadoop, or other batch processing system matter, not suitable choice if have real time needs or if need performance in ~ms. no matter how h/w , how mr job is, there'll initial delay when run mr job. , unavoidable. reason being, when submit mr job, lot of things happen before processing starts, checking input path, creation of splits, creation of map tasks etc etc.
it correct hbase provided real-time data access. doesn't hold if accessing hbase through mr. if need ~ms access, better off writing normal java+hbase api programs. won't able leverage parallelism provided mr then. so, need think before arrive @ decision.
tools impala , phoenix of if have real time needs. have own + , -.
i point 1 thing here. if plan access not-so-big data @ time, can use hbase sequential java programs. remember, random reads/writes come greater costs compared sequential acces. so, think before act.
Comments
Post a Comment