hadoop - access file from hive streaming script in c# (HDInsight) -
i'm using hive streaming job process data in c# on hdinsight. in order process data, script has read xml file stored blob on azure so:
operationcontext oc = new operationcontext(); cloudstorageaccount account = new cloudstorageaccount(new storagecredentials(asvaccount, asvkey), true); cloudblobclient client = account.createcloudblobclient(); cloudblobcontainer container = client.getcontainerreference("mycontainer"); cloudblockblob blob = container.getblockblobreference("file/f.xml"); memorystream stream; using (stream = new memorystream()) { blob.downloadtostream(stream); stream.seek(0, seekorigin.begin); string reader = new streamreader(stream).readtoend(); elem = xelement.parse(reader); return elem; }
the code works on local machine: reads file storage account , returns elem correctly, when try run on cluster, has problem finding microsoft.windowsazure.storage.dll though add via fs.put() /hive/resources/ , "add file" in hive portal.
if try accessing file this:
xelement.load("hdinsighttesting.blob.core.windows.net/repexdeema/pr/productguidmapping.xml");
or
xelement.load("asv://mycontainer@mycluster.blob.core.windows.net/file/f.xml"); following error: not find part of path 'c:\hdfs\mapred\local\tasktracker\admin\jobcache\job_201307200714_0079\attempt_201307200714_0079_m_000000_0\work\storageaccount.blob.core.windows.net\mycontainer\pr\productguidmapping.xml
i don't understand why it's insisting on looking in directory instead of going directly blob storage. tried going directory doesn't exist.
i thought of using localresource that's not possible in case because hive refusing find dll files i'm uploading hdfs.
thanks
"add file" should sufficient. copy dll working folder , should available script. microsoft.windowsazure.storage.dll has dependencies of own. believe on odata.dll , edm.dll. make sure add file of dependencies otherwise fail load.
Comments
Post a Comment