c# - Match Multiline & IgnoreSome -
i'm trying extract information jcl source using regex in c# basically, string can have:
//jobname0 job (blablabla),'some text',msgclass=yes,ilike=potatoes, grmbl // ialsolike=tomatoes, garbage // finally=bye //other stuff
so need extract jobname jobname0
, info (blablabla)
, description 'some text'
, other parms msgclass=yes
ilike=potatoes
ialsolike=tomatoes
finally=bye
.
i must ignore after space ... grmbl
or another garbage
i must continue next line if last valid char ,
, stop if there none.
so far, have managed jobname, info , description, pretty easy. other parms, i'm able parms , split them, don't know how rid of garbage.
here code:
var regex = "//([^\\s]*) job (\\([^)]*\\))?,?(\\'[^']*\\')?,?([^,]*[,|\\s|$])*"; match match2 = regex.match(test5, regex,regexoptions.singleline); string cartejob2 = match2.groups[0].value; string jobname2 = match2.groups[1].value; string jobinfo2 = match2.groups[2].value; string jobdesc2 = match2.groups[3].value; ienumerable<string> parms = match2.groups[4].captures.oftype<capture>().select(x => x.value); string jobparms2 = string.join("|", parms); console.writeline(cartejob2 + "|"); console.writeline(jobname2 + "|"); console.writeline(jobinfo2 + "|"); console.writeline(jobdesc2 + "|"); console.writeline(jobparms2 + "|");
the output one:
//jobname0 job (blablabla),'some text',msgclass=yes,ilike=potatoes, grmbl // ialsolike=tomatoes, garbage // finally=bye //other | jobname0| (blablabla)| 'some text'| msgclass=yes,|ilike=potatoes,| grmbl // ialsolike=tomatoes,| garbage // finally=bye //other |
the output see is:
//jobname0 job (blablabla),'some text',msgclass=yes,ilike=potatoes, grmbl // ialsolike=tomatoes, garbage // finally=bye| jobname0| (blablabla)| 'some text'| msgclass=yes|ilike=potatoes|ialsolike=tomatoes|finally=bye|
is there way want ?
i think i'd try , 2 regex
expressions.
the first 1 starting information beginning of string - job name, info, description.
the second 1 parameters, seem have simple pattern of <param name>=<param value>
.
the first regex
might this:
^//(?<job>[\d\w]+)[ ]+job[ ]+\((?<info>[\d\w]+)\),'(?<description>[\d\w ]+)'
i don't know if rules permit whitespaces appear in job name, info or description - adjust needed. also, i'm assuming start of file using ^
char. finally, regex
has groups defined, getting values should easier in c#.
the second regex
might this:
(?<param>[\w\d]+)=(?<value>[\w\d]+)
again, grouping added parameter names , values.
hope helps.
edit:
a small tip - can use @
sign before string in c# make easier write such regex
patterns. example:
regex reg = new regex(@"(?<param>[\w\d]+)=(?<value>[\w\d]+)");
Comments
Post a Comment