bash - sed statement to change/modify CSV separators and delimiters -


i have csv files contains comma seperated values , of column values can contain characters ,.<>!/\;&

i trying convert csv comma separated, quote enclosed csv

example data:

datecreated,datemodified,sku,name,category,description,url,originalurl,image,image50,image100,image120,image200,image300,image400,price,brand,modelnumber 2012-10-19 10:52:50,2013-06-11 02:07:16,34,austral foldaway 45 rotary clothesline,home & garden > household supplies > laundry supplies > drying racks & hangers,"watch product video            plenty of space hang family wash  austral's foldaway 45 rotary clothesline folding head rotary clothes hoist beautifully finished in either beige or heritage green.  though foldaway 45 compact, still large 45 metres of line space, big enough full family wash.  if want advantage of rotary hoist, dont want lose yard, austral foldaway 45 clothesline you.&nbsp;  installation note:&nbsp;a core hole required when installing existing concrete, e.g. pathway. not required in ground(grass/soil).  watch video on youtube, click following link:&nbsp;austral foldaway 45 rotary clothesline      &nbsp;            //           customer video reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",https://track.commissionfactory.com.au/p/10604/1718695,http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/,http://content.commissionfactory.com.au/products/7228/1718695.jpg,http://content.commissionfactory.com.au/products/7228/1718695@50x50.jpg,http://content.commissionfactory.com.au/products/7228/1718695@100x100.jpg,http://content.commissionfactory.com.au/products/7228/1718695@120x120.jpg,http://content.commissionfactory.com.au/products/7228/1718695@200x200.jpg,http://content.commissionfactory.com.au/products/7228/1718695@300x300.jpg,http://content.commissionfactory.com.au/products/7228/1718695@400x400.jpg,309.9000 aud,austral,fa45gr 

and output i'm trying achieve is

"datecreated","datemodified","sku","name","category","description","url","originalurl","image","image50","image100","image120","image200","image300","image400","price","brand","modelnumber" "2012-10-19 10:52:50","2013-06-11 02:07:16","34","austral foldaway 45 rotary clothesline","home & garden > household supplies > laundry supplies > drying racks & hangers","watch product video            plenty of space hang family wash  austral's foldaway 45 rotary clothesline folding head rotary clothes hoist beautifully finished in either beige or heritage green.  though foldaway 45 compact, still large 45 metres of line space, big enough full family wash.  if want advantage of rotary hoist, dont want lose yard, austral foldaway 45 clothesline you.&nbsp;  installation note:&nbsp;a core hole required when installing existing concrete, e.g. pathway. not required in ground(grass/soil).  watch video on youtube, click following link:&nbsp;austral foldaway 45 rotary clothesline      &nbsp;            //           customer video reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;","https://track.commissionfactory.com.au/p/10604/1718695","http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/","http://content.commissionfactory.com.au/products/7228/1718695.jpg","http://content.commissionfactory.com.au/products/7228/1718695@50x50.jpg","http://content.commissionfactory.com.au/products/7228/1718695@100x100.jpg","http://content.commissionfactory.com.au/products/7228/1718695@120x120.jpg","http://content.commissionfactory.com.au/products/7228/1718695@200x200.jpg","http://content.commissionfactory.com.au/products/7228/1718695@300x300.jpg","http://content.commissionfactory.com.au/products/7228/1718695@400x400.jpg","309.9000 aud","austral","fa45gr" 

any assistance appreciated.

first, lets try trivial (and "not enough") solution adds double quote each field (including have double quotes! isn't want)

sed -r 's/([^,]*)/"\1"/g' 

great, first part looks sequences no commas in them, second part adds double quotes around them, final 'g' means doing more once per line

this turn

abc,345, words ,"some text","text,with,commas" 

into "abc","345"," words ",""some text"",""text","with","commas""

a few things note:

  • it correctly surrounds "some words" space between them, surrounds initial , final spaces. assume that's ok if not can fixed

  • if field had quotes, quoted again, bad. needs fixed

  • if field had quotes , inner text had commas (which shouldn't considered field separators) these commas quoted. needs fixed

so want match 2 different regexps - either there quoted string or field no commas:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' 

the result be

"abc","345"," words ",""some text"",""text,with,commas"" 

as can see, have double quote on quoted text. have remove second sed command:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' | sed 's/""/"/g' 

which results in

"abc","345"," words ","some text","text,with,commas" 

yay!


Comments

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

Does Firefox offer AppleScript support to get URL of windows? -

android - How to install packaged app on Firefox for mobile? -