Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to parse csv file with VB.NET.

csv files contains value like 0,"1,2,3",4 which splits in 5 instead of 3. There are many examples with other languages in Stockoverflow but I can't implement it in VB.NET. Here is my code so far but it doesn't work...

 Dim t As String() = Regex.Split(str(i), ",(?=([^""]*""[^""]*"")*[^""]*$)")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.5k views
Welcome To Ask or Share your Answers For Others

1 Answer

Assuming your csv is well-formed (ie no " besides those used to delimit string fields, or besides ones escaped like "), you can split on a comma that's followed by an even number of non-escaped "-marks. (If you're inside a set of "" there's only an odd number left in the line).

Your regex you've tried looks like you're almost there.

The following looks for a comma followed by an even number of any sort of quote marks:

,(?=([^"]*"[^"]*")*[^"]*$)

To modify it to look for an even number of non-escaped quote marks (assuming quote marks are escaped with backslash like "), I replace each [^"] with ([^"\]|\.). This means "match a character that isn't a " and isn't a blackslash, OR match a backslash and the character immediately following it".

,(?=(([^"\]|\.)*"([^"\]|\.)*")*([^"\]|\.)*$)

See it in action here. (The reason the backslash is doubled is I want to match a literal backslash).

Now to get it into vb.net you just need to double all your quote marks:

splitRegex = ",(?=(([^""\]|\.)*""([^""\]|\.)*"")*([^""\]|\.)*$)"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...