Advertisement
Advertisement


Split a string ignoring quoted sections


Question

Given a string like this:

a,"string, with",various,"values, and some",quoted

What is a good algorithm to split this based on commas while ignoring the commas inside the quoted sections?

The output should be an array:

[ "a", "string, with", "various", "values, and some", "quoted" ]

2008/08/24
1
22
8/24/2008 3:22:08 PM

Accepted Answer

If my language of choice didn't offer a way to do this without thinking then I would initially consider two options as the easy way out:

  1. Pre-parse and replace the commas within the string with another control character then split them, followed by a post-parse on the array to replace the control character used previously with the commas.

  2. Alternatively split them on the commas then post-parse the resulting array into another array checking for leading quotes on each array entry and concatenating the entries until I reached a terminating quote.

These are hacks however, and if this is a pure 'mental' exercise then I suspect they will prove unhelpful. If this is a real world problem then it would help to know the language so that we could offer some specific advice.

2016/12/20
1
12/20/2016 9:31:59 AM


Python:

import csv
reader = csv.reader(open("some.csv"))
for row in reader:
    print row
2009/10/01

Of course using a CSV parser is better but just for the fun of it you could:

Loop on the string letter by letter.
    If current_letter == quote : 
        toggle inside_quote variable.
    Else if (current_letter ==comma and not inside_quote) : 
        push current_word into array and clear current_word.
    Else 
        append the current_letter to current_word
When the loop is done push the current_word into array 
2009/04/08

The author here dropped in a blob of C# code that handles the scenario you're having a problem with:

CSV File Imports in .Net

Shouldn't be too difficult to translate.

2017/05/23

What if an odd number of quotes appear in the original string?

This looks uncannily like CSV parsing, which has some peculiarities to handling quoted fields. The field is only escaped if the field is delimited with double quotations, so:

field1, "field2, field3", field4, "field5, field6" field7

becomes

field1

field2, field3

field4

"field5

field6" field7

Notice if it doesn't both start and end with a quotation, then it's not a quoted field and the double quotes are simply treated as double quotes.

Insedently my code that someone linked to doesn't actually handle this correctly, if I recall correctly.

2008/08/08

Here's a simple python implementation based on Pat's pseudocode:

def splitIgnoringSingleQuote(string, split_char, remove_quotes=False):
    string_split = []
    current_word = ""
    inside_quote = False
    for letter in string:
      if letter == "'":
        if not remove_quotes:
           current_word += letter
        if inside_quote:
          inside_quote = False
        else:
          inside_quote = True
      elif letter == split_char and not inside_quote:
        string_split.append(current_word)
        current_word = ""
      else:
        current_word += letter
    string_split.append(current_word)
    return string_split
2010/10/05

Source: https://stackoverflow.com/questions/6209
Licensed under CC-BY-SA with attribution
Not affiliated with Stack Overflow
Email: [email protected]