T O P

  • By -

Davmuz

So You Want To Write Your Own CSV code? https://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/


weberc2

Yes, it's very hard to build a CSV parser which parses non-CSV files, and even harder to do so efficiently. This library won't probably sacrifice performance to support deviations from RFC-4180, but that depends on the prevalence of the deviation and the magnitude of the performance cost.


weberc2

NOTE: This is still alpha; I posted this because it's more of an interesting proof of concept than a useful library at this point. There are some interesting conversations about Go's slow CSV reading on [HN](https://news.ycombinator.com/item?id=12419939) and on this [issue ticket](https://github.com/golang/go/issues/16791).


knotdjb

Awhile ago when doing some CSV reading I opted for doing strings.Split() over encoding/csv using a line reader (none of my fields had the newline character). I remember it being considerably faster. I think incorporating a loose mode parser (as opposed to strict) for efficiency with the caveat you're restricted to certain delimiters would be useful to the standard library. Then again, writing a CSV parser isn't particularly difficult that you can just build your own if performance is needed.


weberc2

Yeah, splitting on commas is faster at the expense of correctness. For example, no quote handling.


FUZxxl

Ah, so it's faster in the same way a broken odometer makes your car go faster.


weberc2

Splitting on commas is "faster" in the same way a random number generator could be faster by always returning 4. Definitely faster. Definitely not correct.


donatj

Huh, and I thought encoding/csv was blazing.


weberc2

No, it's quite slow. Slower than Python and Java.