CSV.jl Documentation

High-level interface

# CSV.readFunction.

CSV.read(fullpath::Union{AbstractString,IO}, sink=DataFrame, args...; kwargs...) => typeof(sink)

parses a delimited file into a Julia structure (a DataFrame by default, but any Data.Sink may be given).

Positional arguments:

  • fullpath; can be a file name (string) or other IO instance
  • sink; a DataFrame by default, but may also be other Data.Sink types that support streaming via Data.Field interface

Keyword Arguments:

  • delim::Union{Char,UInt8}; how fields in the file are delimited
  • quotechar::Union{Char,UInt8}; the character that indicates a quoted field that may contain the delim or newlines
  • escapechar::Union{Char,UInt8}; the character that escapes a quotechar in a quoted field
  • null::String; an ascii string that indicates how NULL values are represented in the dataset
  • header; column names can be provided manually as a complete Vector{String}, or as an Int/Range which indicates the row/rows that contain the column names
  • datarow::Int; specifies the row on which the actual data starts in the file; by default, the data is expected on the next row after the header row(s)
  • types; column types can be provided manually as a complete Vector{DataType}, or in a Dict to reference a column by name or number
  • nullable::Bool; indicates whether values can be nullable or not; true by default. If set to false and missing values are encountered, a NullException will be thrown
  • dateformat::Union{AbstractString,Dates.DateFormat}; how all dates/datetimes are represented in the dataset
  • footerskip::Int; indicates the number of rows to skip at the end of the file
  • rows_for_type_detect::Int=100; indicates how many rows should be read to infer the types of columns
  • rows::Int; indicates the total number of rows to read from the file; by default the file is pre-parsed to count the # of rows
  • use_mmap::Bool=true; whether the underlying file will be mmapped or not while parsing

Note by default, "string" or text columns will be parsed as the WeakRefString type. This is a custom type that only stores a pointer to the actual byte data + the number of bytes. To convert a String to a standard Julia string type, just call string(::WeakRefString), this also works on an entire column string(::NullableVector{WeakRefString}). Oftentimes, however, it can be convenient to work with WeakRefStrings depending on the ultimate use, such as transfering the data directly to another system and avoiding all the intermediate byte copying.

Example usage:

julia> dt = CSV.read("bids.csv")
7656334×9 DataFrames.DataFrame
│ Row     │ bid_id  │ bidder_id                               │ auction │ merchandise      │ device      │
├─────────┼─────────┼─────────────────────────────────────────┼─────────┼──────────────────┼─────────────┤
│ 1       │ 0       │ "8dac2b259fd1c6d1120e519fb1ac14fbqvax8" │ "ewmzr" │ "jewelry"        │ "phone0"    │
│ 2       │ 1       │ "668d393e858e8126275433046bbd35c6tywop" │ "aeqok" │ "furniture"      │ "phone1"    │
│ 3       │ 2       │ "aa5f360084278b35d746fa6af3a7a1a5ra3xe" │ "wa00e" │ "home goods"     │ "phone2"    │
...

source

# CSV.writeFunction.

write a source::Data.Source out to a CSV.Sink

  • io::Union{String,IO}; a filename (String) or IO type to write the source to
  • source; a Data.Source type
  • delim::Union{Char,UInt8}; how fields in the file will be delimited
  • quotechar::Union{Char,UInt8}; the character that indicates a quoted field that may contain the delim or newlines
  • escapechar::Union{Char,UInt8}; the character that escapes a quotechar in a quoted field
  • null::String; the ascii string that indicates how NULL values will be represented in the dataset
  • dateformat; how dates/datetimes will be represented in the dataset
  • quotefields::Bool; whether all fields should be quoted or not
  • header::Bool; whether to write out the column names from source
  • append::Bool; start writing data at the end of io; by default, io will be reset to its beginning before writing

source

Lower-level utilities

CSV.Source
CSV.Sink
CSV.Options
CSV.parsefield
CSV.readline(::CSV.Source)
CSV.readsplitline
CSV.countlines(::CSV.Source)