CSV.jl Documentation
High-level interface
#
CSV.read
— Function.
CSV.read(fullpath::Union{AbstractString,IO}, sink::Type{T}=DataFrame, args...; kwargs...)
=> typeof(sink)
CSV.read(fullpath::Union{AbstractString,IO}, sink::Data.Sink; kwargs...)
=> Data.Sink
parses a delimited file into a Julia structure (a DataFrame by default, but any Data.Sink
may be given).
Positional arguments:
fullpath
; can be a file name (string) or otherIO
instancesink::Type{T}
;DataFrame
by default, but may also be otherData.Sink
types that support streaming viaData.Field
interface; note that the method argument can be the type ofData.Sink
, plus any required arguments the sink may need (args...
). or an already constructedsink
may be passed (2nd method above)
Keyword Arguments:
delim::Union{Char,UInt8}
; a single character or ascii-compatible byte that indicates how fields in the file are delimited; default isUInt8(',')
quotechar::Union{Char,UInt8}
; the character that indicates a quoted field that may contain thedelim
or newlines; default isUInt8('"')
escapechar::Union{Char,UInt8}
; the character that escapes aquotechar
in a quoted field; default isUInt8('\')
null::String
; an ascii string that indicates how NULL values are represented in the dataset; default is the empty string,""
header
; column names can be provided manually as a complete Vector{String}, or as an Int/Range which indicates the row/rows that contain the column namesdatarow::Int
; specifies the row on which the actual data starts in the file; by default, the data is expected on the next row after the header row(s); for a file without column names (header), specifydatarow=1
types
; column types can be provided manually as a complete Vector{DataType}, or in a Dict to reference individual columns by name or numbernullable::Bool
; indicates whether values can be nullable or not;true
by default. If set tofalse
and missing values are encountered, aNullException
will be throwndateformat::Union{AbstractString,Dates.DateFormat}
; how all dates/datetimes in the dataset are formattedfooterskip::Int
; indicates the number of rows to skip at the end of the filerows_for_type_detect::Int=100
; indicates how many rows should be read to infer the types of columnsrows::Int
; indicates the total number of rows to read from the file; by default the file is pre-parsed to count the # of rows;-1
can be passed to skip a full-file scan, but theData.Sink
must be setup account for a potentially unknown # of rowsuse_mmap::Bool=true
; whether the underlying file will be mmapped or not while parsing
Note by default, "string" or text columns will be parsed as the WeakRefString
type. This is a custom type that only stores a pointer to the actual byte data + the number of bytes. To convert a String
to a standard Julia string type, just call string(::WeakRefString)
, this also works on an entire column. Oftentimes, however, it can be convenient to work with WeakRefStrings
depending on the ultimate use, such as transfering the data directly to another system and avoiding all the intermediate copying.
Example usage:
julia> dt = CSV.read("bids.csv")
7656334×9 DataFrames.DataFrame
│ Row │ bid_id │ bidder_id │ auction │ merchandise │ device │
├─────────┼─────────┼─────────────────────────────────────────┼─────────┼──────────────────┼─────────────┤
│ 1 │ 0 │ "8dac2b259fd1c6d1120e519fb1ac14fbqvax8" │ "ewmzr" │ "jewelry" │ "phone0" │
│ 2 │ 1 │ "668d393e858e8126275433046bbd35c6tywop" │ "aeqok" │ "furniture" │ "phone1" │
│ 3 │ 2 │ "aa5f360084278b35d746fa6af3a7a1a5ra3xe" │ "wa00e" │ "home goods" │ "phone2" │
...
#
CSV.write
— Function.
CSV.write(fullpath::Union{AbstractString,IO}, source::Type{T}, args...; kwargs...)
=> CSV.Sink
CSV.write(fullpath::Union{AbstractString,IO}, source::Data.Source; kwargs...)
=> CSV.Sink
write a Data.Source
out to a CSV.Sink
.
Positional Arguments:
fullpath
; can be a file name (string) or otherIO
instancesource
can be the type ofData.Source
, plus any requiredargs...
, or an already constructedData.Source
can be passsed in directly (2nd method)
Keyword Arguments:
delim::Union{Char,UInt8}
; how fields in the file will be delimited; default isUInt8(',')
quotechar::Union{Char,UInt8}
; the character that indicates a quoted field that may contain thedelim
or newlines; default isUInt8('"')
escapechar::Union{Char,UInt8}
; the character that escapes aquotechar
in a quoted field; default isUInt8('\')
null::String
; the ascii string that indicates how NULL values will be represented in the dataset; default is the emtpy string""
dateformat
; how dates/datetimes will be represented in the dataset; default is ISO-8601yyyy-mm-ddTHH:MM:SS.s
header::Bool
; whether to write out the column names fromsource
append::Bool
; start writing data at the end ofio
; by default,io
will be reset to its beginning before writing
Lower-level utilities
#
CSV.Source
— Type.
constructs a CSV.Source
file ready to start parsing data from
implements the Data.Source
interface for providing convenient Data.stream!
methods for various Data.Sink
types
#
CSV.Sink
— Type.
constructs a CSV.Sink
file ready to start writing data to
implements the Data.Sink
interface for providing convenient Data.stream!
methods for various Data.Source
types
#
CSV.Options
— Type.
Represents the various configuration settings for csv file parsing.
Keyword Arguments:
delim
::Union{Char,UInt8} = how fields in the file are delimitedquotechar
::Union{Char,UInt8} = the character that indicates a quoted field that may contain thedelim
or newlinesescapechar
::Union{Char,UInt8} = the character that escapes aquotechar
in a quoted fieldnull
::String = indicates how NULL values are represented in the datasetdateformat
::Union{AbstractString,Dates.DateFormat} = how dates/datetimes are represented in the dataset
#
CSV.parsefield
— Function.
CSV.parsefield{T}(io::IO, ::Type{T}, opt::CSV.Options=CSV.Options(), row=0, col=0)
=> Nullable{T}
io
is an IO
type that is positioned at the first byte/character of an delimited-file field (i.e. a single cell) leading whitespace is ignored for Integer and Float types. returns a Nullable{T}
saying whether the field contains a null value or not (empty field, missing value) field is null if the next delimiter or newline is encountered before any other characters. Specialized methods exist for Integer, Float, String, Date, and DateTime. For other types T
, a generic fallback requires parse(T, str::String)
to be defined. the field value may also be wrapped in opt.quotechar
; two consecutive opt.quotechar
results in a null field opt.null
is also checked if there is a custom value provided (i.e. "NA", "\N", etc.) For numeric fields, if field is non-null and non-digit characters are encountered at any point before a delimiter or newline, an error is thrown
#
CSV.readline
— Function.
CSV.readline(io::IO, q='"', e='\', buf::IOBuffer=IOBuffer())
=> String
CSV.readline(source::CSV.Source)
=> String
read a single line from io
(any IO
type) or a CSV.Source
as a string, accounting for potentially embedded newlines in quoted fields (e.g. value1, value2, "value3 with embedded newlines"). Can optionally provide a buf::IOBuffer
type for buffer reuse
#
CSV.readsplitline
— Function.
CSV.readsplitline(io, d=',', q='"', e='\', buf::IOBuffer=IOBuffer())
=> Vector{String}
CSV.readsplitline(source::CSV.Source)
=> Vector{String}
read a single line from io
(any IO
type) as a Vector{String}
with elements being delimited fields (separated by a delimiter d
). Can optionally provide a buf::IOBuffer
type for buffer reuse
#
CSV.countlines
— Function.
CSV.countlines(io::IO, quotechar, escapechar)
=> Int
CSV.countlines(source::CSV.Source)
=> Int
count the number of lines in a file, accounting for potentially embedded newlines in quoted fields