CSV.jl Documentation
CSV.jl is built to be a fast and flexible pure-Julia library for handling delimited text files.
High-level interface
CSV.File
— Type.CSV.File(source::Union{String, IO}; kwargs...) => CSV.File
Read a csv input (a filename given as a String, or any other IO source), returning a CSV.File
object. Opens the file and uses passed arguments to detect the number of columns and column types. The returned CSV.File
object supports the Tables.jl interface and can iterate CSV.Row
s. CSV.Row
supports propertynames
and getproperty
to access individual row values. Note that duplicate column names will be detected and adjusted to ensure uniqueness (duplicate column name a
will become a_1
). For example, one could iterate over a csv file with column names a
, b
, and c
by doing:
for row in CSV.File(file)
println("a=$(row.a), b=$(row.b), c=$(row.c)")
end
By supporting the Tables.jl interface, a CSV.File
can also be a table input to any other table sink function. Like:
# materialize a csv file as a DataFrame
df = CSV.File(file) |> DataFrame
# load a csv file directly into an sqlite database table
db = SQLite.DB()
tbl = CSV.File(file) |> SQLite.load!(db, "sqlite_table")
Supported keyword arguments include:
- File layout options:
header=1
: theheader
argument can be anInt
, indicating the row to parse for column names; or aRange
, indicating a span of rows to be combined together as column names; or an entireVector of Symbols
orStrings
to use as column namesnormalizenames=false
: whether column names should be "normalized" into valid Julia identifier symbolsdatarow
: anInt
argument to specify the row where the data starts in the csv file; by default, the next row after theheader
row is usedskipto::Int
: similar todatarow
, specifies the number of rows to skip before starting to read datafooterskip::Int
: number of rows at the end of a file to skip parsinglimit
: anInt
to indicate a limited number of rows to parse in a csv filetranspose::Bool
: read a csv file "transposed", i.e. each column is parsed as a rowcomment
: aString
that occurs at the beginning of a line to signal parsing that row should be skippeduse_mmap::Bool=!Sys.iswindows()
: whether the file should be mmapped for reading, which in some cases can be faster
- Parsing options:
missingstrings
,missingstring
: either aString
, orVector{String}
to use as sentinel values that will be parsed asmissing
; by default, only an empty field (two consecutive delimiters) is consideredmissing
delim=','
: aChar
orString
that indicates how columns are delimited in a fileignorerepeated::Bool=false
: whether repeated (consecutive) delimiters should be ignored while parsing; useful for fixed-width files with delimiter padding between cellsquotechar='"'
,openquotechar
,closequotechar
: aChar
(or different start and end characters) that indicate a quoted field which may contain textual delimiters or newline charactersescapechar='"'
: theChar
used to escape quote characters in a text fielddateformat::Union{String, Dates.DateFormat, Nothing}
: a date format string to indicate how Date/DateTime columns are formatted in a delimited filedecimal
: aChar
indicating how decimals are separated in floats, i.e.3.14
used '.', or3,14
uses a comma ','truestrings
,falsestrings
:Vectors of Strings
that indicate howtrue
orfalse
values are represented
- Column Type Options:
types
: a Vector or Dict of types to be used for column types; a Dict can map column indexInt
, or nameSymbol
orString
to type for a column, i.e. Dict(1=>Float64) will set the first column as a Float64, Dict(:column1=>Float64) will set the column named column1 to Float64 and, Dict("column1"=>Float64) will set the column1 to Float64typemap::Dict{Type, Type}
: a mapping of a type that should be replaced in every instance with another type, i.e.Dict(Float64=>String)
would change every detectedFloat64
column to be parsed asStrings
allowmissing=:all
: indicate how missing values are allowed in columns; possible values are:all
- all columns may contain missings,:auto
- auto-detect columns that contain missings or,:none
- no columns may contain missingscategorical::Union{Bool, Real}=false
: iftrue
, columns detected asString
are returned as aCategoricalArray
; alternatively, the proportion of unique values below whichString
columns should be treated as categorical (for example 0.1 for 10%)strict::Bool=false
: whether invalid values should throw a parsing error or be replaced with missing valuessilencewarnings::Bool=false
: whether invalid value warnings should be silenced (requiresstrict=false
)
CSV.validate
— Function.CSV.validate(fullpath::Union{AbstractString,IO}, sink::Type{T}=DataFrame, args...; kwargs...)
=> typeof(sink)
CSV.validate(fullpath::Union{AbstractString,IO}, sink::Data.Sink; kwargs...)
=> Data.Sink
Takes the same positional & keyword arguments as CSV.read
, but provides detailed information as to why reading a csv file failed. Useful for cases where reading fails and it's not clear whether it's due to a row having too many columns, or wrong types, or what have you.
CSV.write
— Function.CSV.write(file::Union{String, IO}, file; kwargs...) => file
table |> CSV.write(file::Union{String, IO}; kwargs...) => file
Write a Tables.jl interface input to a csv file, given as an IO
argument or String representing the file name to write to.
Keyword arguments include:
delim::Union{Char, String}=','
: a character or string to print out as the file's delimiterquotechar::Char='"'
: character to use for quoting text fields that may contain delimiters or newlinesopenquotechar::Char
: instead ofquotechar
, useopenquotechar
andclosequotechar
to support different starting and ending quote charactersescapechar::Char='\'
: character used to escape quote characters in a text fieldmissingstring::String=""
: string to printdateformat=Dates.default_format(T)
: the date format string to use for printing out Date & DateTime columnsappend=false
: whether to append writing to an existing file/IO, iftrue
, it will not write column names by defaultwriteheader=!append
: whether to write an initial row of delimited column names, not written by default if appendingheader
: pass a list of column names (Symbols or Strings) to use instead of the column names of the input table