Dataset#

Dataset represents a collection of files.

To create a dataset:

dataset = ParquetDataSet("path/to/dataset/*.parquet")

DataSets#

DataSet(paths[, root_dir, recursive, ...])

The base class for all datasets.

FileSet(paths[, root_dir, recursive, ...])

A set of files.

ParquetDataSet(paths[, root_dir, recursive, ...])

A set of parquet files.

CsvDataSet(paths, schema[, delim, ...])

A set of csv files.

JsonDataSet(paths, schema[, format, ...])

A set of json files.

ArrowTableDataSet(table)

An arrow table.

PandasDataSet(df)

A pandas dataframe.

PartitionedDataSet(datasets)

A dataset that is partitioned into multiple datasets.

SqlQueryDataSet(sql_query[, query_builder])

The result of a sql query.