smallpond.logical.node.EvenlyDistributedPartitionNode#

class smallpond.logical.node.EvenlyDistributedPartitionNode(ctx: Context, input_deps: Tuple[Node, ...], npartitions: int, dimension: str | None = None, nested: bool = False, *, partition_by_rows=False, random_shuffle=False, output_name: str | None = None, output_path: str | None = None, cpu_limit: int = 1, memory_limit: int | None = None)#

Evenly distribute the output files or rows of input_deps into n partitions.

__init__(ctx: Context, input_deps: Tuple[Node, ...], npartitions: int, dimension: str | None = None, nested: bool = False, *, partition_by_rows=False, random_shuffle=False, output_name: str | None = None, output_path: str | None = None, cpu_limit: int = 1, memory_limit: int | None = None) None#

Evenly distribute the output files or rows of input_deps into n partitions.

Parameters#

partition_by_rows, optional

Evenly distribute rows instead of input files into npartitions partitions, by default distribute by files.

random_shuffle, optional

Random shuffle the list of paths or parquet row groups (if partition_by_rows=True) in input datasets.

Methods

__init__(ctx, input_deps, npartitions[, ...])

Evenly distribute the output files or rows of input_deps into n partitions.

add_perf_metrics(name, value)

create_consumer_task(*args, **kwargs)

create_merge_task(*args, **kwargs)

create_producer_task(*args, **kwargs)

create_split_task(*args, **kwargs)

create_task(runtime_ctx, input_deps, ...)

get_perf_stats(name)

slim_copy()

task_factory(task_builder)

Attributes

enable_resource_boost

max_card_of_producers_x_consumers

max_num_producer_tasks

num_partitions