smallpond.logical.node.EvenlyDistributedPartitionNode#
- class smallpond.logical.node.EvenlyDistributedPartitionNode(ctx: Context, input_deps: Tuple[Node, ...], npartitions: int, dimension: str | None = None, nested: bool = False, *, partition_by_rows=False, random_shuffle=False, output_name: str | None = None, output_path: str | None = None, cpu_limit: int = 1, memory_limit: int | None = None)#
Evenly distribute the output files or rows of input_deps into n partitions.
- __init__(ctx: Context, input_deps: Tuple[Node, ...], npartitions: int, dimension: str | None = None, nested: bool = False, *, partition_by_rows=False, random_shuffle=False, output_name: str | None = None, output_path: str | None = None, cpu_limit: int = 1, memory_limit: int | None = None) None#
Evenly distribute the output files or rows of input_deps into n partitions.
Parameters#
- partition_by_rows, optional
Evenly distribute rows instead of input files into npartitions partitions, by default distribute by files.
- random_shuffle, optional
Random shuffle the list of paths or parquet row groups (if partition_by_rows=True) in input datasets.
Methods
__init__(ctx, input_deps, npartitions[, ...])Evenly distribute the output files or rows of input_deps into n partitions.
add_perf_metrics(name, value)create_consumer_task(*args, **kwargs)create_merge_task(*args, **kwargs)create_producer_task(*args, **kwargs)create_split_task(*args, **kwargs)create_task(runtime_ctx, input_deps, ...)get_perf_stats(name)slim_copy()task_factory(task_builder)Attributes
enable_resource_boostmax_card_of_producers_x_consumersmax_num_producer_tasksnum_partitions