smallpond.logical.node.UserPartitionedDataSourceNode#
- class smallpond.logical.node.UserPartitionedDataSourceNode(ctx: Context, partitioned_datasets: List[DataSet], dimension: str | None = None)#
- __init__(ctx: Context, partitioned_datasets: List[DataSet], dimension: str | None = None) None #
Partition the outputs of input_deps into n partitions.
Parameters#
- npartitions
The dataset would be split and distributed to npartitions partitions.
- dimension
The unique partition dimension. Required if this is a nested partition.
- nested, optional
npartitions subpartitions are created in each existing partition of input_deps if true.
Examples#
See unit tests in test/test_partition.py. For nested partition see test_nested_partition. Why nested partition? See 5.1 Partial Partitioning of [Advanced partitioning techniques for massively distributed computation](https://dl.acm.org/doi/10.1145/2213836.2213839).
Methods
__init__
(ctx, partitioned_datasets[, dimension])Partition the outputs of input_deps into n partitions.
add_perf_metrics
(name, value)create_consumer_task
(*args, **kwargs)create_merge_task
(*args, **kwargs)create_producer_task
(*args, **kwargs)create_split_task
(*args, **kwargs)create_task
(runtime_ctx, input_deps, ...)get_perf_stats
(name)partition
(runtime_ctx, dataset)slim_copy
()task_factory
(task_builder)Attributes
enable_resource_boost
max_card_of_producers_x_consumers
max_num_producer_tasks
num_partitions