smallpond.logical.node.Node#

class smallpond.logical.node.Node(ctx: Context, input_deps: Tuple[Node, ...], output_name: str | None = None, output_path: str | None = None, cpu_limit: int = 1, gpu_limit: float = 0, memory_limit: int | None = None)#

The base class for all nodes.

__init__(ctx: Context, input_deps: Tuple[Node, ...], output_name: str | None = None, output_path: str | None = None, cpu_limit: int = 1, gpu_limit: float = 0, memory_limit: int | None = None) None#

The base class for all nodes in logical plan.

Parameters#

ctx

The context of logical plan.

input_deps

Define the inputs of this node.

output_name, optional

The prefix of output directories and filenames for tasks generated from this node. The default output_name is the class name of the task created for this node, e.g. HashPartitionTask, SqlEngineTask, PythonScriptTask, etc.

The output_name string should only include alphanumeric characters and underscore. In other words, it matches regular expression [a-zA-Z0-9_]+.

If output_name is set and output_path is None, the path format of output files is: {job_root_path}/output/{output_name}/{task_runtime_id}/{output_name}-{task_runtime_id}-{seqnum}.parquet where {task_runtime_id} is defined as {job_id}.{task_id}.{sched_epoch}.{task_retry_count}.

output_path, optional

The absolute path of a customized output folder for tasks generated from this node. Any shared folder that can be accessed by executor and scheduler is allowed although IO performance varies across filesystems.

If both output_name and output_path are specified, the path format of output files is: {output_path}/{output_name}/{task_runtime_id}/{output_name}-{task_runtime_id}-{seqnum}.parquet where {task_runtime_id} is defined as {job_id}.{task_id}.{sched_epoch}.{task_retry_count}.

cpu_limit, optional

The max number of CPUs would be used by tasks generated from this node. This is a resource requirement specified by the user and used to guide task scheduling. smallpond does NOT enforce this limit.

gpu_limit, optional

The max number of GPUs would be used by tasks generated from this node. This is a resource requirement specified by the user and used to guide task scheduling. smallpond does NOT enforce this limit.

memory_limit, optional

The max memory would be used by tasks generated from this node. The memory limit is automatically calculated based memory-to-cpu ratio of executor machine if not specified. This is a resource requirement specified by the user and used to guide task scheduling. smallpond does NOT enforce this limit.

Methods

__init__(ctx, input_deps[, output_name, ...])

The base class for all nodes in logical plan.

add_perf_metrics(name, value)

create_task(runtime_ctx, input_deps, ...)

get_perf_stats(name)

slim_copy()

task_factory(task_builder)

Attributes

enable_resource_boost

num_partitions