smallpond.logical.node.DataSinkNode#

class smallpond.logical.node.DataSinkNode(ctx: Context, input_deps: Tuple[Node, ...], output_path: str, type: Literal['link', 'copy', 'link_or_copy', 'manifest'] = 'link', manifest_only=False, is_final_node=False)#

Collect the output files of input_deps to output_path. Depending on the options, it may create hard links, symbolic links, manifest files, or copy files.

__init__(ctx: Context, input_deps: Tuple[Node, ...], output_path: str, type: Literal['link', 'copy', 'link_or_copy', 'manifest'] = 'link', manifest_only=False, is_final_node=False) None#

Construct a DataSinkNode. See Node.__init__() to find comments on other parameters.

Parameters#

output_path

The absolute path of a customized output folder. If set to None, an output folder would be created under the default output root. Any shared folder that can be accessed by executor and scheduler is allowed although IO performance varies across filesystems.

type, optional

The operation type of the sink node. “link” (default): If an output file is under the same mount point as output_path, a hard link is created; otherwise a symlink. “copy”: Copies files to the output path. “link_or_copy”: If an output file is under the same mount point as output_path, creates a hard link; otherwise copies the file. “manifest”: Creates a manifest file under output_path. Every line of the manifest file is a path string.

manifest_only, optional, deprecated

Set type to “manifest”.

Methods

__init__(ctx, input_deps, output_path[, ...])

Construct a DataSinkNode.

add_perf_metrics(name, value)

create_task(*args, **kwargs)

get_perf_stats(name)

slim_copy()

task_factory(task_builder)

Attributes

enable_resource_boost

num_partitions