smallpond.execution.task.PartitionConsumerTask#

class smallpond.execution.task.PartitionConsumerTask(ctx: RuntimeContext, input_deps: List[PartitionProducerTask], partition_infos: List[PartitionInfo])#
__init__(ctx: RuntimeContext, input_deps: List[PartitionProducerTask], partition_infos: List[PartitionInfo]) None#

Methods

__init__(ctx, input_deps, partition_infos)

add_elapsed_time([metric_name])

Start or stop the timer.

adjust_row_group_size(nbytes, num_rows[, ...])

clean_complex_attrs()

clean_output([force])

cleanup()

Called after run() even if there is an exception.

compute_avg_row_size(nbytes, num_rows)

dump()

exec([cq])

finalize()

Called after run() to finalize the processing.

get_partition_info(dimension)

initialize()

Called before run() to prepare for running the task.

inject_fault()

merge_metrics(metrics)

oom([nonzero_exitcode_as_oom])

parquet_kv_metadata_bytes([extra_partitions])

parquet_kv_metadata_str([extra_partitions])

random_float()

random_uint32()

run()

run_on_ray()

Run the task on Ray.

set_memory_limit(soft_limit, hard_limit)

Attributes

last_partition

allow_speculative_exec

Whether the task is allowed to be executed by speculative execution.

any_input_empty

cpu_limit

ctx

dataset

default_output_name

elapsed_time

exception

exec_cq

exec_id

exec_on_scheduler

fail_count

final_output_abspath

finish_time

gpu_limit

id

input_datasets

input_deps

key

local_gpu

Return the first GPU granted to this task.

local_gpu_ranks

Return all GPU ranks granted to this task.

local_rank

Return the first GPU rank granted to this task.

location

memory_limit

node_id

numa_node

numpy_random_gen

output

output_deps

output_dirname

output_filename

output_name

output_root

partition_dims

partition_infos

partition_infos_as_dict

perf_metrics

perf_profile

python_random_gen

random_seed_bytes

ray_dataset_path

The path of a pickle file that contains the output dataset of the task.

ray_marker_path

The path of an empty file that is used to determine if the task has been started.

retry_count

runtime_id

runtime_output_abspath

Output data will be produced in this directory at runtime.

runtime_state

sched_epoch

self_contained_output

Whether the output of this node is not dependent on any input nodes.

skip_when_any_input_empty

staging_root

If the task has a special output directory, its runtime output directory will be under it.

start_time

status

temp_abspath

temp_output

Whether the output of this node is stored in a temporary directory.

uniform_failure_prob