smallpond.dataframe.DataFrame.map#

DataFrame.map(sql_or_func: str | Callable[[Dict[str, Any]], Dict[str, Any]], *, schema: Schema | None = None, **kwargs) DataFrame#

Apply a function to each row.

Parameters#

sql_or_func

A SQL expression or a function to apply to each row. For functions, it should take a dictionary of columns as input and returns a dictionary of columns. SQL expression is preferred as it’s more efficient.

schema, optional

The schema of the output DataFrame. If not passed, will be inferred from the first row of the mapping values.

udfs, optional

A list of user defined functions to be referenced in the SQL expression.

Examples#

df = df.map('a, b')
df = df.map('a + b as c')
df = df.map(lambda row: {'c': row['a'] + row['b']})

Use user-defined functions in SQL expression:

@udf(params=[UDFType.INT, UDFType.INT], return_type=UDFType.INT)
def gcd(a: int, b: int) -> int:
    while b:
        a, b = b, a % b
    return a
# load python udf
df = df.map('gcd(a, b)', udfs=[gcd])

# load udf from duckdb extension
df = df.map('gcd(a, b)', udfs=['path/to/udf.duckdb_extension'])