smallpond.dataframe.DataFrame.map#
- DataFrame.map(sql_or_func: str | Callable[[Dict[str, Any]], Dict[str, Any]], *, schema: Schema | None = None, **kwargs) DataFrame #
Apply a function to each row.
Parameters#
- sql_or_func
A SQL expression or a function to apply to each row. For functions, it should take a dictionary of columns as input and returns a dictionary of columns. SQL expression is preferred as it’s more efficient.
- schema, optional
The schema of the output DataFrame. If not passed, will be inferred from the first row of the mapping values.
- udfs, optional
A list of user defined functions to be referenced in the SQL expression.
Examples#
df = df.map('a, b') df = df.map('a + b as c') df = df.map(lambda row: {'c': row['a'] + row['b']})
Use user-defined functions in SQL expression:
@udf(params=[UDFType.INT, UDFType.INT], return_type=UDFType.INT) def gcd(a: int, b: int) -> int: while b: a, b = b, a % b return a # load python udf df = df.map('gcd(a, b)', udfs=[gcd]) # load udf from duckdb extension df = df.map('gcd(a, b)', udfs=['path/to/udf.duckdb_extension'])