smallpond.dataframe.Session.partial_sql#
- Session.partial_sql(query: str, *inputs: DataFrame, **kwargs) DataFrame #
Execute a SQL query on each partition of the input DataFrames.
The query can contain placeholder {0}, {1}, etc. for the input DataFrames. If multiple DataFrames are provided, they must have the same number of partitions.
Examples#
Join two datasets. You need to make sure the join key is correctly partitioned.
a = sp.read_parquet("a/*.parquet").repartition(10, hash_by="id") b = sp.read_parquet("b/*.parquet").repartition(10, hash_by="id") c = sp.partial_sql("select * from {0} join {1} on a.id = b.id", a, b)