smallpond.dataframe.Session.partial_sql#

Session.partial_sql(query: str, *inputs: DataFrame, **kwargs) DataFrame#

Execute a SQL query on each partition of the input DataFrames.

The query can contain placeholder {0}, {1}, etc. for the input DataFrames. If multiple DataFrames are provided, they must have the same number of partitions.

Examples#

Join two datasets. You need to make sure the join key is correctly partitioned.

a = sp.read_parquet("a/*.parquet").repartition(10, hash_by="id")
b = sp.read_parquet("b/*.parquet").repartition(10, hash_by="id")
c = sp.partial_sql("select * from {0} join {1} on a.id = b.id", a, b)