smallpond.dataframe.DataFrame.repartition#
- DataFrame.repartition(npartitions: int, hash_by: str | List[str] | None = None, by: str | None = None, by_rows: bool = False, **kwargs) DataFrame #
Repartition the data into the given number of partitions.
Parameters#
- npartitions
The dataset would be split and distributed to npartitions partitions. If not specified, the number of partitions would be the default partition size of the context.
- hash_by, optional
If specified, the dataset would be repartitioned by the hash of the given columns.
- by, optional
If specified, the dataset would be repartitioned by the given column.
- by_rows, optional
If specified, the dataset would be repartitioned by rows instead of by files.
Examples#
df = df.repartition(10) # evenly distributed df = df.repartition(10, by_rows=True) # evenly distributed by rows df = df.repartition(10, hash_by='host') # hash partitioned df = df.repartition(10, by='bucket') # partitioned by column