
DataFrame.repartition(npartitions: int, hash_by: str | List[str] | None = None, by: str | None = None, by_rows: bool = False, **kwargs) DataFrame#

Repartition the data into the given number of partitions.



The dataset would be split and distributed to npartitions partitions. If not specified, the number of partitions would be the default partition size of the context.

hash_by, optional

If specified, the dataset would be repartitioned by the hash of the given columns.

by, optional

If specified, the dataset would be repartitioned by the given column.

by_rows, optional

If specified, the dataset would be repartitioned by rows instead of by files.


df = df.repartition(10)                 # evenly distributed
df = df.repartition(10, by_rows=True)   # evenly distributed by rows
df = df.repartition(10, hash_by='host') # hash partitioned
df = df.repartition(10, by='bucket')    # partitioned by column