Python Polars Tutorial
last modified March 1, 2025
Polars is a fast DataFrame library in Python designed for efficient data manipulation and analysis. It is built for performance, leveraging Rust under the hood. This tutorial introduces Polars with practical examples.
Polars supports lazy and eager execution modes, making it ideal for large datasets. It provides a Pandas-like API with additional optimizations.
Creating a DataFrame
This example shows how to create a Polars DataFrame from a dictionary.
import polars as pl
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pl.DataFrame(data)
print(df)
The pl.DataFrame function creates a DataFrame from a dictionary.
This is the simplest way to initialize a Polars DataFrame.
Reading a CSV File
This example demonstrates reading a CSV file into a Polars DataFrame.
import polars as pl
df = pl.read_csv('data.csv')
print(df)
The pl.read_csv function reads a CSV file into a DataFrame. Polars
supports various file formats, including Parquet and JSON.
Filtering Rows
This example shows how to filter rows based on a condition.
import polars as pl
df = pl.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})
filtered_df = df.filter(pl.col('Age') > 30)
print(filtered_df)
The filter method filters rows where the 'Age' column is greater
than 30. Polars uses expressions for efficient filtering.
Selecting Columns
This example demonstrates selecting specific columns from a DataFrame.
import polars as pl
df = pl.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
selected_df = df.select(['Name', 'City'])
print(selected_df)
The select method selects specific columns from the DataFrame.
This is useful for focusing on relevant data.
Adding a New Column
This example shows how to add a new column to a DataFrame.
import polars as pl
df = pl.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})
df = df.with_column((pl.col('Age') * 2).alias('DoubleAge'))
print(df)
The with_column method adds a new column 'DoubleAge', which is
twice the 'Age' column. Polars supports column-wise operations.
Grouping and Aggregating
This example demonstrates grouping data and calculating aggregate statistics.
import polars as pl
df = pl.DataFrame({
'City': ['New York', 'Los Angeles', 'New York', 'Chicago'],
'Sales': [100, 200, 150, 300]
})
grouped_df = df.groupby('City').agg([
pl.col('Sales').sum().alias('TotalSales')
])
print(grouped_df)
The groupby and agg methods group data by 'City'
and calculate the total sales for each city. Polars supports efficient grouping.
Sorting Data
This example shows how to sort a DataFrame by a specific column.
import polars as pl
df = pl.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})
sorted_df = df.sort('Age', reverse=True)
print(sorted_df)
The sort method sorts the DataFrame by the 'Age' column in
descending order. Polars provides efficient sorting algorithms.
Lazy Execution
This example demonstrates lazy execution for optimizing performance.
import polars as pl
df = pl.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})
lazy_df = df.lazy().filter(pl.col('Age') > 30).collect()
print(lazy_df)
The lazy method enables lazy execution, which optimizes queries
before execution. Use collect to trigger computation.
Best Practices for Using Polars
- Use Lazy Execution: Optimize queries with lazy execution for large datasets.
- Leverage Expressions: Use Polars expressions for efficient data manipulation.
- Choose Appropriate Data Types: Use correct data types to improve performance.
- Profile Queries: Profile queries to identify bottlenecks.
Source
In this article, we have explored the basics of Polars with practical examples.
Author
List all Polars tutorials.