Python Polars Tutorial

last modified March 1, 2025

Polars is a fast DataFrame library in Python designed for efficient data manipulation and analysis. It is built for performance, leveraging Rust under the hood. This tutorial introduces Polars with practical examples.

Polars supports lazy and eager execution modes, making it ideal for large datasets. It provides a Pandas-like API with additional optimizations.

Creating a DataFrame

This example shows how to create a Polars DataFrame from a dictionary.

create_df.py

import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pl.DataFrame(data)
print(df)

The pl.DataFrame function creates a DataFrame from a dictionary. This is the simplest way to initialize a Polars DataFrame.

Reading a CSV File

This example demonstrates reading a CSV file into a Polars DataFrame.

read_csv.py

import polars as pl

df = pl.read_csv('data.csv')
print(df)

The pl.read_csv function reads a CSV file into a DataFrame. Polars supports various file formats, including Parquet and JSON.

Filtering Rows

This example shows how to filter rows based on a condition.

filter_rows.py

import polars as pl

df = pl.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

filtered_df = df.filter(pl.col('Age') > 30)
print(filtered_df)

The filter method filters rows where the 'Age' column is greater than 30. Polars uses expressions for efficient filtering.

Selecting Columns

This example demonstrates selecting specific columns from a DataFrame.

select_columns.py

import polars as pl

df = pl.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

selected_df = df.select(['Name', 'City'])
print(selected_df)

The select method selects specific columns from the DataFrame. This is useful for focusing on relevant data.

Adding a New Column

This example shows how to add a new column to a DataFrame.

add_column.py

import polars as pl

df = pl.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

df = df.with_column((pl.col('Age') * 2).alias('DoubleAge'))
print(df)

The with_column method adds a new column 'DoubleAge', which is twice the 'Age' column. Polars supports column-wise operations.

Grouping and Aggregating

This example demonstrates grouping data and calculating aggregate statistics.

group_agg.py

import polars as pl

df = pl.DataFrame({
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago'],
    'Sales': [100, 200, 150, 300]
})

grouped_df = df.groupby('City').agg([
    pl.col('Sales').sum().alias('TotalSales')
])
print(grouped_df)

The groupby and agg methods group data by 'City' and calculate the total sales for each city. Polars supports efficient grouping.

Sorting Data

This example shows how to sort a DataFrame by a specific column.

sort_data.py

import polars as pl

df = pl.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

sorted_df = df.sort('Age', reverse=True)
print(sorted_df)

The sort method sorts the DataFrame by the 'Age' column in descending order. Polars provides efficient sorting algorithms.

Lazy Execution

This example demonstrates lazy execution for optimizing performance.

lazy_execution.py

import polars as pl

df = pl.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

lazy_df = df.lazy().filter(pl.col('Age') > 30).collect()
print(lazy_df)

The lazy method enables lazy execution, which optimizes queries before execution. Use collect to trigger computation.

Best Practices for Using Polars

Use Lazy Execution: Optimize queries with lazy execution for large datasets.
Leverage Expressions: Use Polars expressions for efficient data manipulation.
Choose Appropriate Data Types: Use correct data types to improve performance.
Profile Queries: Profile queries to identify bottlenecks.

Source

Polars Documentation

In this article, we have explored the basics of Polars with practical examples.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Polars tutorials.

Ebooks