Polars Select Function
last modified March 1, 2025
Polars is a fast, efficient DataFrame library in Python. The select
function is used to choose specific columns from a DataFrame. This tutorial
covers how to use the select function with practical examples.
The select function is essential for data manipulation tasks like
filtering columns, renaming, and applying transformations. Polars provides a
simple and intuitive API for these operations.
Basic Column Selection
This example shows how to select specific columns from a DataFrame.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select(['A', 'B'])
print(selected)
The select(['A', 'B']) selects columns 'A' and 'B' from the
DataFrame. This is useful for focusing on specific data.
Select with Renaming
This example demonstrates renaming columns during selection.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select([
pl.col('A').alias('Column1'),
pl.col('B').alias('Column2')
])
print(selected)
The alias function renames columns during selection. This is useful
for creating more readable column names.
Select with Expression
This example shows how to apply an expression during column selection.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select([
(pl.col('A') + pl.col('B')).alias('Sum')
])
print(selected)
The pl.col('A') + pl.col('B') expression calculates the sum of
columns 'A' and 'B'. This is useful for creating derived columns.
Select with Filter
This example demonstrates filtering rows during column selection.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select([
pl.col('A').filter(pl.col('A') > 1)
])
print(selected)
The filter(pl.col('A') > 1) filters rows where column 'A' is greater
than 1. This is useful for conditional data selection.
Select with Aggregation
This example shows how to aggregate data during column selection.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select([
pl.col('A').sum().alias('Total')
])
print(selected)
The sum function calculates the total of column 'A'. This is useful
for summarizing data.
Select with Multiple Expressions
This example demonstrates selecting multiple columns with different expressions.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select([
pl.col('A').alias('Column1'),
(pl.col('B') * 2).alias('DoubleB')
])
print(selected)
The pl.col('B') * 2 expression doubles the values in column 'B'. This
is useful for applying multiple transformations.
Select with Conditional Logic
This example shows how to use conditional logic during column selection.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
selected = df.select([
pl.when(pl.col('A') > 1).then(pl.col('B')).otherwise(0).alias('Result')
])
print(selected)
The when and then functions apply conditional logic.
This is useful for creating dynamic columns.
Select with String Operations
This example demonstrates string operations during column selection.
import polars as pl
df = pl.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie']
})
selected = df.select([
pl.col('Name').str.to_uppercase().alias('UppercaseName')
])
print(selected)
The str.to_uppercase function converts names to uppercase. This is
useful for text manipulation.
Select with Date Operations
This example shows how to perform date operations during column selection.
import polars as pl
df = pl.DataFrame({
'Date': ['2023-01-01', '2023-02-01', '2023-03-01']
})
selected = df.select([
pl.col('Date').str.strptime(pl.Date, '%Y-%m-%d').dt.month().alias('Month')
])
print(selected)
The dt.month function extracts the month from the date. This is
useful for time-based analysis.
Select with Nested Data
This example demonstrates selecting nested data from a DataFrame.
import polars as pl
df = pl.DataFrame({
'A': [1, 2, 3],
'B': [[4, 5], [6, 7], [8, 9]]
})
selected = df.select([
pl.col('B').arr.get(0).alias('FirstElement')
])
print(selected)
The arr.get(0) function extracts the first element from nested lists.
This is useful for working with complex data structures.
Best Practices for Using Select
- Plan Column Selection: Identify required columns before applying
select. - Use Aliases: Rename columns for better readability.
- Combine Expressions: Apply multiple transformations in a single
selectcall. - Optimize Performance: Avoid unnecessary column selections to improve speed.
Source
In this article, we have explored how to use the select function in Polars.
Author
List all Polars tutorials.