Python glob
last modified May 28, 2026
In this article, we show how to use the glob module in Python. The
glob module finds all pathnames matching a specified pattern
according to the rules used by the Unix shell. It supports the wildcards
*, ?, and [...], as well as recursive
matching with **.
The glob module is part of Python's standard library and requires
no additional installation. It is particularly useful in scripts that need to
process batches of files selected by name pattern rather than enumerating them
explicitly.
Basic Pattern Matching
The glob.glob function returns a list of pathnames that match the
given pattern. The * wildcard matches any number of characters
within a single directory component (but not a path separator).
import glob
import os
from pathlib import Path
# Build a small directory of sample files
Path('data').mkdir(exist_ok=True)
for name in ['report.txt', 'summary.txt', 'notes.md', 'data.csv', 'backup.txt']:
Path(f'data/{name}').touch()
# Match all .txt files in data/
txt_files = glob.glob('data/*.txt')
print("Text files:")
for f in sorted(txt_files):
print(f" {f}")
# Match all files in data/ regardless of extension
all_files = glob.glob('data/*')
print(f"\nAll files in data/: {len(all_files)}")
glob.glob returns an unsorted list of matching paths. The order
depends on the filesystem, so it is good practice to call sorted
when a predictable order is required. Paths that begin with a dot are not
matched by * unless the pattern also begins with a dot.
Single-Character Wildcard
The ? wildcard matches exactly one character within a single
directory component. It can appear multiple times in a pattern, and each
occurrence matches one arbitrary character.
import glob
import os
from pathlib import Path
Path('logs').mkdir(exist_ok=True)
for name in ['log1.txt', 'log2.txt', 'log3.txt', 'log10.txt', 'error.txt']:
Path(f'logs/{name}').touch()
# Match files whose name is exactly log?.txt (one digit)
single = glob.glob('logs/log?.txt')
print("Single-digit log files:")
for f in sorted(single):
print(f" {f}")
# Match files whose name is exactly log??.txt (two characters after log)
double = glob.glob('logs/log??.txt')
print("\nTwo-character suffix log files:")
for f in sorted(double):
print(f" {f}")
Because ? matches exactly one character, log?.txt
matches log1.txt but not log10.txt. Use *
when the number of characters is variable, and ? when an exact
length is required.
Character Ranges
Square bracket notation [...] matches any single character listed
inside the brackets. A range such as [a-z] or [0-9]
matches any character in that range. Prefixing with ! negates the
set.
import glob
import os
from pathlib import Path
Path('src').mkdir(exist_ok=True)
for name in ['moduleA.py', 'moduleB.py', 'moduleC.py',
'module1.py', 'module2.py', 'helper.py']:
Path(f'src/{name}').touch()
# Match only modules ending with an uppercase letter
upper = glob.glob('src/module[A-Z].py')
print("Modules with uppercase suffix:")
for f in sorted(upper):
print(f" {f}")
# Match only modules ending with a digit
digit = glob.glob('src/module[0-9].py')
print("\nModules with digit suffix:")
for f in sorted(digit):
print(f" {f}")
# Match anything that is NOT a digit suffix
non_digit = glob.glob('src/module[!0-9].py')
print("\nModules without digit suffix:")
for f in sorted(non_digit):
print(f" {f}")
Character ranges follow standard POSIX shell globbing rules. They are
case-sensitive on Linux and macOS. On Windows, case sensitivity depends on the
case_sensitive parameter introduced in Python 3.12.
Recursive Directory Search
When recursive=True is passed to glob.glob, the
pattern ** matches zero or more directories and subdirectories.
This allows a single pattern to search an entire directory tree.
import glob
import os
# Build a nested directory tree
for path in ['project/src', 'project/src/utils', 'project/tests']:
os.makedirs(path, exist_ok=True)
files = {
'project/src/main.py': '',
'project/src/utils/helpers.py': '',
'project/src/utils/validators.py': '',
'project/tests/test_main.py': '',
'project/README.md': '',
}
for path, content in files.items():
with open(path, 'w') as f:
f.write(content)
# Find all Python files anywhere in the project tree
py_files = glob.glob('project/**/*.py', recursive=True)
print("All Python files:")
for f in sorted(py_files):
print(f" {f}")
# Find all files at any depth
all_files = glob.glob('project/**/*', recursive=True)
all_files = [f for f in all_files if os.path.isfile(f)]
print(f"\nTotal files found: {len(all_files)}")
Without recursive=True, the ** pattern is treated as a
literal two-star wildcard and will not traverse subdirectories. For large
directory trees, consider using glob.iglob with
recursive=True to avoid building the entire list in memory.
Iterator-Based Matching
The glob.iglob function works exactly like glob.glob
but returns an iterator instead of a list. This avoids storing all matching
paths in memory at once, which is beneficial when searching large directory
trees or when only the first few results are needed.
import glob
import os
# Create sample files
os.makedirs('archive', exist_ok=True)
for i in range(1, 6):
with open(f'archive/file{i:03d}.log', 'w') as f:
f.write(f'log entry {i}\n')
# Iterate without building a full list
it = glob.iglob('archive/*.log')
print("Log files (via iterator):")
for path in it:
print(f" {path}")
# Process only the first matching result
first = next(glob.iglob('archive/*.log'), None)
if first:
print(f"\nFirst log file: {first}")
else:
print("\nNo log files found.")
# Count matches without storing them all
count = sum(1 for _ in glob.iglob('archive/*.log'))
print(f"Total log files: {count}")
Because iglob is lazy, it is safe to use even when the number of
matching files is unknown and potentially very large. Once the iterator is
exhausted it cannot be restarted; create a new one with another call to
iglob if you need to iterate again.
Searching in a Specific Root Directory
The root_dir parameter, added in Python 3.10, sets the directory
from which glob starts searching. When supplied, the returned paths are relative
to that root, making it easy to work with portable patterns independently of the
current working directory.
import glob
import os
# Build sample structure
base = '/tmp/myproject'
for sub in ['src', 'tests', 'docs']:
os.makedirs(f'{base}/{sub}', exist_ok=True)
for path in ['src/app.py', 'src/config.py', 'tests/test_app.py', 'docs/index.md']:
with open(f'{base}/{path}', 'w') as f:
f.write('')
# Search relative to root_dir — results are relative paths
py_files = glob.glob('**/*.py', root_dir=base, recursive=True)
print("Python files relative to project root:")
for f in sorted(py_files):
print(f" {f}")
# Combine with root_dir to get full paths
full_paths = [os.path.join(base, f) for f in py_files]
print("\nFull paths:")
for f in sorted(full_paths):
print(f" {f}")
The root_dir parameter does not change the process working
directory; it only affects where the glob search begins. It can be combined with
dir_fd (a file descriptor) when working with low-level OS
interfaces. The companion parameter dironly restricts results to
directories only.
Escaping Special Characters
The glob.escape function escapes all special glob characters
(*, ?, and [) in a string so it is
treated as a literal path component. This is essential when a filename or
directory name contains characters that would otherwise be interpreted as
wildcards.
import glob
import os
# Create files whose names contain glob special characters
os.makedirs('special', exist_ok=True)
tricky_names = ['report[2024].txt', 'data?.csv', 'summary*.md', 'normal.txt']
for name in tricky_names:
with open(f'special/{name}', 'w') as f:
f.write('')
# Without escaping, brackets are interpreted as a character class
unescaped = glob.glob('special/report[2024].txt')
print(f"Unescaped result (may be wrong): {unescaped}")
# With escaping, the literal filename is matched
escaped_name = glob.escape('report[2024].txt')
escaped = glob.glob(f'special/{escaped_name}')
print(f"Escaped result (correct) : {escaped}")
# Build a safe pattern from a user-supplied filename
user_input = 'data?.csv'
safe_pattern = os.path.join('special', glob.escape(user_input))
result = glob.glob(safe_pattern)
print(f"Safe lookup for '{user_input}' : {result}")
Without escaping, report[2024].txt would be treated as a character
class matching report2.txt, report0.txt, and so on.
Always use glob.escape when constructing patterns from
user-supplied or externally sourced strings to avoid unintended matches.
Filtering by Multiple Extensions
The glob module does not support alternation like
{*.py,*.js} directly. The standard approach is to call
glob.glob once per pattern and combine the results using
itertools.chain, or to post-filter a broad match by extension.
import glob
import itertools
import os
from pathlib import Path
Path('webapp').mkdir(exist_ok=True)
for name in ['index.html', 'style.css', 'app.js', 'utils.js',
'main.py', 'config.yaml', 'README.md']:
Path(f'webapp/{name}').touch()
# Approach 1: chain multiple glob calls
extensions = ['*.py', '*.js', '*.html']
matches = list(itertools.chain.from_iterable(
glob.glob(f'webapp/{ext}') for ext in extensions
))
print("Source files (chained globs):")
for f in sorted(matches):
print(f" {f}")
# Approach 2: broad match then filter by suffix
wanted = {'.py', '.js', '.html'}
filtered = [f for f in glob.glob('webapp/*')
if os.path.splitext(f)[1] in wanted]
print("\nSource files (post-filtered):")
for f in sorted(filtered):
print(f" {f}")
Both approaches produce the same result. The chained approach is better when
each pattern is complex; post-filtering is simpler when you only need to check
the file extension. Deduplication with set is recommended when
patterns overlap and the same file could be matched more than once.
Case-Sensitive Matching
Python 3.12 introduced the case_sensitive parameter for
glob.glob and glob.iglob. Setting it to
True forces case-sensitive matching even on Windows, while
False forces case-insensitive matching on Linux and macOS.
import glob
import os
import sys
from pathlib import Path
Path('assets').mkdir(exist_ok=True)
for name in ['Logo.PNG', 'logo.png', 'LOGO.PNG', 'background.jpg']:
Path(f'assets/{name}').touch()
# Default: platform-native behaviour
native = glob.glob('assets/*.png')
print(f"Native (platform default): {sorted(native)}")
# Force case-sensitive (requires Python 3.12+)
if sys.version_info >= (3, 12):
sensitive = glob.glob('assets/*.png', case_sensitive=True)
print(f"Case-sensitive : {sorted(sensitive)}")
insensitive = glob.glob('assets/*.png', case_sensitive=False)
print(f"Case-insensitive : {sorted(insensitive)}")
else:
# Fallback for older Python: filter with str.lower()
all_files = glob.glob('assets/*')
insensitive = [f for f in all_files if f.lower().endswith('.png')]
print(f"Case-insensitive fallback: {sorted(insensitive)}")
On Linux, the default is case-sensitive; on Windows, case-insensitive; on macOS
it depends on the filesystem. The explicit case_sensitive parameter
overrides the platform default, making scripts portable across all operating
systems without needing to branch on the host platform.
Sorting and Deduplicating Results
glob.glob does not guarantee a specific ordering of its results.
When scripts must process files in a deterministic sequence — for example, when
combining numbered log files — it is important to sort the list explicitly.
Similarly, combining multiple glob calls can produce duplicates that should be
removed.
import glob
import os
import re
from pathlib import Path
Path('releases').mkdir(exist_ok=True)
for name in ['v1.0.0.tar.gz', 'v1.2.0.tar.gz', 'v1.10.0.tar.gz',
'v2.0.0.tar.gz', 'v1.9.0.tar.gz']:
Path(f'releases/{name}').touch()
# Lexicographic sort (v1.10 sorts before v1.9 — wrong for versions)
lex_sorted = sorted(glob.glob('releases/*.tar.gz'))
print("Lexicographic order:")
for f in lex_sorted:
print(f" {f}")
# Natural / version-aware sort
def version_key(path):
parts = re.findall(r'\d+', os.path.basename(path))
return tuple(int(p) for p in parts)
nat_sorted = sorted(glob.glob('releases/*.tar.gz'), key=version_key)
print("\nVersion-aware order:")
for f in nat_sorted:
print(f" {f}")
# Deduplicate results from two overlapping patterns
combined = glob.glob('releases/v1*.tar.gz') + glob.glob('releases/v1.0*.tar.gz')
unique = sorted(set(combined))
print(f"\nDeduplicated: {unique}")
Natural sorting is essential for version strings and numbered filenames where
lexicographic order gives incorrect results — for example, v1.10
sorts before v1.9 lexicographically. Using a set to
remove duplicates before further processing is a simple and efficient approach
when patterns overlap.
Source
In this article, we have shown how to use the glob module in
Python for file pattern matching. We covered the *, ?,
and [...] wildcards, recursive search with **,
memory-efficient iteration with iglob, scoped searching with
root_dir, safe pattern construction with glob.escape,
matching multiple extensions, controlling case sensitivity, and sorting results
correctly.
Author
List all Python tutorials.