Skip to content

Iterators

Iteration is a fundamental aspect of data processing in which programs apply computations to data series. When dealing with data that doesn't fit into memory, the need arises to fetch items lazily—one at a time and on demand. This is precisely the role of an iterator.

In Python, every standard collection is iterable. An iterable is an object that offers an iterator, a mechanism Python utilizes to facilitate operations such as:

  1. for loops
  2. List, dict, and set comprehensions
  3. Unpacking assignments
  4. Construction of collection instances

The iter Function

When Python needs to iterate over an object, it automatically invokes iter(object). The built-in iter function examines whether the object implements __iter__ and calls it to acquire an iterator. In cases where __iter__ is not implemented but __getitem__ is present, iter() generates an iterator attempting to retrieve items by index, starting from 0. If this attempt fails, Python raises a TypeError, typically indicating that the object is not iterable.

This is the reason why all Python sequences are iterable; by definition, they all implement __getitem__. Notably, the standard sequences also implement __iter__, and it is advisable for your custom sequences to do the same. This is because iteration via __getitem__ is maintained for backward compatibility but may be deprecated in the future.

isinstance check and goose-typing

In the goose-typing approach, defining an iterable is more straightforward but less flexible: an object is deemed iterable if it implements the __iter__ method.

Check issubclass and isinstance

src/design_patterns/iterators/isisntancecheck.py
class Test:
    def __iter__(self):
        pass    

from collections import abc

print(f"Class Test is an Iterable subclass? {issubclass(Test, abc.Iterable)}")
test_var = Test()
print(f"test_var instance Test is an Iterable instance? {isinstance(test_var, abc.Iterable)}")
output
Class Test is an Iterable subclass? True
test_var instance Test is an Iterable instance? True

Iterables vs Iterators

Iterable

An iterable refers to any object from which the iter built-in function can obtain an iterator. Objects that implement an __iter__ method returning an iterator are considered iterable. Sequences, by definition, are always iterable. Objects implementing a __getitem__ method that accepts 0-based indexes are also iterable.

Relationship Between Iterables and Iterators

Python acquires iterators from iterables.

Python’s standard interface for an iterator

Two methods: 1. __next__: Returns the next item in the series, raising StopIteration if there are no more. 1. __iter__: Returns self and this enables iterators to be used where an iterable is expected, such as in a for loop.

The StopIteration exception indicates that the iterator is exhausted. Internally, this exception is managed by the iter() built-in, which is integral to the logic of for loops and other iteration contexts like list comprehensions and iterable unpacking.

This interface is formalized in the collections.abc.Iterator ABC (Abstract Base Class), which declares the abstract __next__ method and subclasses Iterable—where the abstract __iter__ method is declared.

Example

The Iterable and Iterator ABCs. Methods in italic are abstract. from Fluent Python, 2nd Edition

The Iterable and Iterator ABCs. Methods in italic are abstract. from Fluent Python, 2nd Edition

Due to the minimal methods required for an iterator (__next__ and __iter__), checking for remaining items involves calling next() and catching StopIteration. Additionally, it is not possible to reset an iterator. If you need to start over, you must call iter() on the iterable that created the iterator initially. This minimal interface is sensible, as not all iterators are resettable. For instance, if an iterator is reading packets from the network, there's no way to rewind it.

Avoid Making the Iterable an Iterator for Itself

A common mistake when creating iterables and iterators is mixing up the two. Iterables have an __iter__ method that crafts a new iterator every time it's called. Iterators, on the flip side, implement a __next__ method to provide individual items and an __iter__ method that returns self.

Tip

In essence, iterators are naturally iterable, but iterables are not iterators.

While it might seem like a good idea to give the both __next__ and __iter__ methods to a class, turning each instance into an iterable and an iterator at the same time, it's generally not a wise move.

Following the Iterator pattern, it's important that we can get multiple independent iterators from the same iterable. Each iterator should maintain its own internal state, meaning a proper implementation should create a new, independent iterator every time iter(my_iterable) is called.

Generators

A Python function becomes a generator function simply by having the yield keyword in its body. When this function is called, it returns a generator object, essentially making it a generator factory. The primary distinction between a regular function and a generator function lies in the presence of the yield keyword.

A generator function constructs a generator object that encapsulates the function's body. Upon invoking next() on the generator object, execution progresses to the next yield in the function body. The next() call yields the value when the function body is paused. Eventually, when the function body completes, the enclosing generator object, as per the Iterator protocol, raises StopIteration.

Example

src/design_patterns/iterators/generator_example.py
def test_gen():
    print("Start test gen")
    yield 1
    print("After yield 1")
    yield 2
    print("Finish test gen")

for t in test_gen():
    print(f"---> {t}")
output
Start test gen
---> 1
After yield 1
---> 2
Finish test gen

Using with Iterator

src/design_patterns/iterators/using_generators.py
import re

RE_WORD = re.compile(r'\w+')

class Sentence:
    def __init__(self, text: str):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        for word in self.words:
            yield word

sentence = Sentence("Corinthians Lakers Liverpool")

for word in sentence:
    print(word)
output
Corinthians
Lakers
Liverpool

Iterator vs Generator

Iterator: A general term for any object that incorporates a __next__ method. Iterators are crafted to produce data consumed by client code—this could be the iterator through a for loop, another iterative feature, or explicitly calling next(it) on the iterator. In the realm of Python, most iterators we encounter are, in fact, generators.

Generator: A type of iterator generated by the Python compiler. Unlike implementing __next__, creating a generator involves using the yield keyword to form a generator function, essentially a generator object factory. Another method to construct a generator object is through a generator expression. These generator objects provide __next__, rendering them iterators as well.

Generator Functions in the Standard Library

See Itertool Module examples

Examples of Generators

Using with Iterator

src/design_patterns/iterators/generator_standar_lib.py
import itertools

int_gen = itertools.count(1, .5)

print(f"1º CALL: {next(int_gen)}")
print(f"2º CALL: {next(int_gen)}")
print(f"3º CALL: {next(int_gen)}")
print(f"4º CALL: {next(int_gen)}")
output
 CALL: 1
 CALL: 1.5
 CALL: 2.0
 CALL: 2.5