Built-In Sequences
Python provides various built-in sequence types implemented in C, offering a rich set of functionalities through their APIs. These sequences include lists, tuples, strings, and range objects, each with its own unique features and use cases.
Classifications/Types of Sequences
Container x Flat
Container sequences, such as lists, tuples, and collections.deque, can hold items of different types
, including other nested containers. They hold references to the objects they contain
, which may be of any type.
On the other hand, flat sequences, like str, bytes, and array.array, hold items of one simple type and store the value of their contents in their own memory space
, not as separate Python objects. Refer to the image below for more details.
Container x Flat. Image from Fluent Python, 2nd Edition
Mutable x Immutable
Mutable sequences include list, bytearray, array.array, and collections.deque. These sequences can be modified after creation
, allowing you to add, remove, or modify elements.
Immutable sequences include tuple, str, and bytes. Once created, these sequences cannot be changed
, meaning their elements cannot be modified or added.
Mutable sequences inherit all methods from immutable sequences
and also implement several additional methods. Refer to the image below for more details.
Mutable Inherit. Image from Fluent Python, 2nd Edition
Unpacking
Sequence unpacking in Python allows you to extract elements from a sequence without using indexes
. It avoids unnecessary and error-prone index-based extraction. This feature works with any iterable object
, including iterators that don't support index notation (using []). Instead of accessing elements by index, you can assign them directly to variables using unpacking.
Parallel assignment
The most visible form of unpacking, assigning items from an iterable to tuple of variable.
>>> point = (123, 456)
>>> x, y = point
>>> print(f"X: {x}")
>>> print(f"Y: {y}")
X: 123
Y: 456
>>> a = 10
>>> b = 15
>>> b, a = a, b
>>> print(f"A: {a}, B: {b}")
A: 15, B: 10
Using *
Use the *
prefix when calling a function to perform unpacking of elements from a sequence
. It allows you to pass multiple arguments from a sequence as individual arguments
to the function.
>>> data = (20, 8)
>>> quotient, remainder = divmod(*data)
>>> print(quotient, remainder)
2 4
Using *
to grab excess items
>>> a, b, *rest = range(5)
>>> print("REST ITEMS")
>>> print(f"WITH 5 => A: {a}, B: {b}, REST: {rest}")
>>> a, b, *rest = range(3)
>>> print(f"WITH 3 => A: {a}, B: {b}, REST: {rest}")
>>> a, b, *rest = range(2)
>>> print(f"WITH 2 => A: {a}, B: {b}, REST: {rest}")
>>> print()
>>> a, *body, c, d = range(5)
>>> print("MIDDLE ITEMS")
>>> print(f"A: {a}, BODY: {body}, C: {c}, D: {d}")
>>> *head, b, c, d = range(5)
>>> print(f"HEAD: {head}, B: {b}, C: {c}, D: {d}")
>>> print()
>>> print("In Function Calls")
>>> def fun(a, b, c, d, *rest):
... return a, b, c, d, rest
>>> print(fun(*[1, 2], 3, *range(4, 7)))
>>> print()
>>> print("When defining list, tuple or set")
>>> print(*range(4), 4)
>>> print([*range(4), 4])
>>> print({*range(4), 4, *(5, 6, 7)})
REST ITEMS
WITH 5 => A: 0, B: 1, REST: [2, 3, 4]
WITH 3 => A: 0, B: 1, REST: [2]
WITH 2 => A: 0, B: 1, REST: []
MIDDLE ITEMS
A: 0, BODY: [1, 2], C: 3, D: 4
HEAD: [0, 1], B: 2, C: 3, D: 4
In Function Calls
(1, 2, 3, 4, (5, 6))
When defining list, tuple or set
0 1 2 3 4
[0, 1, 2, 3, 4]
{0, 1, 2, 3, 4, 5, 6, 7}
With function's return
You can use unpacking from function returns to allow functions to return multiple values
conveniently. The caller can easily unpack the values into separate variables.
>>> import os
>>> # doesn't work with strings
>>> raw_return = os.path.split('/home/aws_nice_cluster/.ssh/id_rsa.pub')
>>> print(f"raw_return: {raw_return}, type: {type(raw_return)}")
>>> _, filename = raw_return
>>> print(filename)
raw_return: ('/home/aws_nice_cluster/.ssh', 'id_rsa.pub'), type: <class 'tuple'>
id_rsa.pub
Nested Unpacking
>>> metro_areas = [
... ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
... ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
... ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
... ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
... ('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
... ]
>>> print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
>>> for name, _, _, (lat, lon) in metro_areas:
... if lon <= 0:
... print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')
| latitude | longitude
Mexico City | 19.4333 | -99.1333
New York-Newark | 40.8086 | -74.0204
São Paulo | -23.5478 | -46.6358
Pattern Matching
Available in Python 3.10 and above!
In Python's pattern matching, the subject
is the data following the match
keyword, which Python aims to match with patterns in each case clause
. One key improvement of match over switch is destructuring
- a more advanced form of unpacking the subject. A case clause has two parts: a pattern
and an optional guard with the if keyword
.
For subject sequence pattern
matching, the following is necessary:
- The subject is a sequence;
- The subject and the pattern have the same number of items and;
- Each corresponding item matches, including nested items.
>>> def demonstration(self, message_type: list[str]) -> str:
... match message_type: # message_type is the SUBJECT
... case ['AAAA', 'BBB', 'CCC']:
... return 'ABC'
... case ['BBB', 'CCC']:
... return 'BC'
... case ['CCC']:
... return 'C'
... case _:
... return ''
>>> metro_areas = [
... ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
... ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
... ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
... ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
... ('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
... ]
>>> print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
>>> for record in metro_areas:
>>> match record:
>>> case [name, _, _, (lat, lon)] if lon <= 0: # using IF on case clause
>>> print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')
| latitude | longitude
Mexico City | 19.4333 | -99.1333
New York-Newark | 40.8086 | -74.0204
São Paulo | -23.5478 | -46.6358
Special Treatment
- In sequence patterns,
both square brackets and parentheses have the same significance
; -
Cannot match sequences of type
str
,bytes
andbytearray
:- A match subject of those types is
treated as an atomic value
; - To treat as a sequence,
convert it in the match clause
;
- A match subject of those types is
-
The
_
symbol: it matches any single item in that position, but it is never bound to the value to the match item:- Also, the only variable that
can appear more than once
- Also, the only variable that
Examples
First case:
- The first item must be an instance of
str
; - Item 3 must be a
pair of floats
.
Second case:
- Match any subject sequence
starting with a str
- Ending with a
nested sequence of two floats
- The
*_
matches any number of items,without binding them to a variable
; - Using
*extra
instead of *_ would bind the items toextra as a list
with 0 or more items.
Generator Expressions
Generator expressions (gen-expr) are employed to construct sequences
. They save memory by yielding items one by one
via the iterator protocol, unlike listcomps, which builds an entire list before feeding another constructor. Generator expressions share the same syntax as listcomps but use parentheses
instead of brackets.
Variables assigned using the "Walrus operator" := remain accessible after the comprehensions or expressions from which they originate return
, unlike local variables within functions. The scope of the target of := is the enclosing function, unless a global or nonlocal declaration is made for that target.
>>> symbols = '$¢£¥€¤'
>>> order_symbols = tuple(ord(symbol) for symbol in symbols)
>>> print(order_symbols)
>>> string_int = "12345"
>>> raw_gen = (value for value in string_int)
>>> print(type(raw_gen))
>>> print(set(raw_gen))
>>> codes = [last := ord(c) for c in string_int]
>>> print(last)
(36, 162, 163, 165, 8364, 164)
<class 'generator'>
{'2', '4', '1', '5', '3'}
53