-
Notifications
You must be signed in to change notification settings - Fork 1
/
python-iteration-annotated.html
1282 lines (1282 loc) · 36.3 KB
/
python-iteration-annotated.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iterables and Iterators\n",
"### _or_\n",
"## Round and Round the Mulberry Bush"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a Python programmer you may have heard the terms _iterable_ and _iterator_ and wondered\n",
"what the difference between them is.\n",
"I started asking people that question in telephone interviews a while back, thinking\n",
"that it was a good way to discriminate between more- and less-experienced Python users.\n",
"Then I discovered that people I regard as extremely competent Pythonistas still\n",
"had some confusion about the issue, so I wrote this to try and help clear up that confusion.\n",
"\n",
"The shortest way I can think of to describe the essentials of an iterable is \"something you\n",
"can iterate over any number of times,\" whereas an iterator is \"something you can iterate\n",
"over once.\" Many objects in Python are iterables - lists, dicts, strings, and so on.\n",
"But iterables aren't iterators!\n",
"\n",
"The two are closely related: each time you iterate over an iterable, the interpreter\n",
"actually creates a new iterator for the iteration, and loops over that.\n",
"The mechanism is quite simple, and understandig the details helps you write\n",
"better code.\n",
"\n",
"Let's begin by defining a simple function that can be used as a loop body, and\n",
"a sample iterable (all Python containers are iterables - remember this if you\n",
"have to write a container type)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"def do_something_with(o):\n",
" print(f\"--- {o} ---\")\n",
"\n",
"test_list = [\"Roberta\", \"Tom\", \"Alice\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How Iteration Started Out"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It helps to remember that\n",
"\n",
" a = x[s]\n",
"\n",
"is in Python merely a (very welcome and higly comprehensible) shorthand for\n",
"\n",
" a = x.__getitem(s)\n",
"\n",
"Originally (well, certainly in Python 1.5), `for` loop iterations over an object `x` were quite simplistic.\n",
"\n",
"The interpeter would internally initialise a hidden integer variable to zero,\n",
"then repeatedly index `x` using the hidden variable as an index\n",
"(by calling `x`'s `__getitem__` method with the hidden variable as an argument).\n",
"\n",
"The hidden variable was incremented to produce successive values\n",
"until the `__getitem__` call produced an `IndexError` exception,\n",
"causing the loop to terminate normally.\n",
"\n",
"Internally, then, an iteration like\n",
"\n",
" for i in test_list:\n",
" do_something_with(i)\n",
"\n",
"would be handled by something like the C equivalent of the following code.\n",
"(This article is for Python users so it's in Python to explain the logic.\n",
"Python is open source, so if you want to read the C source code of the actual interpreter you can)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- Roberta ---\n",
"--- Tom ---\n",
"--- Alice ---\n"
]
}
],
"source": [
"_private_var = 0\n",
"while True:\n",
" try:\n",
" i = test_list.__getitem__(_private_var)\n",
" except IndexError:\n",
" break\n",
" do_something_with(i)\n",
" _private_var += 1 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This was Python's original _iteration protocol_/.\n",
"\n",
"It was easy to understand, but worked only for objects that could be\n",
"numerically subscripted, making it possible to iterate over tuples,\n",
"lists, and other sequence types.\n",
"\n",
"To iterate over a dictionary, however, required you to extract a list\n",
"of its keys and then iterate over that instead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Writing an Original-Style Iterable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python's emphasis on backwards compatibility means that this protocol is still supported.\n",
"\n",
"You can verify this by writing your own class whose instances obey the old protocol."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Getting item: 0\n",
"--- ---\n",
"Getting item: 1\n",
"--- * ---\n",
"Getting item: 2\n",
"--- ** ---\n",
"Getting item: 3\n",
"--- *** ---\n",
"Getting item: 4\n",
"--- **** ---\n",
"Getting item: 5\n"
]
}
],
"source": [
"class Stars():\n",
" \"Class with only __init__ and __getitem__.\"\n",
" def __init__(self, N):\n",
" self.N = N\n",
" def __getitem__(self, index):\n",
" print(\"Getting item:\", index)\n",
" if index > self.N:\n",
" raise IndexError\n",
" return \"*\" * index\n",
"\n",
"s = Stars(4)\n",
"\n",
"for v in s:\n",
" do_something_with(v)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, even a modern interpreter is perfectly happy to iterate over your `Stars` instances,\n",
"old-fashioned or not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enter the Iterable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To overcome the limitations of this old protocol, and specifically to allow iteration\n",
"over objects that can't be numerically indexed, a newer protocol was\n",
"introduced and objects that obeyed it were classified as _iterable_.\n",
"\n",
"To use the new-style object (_i.e._ to be an an iterable) an object _must_ have an `__iter__` method.\n",
"\n",
"As the next cell shows, lists have been updated to use the new protocol."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hasattr([], '__iter__')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A natural question is \"what type of object does the `__iter__` call return?\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<list_iterator at 0x110f984e0>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = [3, 1, 2].__iter__()\n",
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the newer-style iterables the `__iter__` method returns an _iterator_\n",
"for use by this particular iteration. To execute the code\n",
"\n",
" for i in test_list: # or some other iterable\n",
" do_something_with(i)\n",
"\n",
"the interpreter uses the iterator in the following way."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- Roberta ---\n",
"--- Tom ---\n",
"--- Alice ---\n"
]
}
],
"source": [
"_iterator = test_list.__iter__()\n",
"try:\n",
" while True:\n",
" i = _iterator.__next__()\n",
" do_something_with(i)\n",
"except StopIteration:\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Originally, the interpreter made repeated calls to `__getitem__` until `IndexError` was raised,\n",
"In the new protocol, the interpreter instead repeatedly calls an iterator's `__next__` method\n",
"(in Python 2, its `next` method) until `StopIteration` is raised.\n",
"\n",
"This removes any reliance on numerical subscripting, allowing dictionaries\n",
"and sets to become iterable (which they duly did in Python 2.4, if memory\n",
"serves correctly)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the object has no `__iter__` method, the interpreter\n",
"simply falls back to the old protocol.\n",
"That's why the `Stars` class above (for which `__iter__` is not implemented),\n",
"functions as expected.\n",
"\n",
"If there's no `__getitem__` method either, the interpreter just raises a TypeError exception,\n",
"on the not unreasonable grounds that there's no way to iterate over the given value."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False False\n"
]
}
],
"source": [
"print(hasattr(None, \"__iter__\"), hasattr(None, \"__getitem__\"))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'NoneType' object is not iterable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-15-0b97d9158d64>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdo_something_with\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: 'NoneType' object is not iterable"
]
}
],
"source": [
"for i in None:\n",
" do_something_with(i)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each time through the loop, the interpreter extracts the next value from the iterator\n",
"by calling its `__next__` method (Python 2 contained a design\n",
"flaw and the method is called `next`, failing to denote it as a special method.\n",
"It was renamed in Python 3).\n",
"In the case above, the results of the `__next__` call are\n",
"successively bound to `i`, until `__next__` raises a `StopIteration` exception,\n",
"which is used to terminate the loop normally - the exception is caught internally by\n",
"the interpreter's `for` implementation, and not passed to the user's code.\n",
"\n",
"What actually happens is the equivalent of the following code, although no\n",
"new variable is introduced into the Python namespace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"_ = test_list.__iter__() # creates an iterator\n",
"while True:\n",
" try:\n",
" i = _.__next__() # Python 2: _.next()\n",
" except StopIteration: # iterator is exhausted\n",
" break\n",
" do_something_with(i)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The easiest way to determine if the new iteration protocol will work on an object is simply\n",
"to see whether it has an `__iter__` method. If it does, then it's an iterable.\n",
"Lists are iterables, for example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hasattr(test_list, \"__iter__\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So what kind of an object does a call to that method return?\n",
"A specific kind of iterator called a _list iterator_."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"li = test_list.__iter__()\n",
"print(li, type(li))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recognizing Iterators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The list iterator object's attributes verify that it is indeed an iterator,\n",
"which is simply to say that it provides both the `__iter__` and `__next__`\n",
"(Python 2, remember, `next`) methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(hasattr(li, '__iter__'), hasattr(li, '__next__'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the list itself, while it _is_ an iterable\n",
"(_i.e._ it implements the `__iter__` method, which returns an iterator),\n",
"is not itself an iterator because it has no `__next__` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(hasattr(test_list, '__iter__'), hasattr(test_list, '__next__'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When iterating over an iterable the interpreter calls the iterable's `__iter__()` method,\n",
"which returns an _iterator_ whose `__next__` method returns the iterator's successive\n",
"values and raises `StopIteration` when there are no more values.\n",
"\n",
"Note carefully that the iterator is not the same object as the iterable,\n",
"and that each call to an iterable's `__iter__` method creates a brand-new iterator."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterating over Iterators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using iterators rather than iterables in nested `for`s gives odd results:\n",
"the inner loop exhausts `iterator_2` the first time through the outer loop,\n",
"so further attempts to extract a value from it cause immediate termination of the inner loop."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iterator_1 = iter(test_list) # same as test_list.__iter__()\n",
"iterator_2 = iter(test_list)\n",
"print(id(test_list), id(iterator_1), id(iterator_2), sep=\"\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you loop over an iterable, the call to the iterable's `__iter__()` method is made\n",
"automatically.\n",
"Because each call to an iterable's `__iter__` creates a new iterator, nested looping works as we\n",
"expect it to."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in test_list:\n",
" for j in test_list:\n",
" do_something_with(i + \":\" + j)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using Iterators in the iteration gives us a rather different result, however.\n",
"This is because the first iteration of the outer loop exhausts the inner iterator,\n",
"which thereafter refuses to provide further values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iterator_1 = iter(test_list)\n",
"iterator_2 = iter(test_list)\n",
"for i in iterator_1:\n",
" print(\"outer loop\")\n",
" for j in iterator_2:\n",
" print(\"inner loop\")\n",
" do_something_with(i + \":\" + j)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the same iterator in both loops gives even odder results.\n",
"In this case, `iterator_1` is partially exhausted even the first time through the\n",
"inner loop, which then exhausts it completely. The outer loop then terminates after only\n",
"a single iteration becausse the iterator is exhausted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iterator_1 = iter(test_list)\n",
"for i in iterator_1:\n",
" print(\"outer loop\")\n",
" for j in iterator_1:\n",
" print(\"inner loop\")\n",
" do_something_with(i + \":\" + j)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why do these looping constructs fail to perform as you might expect?\n",
"Whereas the iterable's `__iter__()` returns a new iterator on each call,\n",
"an iterator's `__iter__()` method simply returns `self`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"it_3 = iter(test_list)\n",
"print(id(test_list),\n",
" id(it_3),\n",
" id(iter(it_3)),\n",
" id(it_3) is id(iter(it_3)), sep=\"\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Why We Need Iterables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Technically, iterators _are_ iterables because they too have\n",
"an `__iter__` method.\n",
"Most iterators' `__iter__` methods, however, don't create a new object,\n",
"but simply return the iterator itself, which can lead to some unexpected\n",
"consequences."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use the built-in `iter` function which, when called with a single argument\\*,\n",
"is equivalent to calling that argument's `__iter__` method.\n",
"Knowing equivalences like this is the start of understanding how the interpreter works."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember that calling the `iter` function (with a single argument) is\n",
"equivalent to calling the argument's `__iter_` method."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `iter(o) == o.__iter__()`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you have created an iterator, remember, you can always use the `next` function to\n",
"extract a value from the iteration sequence.\n",
"Similarly to `iter` the `next` function simply calls its argument's `__next__` method.\n",
"Which is to say"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `next(o) == o.__next__()` \n",
"or, in Python 2:\n",
"### `next(o) == o.next()`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is completely in line with Python's usual practice of providing utility functions to all standard methods,\n",
"allowing you to provide tailored responses in your own objects by writing the methods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterators Aren't a Silver Bullet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's possible to use iterators directly using explicit calls to `next`\n",
"rather than using the interpreter's `for`-handling logic.\n",
"But it's important to be aware that this can raise a `StopIteration` exception\n",
"that won't be handled by the iteration logic, because the exception\n",
"doesn't occur during the operation of the `for` logic but inside the loop body."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"it_4 = iter([\"one\", \"two\", \"three\", \"four\"])\n",
"it_5 = iter([\"one\", \"two\", \"three\", \"four\", \"five\"])\n",
"for iterator in it_4, it_5:\n",
" print(\"++ New iterator ++\")\n",
" for item_1 in iterator:\n",
" item_2 = next(iterator)\n",
" do_something_with(item_1+\":\"+item_2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Writing Your Own Iterators and Iterables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We've observed that iterators have both an `__iter__` and a `__next__` method,\n",
"while iterables only have the former.\n",
"This is why iterables aren't iterators.\n",
"It also allows us to write functions to identify both types of object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def is_iterable(o):\n",
" \"Return True if o is an iterable.\"\n",
" return hasattr(o, \"__iter__\") and not hasattr(o, \"__next__\")\n",
"\n",
"def is_iterator(o):\n",
" \"Return True if o is an iterator.\"\n",
" return hasattr(o, \"__iter__\") and hasattr(o, \"__next__\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_it = iter(test_list)\n",
"is_iterable(test_list), is_iterator(test_list), is_iterable(test_it), is_iterator(test_it)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lets us investigate various object types to determine their iteration properties.\n",
"As you can see, though iterators are iterable, iterables aren't iterators.\n",
"Now you understand the iteration protocols, you might be wondering how to write your\n",
"own iterables and iterators.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Basic Iterator Pattern"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's begin with iterators: despite the slightly more complex interface it's easier\n",
"to start there, because the objects you produce only have to handle a single\n",
"iteration, whereas iterables have to handle multiple, possibly simultaneous iterations.\n",
"\n",
"The first example is not intended to be complex: iterate over a string, producing each character N times.\n",
"I chose this example because it isn't particularly convenient to write: essentially\n",
"a nested loop is required, but the \"position in the loop\" has to be maintained in\n",
"insstance variables between calls to `__next__`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class MyIterator:\n",
" \"An iterator to produce each character of a string N times.\"\n",
" def __init__(self, s, N):\n",
" self.s = s\n",
" self.N = N\n",
" self.pos = self.count = 0\n",
" def __iter__(self):\n",
" return self\n",
" def __next__(self):\n",
" if self.pos == len(self.s):\n",
" raise StopIteration\n",
" result = self.s[self.pos]\n",
" self.count += 1\n",
" if self.count == self.N:\n",
" self.pos += 1\n",
" self.count = 0\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `__iter__` method seems so simple you might wonder why it's there at all.\n",
"The answer is obvious once you understand the iteration protocol: to make it iterable.\n",
"\n",
"If iterators weren't iterable there'd be no way to iterate over them using the `for`\n",
"construct, since the first thing the interpreter does to iterate over an iterable\n",
"is to generate an iterator from it, by calling its `__iter__` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"for s in MyIterator(\"abc\", 2):\n",
" do_something_with(s)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you might expect, if you use these iterators in a nested loop you\n",
"get the usual funky results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"it_6 = MyIterator(\"*+\", 3)\n",
"it_7 = MyIterator(\"=-\", 3)\n",
"for c1 in it_6:\n",
" print(\"iterating over c1:\", c1)\n",
" for c2 in it_7:\n",
" do_something_with(c1+\":\"+c2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In that case we just used the two iterators in nested loops.\n",
"The great thing about iterators is that they allow you to compute\n",
"the next value in a sequence _when it's required_, rather than having\n",
"not only to precompute all the values in advance _and_ create a container\n",
"to store them all in.\n",
"\n",
"That's why the new iteration protocol was such a win for Python.\n",
"Any function that takes an iterable argument can be passed either\n",
"an iterable _or_ an iterator.\n",
"Pretty much any function that iterates over a container can be part of\n",
"a program structure where values are computed on demand."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Advantages of Generators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's often inconvenient to write a class to create iterators.\n",
"The next snippet creates a so-called _generator function_, because its body\n",
"contains at least one `yield` expression."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def rangedown(n):\n",
" for i in reversed(range(n)):\n",
" yield i"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A call to the generator function returns a generator.\n",
"You'll notice that this is an iterator, but not an iterable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"generator = rangedown(5)\n",
"\n",
"type(generator), is_iterable(generator), is_iterator(generator)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generator functions are often a more convenient way to create iterators\n",
"than class definitions, since the iteration protocol conformance is built-in.\n",
"Just call the function and it returns a generator.\n",
"Since generators are a special case of iterators,\n",
"the job is done without any need for a class definition."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for x in generator:\n",
" print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next cell creates a _generator expression_, another object you may not\n",
"be familiar with.\n",
"Its value is also a generator, and therefore an iterator too.\n",
"Generator expressions are an even more powerful way of producing iterators."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"genexp = (i*2 for i in range(5))\n",
"type(rangedown), type(generator), type(genexp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So we can iterate (once) over `genexp` just like any other iterator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for o in genexp:\n",
" print(o)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Basic Iterable Pattern"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So the next question is, how do we produce an _iterable_?\n",
"The answer is to have its `__iter__` method produce a new iterator each time it's called.\n",
"Suppose you wanted to create _multi-iterable strings_,\n",
"a type of object I have just invented, which behave pretty much like strings except\n",
"that when you iterate over them they produce each character a number of times?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"class MIString(str):\n",
" def __new__(cls, value, N):\n",
" return str.__new__(cls, value)\n",