skip to content
Honwhy Wang

学习Java Collection Framework的Iterator实现

/ 13 min read

继续研读JDK的源码,在比较HashMapConcurrentHashMap的不同之处发现了一个细节——关于Iterator的实现的不同,其实HashMapConcurrentHashMap还有更多不同的地方,这也是面试经常问到的问题,有一篇文章我觉得讲的很好了,Java进阶(六)从ConcurrentHashMap的演进看Java多线程核心技术。 Iterator是一种设计模式,在Java Collection Framework中经常作为容器的视图(view),大多数时候只支持删除、不支持增加,提供统一的接口方法等特点。在Java Collection FrameworkIterator实现中大多数是fast-fail方式的,而支持并发的容器数据结构则没有这个限制。

非并发数据结构的情况

常见的使用方法

1)使用Iterator遍历字符串列表

List<String> lists = Arrays.asList("a","b","c");
Iterator<String> iterator = lists.iterator();
while (iterator.hasNext()) {
String val = iterator.next();
System.out.println(val);
}

这种做法是for..each的语法的展开形式

for(String val: lists){
//sout
}

2)使用Iterator遍历LinkedList

LinkedList<String> linkedList = new LinkedList<>(lists);
iterator = linkedList.iterator();
while (iterator.hasNext()) {
String val = iterator.next();
System.out.println(val);
}

3) 使用Iterator遍历HashMap

Map<String,Integer> hmap = new HashMap<>(3);
hmap.put("a",1);
hmap.put("b",2);
hmap.put("c",3);
Iterator<Map.Entry<String,Integer>> mapIterator = hmap.entrySet().iterator();
while (mapIterator.hasNext()) {
Map.Entry<String,Integer> entry = mapIterator.next();
System.out.println(entry.getKey() + ":" + entry.getValue());
}

非并发数据结构Iterator的实现

1)ArrayList中的Iterator

list中的结构是顺序的,Iterator既然是List的视图,那它也表现了相同的顺序。 ArrayList获得Iterator,

/**
* Returns an iterator over the elements in this list in proper sequence.
*
* <p>The returned iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
*
* @return an iterator over the elements in this list in proper sequence
*/
public Iterator<E> iterator() {
return new Itr();
}

源码,

/**
* An optimized version of AbstractList.Itr
*/
private class Itr implements Iterator<E> {
int cursor; // index of next element to return
int lastRet = -1; // index of last element returned; -1 if no such
int expectedModCount = modCount;
public boolean hasNext() {
return cursor != size;
}
@SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
public void remove() {
if (lastRet < 0)
throw new IllegalStateException();
checkForComodification();
try {
ArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
}

ItrArrayList的一个内部类,它能使用宿主类的成员变量,事实上Itr反映了ArrayList的内部情况,使用了sizeexpectedModCountelementData等属性。通过游标cursor的方式不断往前递进,只要游标小于size就说明依然还有元素可以访问。 应该看到的是,在调用了new Iterator()之后,可以看做ItrArrayList做了快照,这里的快照并不是很严格,是基于modCount比较来实现的。它在初始化时备份了modCount的值,保存为私有的变量expectedModCount

首先Iterator接口并没有诸如add的方法,即不能通过Iterator来为容器增加元素; 其次,如果有其他线程变化了容器的结构(structural modification),那么ArrayList.this.modCount的值会发生改变,那么在Itr执行next或者remove方法时会判断出来modCount != expectedModCount的情况,从而抛出异常fast-fail。 再次,如果执行了Itr的remove方法,它能够调用ArrayList.this.remove的方法,然后修正游标和expectedModCount等。

ArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;

2)LinkedList中的Iterator

LinkedListIteratorArrayList中的有一些类似的地方。 首先,LinkedList的iterator入口方法其实是AbstractSequentialList抽象类中,

/**
* Returns an iterator over the elements in this list (in proper
* sequence).<p>
*
* This implementation merely returns a list iterator over the list.
*
* @return an iterator over the elements in this list (in proper sequence)
*/
public Iterator<E> iterator() {
return listIterator();
}
/**
* Returns a list iterator over the elements in this list (in proper
* sequence).
*
* @param index index of first element to be returned from the list
* iterator (by a call to the <code>next</code> method)
* @return a list iterator over the elements in this list (in proper
* sequence)
* @throws IndexOutOfBoundsException {@inheritDoc}
*/
public abstract ListIterator<E> listIterator(int index);

而这个ListIterator是一个接口,它被LinkedList$ListItr实现,

private class ListItr implements ListIterator<E> {
private Node<E> lastReturned = null;
private Node<E> next;
private int nextIndex;
private int expectedModCount = modCount;
ListItr(int index) {
// assert isPositionIndex(index);
next = (index == size) ? null : node(index);
nextIndex = index;
}
public boolean hasNext() {
return nextIndex < size;
}
public E next() {
checkForComodification();
if (!hasNext())
throw new NoSuchElementException();
lastReturned = next;
next = next.next;
nextIndex++;
return lastReturned.item;
}
public boolean hasPrevious() {
return nextIndex > 0;
}
public E previous() {
checkForComodification();
if (!hasPrevious())
throw new NoSuchElementException();
lastReturned = next = (next == null) ? last : next.prev;
nextIndex--;
return lastReturned.item;
}
public int nextIndex() {
return nextIndex;
}
public int previousIndex() {
return nextIndex - 1;
}
public void remove() {
checkForComodification();
if (lastReturned == null)
throw new IllegalStateException();
Node<E> lastNext = lastReturned.next;
unlink(lastReturned);
if (next == lastReturned)
next = lastNext;
else
nextIndex--;
lastReturned = null;
expectedModCount++;
}
public void set(E e) {
if (lastReturned == null)
throw new IllegalStateException();
checkForComodification();
lastReturned.item = e;
}
public void add(E e) {
checkForComodification();
lastReturned = null;
if (next == null)
linkLast(e);
else
linkBefore(e, next);
nextIndex++;
expectedModCount++;
}
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
}

LinkedListIterator要比ArrayList中的复杂一些,它更支持了add等方法; 类似原来游标的遍历方式,基于sizeexpectedModCount等比较逻辑依然存在,只不过遍历的方式不是原来的下标增进,而是节点之间的next指针来实现。

3)HashMap中的Iterator

HashMap有多个view视图,keySet, values, entrySet,这里分析下entrySet这个视图,另外两个原理和entrySet视图的差不多。

private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
public Iterator<Map.Entry<K,V>> iterator() {
return newEntryIterator();
}
public boolean contains(Object o) {
if (!(o instanceof Map.Entry))
return false;
Map.Entry<K,V> e = (Map.Entry<K,V>) o;
Entry<K,V> candidate = getEntry(e.getKey());
return candidate != null && candidate.equals(e);
}
public boolean remove(Object o) {
return removeMapping(o) != null;
}
public int size() {
return size;
}
public void clear() {
HashMap.this.clear();
}
}

EntrySet的iterator方法中调用了newEntryIterator,将构造EntryIterator实例, EntryIterator源码

private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
public Map.Entry<K,V> next() {
return nextEntry();
}
}

EntryIterator继承了HashIterator类,复用了父类的大部分方法,只是覆盖了next方法。 HashIterator源码,

private abstract class HashIterator<E> implements Iterator<E> {
Entry<K,V> next; // next entry to return
int expectedModCount; // For fast-fail
int index; // current slot
Entry<K,V> current; // current entry
HashIterator() {
expectedModCount = modCount;
if (size > 0) { // advance to first entry
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null)
;
}
}
public final boolean hasNext() {
return next != null;
}
final Entry<K,V> nextEntry() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
Entry<K,V> e = next;
if (e == null)
throw new NoSuchElementException();
if ((next = e.next) == null) {
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null)
;
}
current = e;
return e;
}
public void remove() {
if (current == null)
throw new IllegalStateException();
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
Object k = current.key;
current = null;
HashMap.this.removeEntryForKey(k);
expectedModCount = modCount;
}
}

由于HashMap的结构并不是顺序的,在执行Iterator.next方法时不能通过next指针或下标的方式直接找到下一个元素,HashIterator为了能达到这个目的,在构造函数和nextEntry方法中预先做了advance处理。

//构造函数中
if (size > 0) { // advance to first entry
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null)
;
}
//nextEntry中
if ((next = e.next) == null) {
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null)
;
}

构造函数中预先在HashMap的table数组找到第一个头结点不为null的元素; (next = t[index++]) == null的写法有点迷惑性,不考虑HashMap为空的情况,index自增停在next != null的情况,即 next = t[index-1], index已经往前一步了;

在nextEntry中如果发现e.next是null,此时表示table这个数组元素的链表遍历结束了,需要跳到下一个头节点不为空的元素继续遍历,而index刚好往前一步了,此时继续执行

next = t[index++]

假设next[index]不为空,那么下一个遍历的数组元素头节点找到,并且index已经自增了。

并发数据结构的情况

ConcurrentHashMap为例,看ConcurrentHashMap$HashInteraotr的实现

abstract class HashIterator {
int nextSegmentIndex;
int nextTableIndex;
HashEntry<K,V>[] currentTable;
HashEntry<K, V> nextEntry;
HashEntry<K, V> lastReturned;
HashIterator() {
nextSegmentIndex = segments.length - 1;
nextTableIndex = -1;
advance();
}
/**
* Set nextEntry to first node of next non-empty table
* (in backwards order, to simplify checks).
*/
final void advance() {
for (;;) {
if (nextTableIndex >= 0) {
if ((nextEntry = entryAt(currentTable,
nextTableIndex--)) != null)
break;
}
else if (nextSegmentIndex >= 0) {
Segment<K,V> seg = segmentAt(segments, nextSegmentIndex--);
if (seg != null && (currentTable = seg.table) != null)
nextTableIndex = currentTable.length - 1;
}
else
break;
}
}
final HashEntry<K,V> nextEntry() {
HashEntry<K,V> e = nextEntry;
if (e == null)
throw new NoSuchElementException();
lastReturned = e; // cannot assign until after null check
if ((nextEntry = e.next) == null)
advance();
return e;
}
public final boolean hasNext() { return nextEntry != null; }
public final boolean hasMoreElements() { return nextEntry != null; }
public final void remove() {
if (lastReturned == null)
throw new IllegalStateException();
ConcurrentHashMap.this.remove(lastReturned.key);
lastReturned = null;
}
}

这里能看到ConcurrentHashMap的segment分段因素所在,在构造函数中指定了最后一个segment数组元素,然后做advance处理,也是从后往前处理的。首先找到不为null的分段segment,然后才是在segment的table数组中找到不为null的元素,这都是从后往前“前进”的。

而与HashMap不同的地方,ConcurrentHashMap的Iterator并不是fast-fail的,它并没有判断modCount;除此之外还应该看到它对nextEntry的处理,在advance的方法调用以下两个方法,

/**
* Gets the jth element of given segment array (if nonnull) with
* volatile element access semantics via Unsafe. (The null check
* can trigger harmlessly only during deserialization.) Note:
* because each element of segments array is set only once (using
* fully ordered writes), some performance-sensitive methods rely
* on this method only as a recheck upon null reads.
*/
@SuppressWarnings("unchecked")
static final <K,V> Segment<K,V> segmentAt(Segment<K,V>[] ss, int j) {
long u = (j << SSHIFT) + SBASE;
return ss == null ? null :
(Segment<K,V>) UNSAFE.getObjectVolatile(ss, u);
}
/**
* Gets the ith element of given table (if nonnull) with volatile
* read semantics. Note: This is manually integrated into a few
* performance-sensitive methods to reduce call overhead.
*/
@SuppressWarnings("unchecked")
static final <K,V> HashEntry<K,V> entryAt(HashEntry<K,V>[] tab, int i) {
return (tab == null) ? null :
(HashEntry<K,V>) UNSAFE.getObjectVolatile
(tab, ((long)i << TSHIFT) + TBASE);
}

它们都是调用了UNSAFE.getObjectVolatile方法,利用了volatile access的方式,相较于上锁的方式性能更好。

番外篇

JavaScript实现的Iterator的例子

这个例子来自MDN的文档,做法比较简洁,迭代器

function makeIterator(array){
var nextIndex = 0;
return {
next: function(){
return nextIndex < array.length ?
{value: array[nextIndex++], done: false} :
{done: true};
}
};
}
var it = makeIterator(['yo', 'ya']);
console.log(it.next().value); // 'yo'
console.log(it.next().value); // 'ya'
console.log(it.next().done); // true

可以考虑给这个makeIterator的返回值加上hasNext属性,

return {
next: ...,
hasNext: function() {
return nextIndex < array.length;
}
}

JavaScript利用了闭包实现了Iterator和Java利用内部类实现有相似的地方。

总结

Iterator的主要目的还是为了表现底层数据结构的所有元素,提供一种统一的遍历方式。在不同的数据结构需要针对不同语义做出改动,像LinkedList的支持add方法,像ConcurrentHashMapHashMapadvance处理,像ConcurrentHashMap那样不判断modeCount而使用volatile access等。