java - Collections emptyList/singleton/singletonList/List/Set toArray

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

java - Collections emptyList/singleton/singletonList/List/Set toArray

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

Suppose I have this code:

String[] left = { "1", "2" };
String[] leftNew = Collections.emptyList().toArray(left);
System.out.println(Arrays.toString(leftNew));

This will print [null, 2]. This sort of makes sense, since we have an empty list it is somehow suppose to cope with the fact that we are passing an array that is bigger and sets the first element to null. This is probably saying that the first element does not exist in the empty list, thus it set to null.

But this is still confusing, since we pass an array with a certain type only to help infer the type of the returned array; but anyway this is something that has at least a certain logic. But what if I do:

String[] right = { "nonA", "b", "c" };
// or Collections.singletonList("a");
// or a plain List or Set; does not matter
String[] rightNew = Collections.singleton("a").toArray(right);
System.out.println(Arrays.toString(rightNew));

Taking the previous example as a reference, I would expect this one to show:

["a", "b", "c"]

But, a bit un-expected for me, it prints:

[a, null, c]

And, of course, I go to the documentation that explicitly says this is expected:

If this set fits in the specified array with room to spare (i.e., the array has more elements than this set), the element in the array immediately following the end of the set is set to null.

OK, good, this is at least documented. But it later says:

This is useful in determining the length of this set only if the caller knows that this set does not contain any null elements.

This is the part in the documentation that confuses me the most :|

And an even funner example that makes little sense to me:

String[] middle = { "nonZ", "y", "u", "m" };
List<String> list = new ArrayList<>();
list.add("z");
list.add(null);
list.add("z1");
System.out.println(list.size()); // 3

String[] middleNew = list.toArray(middle);
System.out.println(Arrays.toString(middleNew));

This will print:

[z, null, z1, null]

So it clears the last element from the array, but why it would not do that in the first example?

Can someone shed some light here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

272 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:36:27+0000

The <T> T[] toArray(T[] a) method on Collection is weird, because it's trying to fulfill two purposes at once.

First, let's look at toArray(). This takes the elements from the collection and returns them in an Object[]. That is, the component type of the returned array is always Object. That's useful, but it doesn't satisfy a couple other use cases:

1) The caller wants to re-use an existing array, if possible; and

2) The caller wants to specify the component type of the returned array.

Handling case (1) turns out to be a fairly subtle API problem. The caller wants to re-use an array, so it clearly needs to be passed in. Unlike the no-arg toArray() method, which returns an array of the right size, if the caller's array is re-used, we need to a way to return the number of elements copied. OK, let's have an API that looks like this:

int toArray(T[] a)

The caller passes in an array, which is reused, and the return value is the number of elements copied into it. The array doesn't need to be returned, because the caller already has a reference to it. But what if the array is too small? Well, maybe throw an exception. In fact, that's what Vector.copyInto does.

void copyInto?(Object[] anArray)

This is a terrible API. Not only does it not return the number of elements copied, it throws IndexOutOfBoundsException if the destination array is too short. Since Vector is a concurrent collection, the size might change at any time before the call, so the caller cannot guarantee that the destination array is of sufficient size, nor can it know the number of elements copied. The only thing the caller can do is to lock the Vector around the entire sequence:

synchronized (vec) {
    Object[] a = new Object[vec.size()];
    vec.copyInto(a);
}

Ugh!

The Collections.toArray(T[]) API avoids this problem by having different behavior if the destination array is too small. Instead of throwing an exception like Vector.copyInto(), it allocates a new array of the right size. This trades away the array-reuse case for more reliable operation. The problem is now that caller can't tell whether its array was reused or a new one was allocated. Thus, the return value of toArray(T[]) needs to return an array: the argument array, if it was large enough, or the newly allocated array.

But now we have another problem. We no longer have a way to tell the caller the number of elements that were copied from the collection into the array. If the destination array was newly allocated, or the array happens to be exactly the right size, then the length of the array is the number of elements copied. If the destination array is larger than the number of elements copied, the method attempts to communicate to the caller the number of elements copied, by writing a null to the array location one beyond the last element copied from the collection. If it's known that the source collection has no null values, this enables the caller to determine the number of elements copied. After the call, the caller can search for the first null value in the array. If there is one, its position determines the number of elements copied. If there is no null in the array, it knows that the number of elements copied equals the length of the array.

Quite frankly, this is pretty lame. However, given the constraints on the language at the time, I admit I don't have a better alternative.

I don't think I've ever seen any code that reuses arrays or that checks for nulls this way. This is probably a holdover from the early days when memory allocation and garbage collection were expensive, so people wanted to reuse memory as much as possible. More recently, the accepted idiom for using this method has been the second use case described above, that is, to establish the desired component type of the array as follows:

MyType[] a = coll.toArray(new MyType[0]);

(It seems wasteful to allocate a zero-length array for this purpose, but it turns out that this allocation can be optimized away by the JIT compiler, and the obvious alternative toArray(new MyType[coll.size()]) is actually slower. This is because of the need to initialize the array to nulls, and then to fill it in with the collection's contents. See Alexey Shipilev's article on this topic, Arrays of Wisdom of the Ancients.)

However, many people find the zero-length array counterintuitive. In JDK 11, there is a new API that allows one to use an array constructor reference instead:

MyType[] a = coll.toArray(MyType[]::new);

This lets the caller specify the component type of the array, but it lets the collection provide the size information.

Categories

java - Collections emptyList/singleton/singletonList/List/Set toArray

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags