Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I've got two methods to read in a string, and create Character objects:

static void newChar(String string) {
    int len = string.length();
    System.out.println("Reading " + len + " characters");
    for (int i = 0; i < len; i++) {
        Character cur = new Character(string.charAt(i));

    }       
}

and

static void justChar(String string) {
    int len = string.length();
    for (int i = 0; i < len; i++) {
        Character cur = string.charAt(i);

    }
}

When I run the methods using an 18,554,760 character string, I'm getting wildly different run times. The output I'm getting is:

newChar took: 20 ms
justChar took: 41 ms

With smaller input (4,638,690 characters) the time isn't as varied.

newChar took: 12 ms
justChar took: 13 ms

Why is new so much more efficient in this case?

EDIT:

My benchmark code is pretty hacky.

start = System.currentTimeMillis();
newChar(largeString);
end = System.currentTimeMillis();
diff = end-start;
System.out.println("New char took: " + diff + " ms");

start = System.currentTimeMillis();
justChar(largeString);
end = System.currentTimeMillis();
diff = end-start;
System.out.println("just char took: " + diff+ " ms");
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
187 views
Welcome To Ask or Share your Answers For Others

1 Answer

Well, I'm not sure if Marko was intentional in replicating the original mistake. TL;DR; new instance is not used, gets eliminated. Adjusting the benchmark reverses the result. Don't trust faulty benchmarks, learn from them.

Here's the JMH benchmark:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
@State(Scope.Thread)
public class Chars {

    // Source needs to be @State field to avoid constant optimizations
    // on sources. Results need to be sinked into the Blackhole to
    // avoid dead-code elimination
    private String string;

    @Setup
    public void setup() {
        string = "12345678901234567890";
        for (int i = 0; i < 10; i++) {
            string += string;
        }
    }

    @GenerateMicroBenchmark
    public void newChar_DCE(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = new Character(string.charAt(i));
        }
    }

    @GenerateMicroBenchmark
    public void justChar_DCE(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = Character.valueOf(string.charAt(i));
        }
    }

    @GenerateMicroBenchmark
    public void newChar(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = new Character(string.charAt(i));
            bh.consume(c);
        }
    }

    @GenerateMicroBenchmark
    public void justChar(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            Character c = Character.valueOf(string.charAt(i));
            bh.consume(c);
        }
    }

    @GenerateMicroBenchmark
    public void newChar_prim(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            char c = new Character(string.charAt(i));
            bh.consume(c);
        }
    }

    @GenerateMicroBenchmark
    public void justChar_prim(BlackHole bh) {
        int len = string.length();
        for (int i = 0; i < len; i++) {
            char c = Character.valueOf(string.charAt(i));
            bh.consume(c);
        }
    }
}

...and this is the result:

Benchmark                   Mode   Samples         Mean   Mean error    Units
o.s.Chars.justChar          avgt         9       93.051        0.365    us/op
o.s.Chars.justChar_DCE      avgt         9       62.018        0.092    us/op
o.s.Chars.justChar_prim     avgt         9       82.897        0.440    us/op
o.s.Chars.newChar           avgt         9      117.962        4.679    us/op
o.s.Chars.newChar_DCE       avgt         9       25.861        0.102    us/op
o.s.Chars.newChar_prim      avgt         9       41.334        0.183    us/op

DCE stands for "Dead Code Elimination", and that is what the original benchmark is suffering from. If we eliminate that effect, in JMH's way it requires us to sink the values into the Blackhole, the score reverses. So, in retrospect, that seems to indicate the new Character() in the original code has major improvement with DCE, while the Character.valueOf is not that successful. I'm not sure we should discuss why, because this has no bearing on the real world use cases, where produced Characters are actually used.

You can go further on two fronts from here:

  • Get the assembly for the benchmark methods to confirm the conjecture above. See PrintAssembly.
  • Run with more threads. The difference between returning cached Character and instantiating the new one would diminish as we increase the number of threads, and consequently hit the "allocation wall".

UPD: Following up on Marko's question, it does seem the major impact is about eliminating the allocation itself, whether via the EA or DCE, see *_prim tests.

UPD2: Looked into the assembly. The same run with -XX:-DoEscapeAnalysis confirms the major effect is due to eliminating the allocation, as the effect of escape analysis:

Benchmark                   Mode   Samples         Mean   Mean error    Units
o.s.Chars.justChar          avgt         9       94.318        4.525    us/op
o.s.Chars.justChar_DCE      avgt         9       61.993        0.227    us/op
o.s.Chars.justChar_prim     avgt         9       82.824        0.634    us/op
o.s.Chars.newChar           avgt         9      118.862        1.096    us/op
o.s.Chars.newChar_DCE       avgt         9       97.530        2.485    us/op
o.s.Chars.newChar_prim      avgt         9      101.905        1.871    us/op

This proves the original DCE conjecture is incorrect. EA is the major contributor. DCE results are still faster because we do not pay the costs of unboxing, and generally treating the returned value with any respect. Benchmark is faulty in that regard nevertheless.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...