Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am writing a multi-threaded application in Java in order to improve performance over the sequential version. It is a parallel version of the dynamic programming solution to the 0/1 knapsack problem. I have an Intel Core 2 Duo with both Ubuntu and Windows 7 Professional on different partitions. I am running in Ubuntu.

My problem is that the parallel version actually takes longer than the sequential version. I am thinking this may be because the threads are all being mapped to the same kernel thread or that they are being allocated to the same core. Is there a way I could ensure that each Java thread maps to a separate core?

I have read other posts about this problem but nothing seems to help.

Here is the end of main() and all of run() for the KnapsackThread class (which extends Thread). Notice that they way I use slice and extra to calculate myLowBound and myHiBound ensure that each thread will not overlap in domain of the dynProgMatrix. Therefore there will be no race conditions.

    dynProgMatrix = new int[totalItems+1][capacity+1];
    for (int w = 0; w<= capacity; w++)
        dynProgMatrix[0][w] = 0;
    for(int i=0; i<=totalItems; i++)
        dynProgMatrix[i][0] = 0;
    slice = Math.max(1,
            (int) Math.floor((double)(dynProgMatrix[0].length)/threads.length));
    extra = (dynProgMatrix[0].length) % threads.length;

    barrier = new CyclicBarrier(threads.length);
    for (int i = 0; i <  threads.length; i++){
        threads[i] = new KnapsackThread(Integer.toString(i));
    }
    for (int i = 0; i < threads.length; i++){
        threads[i].start();
    }

    for (int i = 0; i < threads.length; i++){
        try {
            threads[i].join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

public void run(){
    int myRank = Integer.parseInt(this.getName());

    int myLowBound;
    int myHiBound;

    if (myRank < extra){
        myLowBound = myRank * (slice + 1);
        myHiBound = myLowBound + slice;
    }
    else{
        myLowBound = myRank * slice + extra;
        myHiBound = myLowBound + slice - 1;
    }

    if(myHiBound > capacity){
        myHiBound = capacity;
    }

    for(int i = 1; i <= totalItems; i++){
        for (int w = myLowBound; w <= myHiBound; w++){

            if (allItems[i].weight <= w){
               if (allItems[i].profit + dynProgMatrix[i-1][w-allItems[i].weight]
                        > dynProgMatrix[i-1][w])
                {
                    dynProgMatrix[i][w] = allItems[i].profit +
                                      dynProgMatrix[i-1][w- allItems[i].weight];
                }
                else{
                    dynProgMatrix[i][w] = dynProgMatrix[i-1][w];
                }
            }
            else{
                dynProgMatrix[i][w] = dynProgMatrix[i-1][w];
            }
        }
        // now place a barrier to sync up the threads
        try {
            barrier.await(); 
        } catch (InterruptedException ex) { 
            ex.printStackTrace();
            return;
        } catch (BrokenBarrierException ex) { 
            ex.printStackTrace(); 
            return;
        }
    }
}

Update:

I have written another version of the knapsack that uses brute force. This version has very little synchronization because I only need to update a bestSoFar variable at the end of a single thread's execution. Therefore, each thread pretty much should execute completely in parallel except for that small critical section at the end.

I ran this versus the sequential brute force and still it takes longer. I don't see any other explanation than that my threads are being run sequentially, either because they are being mapped to the same core or to the same native thread.

Does anybody have any insight?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
790 views
Welcome To Ask or Share your Answers For Others

1 Answer

I doubt that it will be due to using the same core for all threads. The scheduling is up to the OS, but you should be able to see what's going on if you bring up the performance manager for the OS - it will typically show how busy each core is.

Possible reasons for it taking longer:

  • Lots of synchronization (either necessary or unnecessary)
  • The tasks taking such a short time that thread creation is taking a significant proportion of the time
  • Context switching, if you're creating too many threads - for CPU intensive tasks, create as many as threads as you have cores.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...