Parallelized Matrix Multiplication

安稳与你 提交于 2021-02-08 06:19:15

问题


I am trying to parallelize the multiplication of two matrix A,B.

Unfortunately the serial implementation is still faster than the parallel one or the speedup is too low. (with matrix dimension = 512 the speedup is like 1.3). Probably something is fundamentally wrong. Can someone out there give me a tip?

double[][] matParallel2(final double[][] matrixA,
                        final double[][] matrixB,
                        final boolean parallel) {
    int rows = matrixA.length;
    int columnsA = matrixA[0].length;
    int columnsB = matrixB[0].length;

    Runnable task;
    List<Thread> pool = new ArrayList<>();

    double[][] returnMatrix = new double[rows][columnsB];
    for (int i = 0; i < rows; i++) {
        int finalI = i;
        task = () -> {
            for (int j = 0; j < columnsB; j++) {
                //  returnMatrix[finalI][j] = 0;
                for (int k = 0; k < columnsA; k++) {
                    returnMatrix[finalI][j] +=
                            matrixA[finalI][k] * matrixB[k][j];
                }
            }
        };
        pool.add(new Thread(task));
    }
    if (parallel) {
        for (Thread trd : pool) {
            trd.start();
        }
    } else {
        for (Thread trd : pool) {
            trd.run();
        }
    }
    try {
        for (Thread trd : pool) {
            trd.join();
        }
    } catch (
            Exception e) {
        e.printStackTrace();
    }
    return returnMatrix;
}

回答1:


There's nothing fundamentally wrong.

Creating a thread means a huge overhead, compared to a few multiplications. Currently, for 512*512 matrices, you create 512 threads. Your CPU surely has less than 512 cores, so only e.g. 8 or 16 of them will really run in parallel on different cores, but the ~500 others also consumed the creation overhead without increasing parallel execution.

Try to limit the number of threads to something closer to the number of CPU cores, either with your own logic, or by using a framework, e.g. the java.util.concurrent package.




回答2:


You can use one parallel stream to increase the speedup (perhaps twice or more). Don't use nested parallelism because it decreases the speedup!

/**
 * Parallel Matrix multiplication
 *
 * @param m rows of 'a' matrix
 * @param n columns of 'a' matrix
 *          and rows of 'b' matrix
 * @param p columns of 'b' matrix
 * @param a first matrix 'm×n'
 * @param b second matrix 'n×p'
 * @return result matrix 'm×p'
 */
static double[][] parallelMatrixMultiplication(
        int m, int n, int p, double[][] a, double[][] b) {
    return IntStream.range(0, m)
            .parallel() // comment this line to check the sequential stream
            .mapToObj(i -> IntStream.range(0, p)
                    .mapToDouble(j -> IntStream.range(0, n)
                            .mapToDouble(k -> a[i][k] * b[k][j])
                            .sum())
                    .toArray())
            .toArray(double[][]::new);
}

// test
public static void main(String[] args) {
    // dimensions
    int m = 512;
    int n = 1024;
    int p = 512;

    // matrices
    double[][] a = randomMatrix(m, n);
    double[][] b = randomMatrix(n, p);

    long time = System.currentTimeMillis();

    // multiplication
    double[][] c = parallelMatrixMultiplication(m, n, p, a, b);

    System.out.println(System.currentTimeMillis() - time);
    // with    .parallel() the time is - 1495
    // without .parallel() the time is - 5823
}
static double[][] randomMatrix(int d1, int d2) {
    return IntStream.range(0, d1)
            .mapToObj(i -> IntStream.range(0, d2)
                    .mapToDouble(j -> Math.random() * 10)
                    .toArray())
            .toArray(double[][]::new);
}


来源:https://stackoverflow.com/questions/65007123/parallelized-matrix-multiplication

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!