Cassandra slowed down with more nodes

我与影子孤独终老i 提交于 2019-12-06 11:33:20

Even if this an old post, I think it's worth posting a solution for these (common) kind of problems.

As you've already discovered, loading data with a serial procedure is slow. What you've been suggested is the right thing to do.

However, issuing a lot of queries without applying some sort of back pressure is likely looking for troubles, and you'll gonna lose data due to excessive overload on the server (and on the driver to some extent).

This solution will load data with async calls, and will try to apply some back pressure on the client to avoid data loss.

public class WordGenCAS {
    static final String KEYSPACE = "text_ks";
    static final String COLUMN_FAMILY = "text_table";
    static final String COLUMN_NAME = "text_col";

    public static void main(String[] args) throws Exception {
        if (args.length < 2) {
            System.out.println("Usage: WordGenCAS <input file> <host1,host2,...>");
            System.exit(-1);
        }

        String[] contactPts = args[1].split(",");

        Cluster cluster = Cluster.builder()
                .addContactPoints(contactPts)
                .build();
        Session session = cluster.connect(KEYSPACE);

        InputStream fis = new FileInputStream(args[0]);
        InputStreamReader in = new InputStreamReader(fis, "UTF-8");
        BufferedReader br = new BufferedReader(in);

        String line;
        int lineCount = 0;

        // This is the futures list of our queries
        List<Future<ResultSet>> futures = new ArrayList<>();

        // Loop
        while ( (line = br.readLine()) != null) {
            line = line.replaceAll("'", " ");
            line = line.trim();
            if (line.isEmpty())
                continue;
            System.out.println("[" + line + "]");
            String cqlStatement2 = String.format("insert into %s (id, %s) values (%d, '%s');",
                    COLUMN_FAMILY,
                    COLUMN_NAME,
                    lineCount,
                    line);
            lineCount++;

            // Add the "future" returned by async method the to the list
            futures.add(session.executeAsync(cqlStatement2));

            // Apply some backpressure if we issued more than X query.
            // Change X to another value suitable for your cluster
            while (futures.size() > 1000) {
                Future<ResultSet> future = futures.remove(0);
                try {
                    future.get();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }

        System.out.println("Total lines written: " + lineCount);
        System.out.println("Waiting for writes to complete...");

        // Wait until all writes are done.
        while (futures.size() > 0) {
            Future<ResultSet> future = futures.remove(0);
            try {
                future.get();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        System.out.println("Done!");
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!