shuffle

Need a repeatable random array shuffle using a key

≯℡__Kan透↙ 提交于 2020-01-06 06:39:06
问题 I am looking to randomly shuffle a list/array using a key. I want to be able to repeat the same random order using the key. So I will randomly generate a numeric key from say 1 to 20 then use that key to try and randomly shuffle the list. I first tried just using the key to keep iterating through my list, decrementing the key until=0, then grabbing whatever element I am on, removing it and adding it to my shuffled array. The result is kind of random but when the arrays are small (which most

Shuffle independently within column of numpy array [duplicate]

ⅰ亾dé卋堺 提交于 2020-01-05 08:23:09
问题 This question already has answers here : numpy random shuffle by row independently (5 answers) Closed last year . I have a numpy array of the format [[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]] Each column represents a data channel, and I need to shuffle the contents of each column within that column independently of the other channels. I understand that numpy.random.shuffle only shuffles

c++ async : how to shuffle a vector in multithread context?

五迷三道 提交于 2020-01-05 08:08:39
问题 Running a multithreaded program, I noticed that the program was running faster using 1 thread compared to 4 threads, despite the CPU having 4 cores. After investigating, I found out that the issue appears only when shuffling something. Below the minimal program I created to reproduce the problem: #include <math.h> #include <future> #include <ctime> #include <vector> #include <iostream> #include <algorithm> #define NB_JOBS 5000.0 #define MAX_CORES 8 static bool _no_shuffle(int nb_jobs){ bool b

c++ async : how to shuffle a vector in multithread context?

会有一股神秘感。 提交于 2020-01-05 08:08:21
问题 Running a multithreaded program, I noticed that the program was running faster using 1 thread compared to 4 threads, despite the CPU having 4 cores. After investigating, I found out that the issue appears only when shuffling something. Below the minimal program I created to reproduce the problem: #include <math.h> #include <future> #include <ctime> #include <vector> #include <iostream> #include <algorithm> #define NB_JOBS 5000.0 #define MAX_CORES 8 static bool _no_shuffle(int nb_jobs){ bool b

Shuffle a list and return a copy

安稳与你 提交于 2020-01-05 03:25:33
问题 I want to shuffle an array, but all I find was method like random.shuffle(x) , from Best way to randomize a list of strings in Python Can I do something like import random rectangle = [(0,0),(0,1),(1,1),(1,0)] # I want something like # disorderd_rectangle = rectangle.shuffle Now I can only get away with disorderd_rectangle = rectangle random.shuffle(disorderd_rectangle) print(disorderd_rectangle) print(rectangle) But it returns [(1, 1), (1, 0), (0, 1), (0, 0)] [(1, 1), (1, 0), (0, 1), (0, 0)]

Shuffle a list and return a copy

爱⌒轻易说出口 提交于 2020-01-05 03:25:33
问题 I want to shuffle an array, but all I find was method like random.shuffle(x) , from Best way to randomize a list of strings in Python Can I do something like import random rectangle = [(0,0),(0,1),(1,1),(1,0)] # I want something like # disorderd_rectangle = rectangle.shuffle Now I can only get away with disorderd_rectangle = rectangle random.shuffle(disorderd_rectangle) print(disorderd_rectangle) print(rectangle) But it returns [(1, 1), (1, 0), (0, 1), (0, 0)] [(1, 1), (1, 0), (0, 1), (0, 0)]

BASH - Shuffle characters in strings from file

只谈情不闲聊 提交于 2020-01-04 04:11:11
问题 I have a file ( filename.txt ) with the following structure: >line1 ABC >line2 DEF >line3 GHI >line4 JKL I would like to shuffle the characters in the strings that do not start wit > . The output would (for example) look like the following: >line1 BCA >line2 DFE >line3 IHG >line4 KLJ This is what I tried to shuffle the characters in a string: sed 's/./&\n/' | shuf | tr -d "\n" . It looks like it works but it does not take into account newlines. Moreover it executes the command on all data and

31_spark九—数据倾斜与shuffle调优

柔情痞子 提交于 2020-01-03 08:53:08
Spark数据倾斜与shuffle调优 1. 数据倾斜原理和现象分析 1.1 数据倾斜概述 有的时候,我们可能会遇到大数据计算中一个最棘手的问题—— 数据倾斜 ,此时Spark作业的性能会比期望差很多。 数据倾斜调优,就是使用各种技术方案解决不同类型的数据倾斜问题,以保证Spark作业的性能。 1.2 数据倾斜发生时的现象 (1)绝大多数task执行得都非常快,但个别task执行极慢 你的大部分的task,都执行的特别快,很快就执行完了,剩下几个task,执行的特别特别慢, 前面的task,一般10s可以执行完5个;最后发现某个task,要执行1个小时,2个小时才能执行完一个task。 这个时候就出现数据倾斜了。 这种方式还算好的,因为虽然老牛拉破车一样,非常慢,但是至少还能跑。 (2)绝大数task执行很快,有的task直接报OOM (Jvm Out Of Memory) 异常 运行的时候,其他task都很快执行完了,也没什么特别的问题;但是有的task,就是会突然间报了一个 OOM ,JVM Out Of Memory,内存溢出了,task failed,task lost,resubmitting task等日志异常信息。反复执行几次都到了某个task就是跑不通,最后就挂掉。 某个task就直接OOM,那么基本上也是因为数据倾斜了,task分配的数量实在是太大了!!

Spark调优指南

情到浓时终转凉″ 提交于 2020-01-03 08:13:03
Spark相关问题 Spark 比 MR 快的原因? 1) Spark 的计算结果可以放入内存,支持基于内存的迭代, MR 不支持。 2) Spark 有 DAG 有向无环图,可以实现 pipeline 的计算模式。 3) 资源调度模式: Spark 粗粒度资源调度, MR 是细粒度资源调度。 资源复用: Spark 中的 task 可以复用同一批 Executor 的资源。 MR 里面每一个 map task 对应一个 jvm ,不能复用资源。 Spark 中主要进程的作用? Driver 进程:负责任务的分发和结果的回收。 Executor 进程:负责具体任务的执行。 Master 进程: Spark 资源管理的主进程,负责资源调度。 Worker 进程: Spark 资源管理的从进程, woker 节点主要运行 Executor Spark调优 1. 资源调优 1) .搭建Spark集群的时候要给Spark集群足够的资源(core,memory) 在 spark安装包的conf下spark-env.sh SPARK_WORKER_CORES SPARK_WORKER_MEMORY SPARK_WORKER_INSTANCE 2) .在提交Application的时候给Application分配更多的资源。 提交命令选项:(在提交 Application的时候使用选项) -

Shuffle an array of int in C with - without while loop

大城市里の小女人 提交于 2020-01-03 05:29:07
问题 I want to Shuffle an array of ints, the array is sorted and its size in n, values are 1 - n. I Just want to avoid using a while loop in order to make sure the rand() doesn't give me the same index. the code looks somthin like this: void shuffleArr(int* arr, size_t n) { int newIndx = 0; int i = 0; for(; i < n - 1; ++i) { while((newIndx = i + rand() % (n - i)) == i); swap(i, newIndx, arr); } } The for loop goes until n-1, so for example in the last run it has 50/50 chance of being equal to i. I