shuffle

Shuffle two lists in the same order in python

こ雲淡風輕ζ 提交于 2019-12-10 11:19:28
问题 I have a question about shuffling, but first, here is my code: from psychopy import visual, event, gui import random, os from random import shuffle from PIL import Image import glob a = glob.glob("DDtest/targetimagelist1/*") b = glob.glob("DDtest/distractorimagelist1/*") target = a distractor = b pos1 = [-.05,-.05] pos2 = [.05, .05] shuffle(a) shuffle(b) def loop_function_bro(): win = visual.Window(size=(1280, 800), fullscr=True, screen=0, monitor='testMonitor', color=[-1,-1,-1], colorSpace=

non-repeating random numbers

旧城冷巷雨未停 提交于 2019-12-10 10:38:37
问题 I need to generate around 9-100 million non-repeating random numbers, ranging from zero to the amount of numbers generated, and I need them to be generated very quickly. Several answers to similar questions proposed simply shuffling an array in order to get the random numbers, and others proposed using a bloom filter. The question is, which one is more efficient, and in case of it being the bloom filter, how do I use it? 回答1: You don't want random numbers at all. You want exactly the numbers

Disk Spill during MapReduce

女生的网名这么多〃 提交于 2019-12-10 10:29:36
问题 I have a pretty basic question that I am trying to find an answer for. I was looking through the documentation to understand where is the data spilled to during the map phase, shuffle phase and reduce phase? As in if Mapper A has 16 GB of RAM, but if the allocated memory for a mapper has exceeded then the data is spilled. Is the data spilled to HDFS or will the data be spilled to a tmp folder on the disk? During the shuffle phase, is the data streamed from one node to another node and is

Question about RDD、分区、stage、并行计算、集群、流水线计算、shuffle(join??)、task、executor

£可爱£侵袭症+ 提交于 2019-12-10 05:07:15
Question about RDD、分区、stage、并行计算、集群、流水线计算、shuffle(join??)、task、executor RDD是spark数据中最基本的数据抽象,task是spark的最小代码执行单元?数据不是代码的资源???那为什么RDD又是分区存储?节点中又是对分区(父分区进行流水线计算)?RDD只能转换操作,但是RDD可以分成多个分区,而且这些分区可以被保存到集群中不同的节点,可在不同的节点进行并行计算,那RDD还是高度受限的吗?在一个节点的中以流水线形式计算窄关系的父节点,那RDD还是高度受限的吗?将RDD分成stage,又是为了什么?分配资源吗?优化效率吗?哈希分区和范围分区?shuffle又是什么???task也又是什么??? 流水线计算?是transformation??那就是进行数据的筛选??不对,机器学习算法和交互式数据挖掘使用的目的是什么?理解这个能够理解父分区中的流水计算! shuffle操作中的reduce task需要 跨节点去拉取(为什么要跨节点拉取,因为RDD的不同分区都是在不同的节点储存,但宽关联是子RDD的一个分区就需要父RDD的所有分区,肯定要跨节点。而窄关联的子RDD中的一个分区只是有父RDD的一个分区就可,所以不需要跨节点,但是 join????? 前提组成子RDD的分区的父分区都在同一个节点??

Does this simple shuffling algorithm return a randomly shuffled deck of playing cards? [closed]

喜你入骨 提交于 2019-12-10 04:17:15
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 8 years ago . You have a list of 52 cards where the position of the cards in that list does not move. You have a second list of card positions. At first, the position list is the same as the first list. Iterate through the first list. For each card in the first list, generate a number from 1 to 52. Swap its position in the

Fisher Yates variation

爷,独闯天下 提交于 2019-12-10 01:45:54
问题 The classic Fisher Yates looks something like this: void shuffle1(std::vector<int>& vec) { int n = vec.size(); for (int i = n - 1; i > 0; --i) { std::swap(vec[i], vec[rand() % (i + 1)]); } } Yesterday, I implemented the iteration "backwards" by mistake: void shuffle2(std::vector<int>& vec) { int n = vec.size(); for (int i = 1; i < n; ++i) { std::swap(vec[i], vec[rand() % (i + 1)]); } } Is this version in any way worse (or better) than the first? Does it skew the resulting probabilities? 回答1:

Spark数据倾斜解决方案及shuffle原理

99封情书 提交于 2019-12-09 21:30:03
数据倾斜调优与shuffle调优 数据倾斜发生时的现象 1)个别task的执行速度明显慢于绝大多数task(常见情况) 2)spark作业突然报OOM异常(少见情况) 数据倾斜发生的原理 在进行shuffle的时候,必须将各个节点上相同的key拉取到某个节点上的一个task来进行处理。此时如果某个key对应的数据量特别大的话,就会发生数据倾斜。以至于大部分task只需几分钟,而个别task需要几小时,导致整个task作业需要几个小时才能运行完成。而且如果某个task数据量特别大的时候,甚至会导致内存溢出的情况。 定位数据倾斜发生的位置 数据倾斜只会发生在shuffle过程中,因此我们要先确定数据倾斜发生在第几个stage中,我们可以通过Web UI来查看当前运行到了第一个stage,以及该stage中各个task分配的数据量,来确定是不是由数据分配不均导致的数据倾斜。 一旦确定数据倾斜是由数据分配不均导致,下一步就要确定数据倾斜发生在哪一个stage之后,根据代码中的shuffle算子,推算出stage与代码的对应关系,判定数据倾斜发生的位置。 数据倾斜的解决方案 1)使用Hive ETL预处理数据 适用场景 :Hive里的源数据本身就不均匀,并且需要对Hive表频繁进行shuffle操作 解决方案 :在Hive中预先对数据按照key进行聚合或是和其他表进行join

Shuffle even and odd vaues in SSE register

天大地大妈咪最大 提交于 2019-12-09 18:36:06
问题 I load two SSE 128bit registers with 16 bit values. The values are in the following order: src[0] = [E_3, O_3, E_2, O_2, E_1, O_1, E_0, O_0] src[1] = [E_7, O_7, E_6, O_6, E_5, O_5, E_4, O_4] What I want to achieve is an order like this: src[0] = [E_7, E_6, E_5, E_4, E_3, E_2, E_1, E_0] src[1] = [O_7, O_6, O_5, O_4, O_3, O_2, O_1, O_0] Did you know if there is a good way to do this (by using SSE intrinsics up to SSE 4.2)? I'm stuck at the moment, because I can't shuffle 16 bit values between

Shuffling Letters in an NSString in Objective-C

送分小仙女□ 提交于 2019-12-09 11:33:21
问题 I have written this function which shuffles the contents of a NSString , and it seems to work, but every now and then it crashes. This may be a roundabout way, but I put the characters into an array, swap the elements in the array randomly, and then turn the array back into a string. I'm not sure what I am doing that is unsafe which makes it crash. I thought it was possibly that I am setting finalLettersString = result , but I also tried finalLettersString = [NSString stringWithString:result]

Card Game: Randomly pick 1 number out of array of 52 without duplicates

倾然丶 夕夏残阳落幕 提交于 2019-12-09 08:23:27
问题 I have a simple card game (using 52 cards - no jokers) that I want to randomly pick 1 card at a time until the winning card is chosen. I have the following array: $cards = array( 'diamond' => array( 'A', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K' ), 'heart' => array( 'A', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K' ), 'club' => array( 'A', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K' ), 'spades' => array( 'A', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'J', 'Q', 'K' ), ); As you can see, this array is sorted. I