mmap

kafka速度快的原因

人走茶凉 提交于 2019-12-04 03:22:40
我们都知道Kafka非常快,比绝大多数的市场上其他消息中间件都要快。这里来研究下那么为什么Kafka那么快(当然不会是因为它用了Scala)。 Kafka的消息是保存或缓存在磁盘上的,一般认为在磁盘上读写数据是会降低性能的,因为寻址会比较消耗时间。 但是实际上,Kafka其中一个特性却是高吞吐率,即使是普通的服务器,Kafka也能轻松支持每秒百万级的写入请求,超过了大部分的消息中间件。这种特性使得Kafka在日志处理等海量数据场景中应用广泛。那么为什么Kafka速度那么快,可以从数据写入和数据读取两方面来分析。 Kafka的数据写入(生产者) 生产者(Producer)是负责向Kafka提交数据的,Kafka会把收到的消息都写入到磁盘中,因此可以认为它绝对不会丢失数据。 而为了优化写入速度,Kafka采用了两种技术,一种是顺序写入,一种是MMFile。 顺序写入 磁盘读写的快慢取决于你怎么使用它,一般可以分为顺序读写或者随机读写。 因为硬盘是机械结构,每次读写都会经过一个【寻址->写入】的过程,其中的寻址是一个十分耗时的机械动作,所以硬盘最讨厌随机I/O,最喜欢顺序I/O。为了提高读写硬盘的速度,Kafka就是使用的顺序I/O。而且Linux对于磁盘的读写优化也比较多,包括read-ahead、write-behind和磁盘缓存等。更多的,对Java的内存管理和垃圾回收会有优化

How do I choose a fixed address for mmap?

时间秒杀一切 提交于 2019-12-04 02:20:06
mmap() can be optionally supplied with a fixed location to place the map. I would like to mmap a file and then have it available to a few different programs at the same virtual address in each program. I don't care what the address is, just as long as they all use the same address. If need be, the address can be chosen by one of them at run time (and communicated with the others via some other means). Is there an area of memory that Linux guarantees to be unused (by the application and by the kernel) that I can map to? How can I find one address that is available in several running

What is the fastest way to read 10 GB file from the disk?

∥☆過路亽.° 提交于 2019-12-04 01:28:00
We need to read and count different types of messages/run some statistics on a 10 GB text file, e.g a FIX engine log. We use Linux, 32-bit, 4 CPUs, Intel, coding in Perl but the language doesn't really matter. I have found some interesting tips in Tim Bray's WideFinder project . However, we've found that using memory mapping is inherently limited by the 32 bit architecture. We tried using multiple processes, which seems to work faster if we process the file in parallel using 4 processes on 4 CPUs. Adding multi-threading slows it down, maybe because of the cost of context switching. We tried

Why does mmap() use MAP_FAILED instead of NULL?

江枫思渺然 提交于 2019-12-04 00:40:21
Does anybody know why mmap() returns MAP_FAILED instead of NULL? It seems that MAP_FAILED is (void*)-1 on most systems. Why doesn't mmap() just use NULL instead? I know that address 0x0 is technically a valid memory page, whereas (void*)-1 will never be a valid page. Yet my guess is that mmap() will never actually return page 0x0 in practice. On Windows, for example, VirtualAlloc() returns NULL on error. Is it safe to assume that mmap() will never return 0x0? Presumably a successful call to mmap() ought to return usable memory to the caller. Address 0x0 is never usable, so it should never be

Mapping non-contiguous blocks from a file into contiguous memory addresses

ε祈祈猫儿з 提交于 2019-12-04 00:22:46
I am interested in the prospect of using memory mapped IO, preferably exploiting the facilities in boost::interprocess for cross-platform support, to map non-contiguous system-page-size blocks in a file into a contiguous address space in memory. A simplified concrete scenario: I've a number of 'plain-old-data' structures, each of a fixed length (less than the system page size.) These structures are concatenated into a (very long) stream with the type & location of structures determined by the values of those structures that proceed them in the stream. I'm aiming to minimize latency and

MAP_ANONYMOUS with C99 standard

我的梦境 提交于 2019-12-04 00:01:03
I have an application that uses the mmap system call, I was having an issue getting it to compile for hours looking as to why I was getting MAP_ANON and MAP_ANONYMOUS were undeclared, I had a smaller section of code that I used and I saw I could compile it just fine so I tried just a basic compile and that worked, I saw that it fails when you add -std=c99. Is there a specific reason that MAP_ANON and MAP_ANONYMOUS are not valid in the C99 standard? I know that they aren't defined by POSIX but are defined by BSD SOURCE so I just want to know why that is. You probably want -std=gnu99 instead of

Anonymous mmap zero-filled?

主宰稳场 提交于 2019-12-03 17:50:25
问题 This question was migrated from Unix & Linux Stack Exchange because it can be answered on Stack Overflow. Migrated 6 years ago . In Linux, the mmap(2) man page explains that an anonymous mapping . . . is not backed by any file; its contents are initialized to zero. The FreeBSD mmap(2) man page does not make a similar guarantee about zero-filling, though it does promise that bytes after the end of a file in a non-anonymous mapping are zero-filled. Which flavors of Unix promise to return zero

Is it possible to “punch holes” through mmap'ed anonymous memory?

戏子无情 提交于 2019-12-03 16:51:49
问题 Consider a program which uses a large number of roughly page-sized memory regions (say 64 kB or so), each of which is rather short-lived. (In my particular case, these are alternate stacks for green threads.) How would one best do to allocate these regions, such that their pages can be returned to the kernel once the region isn't in use anymore? The naïve solution would clearly be to simply mmap each of the regions individually, and munmap them again as soon as I'm done with them. I feel this

what is the size limit for mmap

ぃ、小莉子 提交于 2019-12-03 16:00:02
I am using mmap() to map a shared memory object to a process. My question has two parts: 1) what is the size limit for mmap() to a linux process? (is there such limit?) 2) after the process running a while, I think the process virtual memory address space will be somehow fragmented. Will this impact the max size I can do mmap() in this process? The linux kernel used is 2.6.27 . The size of the shared memory object is around 32MB. I am trying to access what is the possibility that mmap() fails with such shared memory object due to no enough virtual address space. prabhakar palanivel There is no

iozone的使用与介绍-20191105

让人想犯罪 __ 提交于 2019-12-03 13:25:01
https://www.jianshu.com/p/faf82e400aa6 iozone的使用与介绍 0.0722017.05.10 07:40:41字数 550阅读 1817 iozone的使用与介绍 iozone介绍: iozone( www.iozone.org )是一个文件系统的benchmark工具,可以测试不同的操作系统中文件系统的读写性能。 可以测试 Read, write, re-read,re-write, read backwards, read strided, fread, fwrite, random read, pread,mmap, aio_read, aio_write 等等不同的模式下的硬盘的性能。 测试的时候请注意,设置的测试文件的大小一定要大过你的内存(最佳为内存的两倍大小),不然linux会给你的读写的内容进行缓存。会使数值非常不真实. iozone常用的几个参数. -a 全面测试,比如块大小它会自动加 -i N 用来选择测试项, 比如Read/Write/Random 比较常用的是0 1 2,可以指定成-i 0 -i 1 -i2.这些别的详细内容请查man 0=write/rewrite 1=read/re-read 2=random-read/write 3=Read-backwards 4=Re-write-record 5