assembly

Optimizing an incrementing ASCII decimal counter in video RAM on 7th gen Intel Core

无人久伴 提交于 2020-05-08 09:40:48
问题 I'm trying to optimize the following subroutine for a specific Kaby Lake CPU (i5-7300HQ), ideally to make the code at least 10 times faster compared to its original form. The code runs as a floppy-style bootloader in 16-bit real mode. It displays a ten digit decimal counter on screen, counting 0 - 9999999999 and then halting. I have taken a look at Agner's Optimization Guides for Microarchitecture and Assembly, Instruction Performance Table and Intel's Optimization Reference Manual. Only

JMP to absolute address (op codes)

风格不统一 提交于 2020-05-07 12:19:50
问题 I'm trying to code a exe packer/protector as a way of learning more about assembler, c++, and how PE files work. I've currently got it working so the section containing the EP is XORed with a key and a new section is created that contains my decryption code. Everything works out great except when I try and JMP to the original EP after decryption. Basically I do this: DWORD originalEntryPoint = optionalHeader->AddressOfEntryPoint; // -- snip -- // crypted.put(0xE9); crypted.write((char*)

ENTER and LEAVE in Assembly?

北战南征 提交于 2020-05-07 11:23:10
问题 I was reading The Art of Assembly Language (Randall Hyde, link to Amazon) and I tried out a console application in that book. It was a program that created a new console for itself using Win32 API functions. The program contains a procedure called LENSTR , which stores the length of string in the EBP register. The code for this function is as follows: LENSTR PROC ENTER 0, 0 PUSH EAX ;---------------------- CLD MOV EDI, DWORD PTR [EBP+08H] MOV EBX, EDI MOV ECX, 100 ; Limit the string length

32-byte aligned routine does not fit the uops cache

て烟熏妆下的殇ゞ 提交于 2020-05-07 05:27:10
问题 KbL i7-8550U I'm researching the behavior of uops-cache and came across a misunderstanding regarding it. As specified in the Intel Optimization Manual 2.5.2.2 (emp. mine): The Decoded ICache consists of 32 sets. Each set contains eight Ways. Each Way can hold up to six micro-ops. - All micro-ops in a Way represent instructions which are statically contiguous in the code and have their EIPs within the same aligned 32-byte region. - Up to three Ways may be dedicated to the same 32-byte aligned

32-byte aligned routine does not fit the uops cache

北城余情 提交于 2020-05-07 05:26:06
问题 KbL i7-8550U I'm researching the behavior of uops-cache and came across a misunderstanding regarding it. As specified in the Intel Optimization Manual 2.5.2.2 (emp. mine): The Decoded ICache consists of 32 sets. Each set contains eight Ways. Each Way can hold up to six micro-ops. - All micro-ops in a Way represent instructions which are statically contiguous in the code and have their EIPs within the same aligned 32-byte region. - Up to three Ways may be dedicated to the same 32-byte aligned

Relative jump out of range by

随声附和 提交于 2020-05-02 08:53:33
问题 I get the message "(76)Relative jump out of range by 000Eh bytes" and (79)Relative jump out of range by 0007h bytes" whenever I input CMP octal, '3'. I'm supposed to do up until the 7th octal number but it always gives me an error when I try to do the 3rd octal number. I can only do 0,1, and 2 until it gives me an error. I don't know what I'm supposed to do. I've tried everything that I can as far as I've been taught but I still can't get any results. Please help me. I'm new to assembly. P.S.

Relative jump out of range by

淺唱寂寞╮ 提交于 2020-05-02 08:53:12
问题 I get the message "(76)Relative jump out of range by 000Eh bytes" and (79)Relative jump out of range by 0007h bytes" whenever I input CMP octal, '3'. I'm supposed to do up until the 7th octal number but it always gives me an error when I try to do the 3rd octal number. I can only do 0,1, and 2 until it gives me an error. I don't know what I'm supposed to do. I've tried everything that I can as far as I've been taught but I still can't get any results. Please help me. I'm new to assembly. P.S.

Assembly Array data storing

泄露秘密 提交于 2020-05-02 04:24:12
问题 Here is a new update on what i'm doing currently. I'm confused on how to use the data i stored in S2 to search the same word in the whole screen. If found highlight the word. DOSBOX - compiler : A86 org 100h ;----------------------------------------------------- lea bp, S1 mov cx, 35 mov al, 1 mov ah, 13h mov bh, 0 mov dl, 0 mov dh, 25 mov bl, 7 int 10h ;---------------------------------------------------------- ; Asks input' mov di,1 start: mov ah, 0 int 16h mov dx,ax mov ah, 0eh cmp dx

Windows C++ fast RGBA32 DX texture to RGB24 buffer

馋奶兔 提交于 2020-04-30 08:20:19
问题 If already have a DirectX texture in hand, what is the fast(low CPU utilization) way to get a RGB24 buffer in main RAM from it (skip the A)? Strictly the format is DXGI_FORMAT_B8G8R8A8_UNORM , does this mean ARGB? I'm using https://github.com/bmharper/WindowsDesktopDuplicationSample to capture Windows desktop and the result is to be converted into RGB24 lossless format in main RAM. The existing code copy GPU texture to CPU texture then use memcpy to copy each line of the RGBA32 data to main

disassembling, changing and assembling DLL file

人盡茶涼 提交于 2020-04-30 07:22:06
问题 I have a DLL which I have disassembled, and just to test that my project is going to work. I have tried assembling it again but without luck. I'm getting different kinds of errors. The disassembly I have done with IDA Pro freeware, and then exported the file as an .asm, to try to assemble it again I have tried to use A86 assembler and flat assembler. Maybe I'm disassembling the DLL the wrong or using the wrong assembler, but could somebody maybe point me to some tools and/or resources about