So I somehow got sucked into reading about register read stall in my previous post and mentioned one possible use of
mov eax,eax to optimise assembly code but this is not what it is usually for. The main reason compiler use
mov eax,eax is actually for code alignment as i briefly mentioned in my previous post.
So i hit Agner's guide for optimising assembly again for a quick read and here is what i gathered.
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an important subroutine entry or jump label happens to be near the end of a 16-byte block then the microprocessor will only get a few useful bytes of code when fetching that block of code. It may have to fetch the next 16 bytes too before it can decode the first instructions after the label. This can be avoided by aligning important subroutine entries and loop entries by 16.
So we learnt that aligning the code for subroutine entries and jump can optimise our code, especially if these subroutine and jumps are in loops. It is now obvious that NOPs instruction can be used to pad the assembly code so we could achieve this. The list of these are:
1-byte: xchg EAX, EAX (equivalent of NOP or 0x90 in x86) 2-byte: mov reg, reg 3-byte: lea reg, 0 (reg) (8-bit displacement) 6-byte: lea reg, 0 (reg) (32-bit displacement)
mov eax,eax saved the day! :)