c++ - Why is strcmp not SIMD optimized? -


itemprop = "text">

I have tried to compile this program on an x64 computer:

  # Includes & lt; Cstring & gt; Integer main (int argc, char * argv []) {return :: std :: strcmp (argv [0], "actually actually actually actually actually actually actually actually actually" "actually actually In fact actually actually actually actually actually actually actually "actually" actually actually actually actually actually actually actually actually actually "really" actually actually smell In fact actually in fact actually actually actually "really" actually actually actually actually actually actually actually actually actually "" actually actually actually actually actually actually actually actually In fact "really" is actually really really really really very really really really in fact In fact Astw really really really really really really really really really really really really really really long string ") in; }  

I compiled it like this:

  G ++ -std = c ++ 11 -msse2 -O3g a.cpp -oa  

but the resulting disassembly is like this:

  0x0000000000400480 & lt; + 0>: mov (% RSI),% RSI 0x0000000000400483 & lt; + 3>: mov $ 0 x400628,% EDI 0x0000000000400488 & lt; + 8>: mov $ 0x22d,% ECX 0x000000000040048d & lt; + 13>: repz cmpsb% es: (% RDI),% DS: (% RSI) 0x000000000040048f & LT; + 15>: Seta% L 0x0000000000400492 & lt; + 18 & gt; Setback% DL0x0000000000400495 & lt; + 21>: Sub% edx,% eax 0x0000000000400497 & lt; + 23>: movsbl% al,% eax 0x000000000040049a & lt; + 26>: retq  

Why is no SIMD used? I think it may be to compare 16 characters at a time. Should I write my own SIMD strcmp , or is it a nonsensical idea for some reason? In the implementation of a SSE 2, how the compiler should ensure that no memory is available at the end of the string,

Will not access? This length is to know first and it requires string scanning for zero byte ending.

The string is scanned for the length of the action you have already completed most of the work of a strcmp function. Therefore there is no benefit to using SSE2.

However, instructions for dealing with Intel added string SSE4.2 instruction set. These zero-byte handles handle the problem: To read this blog post, write them well:


Comments