Linux Assemblycollection of fast libraries

Single Instruction Multiple Data (SIMD) instructions set

Beginning with the Pentium II and Pentium with Intel MMX technology processor families,‭ many extensions have been introduced into the Intel ‬64‭ ‬and IA-32‭ ‬architectures to perform single-instruction multiple-data (‬SIMD‭) ‬operations.‭ These extensions include the MMX technology,‭ ‬SSE,‭ ‬SSE2,‭ ‬SSE3,‭ SSE4, AVX, AVX2 and AVX512 extensions.‭ Each of these extensions provide a group of instructions that perform SIMD operations on packed integer and/or packed floating-point data elements.

Contents

Tip: For detailed information about each instruction please read: Intel Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z

AVX Initialization Instructions

InstructionMeaning
VZEROALLZero all YMM registers
VZEROUPPERZero upper bits of all YMM registers

Data Transfer Instructions

InstructionMeaning
Integer Operands
VMOVDMove double word
VMOVQMove quad word
VMOVDQAMove aligned double quad words
VMOVDQA32Move aligned packed double word integer values using writemask
VMOVDQA64Move aligned packed quad word integer values using writemask
VMOVDQUMove unaligned double quad words
VMOVDQU8Move unaligned packed byte integer values using writemask
VMOVDQU16Move unaligned packed word integer values using writemask
VMOVDQU32Move unaligned packed double word integer values using writemask
VMOVDQU64Move unaligned packed quad word integer values using writemask
VMOVSLDUPLoads/moves 128 bits duplicating the first and third 32-bit data elements
VMOVSHDUPLoads/moves 128 bits duplicating the second and fourth 32-bit data elements
VMOVDDUPLoads/moves 128 bits duplicating the lower 64-bit data elements
VPMASKMOVDConditional SIMD integer packed loads and stores of double word values
VPMASKMOVQConditional SIMD integer packed loads and stores of quad word values
VPMOVMSKBMove byte mask
VPALIGNRConcatenate destination and source operands, extract byte aligned result shifted to the right by constant value
VALIGNDShift right and merge vectors with double word granularity using immediate shift value
VALIGNQShift right and merge vectors with quad word granularity using immediate shift value
Single Precision Floating-point Operands
VMOVSSMove scalar single-precision floating-point value between YMM registers or between an YMM register and memory
VMOVAPSMove four aligned packed single-precision floating-point values between YMM registers or between and YMM register and memory
VMOVUPSMove four unaligned packed single-precision floating-point values between YMM registers or between and YMM register and memory
VMOVLPSMove two packed single-precision floating-point values to the low quad word of an YMM register and memory
VMOVHPSMove two packed single-precision floating-point values to the high quad word of an YMM register and memory
VMOVLHPSMove two packed single-precision floating-point values from the low quad word to the high quad word of another YMM register
VMOVHLPSMove two packed single-precision floating-point values from the high quad word to the low quad word of another YMM register
VMASKMOVPSConditional SIMD packed loads and stores of single-precision floating-point values
VMOVMSKPSExtract sign mask from four packed single-precision floating-point value
Double Precision Floating-point Operands
VMOVSDMove scalar double-precision floating-point value between YMM registers or between an YMM register and memory
VMOVAPDMove two aligned packed double-precision floating-point values between YMM registers or between and YMM register and memory
VMOVUPDMove two unaligned packed double-precision floating-point values between YMM registers or between and YMM register and memory
VMOVLPDMove low packed double-precision floating-point value to the low quad word of an YMM register and memory
VMOVHPDMove high packed double-precision floating-point value to the high quad word of an YMM register and memory
VMASKMOVPDConditional SIMD packed loads and stores of double-precision floating-point values
VMOVMSKPDExtract sign mask from two packed double-precision floating-point value

Broadcast Instructions

InstructionMeaning
Byte Operands
VPBROADCASTBBroadcast a byte integer value to all elements of a register
VPBROADCASTMB2QBroadcast byte size mask to all elements of a register
Word Operands
VPBROADCASTWBroadcast a word integer value to all elements of a register
VPBROADCASTMW2DBroadcast word size mask to all elements of a register
Double Word Operands
VPBROADCASTDBroadcast a double word integer value to all elements of a register
VBROADCASTI32X2Broadcast two double word values to all elements of a register
VBROADCASTI32X4Broadcast four double word values to all elements of a register
VBROADCASTI32X8Broadcast eight double word values to all elements of a register
Quad Word Operands
VPBROADCASTQBroadcast a quad word integer value to all elements of a register
VBROADCASTI64X2Broadcast two quad word values to all elements of a register
VBROADCASTI64X4Broadcast four quad word values to all elements of a register
Single Precision Floating-point Operands
VBROADCASTSSBroadcast a single-precision floating-point value to all elements of a register
VBROADCASTF32X2Broadcast two single-precision floating-point values to all elements of a register
VBROADCASTF32X4Broadcast four single-precision floating-point values to all elements of a register
VBROADCASTF32X8Broadcast eight single-precision floating-point values to all elements of a register
Double Precision Floating-point Operands
VBROADCASTSDBroadcast a double-precision floating-point value to all elements of a register
VBROADCASTF64X2Broadcast two double-precision floating-point values to all elements of a register
VBROADCASTF64X4Broadcast four double-precision floating-point values to all elements of a register

Expand Instructions

InstructionMeaning
VPEXPANDDLoad sparse packed double word integer values from dense memory
VPEXPANDQLoad sparse packed quad word integer values from dense memory
VEXPANDPSLoad sparse packed single-precision floating-point values from dense memory
VEXPANDPDLoad sparse packed double-precision floating-point values from dense memory

Compress Instructions

InstructionMeaning
VPCOMPRESSDStore sparse packed double word integer values into dense memory
VPCOMPRESSQStore sparse packed quad word integer values into dense memory
VCOMPRESSPSStore sparse packed single-precision floating-point values into dense memory
VCOMPRESSPDStore sparse packed double-precision floating-point values into dense memory

Insert Instructions

InstructionMeaning
Integer Operands
VPINSRBInsert a byte value from a register or memory into an YMM register
VPINSRWInsert a word value from a register or memory into an YMM register
VPINSRDInsert a double word value from register or memory into an YMM register
VPINSRQInsert a quad word value from register or memory into an YMM register
VINSERTI128Insert 128-bits of packed integer values from the source into the destination operand
VINSERTI32X4Insert 128-bits of packed integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI64X2Insert 128-bits of packed integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI32X8Insert 256-bits of packed integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI64X4Insert 256-bits of packed integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
Floating-point Operands
VINSERTPSInserts a single-precision floating-point value from either a 32-bit memory location or selected from a specified offset in an YMM register to a specified offset in the destination YMM register. In addition, INSERTPS allows zeroing out selected data elements in the destination, using a mask
VINSERTF128Insert 128-bits of packed floating-point values from the source into the destination operand
VINSERTF32X4Insert 128-bits of packed floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF64X2Insert 128-bits of packed floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF32X8Insert 256-bits of packed floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF64X4Insert 256-bits of packed floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand

Extract Instructions

InstructionMeaning
Integer Operands
VPEXTRBExtract a byte from an YMM register and insert the value into a general-purpose register or memory
VPEXTRWExtract a word from an YMM register and insert the value into a general-purpose register or memory
VPEXTRDExtract a double word from an YMM register and insert the value into a general-purpose register or memory
VPEXTRQExtract a quad word from an YMM register and insert the value into a general-purpose register or memory
VEXTRACTI128Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand
VEXTRACTI32X4Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTI64X2Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTI32X8Extract 256-bits of packed integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
VEXTRACTI64X4Extract 256-bits of packed integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
Floating-point Operands
VEXTRACTPSExtracts a single-precision floating-point value from a specified offset in an YMM register and stores the result to memory or a general-purpose register
VEXTRACTF128Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand
VEXTRACTF32X4Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTF64X2Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTF32X8Extract 256-bits of packed floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
VEXTRACTF64X4Extract 256-bits of packed floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset

Gather Instructions

InstructionMeaning
Double Word Operands
VPGATHERDDGather packed double word values using signed double word indices
VPGATHERQDGather packed double word values using signed quad word indices
Quad Word Operands
VPGATHERDQGather packed quad word values using signed double word indices
VPGATHERQQGather packed quad word values using signed quad word indices
Single Precision Floating-point Operands
VGATHERDPSGather packed single-precision floating-point values using signed double word indices
VGATHERQPSGather packed single-precision floating-point values using signed quad word indices
VGATHERPF0DPSSparse prefetch of packed single-precision floating-point values with signed double word indices using T0 hint
VGATHERPF1DPSSparse prefetch of packed single-precision floating-point values with signed double word indices using T1 hint
VGATHERPF0QPSSparse prefetch of packed single-precision floating-point values with signed quad word indices using T0 hint
VGATHERPF1QPSSparse prefetch of packed single-precision floating-point values with signed quad word indices using T1 hint
Double Precision Floating-point Operands
VGATHERDPDGather packed double-precision floating-point values using signed double word indices
VGATHERQPDGather packed double-precision floating-point values using signed quad word indices
VGATHERPF0DPDSparse prefetch of packed double-precision floating-point values with signed double word indices using T0 hint
VGATHERPF1DPDSparse prefetch of packed double-precision floating-point values with signed double word indices using T1 hint
VGATHERPF0QPDSparse prefetch of packed double-precision floating-point values with signed quad word indices using T0 hint
VGATHERPF1QPDSparse prefetch of packed double-precision floating-point values with signed quad word indices using T1 hint

Scatter Instructions

InstructionMeaning
Double Word Operands
VPSCATTERDDUsing signed double word indices, scatter double word values to memory using writemask
VPSCATTERQDUsing signed quad word indices, scatter double word values to memory using writemask
Quad Word Operands
VPSCATTERDQUsing signed double word indices, scatter quad word values to memory using writemask
VPSCATTERQQUsing signed quad word indices, scatter quad word values to memory using writemask
Single Precision Floating-point Operands
VSCATTERDPSUsing signed double word indices, scatter single-precision floating-point values to memory using writemask
VSCATTERQPSUsing signed quad word indices, scatter single-precision floating-point values to memory using writemask
VSCATTERPF0DPSUsing signed double word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1DPSUsing signed double word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write
VSCATTERPF0QPSUsing signed quad word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1QPSUsing signed quad word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write
Double Precision Floating-point Operands
VSCATTERDPDUsing signed double word indices, scatter double-precision floating-point values to memory using writemask
VSCATTERQPDUsing signed quad word indices, scatter double-precision floating-point values to memory using writemask
VSCATTERPF0DPDUsing signed double word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1QPDUsing signed double word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write
VSCATTERPF0QPDUsing signed quad word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1DPDUsing signed quad word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write

Blending Instructions

InstructionMeaning
Byte Operands
VPBLENDVBConditionally copies specified byte elements in the source operand to the destination, using an implied mask
VPBLENDMBPerforms blending of byte elements between the first and the second operand (register or memory), using the instruction mask selector
Word Operands
VPBLENDWConditionally copies specified word elements in the source operand to the destination, using an immediate byte control
VPBLENDMWPerforms blending of word elements between the first and the second operand (register or memory), using the instruction mask selector
Double Word Operands
VPBLENDDConditionally copies specified double word elements in the source operand to the destination, using an immediate byte control
VPBLENDMDPerforms blending of double word elements between the first and the second operand (register or memory), using the instruction mask selector
Quad Word Operands
VPBLENDMQPerforms blending of quad word elements between the first and the second operand (register or memory), using the instruction mask selector
Single Precision Floating-point Operands
VBLENDPSConditionally copies specified data elements in the source operand to the destination, using an immediate byte control
VBLENDVPSConditionally copies specified data elements in the source operand to the destination, using an implied mask
VBLENDMPSPerforms blending between single-precision elements in the first operand with the elements in the second operand using an opmask register as select control
Double Precision Floating-point Operands
VBLENDPDConditionally copies specified data elements in the source operand to the destination, using an immediate byte control
VBLENDVPDConditionally copies specified data elements in the source operand to the destination, using an implied mask
VBLENDMPDPerforms blending between double-precision elements in the first operand with the elements in the second operand using an opmask register as select control

Shuffle Instructions

InstructionMeaning
Byte Operands
VPSHUFBShuffle packed byte values
Word Operands
VPSHUFLWShuffle packed low words values
VPSHUFHWShuffle packed high words values
Double Word Operands
VPSHUFDShuffle packed double words values
VSHUFI32X4Shuffle 128-bit packed double word values
Quad Word Operands
VSHUFI64X2Shuffle 128-bit packed quad word values
Single Precision Floating-point Operands
VSHUFPSShuffles values in packed single-precision floating-point operands
VSHUFF32X4Shuffle 128-bit packed single-precision floating-point operands
Double Precision Floating-point Operands
VSHUFPDShuffles values in packed double-precision floating-point operands
VSHUFF64X2Shuffle 128-bit packed double-precision floating-point operands

Permute Instructions

InstructionMeaning
Word Operands
VPERMWPermute packed word elements
VPERMI2WPermute packed word elements from two tables using indexes
Double Word Operands
VPERMDPermute packed double word elements
VPERMI2DPermute packed double word elements from two tables using indexes
Quad Word Operands
VPERMQPermute packed quad word elements
VPERMI2QPermute packed quad word elements from two tables using indexes
128-bits Integer Operands
VPERM2I128Permute 128-bit integer fields using controls
Single Precision Floating-point Operands
VPERMPSPermute packed single-precision floating-point elements
VPERMILPSPermute packed single-precision floating-point elements using controls
VPERMI2PSPermute packed single-precision elements from two tables using indexes
Double Precision Floating-point Operands
VPERMPDPermute packed double-precision floating-point elements
VPERMILPDPermute packed double-precision floating-point elements using controls
VPERMI2PDPermute packed double-precision elements from two tables using indexes
128-bits Floating-point Operands
VPERM2F128Permute 128-bit floating-point fields using controls

Unpack Instructions

InstructionMeaning
Byte Operands
VPUNPCKLBWUnpack low-order bytes
VPUNPCKHBWUnpack high-order bytes
Word Operands
VPUNPCKLWDUnpack low-order words
VPUNPCKHWDUnpack high-order words
Double Word Operands
VPUNPCKLDQUnpack low-order double words
VPUNPCKHDQUnpack high-order double words
Quad Word Operands
VPUNPCKLQDQUnpack low quad words
VPUNPCKHQDQUnpack high quad words
Single Precision Floating-point Operands
VUNPCKLPSUnpacks and interleaves the two low-order values from two single-precision floating-point operands
VUNPCKHPSUnpacks and interleaves the two high-order values from two single-precision floating-point operands
Double Precision Floating-point Operands
VUNPCKLPDUnpacks and interleaves the low values from two packed double-precision floating-point operands
VUNPCKHPDUnpacks and interleaves the high values from two packed double-precision floating-point operands

Pack Instructions

InstructionMeaning
Words into Bytes
VPACKSSWBPack words into bytes with signed saturation
VPACKUSWBPack words into bytes with unsigned saturation
Double Words into Words
VPACKSSDWPack double words into words with signed saturation
VPACKUSDWPack double words into words with unsigned saturation

Conversion Instructions

InstructionMeaning
Byte to Word
VPMOVSXBWSign extend the lower 8-bit integer of each packed word element into packed signed word integers
VPMOVZXBWZero extend the lower 8-bit integer of each packed word element into packed signed word integers
Byte to Double Word
VPMOVSXBDSign extend the lower 8-bit integer of each packed double word element into packed signed double word integers
VPMOVZXBDZero extend the lower 8-bit integer of each packed double word element into packed signed double word integers
Byte to Quad Word
VPMOVSXBQSign extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXBQZero extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers
Word to Byte
VPMOVWBConverts packed word integers into packed bytes with truncation
VPMOVSWBConverts packed signed word integers into packed signed bytes using signed saturation
VPMOVUSWBConverts packed unsigned word integers into packed unsigned bytes using unsigned saturation
Word to Double Word
VPMOVSXWDSign extend the lower 16-bit integer of each packed double word element into packed signed double word integers
VPMOVZXWDZero extend the lower 16-bit integer of each packed double word element into packed signed double word integers
Word to Quad Word
VPMOVSXWQSign extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXWQZero extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers
Double Word to Byte
VPMOVDBConverts packed double word integers into packed bytes with truncation
VPMOVSDBConverts packed signed double word integers into packed signed bytes using signed saturation
VPMOVUSDBConverts packed unsigned double word integers into packed unsigned bytes using unsigned saturation
Double Word to Word
VPMOVDWConverts packed double word integers into packed words with truncation
VPMOVSDWConverts packed signed double word integers into packed signed words using signed saturation
VPMOVUSDWConverts packed unsigned double word integers into packed unsigned words using unsigned saturation
Double Word to Quad Word
VPMOVSXDQSign extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXDQZero extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers
Quad Word to Byte
VPMOVQBConverts packed quad word integers into packed bytes with truncation
VPMOVSQBConverts packed signed quad word integers into packed signed bytes using signed saturation
VPMOVUSQBConverts packed unsigned quad word integers into packed unsigned bytes using unsigned saturation
Quad Word to Word
VPMOVQWConverts packed quad word integers into packed words with truncation
VPMOVSQWConverts packed signed quad word integers into packed signed words using signed saturation
VPMOVUSQWConverts packed unsigned quad word integers into packed unsigned words using unsigned saturation
Quad Word to Double Word
VPMOVQDConverts packed quad word integers into packed double words with truncation
VPMOVSQDConverts packed signed quad word integers into packed signed double words using signed saturation
VPMOVUSQDConverts packed unsigned quad word integers into packed unsigned double words using unsigned saturation
Double Word to Single Precision Floating-point
VCVTSI2SSConvert scalar signed double word integer to scalar single-precision floating-point value
VCVTUSI2SSConvert scalar unsigned double word integer to scalar single-precision floating-point value
VCVTDQ2PSConvert packed signed double word integers to packed single-precision floating-point values
VCVTUDQ2PSConvert packed unsigned double word integers to packed single-precision floating-point values
Double Word to Double Precision Floating-point
VCVTSI2SDConvert scalar signed double word integer to scalar double-precision floating-point value
VCVTUSI2SDConvert scalar unsigned double word integer to scalar double-precision floating-point value
VCVTDQ2PDConvert packed signed double word integers to packed double-precision floating-point values
VCVTUDQ2PDConvert packed unsigned double word integers to packed double-precision floating-point values
Quad Word to Single Precision Floating-point
VCVTSI2SSConvert scalar signed quad word integer to scalar single-precision floating-point value
VCVTUSI2SSConvert scalar unsigned quad word integer to scalar single-precision floating-point value
VCVTQQ2PSConvert packed signed quad word integers to packed single-precision floating-point values
VCVTUQQ2PSConvert packed unsigned quad word integers to packed single-precision floating-point values
Quad Word to Double Precision Floating-point
VCVTSI2SDConvert scalar signed quad word integer to scalar double-precision floating-point value
VCVTUSI2SDConvert scalar unsigned quad word integer to scalar double-precision floating-point value
VCVTQQ2PDConvert packed signed quad word integers to packed double-precision floating-point values
VCVTUQQ2PDConvert packed unsigned quad word integers to packed double-precision floating-point values
Half Precision Floating-point to Single Precision Floating-point
VCVTPH2PSConvert eight/four data element containing 16-bit floating-point data into eight/four single-precision floating-point data
Single Precision Floating-point to Double Word
VCVTSS2SIConvert scalar single-precision floating-point value to scalar signed double word integer
VCVTSS2USIConvert scalar single-precision floating-point value to scalar unsigned double word integer
VCVTPS2DQConvert packed single-precision floating-point values to packed signed double word integers
VCVTPS2UDQConvert packed single-precision floating-point values to packed unsigned double word integers
VCVTTSS2SIConvert with truncation scalar single-precision floating-point value to scalar signed double word integer
VCVTTSS2USIConvert with truncation scalar single-precision floating-point value to scalar unsigned double word integer
VCVTTPS2DQConvert with truncation packed single-precision floating-point values to packed signed double word integers
VCVTTPS2UDQConvert with truncation packed single-precision floating-point values to packed unsigned double word integers
Single Precision Floating-point to Quad Word
VCVTSS2SIConvert scalar single-precision floating-point value to scalar signed quad word integer
VCVTSS2USIConvert scalar single-precision floating-point value to scalar unsigned quad word integer
VCVTPS2QQConvert packed single-precision floating-point values to packed signed quad word integers
VCVTPS2UQQConvert packed single precision floating-point values to packed unsigned quad word integers
VCVTTSS2SIConvert with truncation scalar single-precision floating-point value to scalar signed quad word integer
VCVTTSS2USIConvert with truncation scalar single-precision floating-point value to scalar unsigned quad word integer
VCVTTPS2QQConvert with truncation packed single precision floating-point values to packed signed quad word integers
VCVTTPS2UQQConvert with truncation packed single precision floating-point values to packed unsigned quad word integers
Single Precision Floating-point to Half Precision Floating-point
VCVTPS2PHConvert eight/four data element containing single-precision floating-point data into eight/four 16-bit floating-point data
Single Precision Floating-point to Double Precision Floating-point
VCVTSS2SDConvert scalar single-precision floating-point value to scalar double-precision floating-point value
VCVTPS2PDConvert packed single-precision floating-point values to packed double-precision floating-point values
Double Precision Floating-point to Double Word
VCVTSD2SIConvert scalar double-precision floating-point value to scalar signed double word integer
VCVTSD2USIConvert scalar double-precision floating-point value to scalar unsigned double word integer
VCVTPD2DQConvert packed double-precision floating-point values to packed signed double word integers
VCVTPD2UDQConvert packed double-precision floating-point values to packed unsigned double word integers
VCVTTSD2SIConvert with truncation scalar double-precision floating-point value to scalar signed double word integer
VCVTTSD2USIConvert with truncation scalar double-precision floating-point value to scalar unsigned double word integer
VCVTTPD2DQConvert with truncation packed double-precision floating-point values to packed signed double word integers
VCVTTPD2UDQConvert with truncation packed double-precision floating-point values to packed unsigned double word integers
Double Precision Floating-point to Quad Word
VCVTSD2SIConvert scalar double-precision floating-point value to scalar signed quad word integer
VCVTSD2USIConvert scalar double-precision floating-point value to scalar unsigned quad word integer
VCVTPD2QQConvert packed double-precision floating-point values to packed signed quad word integers
VCVTPD2UQQConvert packed double-precision floating-point values to packed unsigned quad word integers
VCVTTSD2SIConvert with truncation scalar double-precision floating-point value to scalar signed quad word integer
VCVTTSD2USIConvert with truncation scalar double-precision floating-point value to scalar unsigned quad word integer
VCVTTPD2QQConvert with truncation packed double-precision floating-point values to packed signed quad word integers
VCVTTPD2UQQConvert with truncation packed double-precision floating-point values to packed unsigned quad word integers
Double Precision Floating-point to Single Precision Floating-point
VCVTSD2SSConvert scalar double-precision floating-point value to scalar single-precision floating-point value
VCVTPD2PSConvert packed double-precision floating-point values to packed single-precision floating-point values

Logical Instructions

InstructionMeaning
Byte Operands
VPTESTMBPerforms a bitwise logical AND of packed byte integers and set mask
VPTESTNMBPerforms a bitwise logical NOT AND of packed byte integers and set mask
Word Operands
VPTESTMWPerforms a bitwise logical AND of packed word integers and set mask
VPTESTNMWPerforms a bitwise logical NOT AND of packed word integers and set mask
Double Word Operands
VPTESTMDPerforms a bitwise logical AND of packed double word integers and set mask
VPTESTNMDPerforms a bitwise logical NOT AND of packed double word integers and set mask
VPANDDBitwise logical AND of packed double word integers
VPANDNDBitwise logical AND NOT of packed double word integers
VPORDBitwise logical OR of packed double word integers
VPXORDBitwise logical exclusive XOR of packed double word integers
VPTERNLOGDBitwise ternary logic with double word granularity. The immediate value determines the specific binary function being implemented
Quad Word Operands
VPTESTMQPerforms a bitwise logical AND of packed quad word integers and set mask
VPTESTNMQPerforms a bitwise logical NOT AND of packed quad word integers and set mask
VPANDQBitwise logical AND of packed quad word integers
VPANDNQBitwise logical AND NOT of packed quad word integers
VPORQBitwise logical OR of packed quad word integers
VPXORQBitwise logical exclusive XOR of packed quad word integers
VPTERNLOGQBitwise ternary logic with quad word granularity. The immediate value determines the specific binary function being implemented
Integer Operands
VPTESTPerforms a logical AND between the destinations with this mask and sets the ZF flag if the result is zero. The CF flag (zero for TEST) is set if the inverted mask AND with the destination is all zero
VPANDBitwise logical AND
VPANDNBitwise logical AND NOT
VPORBitwise logical OR
VPXORBitwise logical exclusive OR
Single Precision Floating-point Operands
VTESTPSPacked bit test of single-precision floating-point elements
VANDPSPerform bitwise logical AND of packed single-precision floating-point values
VANDNPSPerform bitwise logical AND NOT of packed single-precision floating-point values
VORPSPerform bitwise logical OR of packed single-precision floating-point values
VXORPSPerform bitwise logical XOR of packed single-precision floating-point values
Double Precision Floating-point Operands
VTESTPDPacked bit test of double-precision floating-point elements
VANDPDPerform bitwise logical AND of packed double-precision floating-point values
VANDNPDPerform bitwise logical AND NOT of packed double-precision floating-point values
VORPDPerform bitwise logical OR of packed double-precision floating-point values
VXORPDPerform bitwise logical XOR of packed double-precision floating-point values

Shift and Rotate Instructions

InstructionMeaning
Word Operands
VPSLLWShift packed words left logical
VPSRLWShift packed words right logical
VPSRAWShift packed words right arithmetic
VPSLLVWVariable bit shift left logical
VPSRLVWVariable bit shift right logical
VPSRAVWVariable bit shift right arithmetic
Double Word Operands
VPSLLDShift packed double words left logical
VPSRLDShift packed double words right logical
VPSRADShift packed double words right arithmetic
VPSLLVDVariable bit shift left logical
VPSRLVDVariable bit shift right logical
VPSRAVDVariable bit shift right arithmetic
VPROLDRotate double words left using immediate bits count
VPRORDRotate double words right using immediate bits count
VPROLVDRotate double words left using variable bits count
VPRORVDRotate double words right using variable bits count
Quad Word Operands
VPSLLQShift packed quad word left logical
VPSRLQShift packed quad word right logical
VPSRAQShift packed quad words right arithmetic
VPSLLVQVariable bit shift left logical
VPSRLVQVariable bit shift right logical
VPSRAVQVariable bit shift right arithmetic
VPROLQRotate quad words left using immediate bits count
VPRORQRotate quad words right using immediate bits count
VPROLVQRotate quad words left using variable bits count
VPRORVQRotate quad words right using variable bits count
Double Quad Word Operands
VPSLLDQShift double quad word left logical
VPSRLDQShift double quad word right logical

Comparison Instructions

InstructionMeaning
Byte Operands
VPCMPEQBCompare packed bytes for equal
VPCMPGTBCompare packed signed byte integers for greater than
VPCMPBCompare packed signed byte values into mask
VPCMPUBCompare packed unsigned byte values into mask
Word Operands
VPCMPEQWCompare packed words for equal
VPCMPGTWCompare packed signed word integers for greater than
VPCMPWCompare packed signed word values into mask
VPCMPUWCompare packed unsigned word values into mask
Double Word Operands
VPCMPEQDCompare packed double words for equal
VPCMPGTDCompare packed signed double word integers for greater than
VPCMPDCompare packed signed double word values into mask
VPCMPUDCompare packed unsigned double word values into mask
Quad Word Operands
VPCMPEQQCompare packed quad words for equal
VPCMPGTQCompare packed signed quad word integers for greater than
VPCMPQCompare packed signed quad word values into mask
VPCMPUQCompare packed unsigned quad word values into mask
Single Precision Floating-point Operands
VCMPEQPSCompare packed single-precision floating-point values and set mask if destination value is equal to source value
VCMPLTPSCompare packed single-precision floating-point values and set mask if destination value is less than source value
VCMPLEPSCompare packed single-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTPSCompare packed single-precision floating-point values and set mask if destination value is greater than source value
VCMPGEPSCompare packed single-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDPSCompare packed single-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQPSCompare packed single-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTPSCompare packed single-precision floating-point values and set mask if destination value is not less than source value
VCMPNLEPSCompare packed single-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTPSCompare packed single-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGEPSCompare packed single-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDPSCompare packed single-precision floating-point values and set mask if neither of both source operands is a NaN
VCMPEQSSCompare scalar single-precision floating-point values and set mask if destination value is equal to source value
VCMPLTSSCompare scalar single-precision floating-point values and set mask if destination value is less than source value
VCMPLESSCompare scalar single-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTSSCompare scalar single-precision floating-point values and set mask if destination value is greater than source value
VCMPGESSCompare scalar single-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDSSCompare scalar single-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQSSCompare scalar single-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTSSCompare scalar single-precision floating-point values and set mask if destination value is not less than source value
VCMPNLESSCompare scalar single-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTSSCompare scalar single-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGESSCompare scalar single-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDSSCompare scalar single-precision floating-point values and set mask if neither of both source operands is a NaN
VCOMISSPerform ordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register
VUCOMISSPerform unordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register
Double Precision Floating-point Operands
VCMPEQPDCompare packed double-precision floating-point values and set mask if destination value is equal to source value
VCMPLTPDCompare packed double-precision floating-point values and set mask if destination value is less than source value
VCMPLEPDCompare packed double-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTPDCompare packed double-precision floating-point values and set mask if destination value is greater than source value
VCMPGEPDCompare packed double-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDPDCompare packed double-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQPDCompare packed double-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTPDCompare packed double-precision floating-point values and set mask if destination value is not less than source value
VCMPNLEPDCompare packed double-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTPDCompare packed double-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGEPDCompare packed double-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDPDCompare packed double-precision floating-point values and set mask if neither of both source operands is a NaN
VCMPEQSDCompare scalar double-precision floating-point values and set mask if destination value is equal to source value
VCMPLTSDCompare scalar double-precision floating-point values and set mask if destination value is less than source value
VCMPLESDCompare scalar double-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTSDCompare scalar double-precision floating-point values and set mask if destination value is greater than source value
VCMPGESDCompare scalar double-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDSDCompare scalar double-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQSDCompare scalar double-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTSDCompare scalar double-precision floating-point values and set mask if destination value is not less than source value
VCMPNLESDCompare scalar double-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTSDCompare scalar double-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGESDCompare scalar double-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDSDCompare scalar double-precision floating-point values and set mask if neither of both source operands is a NaN
VCOMISDPerform ordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register
VUCOMISDPerform unordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register

Packed Arithmetic Instructions

InstructionMeaning
Byte Operands
VPADDBAdd packed byte integers
VPADDUSBAdd packed unsigned byte integers with unsigned saturation
VPADDSBAdd packed signed byte integers with signed saturation
VPSUBBSubtract packed byte integers
VPSUBUSBSubtract packed unsigned byte integers with unsigned saturation
VPSUBSBSubtract packed signed byte integers with signed saturation
Word Operands
VPADDWAdd packed word integers
VPADDUSWAdd packed unsigned word integers with unsigned saturation
VPADDSWAdd packed signed word integers with signed saturation
VPHADDWAdds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed 16-bit results to the destination operand
VPHADDSWAdds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed, saturated 16-bit results to the destination operand
VPSUBWSubtract packed word integers
VPSUBUSWSubtract packed unsigned word integers with unsigned saturation
VPSUBSWSubtract packed signed word integers with signed saturation
VPHSUBWPerforms horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed 16-bit results are packed and written to the destination operand
VPHSUBSWPerforms horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed, saturated 16-bit results are packed and written to the destination operand
VPMULHUWMultiply packed unsigned integers and store high result
VPMULLWMultiply packed signed word integers and store low result
VPMULHWMultiply packed signed word integers and store high result
VPMULHRSWMultiplies vertically each signed 16-bit integer from the destination operand with the corresponding signed 16-bit integer of the source operand, producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand
VPMADDUBSWMultiplies each unsigned byte value with the corresponding signed byte value to produce an intermediate, 16-bit signed integer. Each adjacent pair of 16-bit signed values are added horizontally. The signed, saturated 16-bit results are packed to the destination operand
Double Word Operands
VPADDDAdd packed double word integers
VPHADDDAdds two adjacent, signed 32-bit integers horizontally from the source and destination operands and packs the signed 32-bit results to the destination operand
VPSUBDSubtract packed double word integers
VPHSUBDPerforms horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant double word from the least significant double word of each pair in the source and destination operands. The signed 32-bit results are packed and written to the destination operand
VPMULLDReturns four lower 32-bits of the 64-bit results of signed 32-bit integer multiplies
VPMADDWDMultiply and add packed word integers
Quad Word Operands
VPADDQAdd packed quad word integers
VPSUBQSubtract packed quad word integers
VPMULUDQMultiply packed unsigned double word integers
VPMULDQReturns two 64-bit signed result of signed 32-bit integer multiplies
VPMULLQReturns two lower 64-bits of the 128-bit results of signed 64-bit integer multiplies
Single Precision Floating-point Operands
VADDSSAdd scalar single-precision floating-point value
VADDPSAdd packed single-precision floating-point values
VHADDPSPerforms a single-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand
VSUBSSSubtract scalar single-precision floating-point value
VSUBPSSubtract packed single-precision floating-point values
VHSUBPSPerforms a single-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from the third element of the second operand
VADDSUBPSPerforms single-precision addition on the second and fourth pairs of 32-bit data elements within the operands; single-precision subtraction on the first and third pairs
VMULSSMultiply scalar single-precision floating-point value
VMULPSMultiply packed single-precision floating-point values
VDIVSSDivide scalar single-precision floating-point value
VDIVPSDivide packed single-precision floating-point values
Double Precision Floating-point Operands
VADDSDAdd scalar double precision floating-point value
VADDPDAdd packed double-precision floating-point values
VHADDPDPerforms a double-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand
VSUBSDSubtract scalar double-precision floating-point value
VSUBPDSubtract scalar double-precision floating-point value
VHSUBPDPerforms a double-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the second operand from the first element of the second operand
VADDSUBPDPerforms double-precision addition on the second pair of quad words, and double-precision subtraction on the first pair
VMULSDMultiply scalar double-precision floating-point value
VMULPDMultiply packed double-precision floating-point values
VDIVSDDivide scalar double-precision floating-point value
VDIVPDDivide packed double-precision floating-point values

Fused Arithmetic Instructions

InstructionMeaning
Single Precision Floating-point Operands
VFMADD132SSFused multiply-add of scalar single-precision floating-point values
VFMADD213SSFused multiply-add of scalar single-precision floating-point values
VFMADD231SSFused multiply-add of scalar single-precision floating-point values
VFMADD132PSFused multiply-add of packed single-precision floating-point values
VFMADD213PSFused multiply-add of packed single-precision floating-point values
VFMADD231PSFused multiply-add of packed single-precision floating-point values
VFNMADD132SSFused negative multiply-add of scalar single-precision floating-point values
VFNMADD213SSFused negative multiply-add of scalar single-precision floating-point values
VFNMADD231SSFused negative multiply-add of scalar single-precision floating-point values
VFNMADD132PSFused negative multiply-add of packed single-precision floating-point values
VFNMADD213PSFused negative multiply-add of packed single-precision floating-point values
VFNMADD231PSFused negative multiply-add of packed single-precision floating-point values
VFMSUB132SSFused multiply-subtract of scalar single-precision floating-point values
VFMSUB213SSFused multiply-subtract of scalar single-precision floating-point values
VFMSUB231SSFused multiply-subtract of scalar single-precision floating-point values
VFMSUB132PSFused multiply-subtract of packed single-precision floating-point values
VFMSUB213PSFused multiply-subtract of packed single-precision floating-point values
VFMSUB231PSFused multiply-subtract of packed single-precision floating-point values
VFNMSUB132SSFused negative multiply-subtract of scalar single-precision floating-point values
VFNMSUB213SSFused negative multiply-subtract of scalar single-precision floating-point values
VFNMSUB231SSFused negative multiply-subtract of scalar single-precision floating-point values
VFNMSUB132PSFused negative multiply-subtract of packed single-precision floating-point values
VFNMSUB213PSFused negative multiply-subtract of packed single-precision floating-point values
VFNMSUB231PSFused negative multiply-subtract of packed single-precision floating-point values
VFMADDSUB132PSFused multiply-alternating add/subtract of packed single-precision floating-point values
VFMADDSUB213PSFused multiply-alternating add/subtract of packed single-precision floating-point values
VFMADDSUB231PSFused multiply-alternating add/subtract of packed single-precision floating-point values
VFMSUBADD132PSFused multiply-alternating subtract/add of packed single-precision floating-point values
VFMSUBADD213PSFused multiply-alternating subtract/add of packed single-precision floating-point values
VFMSUBADD231PSFused multiply-alternating subtract/add of packed single-precision floating-point values
Double Precision Floating-point Operands
VFMADD132SDFused multiply-add of scalar double-precision floating-point values
VFMADD213SDFused multiply-add of scalar double-precision floating-point values
VFMADD231SDFused multiply-add of scalar double-precision floating-point values
VFMADD132PDFused multiply-add of packed double-precision floating-point values
VFMADD213PDFused multiply-add of packed double-precision floating-point values
VFMADD231PDFused multiply-add of packed double-precision floating-point values
VFNMADD132SDFused negative multiply-add of scalar double-precision floating-point values
VFNMADD213SDFused negative multiply-add of scalar double-precision floating-point values
VFNMADD231SDFused negative multiply-add of scalar double-precision floating-point values
VFNMADD132PDFused negative multiply-add of packed double-precision floating-point values
VFNMADD213PDFused negative multiply-add of packed double-precision floating-point values
VFNMADD231PDFused negative multiply-add of packed double-precision floating-point values
VFMSUB132SDFused multiply-subtract of scalar double-precision floating-point values
VFMSUB213SDFused multiply-subtract of scalar double-precision floating-point values
VFMSUB231SDFused multiply-subtract of scalar double-precision floating-point values
VFMSUB132PDFused multiply-subtract of packed double-precision floating-point values
VFMSUB213PDFused multiply-subtract of packed double-precision floating-point values
VFMSUB231PDFused multiply-subtract of packed double-precision floating-point values
VFNMSUB132SDFused negative multiply-subtract of scalar double-precision floating-point values
VFNMSUB213SDFused negative multiply-subtract of scalar double-precision floating-point values
VFNMSUB231SDFused negative multiply-subtract of scalar double-precision floating-point values
VFNMSUB132PDFused negative multiply-subtract of packed double-precision floating-point values
VFNMSUB213PDFused negative multiply-subtract of packed double-precision floating-point values
VFNMSUB231PDFused negative multiply-subtract of packed double-precision floating-point values
VFMADDSUB132PDFused multiply-alternating add/subtract of packed double-precision floating-point values
VFMADDSUB213PDFused multiply-alternating add/subtract of packed double-precision floating-point values
VFMADDSUB231PDFused multiply-alternating add/subtract of packed double-precision floating-point values
VFMSUBADD132PDFused multiply-alternating subtract/add of packed double-precision floating-point values
VFMSUBADD213PDFused multiply-alternating subtract/add of packed double-precision floating-point values
VFMSUBADD231PDFused multiply-alternating subtract/add of packed double-precision floating-point values

Primitives of Functions

InstructionMeaning
Byte Operands
VPABSBComputes the absolute value of each signed byte data element
VPSIGNBNegates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPAVGBCompute average of packed unsigned byte integers
VPMINUBMinimum of packed unsigned byte integers
VPMINSBMinimum of packed signed byte integers
VPMAXUBMaximum of packed unsigned byte integers
VPMAXSBMaximum of packed signed byte integers
VPSADBWCompute sum of absolute differences
VMPSADBWPerforms eight 4-byte wide Sum of Absolute Differences operations to produce eight word integers
VDBPSADBWDouble block packed Sum of Absolute Differences on unsigned bytes
Word Operands
VPABSWComputes the absolute value of each signed 16-bit data element
VPSIGNWNegates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPAVGWCompute average of packed unsigned word integers
VPMINUWMinimum of packed unsigned word integers
VPMINSWMinimum of packed signed word integers
VPMAXUWMaximum of packed unsigned word integers
VPMAXSWMaximum of packed signed word integers
VPHMINPOSUWFinds the value and location of the minimum unsigned word from one of 8 horizontally packed unsigned words. The resulting value and location (offset within the source) are packed into the low double word of the destination YMM register
Double Word Operands
VPABSDComputes the absolute value of each signed 32-bit data element
VPSIGNDNegates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPMINUDMinimum of packed unsigned double word integers
VPMINSDMinimum of packed signed double word integers
VPMAXUDMaximum of packed unsigned double word integers
VPMAXSDMaximum of packed signed double word integers
VPLZCNTDCount the number of leading zero bits in each packed double word element
VPCONFLICTDDetect conflicts within a vector of packed double word values into dense memory
Quad Word Operands
VPABSQComputes the absolute value of each signed 64-bit data element
VPMINUQMinimum of packed unsigned quad word integers
VPMINSQMinimum of packed signed quad word integers
VPMAXUQMaximum of packed unsigned quad word integers
VPMAXSQMaximum of packed signed quad word integers
VPLZCNTQCount the number of leading zero bits in each packed quad word element
VPCONFLICTQDetect conflicts within a vector of packed quad word values into dense memory
Single Precision Floating-point Operands
VSQRTSSCompute square root of scalar single-precision floating-point value
VSQRTPSCompute square roots of packed single-precision floating-point values
VMINSSReturn minimum scalar single-precision floating-point value
VMINPSReturn minimum packed single-precision floating-point values
VMAXSSReturn maximum scalar single-precision floating-point value
VMAXPSReturn maximum packed single-precision floating-point values
VROUNDSSRound the low packed single precision floating-point value into an integer value and return a rounded floating-point value
VROUNDPSRound packed single precision floating-point values into integer values and return rounded floating-point values
VRNDSCALESSRound scalar single-precision floating-point value to include a given number of fraction bits
VRNDSCALEPSRound packed single-precision floating-point values to include a given number of fraction bits
VDPPSPerform single-precision dot products for up to 4 elements and broadcast
VRANGESSRange restriction calculation for pairs of scalar single-precision floating-point values
VRANGEPSRange restriction calculation for packed pairs of single-precision floating-point values
VREDUCESSPerform a reduction transformation on a scalar single-precision floating-point value by subtracting a number of fraction bits
VREDUCEPSPerform reduction transformation on packed single-precision floating-point values by subtracting a number of fraction bits
VGETEXPSSConvert the biased exponent of scalar single-precision floating-point value to floating-point value representing unbiased integer exponent
VGETEXPPSConvert the biased exponent of packed single-precision floating-point values to floating-point values representing unbiased integer exponent
VGETMANTSSExtract the normalized mantissa from scalar single-precision floating-point value
VGETMANTPSExtract the normalized mantissa from packed single-precision floating-point values
VSCALEFSSScale scalar single-precision floating-point value
VSCALEFPSScale packed single-precision floating-point values
VEXP2PSApproximation to the exponential 2^x of packed single-precision floating-point values with less than 2^-23 relative error
VFPCLASSSSTests scalar single-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFPCLASSPSTests packed single-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFIXUPIMMSSFix up special scalar single-precision floating-point value
VFIXUPIMMPSFix up special packed single-precision floating-point values
VRCP14SSComputes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error is less than 2^-28
VRCP14PSComputes the approximate reciprocals of the packed single-precision floating-point values. The max relative error is less than 2^-28
VRCP28SSComputes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error is less than 2^-28
VRCP28PSComputes the approximate reciprocals of the packed single-precision floating-point values. The max relative error is less than 2^-28
VRSQRT14SSComputes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error is less than 2^-14
VRSQRT14PSComputes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error is less than 2^-14
VRSQRT28SSComputes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error is less than 2^-28
VRSQRT28PSComputes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error is less than 2^-28
VRCPPSCompute reciprocals of packed single-precision floating-point values
VRCPSSCompute reciprocal of scalar single-precision floating-point value
VRSQRTPSCompute reciprocals of square roots of packed single-precision floating-point values
VRSQRTSSCompute reciprocal of square root of scalar single-precision floating-point value
Double Precision Floating-point Operands
VSQRTSDCompute scalar square root of scalar double-precision floating-point value
VSQRTPDCompute packed square roots of packed double-precision floating-point values
VMINSDReturn minimum scalar double-precision floating-point value
VMINPDReturn minimum packed double-precision floating-point values
VMAXSDReturn maximum scalar double-precision floating-point value
VMAXPDReturn maximum packed double-precision floating-point values
VROUNDSDRound the low packed double precision floating-point value into an integer value and return a rounded floating-point value
VROUNDPDRound packed double precision floating-point values into integer values and return rounded floating-point values
VRNDSCALESDRound scalar double-precision floating-point value to include a given number of fraction bits
VRNDSCALEPDRound packed double-precision floating-point values to include a given number of fraction bits
VDPPDPerform double-precision dot product for up to 2 elements and broadcast
VRANGESDRange restriction calculation for pairs of scalar double-precision floating-point values
VRANGEPDRange restriction calculation for packed pairs of double-precision floating-point values
VREDUCESDPerform a reduction transformation on a scalar double-precision floating-point value by subtracting a number of fraction bits
VREDUCEPDPerform reduction transformation on packed double-precision floating-point values by subtracting a number of fraction bits
VGETEXPSDConvert the biased exponent of scalar double-precision floating-point value to floating-point value representing unbiased integer exponent
VGETEXPPDConvert the biased exponent of packed double-precision floating-point values to floating-point values representing unbiased integer exponent
VGETMANTSDExtract the normalized mantissa from scalar double-precision floating-point value
VGETMANTPDExtract the normalized mantissa from packed double-precision floating-point values
VSCALEFSDScale scalar double-precision floating-point value
VSCALEFPDScale packed double-precision floating-point values
VEXP2PDApproximation to the exponential 2^x of packed double-precision floating-point values with less than 2^-23 relative error
VFPCLASSSDTests scalar double-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFPCLASSPDTests packed double-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFIXUPIMMSDFix up special scalar double-precision floating-point value
VFIXUPIMMPDFix up special packed double-precision floating-point values
VRCP14SDComputes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error is less than 2^-28
VRCP14PDComputes the approximate reciprocals of the packed double-precision floating-point values. The max relative error is less than 2^-28
VRCP28SDComputes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error is less than 2^-28
VRCP28PDComputes the approximate reciprocals of the packed double-precision floating-point values. The max relative error is less than 2^-28
VRSQRT14SDComputes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error is less than 2^-14
VRSQRT14PDComputes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error is less than 2^-14
VRSQRT28SDComputes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error is less than 2^-28
VRSQRT28PDComputes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error is less than 2^-28

Opmask Instructions

InstructionMeaning
8-bit Operands
KMOVBMove 8-bit from and to mask registers
KTESTBSet ZF and CF depending on sign bit AND and ANDN of 8-bit masks
KORTESTBBitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTBBitwise NOT of 8-bits mask
KANDBBitwise logical AND of two 8-bit masks
KANDNBBitwise logical AND NOT of two 8-bit masks
KORBBitwise logical OR of two 8-bit masks
KXORBBitwise logical XOR of two 8-bit masks
KXNORBBitwise logical XNOR of two 8-bit masks
KADDBAdd two 8-bit masks
KSHIFTLBShift left 8-bit mask register
KSHIFTRBShift right 8-bit mask register
KUNPCKBWUnpack and interleave 8-bit masks
VPMOVM2BConvert a mask register to a vector register
VPMOVB2MConverts a vector register to a mask register
16-bit Operands
KMOVWMove 16-bit from and to mask registers
KTESTWSet ZF and CF depending on sign bit AND and ANDN of 16-bit masks
KORTESTWBitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTWBitwise NOT of 16-bits mask
KANDWBitwise logical AND of two 16-bit masks
KANDNWBitwise logical AND NOT of two 16-bit masks
KORWBitwise logical OR of two 16-bit masks
KXORWBitwise logical XOR of two 16-bit masks
KXNORWBitwise logical XNOR of two 16-bit masks
KADDWAdd two 16-bit masks
KSHIFTLWShift left 16-bit mask register
KSHIFTRWShift right 16-bit mask register
KUNPCKWDUnpack and interleave 16-bit masks
VPMOVM2WConvert a mask register to a vector register
VPMOVW2MConverts a vector register to a mask register
32-bit Operands
KMOVDMove 32-bit from and to mask registers
KTESTDSet ZF and CF depending on sign bit AND and ANDN of 32-bit masks
KORTESTDBitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTDBitwise NOT of 32-bits mask
KANDDBitwise logical AND of two 32-bit masks
KANDNDBitwise logical AND NOT of two 32-bit masks
KORDBitwise logical OR of two 32-bit masks
KXORDBitwise logical XOR of two 32-bit masks
KXNORDBitwise logical XNOR of two 32-bit masks
KADDDAdd two 32-bit masks
KSHIFTLDShift left 32-bit mask register
KSHIFTRDShift right 32-bit mask register
KUNPCKDQUnpack and interleave 32-bit masks
VPMOVM2DConvert a mask register to a vector register
VPMOVD2MConverts a vector register to a mask register
64-bit Operands
KMOVQMove 64-bit from and to mask registers
KTESTQSet ZF and CF depending on sign bit AND and ANDN of 64-bit masks
KORTESTQBitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTQBitwise NOT of 64-bits mask
KANDQBitwise logical AND of two 64-bit masks
KANDNQBitwise logical AND NOT of two 64-bit masks
KORQBitwise logical OR of two 64-bit masks
KXORQBitwise logical XOR of two 64-bit masks
KXNORQBitwise logical XNOR of two 64-bit masks
KADDQAdd two 64-bit masks
KSHIFTLQShift left 64-bit mask register
KSHIFTRQShift right 64-bit mask register
VPMOVM2QConvert a mask register to a vector register
VPMOVQ2MConverts a vector register to a mask register

String and Text Processing Instructions

InstructionMeaning
VPCMPESTRIPacked compare explicit-length strings, return index in ECX/RCX
VPCMPESTRMPacked compare explicit-length strings, return mask in YMM0
VPCMPISTRIPacked compare implicit-length strings, return index in ECX/RCX
VPCMPISTRMPacked compare implicit-length strings, return mask in YMM0

Secure Hash Algorithm Instructions

InstructionMeaning
SHA1RNDS4Perform four rounds of SHA1 operation
SHA1MSG1Perform an intermediate calculation for the next four SHA1 message double words
SHA1MSG2Perform a final calculation for the next four SHA1 message double words
SHA1NEXTECalculate SHA1 state variable E after four founds
SHA256RNDS2Perform two rounds of SHA256 operation
SHA256MSG1Perform an intermediate calculation for the next four SHA256 message double words
SHA256MSG2Perform a final calculation for the next four SHA256 message double words

State Management Instructions

InstructionMeaning
VLDMXCSRLoad MXCSR register
VSTMXCSRSave MXCSR register state

Agent Synchronization Instructions

InstructionMeaning
MONITORSets up an address range used to monitor write-back stores
MWAITEnables a processor to enter into an optimized state while waiting for a write-back store to the address range set up by the MONITOR instruction

Cacheability Control, Prefetch and Ordering Instructions

InstructionMeaning
VLDDQUSpecial 128-bit unaligned load designed to avoid cache line splits
PREFETCHNTALoad 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using NTA hint
PREFETCHT0Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T0 hint
PREFETCHT1Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T1 hint
PREFETCHT2Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T2 hint
PREFETCHWPrefetch data into caches in anticipation of a write
PREFETCHWT1Prefetch data into caches with intent to write and T1 hint
CLFLUSHFlushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy
CLFLUSHOPTFlushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy with optimized memory system throughput
SFENCESerializes store operations
LFENCESerializes load operations
MFENCESerializes load and store operations
VMASKMOVDQUNon-temporal store of selected bytes from an YMM register into memory
VMOVNTPSNon-temporal store of four packed single-precision floating-point values from an YMM register into memory
VMOVNTPDNon-temporal store of two packed double-precision floating-point values from an YMM register into memory
VMOVNTDQNon-temporal store of double quad word from an YMM register into memory
VMOVNTDQAProvides a non-temporal hint that can cause adjacent 16-byte items within an aligned 64-byte region (a streaming line) to be fetched and held in a small set of temporary buffers ("streaming load buffers"). Subsequent streaming loads to other aligned 16-byte items in the same streaming line may be supplied from the streaming load buffer and can improve throughput
MOVNTINon-temporal store of a double word from a general-purpose register into memory
PAUSEImproves the performance of "spin-wait loops"
Copyright 2012-2016 Jack Black. All rights reserved.