Linux Assemblycollection of fast libraries

Single Instruction Multiple Data (SIMD) instructions set

Beginning with the Pentium II and Pentium with Intel MMX technology processor families,‭ many extensions have been introduced into the Intel ‬64‭ ‬and IA-32‭ ‬architectures to perform single-instruction multiple-data (‬SIMD‭) ‬operations.‭ These extensions include the MMX technology,‭ ‬SSE,‭ ‬SSE2,‭ ‬SSE3,‭ SSE4, AVX, AVX2 and AVX512 extensions.‭ Each of these extensions provide a group of instructions that perform SIMD operations on packed integer and/or packed floating-point data elements.

Contents

Tip: For detailed information about each instruction please read: Intel Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z

AVX Initialization Instructions

Instruction📄Meaning
VZEROALLℹ️Zero all YMM registers
VZEROUPPERℹ️Zero upper bits of all YMM registers

Data Transfer Instructions

The data transfer instructions move integer and floating-point operands between SIMD registers and between SIMD registers and memory.

Instruction📄Meaning
Integer Operands
VMOVWℹ️Move word
VMOVDℹ️Move double word
VMOVQℹ️Move quad word
VMOVDQAℹ️Move aligned double quad words
VMOVDQA32ℹ️Move aligned packed double word integer values using writemask
VMOVDQA64ℹ️Move aligned packed quad word integer values using writemask
VMOVDQUℹ️Move unaligned double quad words
VMOVDQU8ℹ️Move unaligned packed byte integer values using writemask
VMOVDQU16ℹ️Move unaligned packed word integer values using writemask
VMOVDQU32ℹ️Move unaligned packed double word integer values using writemask
VMOVDQU64ℹ️Move unaligned packed quad word integer values using writemask
VMOVSLDUPℹ️Loads/moves 128 bits duplicating the first and third 32-bit data elements
VMOVSHDUPℹ️Loads/moves 128 bits duplicating the second and fourth 32-bit data elements
VMOVDDUPℹ️Loads/moves 128 bits duplicating the lower 64-bit data elements
VPMASKMOVDℹ️Conditional SIMD integer packed loads and stores of double word values
VPMASKMOVQℹ️Conditional SIMD integer packed loads and stores of quad word values
VPMOVMSKBℹ️Move byte mask
Single Precision Floating-point Operands
VMOVSSℹ️Move scalar single-precision floating-point value between YMM registers or between an YMM register and memory
VMOVAPSℹ️Move aligned packed single-precision floating-point values between YMM registers or between and YMM register and memory
VMOVUPSℹ️Move unaligned packed single-precision floating-point values between YMM registers or between and YMM register and memory
VMOVLPSℹ️Move two packed single-precision floating-point values to the low quad word of an YMM register and memory
VMOVHPSℹ️Move two packed single-precision floating-point values to the high quad word of an YMM register and memory
VMOVLHPSℹ️Move two packed single-precision floating-point values from the low quad word to the high quad word of another YMM register
VMOVHLPSℹ️Move two packed single-precision floating-point values from the high quad word to the low quad word of another YMM register
VMASKMOVPSℹ️Conditional SIMD packed loads and stores of single-precision floating-point values
VMOVMSKPSℹ️Extract sign mask from four packed single-precision floating-point value
Double Precision Floating-point Operands
VMOVSDℹ️Move scalar double-precision floating-point value between YMM registers or between an YMM register and memory
VMOVAPDℹ️Move aligned packed double-precision floating-point values between YMM registers or between and YMM register and memory
VMOVUPDℹ️Move unaligned packed double-precision floating-point values between YMM registers or between and YMM register and memory
VMOVLPDℹ️Move low packed double-precision floating-point value to the low quad word of an YMM register and memory
VMOVHPDℹ️Move high packed double-precision floating-point value to the high quad word of an YMM register and memory
VMASKMOVPDℹ️Conditional SIMD packed loads and stores of double-precision floating-point values
VMOVMSKPDℹ️Extract sign mask from two packed double-precision floating-point value

Broadcast Instructions

Instruction📄Meaning
Byte Operands
VPBROADCASTBℹ️Broadcast a byte integer value to all elements of a register
VPBROADCASTMB2Qℹ️Broadcast byte size mask to all elements of a register
Word Operands
VPBROADCASTWℹ️Broadcast a word integer value to all elements of a register
VPBROADCASTMW2Dℹ️Broadcast word size mask to all elements of a register
Double Word Operands
VPBROADCASTDℹ️Broadcast a double word integer value to all elements of a register
VBROADCASTI32X2ℹ️Broadcast two double word values to all elements of a register
VBROADCASTI32X4ℹ️Broadcast four double word values to all elements of a register
VBROADCASTI32X8ℹ️Broadcast eight double word values to all elements of a register
Quad Word Operands
VPBROADCASTQℹ️Broadcast a quad word integer value to all elements of a register
VBROADCASTI64X2ℹ️Broadcast two quad word values to all elements of a register
VBROADCASTI64X4ℹ️Broadcast four quad word values to all elements of a register
Single Precision Floating-point Operands
VBROADCASTSSℹ️Broadcast a single-precision floating-point value to all elements of a register
VBROADCASTF32X2ℹ️Broadcast two single-precision floating-point values to all elements of a register
VBROADCASTF32X4ℹ️Broadcast four single-precision floating-point values to all elements of a register
VBROADCASTF32X8ℹ️Broadcast eight single-precision floating-point values to all elements of a register
Double Precision Floating-point Operands
VBROADCASTSDℹ️Broadcast a double-precision floating-point value to all elements of a register
VBROADCASTF64X2ℹ️Broadcast two double-precision floating-point values to all elements of a register
VBROADCASTF64X4ℹ️Broadcast four double-precision floating-point values to all elements of a register
128-bits Integer Operands
VBROADCASTI128ℹ️Broadcast 128-bits of integer data in memory to low and high 128-bits in YMM register
128-bits Floating-point Operands
VBROADCASTF128ℹ️Broadcast 128-bits of floating-point data in memory to low and high 128-bits in YMM register

Expand Instructions

Instruction📄Meaning
Integer Operands
VPEXPANDBℹ️Load sparse packed byte word integer values from dense memory/register
VPEXPANDWℹ️Load sparse packed word integer values from dense memory/register
VPEXPANDDℹ️Load sparse packed double word integer values from dense memory/register
VPEXPANDQℹ️Load sparse packed quad word integer values from dense memory/register
Floating-point Operands
VEXPANDPSℹ️Load sparse packed single-precision floating-point values from dense memory
VEXPANDPDℹ️Load sparse packed double-precision floating-point values from dense memory

Compress Instructions

Instruction📄Meaning
Integer Operands
VPCOMPRESSBℹ️Store sparse packed byte integer values into dense memory/register
VPCOMPRESSWℹ️Store sparse packed word integer values into dense memory/register
VPCOMPRESSDℹ️Store sparse packed double word integer values into dense memory/register
VPCOMPRESSQℹ️Store sparse packed quad word integer values into dense memory/register
Floating-point Operands
VCOMPRESSPSℹ️Store sparse packed single-precision floating-point values into dense memory
VCOMPRESSPDℹ️Store sparse packed double-precision floating-point values into dense memory

Insert Instructions

Instruction📄Meaning
Integer Operands
VPINSRBℹ️Insert a byte value from a register or memory into an YMM register
VPINSRWℹ️Insert a word value from a register or memory into an YMM register
VPINSRDℹ️Insert a double word value from register or memory into an YMM register
VPINSRQℹ️Insert a quad word value from register or memory into an YMM register
VINSERTI128ℹ️Insert 128-bits of packed integer values from the source into the destination operand
VINSERTI32X4ℹ️Insert 128-bits of packed integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI64X2ℹ️Insert 128-bits of packed integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI32X8ℹ️Insert 256-bits of packed integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI64X4ℹ️Insert 256-bits of packed integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
Floating-point Operands
VINSERTPSℹ️Inserts a single-precision floating-point value from either a 32-bit memory location or selected from a specified offset in an YMM register to a specified offset in the destination YMM register. In addition, INSERTPS allows zeroing out selected data elements in the destination, using a mask
VINSERTF128ℹ️Insert 128-bits of packed floating-point values from the source into the destination operand
VINSERTF32X4ℹ️Insert 128-bits of packed floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF64X2ℹ️Insert 128-bits of packed floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF32X8ℹ️Insert 256-bits of packed floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF64X4ℹ️Insert 256-bits of packed floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand

Extract Instructions

Instruction📄Meaning
Integer Operands
VPEXTRBℹ️Extract a byte from an YMM register and insert the value into a general-purpose register or memory
VPEXTRWℹ️Extract a word from an YMM register and insert the value into a general-purpose register or memory
VPEXTRDℹ️Extract a double word from an YMM register and insert the value into a general-purpose register or memory
VPEXTRQℹ️Extract a quad word from an YMM register and insert the value into a general-purpose register or memory
VEXTRACTI128ℹ️Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand
VEXTRACTI32X4ℹ️Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTI64X2ℹ️Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTI32X8ℹ️Extract 256-bits of packed integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
VEXTRACTI64X4ℹ️Extract 256-bits of packed integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
Floating-point Operands
VEXTRACTPSℹ️Extracts a single-precision floating-point value from a specified offset in an YMM register and stores the result to memory or a general-purpose register
VEXTRACTF128ℹ️Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand
VEXTRACTF32X4ℹ️Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTF64X2ℹ️Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTF32X8ℹ️Extract 256-bits of packed floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
VEXTRACTF64X4ℹ️Extract 256-bits of packed floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset

Gather Instructions

Instruction📄Meaning
Double Word Operands
VPGATHERDDℹ️Gather packed double word values using signed double word indices
VPGATHERQDℹ️Gather packed double word values using signed quad word indices
Quad Word Operands
VPGATHERDQℹ️Gather packed quad word values using signed double word indices
VPGATHERQQℹ️Gather packed quad word values using signed quad word indices
Single Precision Floating-point Operands
VGATHERDPSℹ️Gather packed single-precision floating-point values using signed double word indices
VGATHERQPSℹ️Gather packed single-precision floating-point values using signed quad word indices
VGATHERPF0DPSSparse prefetch of packed single-precision floating-point values with signed double word indices using T0 hint
VGATHERPF1DPSSparse prefetch of packed single-precision floating-point values with signed double word indices using T1 hint
VGATHERPF0QPSSparse prefetch of packed single-precision floating-point values with signed quad word indices using T0 hint
VGATHERPF1QPSSparse prefetch of packed single-precision floating-point values with signed quad word indices using T1 hint
Double Precision Floating-point Operands
VGATHERDPDℹ️Gather packed double-precision floating-point values using signed double word indices
VGATHERQPDℹ️Gather packed double-precision floating-point values using signed quad word indices
VGATHERPF0DPDSparse prefetch of packed double-precision floating-point values with signed double word indices using T0 hint
VGATHERPF1DPDSparse prefetch of packed double-precision floating-point values with signed double word indices using T1 hint
VGATHERPF0QPDSparse prefetch of packed double-precision floating-point values with signed quad word indices using T0 hint
VGATHERPF1QPDSparse prefetch of packed double-precision floating-point values with signed quad word indices using T1 hint

Scatter Instructions

Instruction📄Meaning
Double Word Operands
VPSCATTERDDℹ️Using signed double word indices, scatter double word values to memory using writemask
VPSCATTERQDℹ️Using signed quad word indices, scatter double word values to memory using writemask
Quad Word Operands
VPSCATTERDQℹ️Using signed double word indices, scatter quad word values to memory using writemask
VPSCATTERQQℹ️Using signed quad word indices, scatter quad word values to memory using writemask
Single Precision Floating-point Operands
VSCATTERDPSℹ️Using signed double word indices, scatter single-precision floating-point values to memory using writemask
VSCATTERQPSℹ️Using signed quad word indices, scatter single-precision floating-point values to memory using writemask
VSCATTERPF0DPSUsing signed double word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1DPSUsing signed double word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write
VSCATTERPF0QPSUsing signed quad word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1QPSUsing signed quad word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write
Double Precision Floating-point Operands
VSCATTERDPDℹ️Using signed double word indices, scatter double-precision floating-point values to memory using writemask
VSCATTERQPDℹ️Using signed quad word indices, scatter double-precision floating-point values to memory using writemask
VSCATTERPF0DPDUsing signed double word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1QPDUsing signed double word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write
VSCATTERPF0QPDUsing signed quad word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1DPDUsing signed quad word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write

Blending Instructions

Instruction📄Meaning
Byte Operands
VPBLENDVBℹ️Conditionally copies specified byte elements in the source operand to the destination, using an implied mask
VPBLENDMBℹ️Performs blending of byte elements between the first and the second operand (register or memory), using the instruction mask selector
Word Operands
VPBLENDWℹ️Conditionally copies specified word elements in the source operand to the destination, using an immediate byte control
VPBLENDMWℹ️Performs blending of word elements between the first and the second operand (register or memory), using the instruction mask selector
Double Word Operands
VPBLENDDℹ️Conditionally copies specified double word elements in the source operand to the destination, using an immediate byte control
VPBLENDMDℹ️Performs blending of double word elements between the first and the second operand (register or memory), using the instruction mask selector
Quad Word Operands
VPBLENDMQℹ️Performs blending of quad word elements between the first and the second operand (register or memory), using the instruction mask selector
Single Precision Floating-point Operands
VBLENDPSℹ️Conditionally copies specified data elements in the source operand to the destination, using an immediate byte control
VBLENDVPSℹ️Conditionally copies specified data elements in the source operand to the destination, using an implied mask
VBLENDMPSℹ️Performs blending between single-precision elements in the first operand with the elements in the second operand using an opmask register as select control
Double Precision Floating-point Operands
VBLENDPDℹ️Conditionally copies specified data elements in the source operand to the destination, using an immediate byte control
VBLENDVPDℹ️Conditionally copies specified data elements in the source operand to the destination, using an implied mask
VBLENDMPDℹ️Performs blending between double-precision elements in the first operand with the elements in the second operand using an opmask register as select control

Shuffle Instructions

Shuffle instructions shuffle values in packed SIMD operands.

Instruction📄Meaning
Bit Operands
VPSHUFBITQMBℹ️Shuffle bits from quad word elements using byte indexes into mask
Byte Operands
VPSHUFBℹ️Shuffle packed byte values
Word Operands
VPSHUFLWℹ️Shuffle packed low words values
VPSHUFHWℹ️Shuffle packed high words values
Double Word Operands
VPSHUFDℹ️Shuffle packed double words values
VSHUFI32X4ℹ️Shuffle 128-bit packed double word values
Quad Word Operands
VSHUFI64X2ℹ️Shuffle 128-bit packed quad word values
Single Precision Floating-point Operands
VSHUFPSℹ️Shuffles values in packed single-precision floating-point operands
VSHUFF32X4ℹ️Shuffle 128-bit packed single-precision floating-point operands
Double Precision Floating-point Operands
VSHUFPDℹ️Shuffles values in packed double-precision floating-point operands
VSHUFF64X2ℹ️Shuffle 128-bit packed double-precision floating-point operands

Permute Instructions

Instruction📄Meaning
Byte Operands
VPERMBℹ️Permute packed bytes elements
VPERMI2Bℹ️Permute packed bytes elements from two tables using indexes
VPERMT2Bℹ️Full permute of two tables of bytes elements overwriting one source table
Word Operands
VPERMWℹ️Permute packed word elements
VPERMI2Wℹ️Permute packed word elements from two tables using indexes
VPERMT2Wℹ️Full permute of two tables of word elements overwriting one source table
Double Word Operands
VPERMDℹ️Permute packed double word elements
VPERMI2Dℹ️Permute packed double word elements from two tables using indexes
VPERMT2Dℹ️Full permute of two tables of double word elements overwriting one source table
Quad Word Operands
VPERMQℹ️Permute packed quad word elements
VPERMI2Qℹ️Permute packed quad word elements from two tables using indexes
VPERMT2Qℹ️Full permute of two tables of quad word elements overwriting one source table
128-bits Integer Operands
VPERM2I128ℹ️Permute 128-bit integer fields using controls
Single Precision Floating-point Operands
VPERMPSℹ️Permute packed single-precision floating-point elements
VPERMILPSℹ️Permute packed single-precision floating-point elements using controls
VPERMI2PSℹ️Permute packed single-precision elements from two tables using indexes
VPERMT2PSℹ️Full permute of two tables of single-precision floating-point elements overwriting one source table
Double Precision Floating-point Operands
VPERMPDℹ️Permute packed double-precision floating-point elements
VPERMILPDℹ️Permute packed double-precision floating-point elements using controls
VPERMI2PDℹ️Permute packed double-precision elements from two tables using indexes
VPERMT2PDℹ️Full permute of two tables of double-precision floating-point elements overwriting one source table
128-bits Floating-point Operands
VPERM2F128ℹ️Permute 128-bit floating-point fields using controls

Unpack Instructions

Unpack instructions interleave values in packed SIMD operands.

Instruction📄Meaning
Byte Operands
VPUNPCKLBWℹ️Unpack low-order bytes
VPUNPCKHBWℹ️Unpack high-order bytes
Word Operands
VPUNPCKLWDℹ️Unpack low-order words
VPUNPCKHWDℹ️Unpack high-order words
Double Word Operands
VPUNPCKLDQℹ️Unpack low-order double words
VPUNPCKHDQℹ️Unpack high-order double words
Quad Word Operands
VPUNPCKLQDQℹ️Unpack low quad words
VPUNPCKHQDQℹ️Unpack high quad words
Single Precision Floating-point Operands
VUNPCKLPSℹ️Unpacks and interleaves the two low-order values from two single-precision floating-point operands
VUNPCKHPSℹ️Unpacks and interleaves the two high-order values from two single-precision floating-point operands
Double Precision Floating-point Operands
VUNPCKLPDℹ️Unpacks and interleaves the low values from two packed double-precision floating-point operands
VUNPCKHPDℹ️Unpacks and interleaves the high values from two packed double-precision floating-point operands

Pack Instructions

The pack instructions pack bytes, words, and doublewords.

Instruction📄Meaning
Words into Bytes
VPACKSSWBℹ️Pack words into bytes with signed saturation
VPACKUSWBℹ️Pack words into bytes with unsigned saturation
Double Words into Words
VPACKSSDWℹ️Pack double words into words with signed saturation
VPACKUSDWℹ️Pack double words into words with unsigned saturation

Conversion Instructions

These instructions perform conversion operations on operands of different types.

Instruction📄Meaning
Byte to Word
VPMOVSXBWℹ️Sign extend the lower 8-bit integer of each packed word element into packed signed word integers
VPMOVZXBWℹ️Zero extend the lower 8-bit integer of each packed word element into packed signed word integers
Byte to Double Word
VPMOVSXBDℹ️Sign extend the lower 8-bit integer of each packed double word element into packed signed double word integers
VPMOVZXBDℹ️Zero extend the lower 8-bit integer of each packed double word element into packed signed double word integers
Byte to Quad Word
VPMOVSXBQℹ️Sign extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXBQℹ️Zero extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers
Word to Byte
VPMOVWBℹ️Converts packed word integers into packed bytes with truncation
VPMOVSWBℹ️Converts packed signed word integers into packed signed bytes using signed saturation
VPMOVUSWBℹ️Converts packed unsigned word integers into packed unsigned bytes using unsigned saturation
Word to Double Word
VPMOVSXWDℹ️Sign extend the lower 16-bit integer of each packed double word element into packed signed double word integers
VPMOVZXWDℹ️Zero extend the lower 16-bit integer of each packed double word element into packed signed double word integers
Word to Quad Word
VPMOVSXWQℹ️Sign extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXWQℹ️Zero extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers
Double Word to Byte
VPMOVDBℹ️Converts packed double word integers into packed bytes with truncation
VPMOVSDBℹ️Converts packed signed double word integers into packed signed bytes using signed saturation
VPMOVUSDBℹ️Converts packed unsigned double word integers into packed unsigned bytes using unsigned saturation
Double Word to Word
VPMOVDWℹ️Converts packed double word integers into packed words with truncation
VPMOVSDWℹ️Converts packed signed double word integers into packed signed words using signed saturation
VPMOVUSDWℹ️Converts packed unsigned double word integers into packed unsigned words using unsigned saturation
Double Word to Quad Word
VPMOVSXDQℹ️Sign extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXDQℹ️Zero extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers
Quad Word to Byte
VPMOVQBℹ️Converts packed quad word integers into packed bytes with truncation
VPMOVSQBℹ️Converts packed signed quad word integers into packed signed bytes using signed saturation
VPMOVUSQBℹ️Converts packed unsigned quad word integers into packed unsigned bytes using unsigned saturation
Quad Word to Word
VPMOVQWℹ️Converts packed quad word integers into packed words with truncation
VPMOVSQWℹ️Converts packed signed quad word integers into packed signed words using signed saturation
VPMOVUSQWℹ️Converts packed unsigned quad word integers into packed unsigned words using unsigned saturation
Quad Word to Double Word
VPMOVQDℹ️Converts packed quad word integers into packed double words with truncation
VPMOVSQDℹ️Converts packed signed quad word integers into packed signed double words using signed saturation
VPMOVUSQDℹ️Converts packed unsigned quad word integers into packed unsigned double words using unsigned saturation
Double Word to Single Precision Floating-point
VCVTSI2SSℹ️Convert scalar signed double word integer to scalar single-precision floating-point value
VCVTUSI2SSℹ️Convert scalar unsigned double word integer to scalar single-precision floating-point value
VCVTDQ2PSℹ️Convert packed signed double word integers to packed single-precision floating-point values
VCVTUDQ2PSℹ️Convert packed unsigned double word integers to packed single-precision floating-point values
Double Word to Double Precision Floating-point
VCVTSI2SDℹ️Convert scalar signed double word integer to scalar double-precision floating-point value
VCVTUSI2SDℹ️Convert scalar unsigned double word integer to scalar double-precision floating-point value
VCVTDQ2PDℹ️Convert packed signed double word integers to packed double-precision floating-point values
VCVTUDQ2PDℹ️Convert packed unsigned double word integers to packed double-precision floating-point values
Quad Word to Single Precision Floating-point
VCVTSI2SSℹ️Convert scalar signed quad word integer to scalar single-precision floating-point value
VCVTUSI2SSℹ️Convert scalar unsigned quad word integer to scalar single-precision floating-point value
VCVTQQ2PSℹ️Convert packed signed quad word integers to packed single-precision floating-point values
VCVTUQQ2PSℹ️Convert packed unsigned quad word integers to packed single-precision floating-point values
Quad Word to Double Precision Floating-point
VCVTSI2SDℹ️Convert scalar signed quad word integer to scalar double-precision floating-point value
VCVTUSI2SDℹ️Convert scalar unsigned quad word integer to scalar double-precision floating-point value
VCVTQQ2PDℹ️Convert packed signed quad word integers to packed double-precision floating-point values
VCVTUQQ2PDℹ️Convert packed unsigned quad word integers to packed double-precision floating-point values
Half Precision Floating-point to Single Precision Floating-point
VCVTPH2PSℹ️Convert eight/four data element containing 16-bit floating-point data into eight/four single-precision floating-point data
Single Precision Floating-point to Double Word
VCVTSS2SIℹ️Convert scalar single-precision floating-point value to scalar signed double word integer
VCVTSS2USIℹ️Convert scalar single-precision floating-point value to scalar unsigned double word integer
VCVTPS2DQℹ️Convert packed single-precision floating-point values to packed signed double word integers
VCVTPS2UDQℹ️Convert packed single-precision floating-point values to packed unsigned double word integers
VCVTTSS2SIℹ️Convert with truncation scalar single-precision floating-point value to scalar signed double word integer
VCVTTSS2USIℹ️Convert with truncation scalar single-precision floating-point value to scalar unsigned double word integer
VCVTTPS2DQℹ️Convert with truncation packed single-precision floating-point values to packed signed double word integers
VCVTTPS2UDQℹ️Convert with truncation packed single-precision floating-point values to packed unsigned double word integers
Single Precision Floating-point to Quad Word
VCVTSS2SIℹ️Convert scalar single-precision floating-point value to scalar signed quad word integer
VCVTSS2USIℹ️Convert scalar single-precision floating-point value to scalar unsigned quad word integer
VCVTPS2QQℹ️Convert packed single-precision floating-point values to packed signed quad word integers
VCVTPS2UQQℹ️Convert packed single precision floating-point values to packed unsigned quad word integers
VCVTTSS2SIℹ️Convert with truncation scalar single-precision floating-point value to scalar signed quad word integer
VCVTTSS2USIℹ️Convert with truncation scalar single-precision floating-point value to scalar unsigned quad word integer
VCVTTPS2QQℹ️Convert with truncation packed single precision floating-point values to packed signed quad word integers
VCVTTPS2UQQℹ️Convert with truncation packed single precision floating-point values to packed unsigned quad word integers
Single Precision Floating-point to Half Precision Floating-point
VCVTPS2PHℹ️Convert eight/four data element containing single-precision floating-point data into eight/four 16-bit floating-point data
Single Precision Floating-point to Double Precision Floating-point
VCVTSS2SDℹ️Convert scalar single-precision floating-point value to scalar double-precision floating-point value
VCVTPS2PDℹ️Convert packed single-precision floating-point values to packed double-precision floating-point values
Double Precision Floating-point to Double Word
VCVTSD2SIℹ️Convert scalar double-precision floating-point value to scalar signed double word integer
VCVTSD2USIℹ️Convert scalar double-precision floating-point value to scalar unsigned double word integer
VCVTPD2DQℹ️Convert packed double-precision floating-point values to packed signed double word integers
VCVTPD2UDQℹ️Convert packed double-precision floating-point values to packed unsigned double word integers
VCVTTSD2SIℹ️Convert with truncation scalar double-precision floating-point value to scalar signed double word integer
VCVTTSD2USIℹ️Convert with truncation scalar double-precision floating-point value to scalar unsigned double word integer
VCVTTPD2DQℹ️Convert with truncation packed double-precision floating-point values to packed signed double word integers
VCVTTPD2UDQℹ️Convert with truncation packed double-precision floating-point values to packed unsigned double word integers
Double Precision Floating-point to Quad Word
VCVTSD2SIℹ️Convert scalar double-precision floating-point value to scalar signed quad word integer
VCVTSD2USIℹ️Convert scalar double-precision floating-point value to scalar unsigned quad word integer
VCVTPD2QQℹ️Convert packed double-precision floating-point values to packed signed quad word integers
VCVTPD2UQQℹ️Convert packed double-precision floating-point values to packed unsigned quad word integers
VCVTTSD2SIℹ️Convert with truncation scalar double-precision floating-point value to scalar signed quad word integer
VCVTTSD2USIℹ️Convert with truncation scalar double-precision floating-point value to scalar unsigned quad word integer
VCVTTPD2QQℹ️Convert with truncation packed double-precision floating-point values to packed signed quad word integers
VCVTTPD2UQQℹ️Convert with truncation packed double-precision floating-point values to packed unsigned quad word integers
Double Precision Floating-point to Single Precision Floating-point
VCVTSD2SSℹ️Convert scalar double-precision floating-point value to scalar single-precision floating-point value
VCVTPD2PSℹ️Convert packed double-precision floating-point values to packed single-precision floating-point values

Logical Instructions

The logical instructions perform AND, AND NOT, OR, and XOR operations on packed SIMD values.

Instruction📄Meaning
Byte Operands
VPTESTMBℹ️Performs a bitwise logical AND of packed byte integers and set mask
VPTESTNMBℹ️Performs a bitwise logical NOT AND of packed byte integers and set mask
Word Operands
VPTESTMWℹ️Performs a bitwise logical AND of packed word integers and set mask
VPTESTNMWℹ️Performs a bitwise logical NOT AND of packed word integers and set mask
Double Word Operands
VPTESTMDℹ️Performs a bitwise logical AND of packed double word integers and set mask
VPTESTNMDℹ️Performs a bitwise logical NOT AND of packed double word integers and set mask
VPANDDℹ️Bitwise logical AND of packed double word integers
VPANDNDℹ️Bitwise logical AND NOT of packed double word integers
VPORDℹ️Bitwise logical OR of packed double word integers
VPXORDℹ️Bitwise logical exclusive XOR of packed double word integers
VPTERNLOGDℹ️Bitwise ternary logic with double word granularity. The immediate value determines the specific binary function being implemented
Quad Word Operands
VPTESTMQℹ️Performs a bitwise logical AND of packed quad word integers and set mask
VPTESTNMQℹ️Performs a bitwise logical NOT AND of packed quad word integers and set mask
VPANDQℹ️Bitwise logical AND of packed quad word integers
VPANDNQℹ️Bitwise logical AND NOT of packed quad word integers
VPORQℹ️Bitwise logical OR of packed quad word integers
VPXORQℹ️Bitwise logical exclusive XOR of packed quad word integers
VPTERNLOGQℹ️Bitwise ternary logic with quad word granularity. The immediate value determines the specific binary function being implemented
Integer Operands
VPTESTℹ️Performs a logical AND between the destinations with this mask and sets the ZF flag if the result is zero. The CF flag (zero for TEST) is set if the inverted mask AND with the destination is all zero
VPANDℹ️Bitwise logical AND
VPANDNℹ️Bitwise logical AND NOT
VPORℹ️Bitwise logical OR
VPXORℹ️Bitwise logical exclusive OR
Single Precision Floating-point Operands
VTESTPSℹ️Packed bit test of single-precision floating-point elements
VANDPSℹ️Perform bitwise logical AND of packed single-precision floating-point values
VANDNPSℹ️Perform bitwise logical AND NOT of packed single-precision floating-point values
VORPSℹ️Perform bitwise logical OR of packed single-precision floating-point values
VXORPSℹ️Perform bitwise logical XOR of packed single-precision floating-point values
Double Precision Floating-point Operands
VTESTPDℹ️Packed bit test of double-precision floating-point elements
VANDPDℹ️Perform bitwise logical AND of packed double-precision floating-point values
VANDNPDℹ️Perform bitwise logical AND NOT of packed double-precision floating-point values
VORPDℹ️Perform bitwise logical OR of packed double-precision floating-point values
VXORPDℹ️Perform bitwise logical XOR of packed double-precision floating-point values

Shift and Rotate Instructions

The shift and rotate instructions shift and rotate packed bytes, words, or doublewords, or quadwords in 64-bit operands.

Instruction📄Meaning
Word Operands
VPSLLWℹ️Shift packed words left logical
VPSRLWℹ️Shift packed words right logical
VPSRAWℹ️Shift packed words right arithmetic
VPSLLVWℹ️Variable bit shift left logical
VPSRLVWℹ️Variable bit shift right logical
VPSRAVWℹ️Variable bit shift right arithmetic
VPSHLDWℹ️Concatenate and shift packed words left logical
VPSHRDWℹ️Concatenate and shift packed words right logical
VPSHLDVWℹ️Concatenate and variable shift packed words left logical
VPSHRDVWℹ️Concatenate and variable shift packed words right logical
Double Word Operands
VPSLLDℹ️Shift packed double words left logical
VPSRLDℹ️Shift packed double words right logical
VPSRADℹ️Shift packed double words right arithmetic
VPSLLVDℹ️Variable bit shift left logical
VPSRLVDℹ️Variable bit shift right logical
VPSRAVDℹ️Variable bit shift right arithmetic
VPSHLDDℹ️Concatenate and shift packed double words left logical
VPSHRDDℹ️Concatenate and shift packed double words right logical
VPSHLDVDℹ️Concatenate and variable shift packed double words left logical
VPSHRDVDℹ️Concatenate and variable shift packed double words right logical
VPROLDℹ️Rotate double words left using immediate bits count
VPRORDℹ️Rotate double words right using immediate bits count
VPROLVDℹ️Rotate double words left using variable bits count
VPRORVDℹ️Rotate double words right using variable bits count
VALIGNDℹ️Shift right and merge vectors with double word granularity using immediate shift value
Quad Word Operands
VPSLLQℹ️Shift packed quad word left logical
VPSRLQℹ️Shift packed quad word right logical
VPSRAQℹ️Shift packed quad words right arithmetic
VPSLLVQℹ️Variable bit shift left logical
VPSRLVQℹ️Variable bit shift right logical
VPSRAVQℹ️Variable bit shift right arithmetic
VPSHLDQℹ️Concatenate and shift packed quad words left logical
VPSHRDQℹ️Concatenate and shift packed quad words right logical
VPSHLDVQℹ️Concatenate and variable shift packed quad words left logical
VPSHRDVQℹ️Concatenate and variable shift packed quad words right logical
VPROLQℹ️Rotate quad words left using immediate bits count
VPRORQℹ️Rotate quad words right using immediate bits count
VPROLVQℹ️Rotate quad words left using variable bits count
VPRORVQℹ️Rotate quad words right using variable bits count
VALIGNQℹ️Shift right and merge vectors with quad word granularity using immediate shift value
Double Quad Word Operands
VPSLLDQℹ️Shift double quad word left logical
VPSRLDQℹ️Shift double quad word right logical
VPALIGNRℹ️Concatenate destination and source operands, extract byte aligned result shifted to the right by constant value

Comparison Instructions

The compare instructions compare packed and scalar SIMD values and return the results of the comparison either to the destination operand or to the EFLAGS register.

Instruction📄Meaning
Byte Operands
VPCMPEQBℹ️Compare packed bytes for equal
VPCMPGTBℹ️Compare packed signed byte integers for greater than
VPCMPBℹ️Compare packed signed byte values into mask
VPCMPUBℹ️Compare packed unsigned byte values into mask
Word Operands
VPCMPEQWℹ️Compare packed words for equal
VPCMPGTWℹ️Compare packed signed word integers for greater than
VPCMPWℹ️Compare packed signed word values into mask
VPCMPUWℹ️Compare packed unsigned word values into mask
Double Word Operands
VPCMPEQDℹ️Compare packed double words for equal
VPCMPGTDℹ️Compare packed signed double word integers for greater than
VPCMPDℹ️Compare packed signed double word values into mask
VPCMPUDℹ️Compare packed unsigned double word values into mask
VP2INTERSECTDℹ️Compute intersection between double words to a pair of mask registers
Quad Word Operands
VPCMPEQQℹ️Compare packed quad words for equal
VPCMPGTQℹ️Compare packed signed quad word integers for greater than
VPCMPQℹ️Compare packed signed quad word values into mask
VPCMPUQℹ️Compare packed unsigned quad word values into mask
VP2INTERSECTQℹ️Compute intersection between quad words to a pair of mask registers
Single Precision Floating-point Operands
VCMPEQPSℹ️Compare packed single-precision floating-point values and set mask if destination value is equal to source value
VCMPLTPSℹ️Compare packed single-precision floating-point values and set mask if destination value is less than source value
VCMPLEPSℹ️Compare packed single-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTPSℹ️Compare packed single-precision floating-point values and set mask if destination value is greater than source value
VCMPGEPSℹ️Compare packed single-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDPSℹ️Compare packed single-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQPSℹ️Compare packed single-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTPSℹ️Compare packed single-precision floating-point values and set mask if destination value is not less than source value
VCMPNLEPSℹ️Compare packed single-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTPSℹ️Compare packed single-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGEPSℹ️Compare packed single-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDPSℹ️Compare packed single-precision floating-point values and set mask if neither of both source operands is a NaN
VCMPEQSSℹ️Compare scalar single-precision floating-point values and set mask if destination value is equal to source value
VCMPLTSSℹ️Compare scalar single-precision floating-point values and set mask if destination value is less than source value
VCMPLESSℹ️Compare scalar single-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTSSℹ️Compare scalar single-precision floating-point values and set mask if destination value is greater than source value
VCMPGESSℹ️Compare scalar single-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDSSℹ️Compare scalar single-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQSSℹ️Compare scalar single-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTSSℹ️Compare scalar single-precision floating-point values and set mask if destination value is not less than source value
VCMPNLESSℹ️Compare scalar single-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTSSℹ️Compare scalar single-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGESSℹ️Compare scalar single-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDSSℹ️Compare scalar single-precision floating-point values and set mask if neither of both source operands is a NaN
VCOMISSℹ️Perform ordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register
VUCOMISSℹ️Perform unordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register
Double Precision Floating-point Operands
VCMPEQPDℹ️Compare packed double-precision floating-point values and set mask if destination value is equal to source value
VCMPLTPDℹ️Compare packed double-precision floating-point values and set mask if destination value is less than source value
VCMPLEPDℹ️Compare packed double-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTPDℹ️Compare packed double-precision floating-point values and set mask if destination value is greater than source value
VCMPGEPDℹ️Compare packed double-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDPDℹ️Compare packed double-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQPDℹ️Compare packed double-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTPDℹ️Compare packed double-precision floating-point values and set mask if destination value is not less than source value
VCMPNLEPDℹ️Compare packed double-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTPDℹ️Compare packed double-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGEPDℹ️Compare packed double-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDPDℹ️Compare packed double-precision floating-point values and set mask if neither of both source operands is a NaN
VCMPEQSDℹ️Compare scalar double-precision floating-point values and set mask if destination value is equal to source value
VCMPLTSDℹ️Compare scalar double-precision floating-point values and set mask if destination value is less than source value
VCMPLESDℹ️Compare scalar double-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTSDℹ️Compare scalar double-precision floating-point values and set mask if destination value is greater than source value
VCMPGESDℹ️Compare scalar double-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDSDℹ️Compare scalar double-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQSDℹ️Compare scalar double-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTSDℹ️Compare scalar double-precision floating-point values and set mask if destination value is not less than source value
VCMPNLESDℹ️Compare scalar double-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTSDℹ️Compare scalar double-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGESDℹ️Compare scalar double-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDSDℹ️Compare scalar double-precision floating-point values and set mask if neither of both source operands is a NaN
VCOMISDℹ️Perform ordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register
VUCOMISDℹ️Perform unordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register

Packed Arithmetic Instructions

The arithmetic instructions perform addition, subtraction, multiply, and divide on packed and scalar SIMD operands.

Instruction📄Meaning
Byte Operands
VPADDBℹ️Add packed byte integers
VPADDUSBℹ️Add packed unsigned byte integers with unsigned saturation
VPADDSBℹ️Add packed signed byte integers with signed saturation
VPSUBBℹ️Subtract packed byte integers
VPSUBUSBℹ️Subtract packed unsigned byte integers with unsigned saturation
VPSUBSBℹ️Subtract packed signed byte integers with signed saturation
VPDPBUSDℹ️Multiply and add unsigned and signed byte integers
VPDPBUSDSℹ️Multiply and add unsigned and signed byte integers with saturation
VPDPBUUDℹ️Multiply groups of 4 pairs of corresponding unsigned bytes, summing products and adding them to the result
VPDPBUUDSℹ️Multiply groups of 4 pairs of corresponding unsigned bytes, summing products and adding them to the result, with unsigned saturation
VPDPBSSDℹ️Multiply groups of 4 pairs of corresponding signed bytes, summing products and adding them to the result
VPDPBSSDSℹ️Multiply groups of 4 pairs of corresponding signed bytes, summing products and adding them to the result, with signed saturation
VPDPBSUDℹ️Multiply groups of 4 pairs of corresponding unsigned and signed bytes, summing products and adding them to the result
VPDPBSUDSℹ️Multiply groups of 4 pairs of corresponding unsigned and signed bytes, summing products and adding them to the result, with signed saturation
Word Operands
VPADDWℹ️Add packed word integers
VPADDUSWℹ️Add packed unsigned word integers with unsigned saturation
VPADDSWℹ️Add packed signed word integers with signed saturation
VPHADDWℹ️Adds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed 16-bit results to the destination operand
VPHADDSWℹ️Adds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed, saturated 16-bit results to the destination operand
VPSUBWℹ️Subtract packed word integers
VPSUBUSWℹ️Subtract packed unsigned word integers with unsigned saturation
VPSUBSWℹ️Subtract packed signed word integers with signed saturation
VPHSUBWℹ️Performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed 16-bit results are packed and written to the destination operand
VPHSUBSWℹ️Performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed, saturated 16-bit results are packed and written to the destination operand
VPDPWSSDℹ️Multiply and add signed word integers
VPDPWSSDSℹ️Multiply and add signed word integers with saturation
VPDPWUUDℹ️Multiply groups of 2 pairs of corresponding unsigned words, summing products and adding them to the result
VPDPWUUDSℹ️Multiply groups of 2 pairs of corresponding unsigned words, summing products and adding them to the result, with unsigned saturation
VPDPWSUDℹ️Multiply groups of 2 pairs of corresponding unsigned and signed words, summing products and adding them to the result
VPDPWSUDSℹ️Multiply groups of 2 pairs of corresponding unsigned and signed words, summing products and adding them to the result, with signed saturation
VPDPWUSDℹ️Multiply groups of 2 pairs of corresponding signed and unsigned words, summing products and adding them to result
VPDPWUSDSℹ️Multiply groups of 2 pairs of corresponding signed and unsigned words, summing products and adding them to result, with signed saturation
VPMULHUWℹ️Multiply packed unsigned integers and store high result
VPMULLWℹ️Multiply packed signed word integers and store low result
VPMULHWℹ️Multiply packed signed word integers and store high result
VPMULHRSWℹ️Multiplies vertically each signed 16-bit integer from the destination operand with the corresponding signed 16-bit integer of the source operand, producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand
VPMADDUBSWℹ️Multiplies each unsigned byte value with the corresponding signed byte value to produce an intermediate, 16-bit signed integer. Each adjacent pair of 16-bit signed values are added horizontally. The signed, saturated 16-bit results are packed to the destination operand
Double Word Operands
VPADDDℹ️Add packed double word integers
VPHADDDℹ️Adds two adjacent, signed 32-bit integers horizontally from the source and destination operands and packs the signed 32-bit results to the destination operand
VPSUBDℹ️Subtract packed double word integers
VPHSUBDℹ️Performs horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant double word from the least significant double word of each pair in the source and destination operands. The signed 32-bit results are packed and written to the destination operand
VPMULLDℹ️Returns four lower 32-bits of the 64-bit results of signed 32-bit integer multiplies
VPMADDWDℹ️Multiply and add packed word integers
Quad Word Operands
VPADDQℹ️Add packed quad word integers
VPSUBQℹ️Subtract packed quad word integers
VPMULUDQℹ️Multiply packed unsigned double word integers
VPMULDQℹ️Returns two 64-bit signed result of signed 32-bit integer multiplies
VPMULLQℹ️Returns two lower 64-bits of the 128-bit results of signed 64-bit integer multiplies
Single Precision Floating-point Operands
VADDSSℹ️Add scalar single-precision floating-point value
VADDPSℹ️Add packed single-precision floating-point values
VHADDPSℹ️Performs a single-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand
VSUBSSℹ️Subtract scalar single-precision floating-point value
VSUBPSℹ️Subtract packed single-precision floating-point values
VHSUBPSℹ️Performs a single-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from the third element of the second operand
VADDSUBPSℹ️Performs single-precision addition on the second and fourth pairs of 32-bit data elements within the operands; single-precision subtraction on the first and third pairs
VMULSSℹ️Multiply scalar single-precision floating-point value
VMULPSℹ️Multiply packed single-precision floating-point values
VDIVSSℹ️Divide scalar single-precision floating-point value
VDIVPSℹ️Divide packed single-precision floating-point values
Double Precision Floating-point Operands
VADDSDℹ️Add scalar double precision floating-point value
VADDPDℹ️Add packed double-precision floating-point values
VHADDPDℹ️Performs a double-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand
VSUBSDℹ️Subtract scalar double-precision floating-point value
VSUBPDℹ️Subtract scalar double-precision floating-point value
VHSUBPDℹ️Performs a double-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the second operand from the first element of the second operand
VADDSUBPDℹ️Performs double-precision addition on the second pair of quad words, and double-precision subtraction on the first pair
VMULSDℹ️Multiply scalar double-precision floating-point value
VMULPDℹ️Multiply packed double-precision floating-point values
VDIVSDℹ️Divide scalar double-precision floating-point value
VDIVPDℹ️Divide packed double-precision floating-point values

Fused Arithmetic Instructions

Instruction📄Meaning
Single Precision Floating-point Operands
VFMADD132SSℹ️Fused multiply-add of scalar single-precision floating-point values: s1 * s3 + s2
VFMADD213SSℹ️Fused multiply-add of scalar single-precision floating-point values: s2 * s1 + s3
VFMADD231SSℹ️Fused multiply-add of scalar single-precision floating-point values: s2 * s3 + s1
VFMADD132PSℹ️Fused multiply-add of packed single-precision floating-point values: v1 * v3 + v2
VFMADD213PSℹ️Fused multiply-add of packed single-precision floating-point values: v2 * v1 + v3
VFMADD231PSℹ️Fused multiply-add of packed single-precision floating-point values: v2 * v3 + v1
VFNMADD132SSℹ️Fused negative multiply-add of scalar single-precision floating-point values: -s1 * s3 + s2
VFNMADD213SSℹ️Fused negative multiply-add of scalar single-precision floating-point values: -s2 * s1 + s3
VFNMADD231SSℹ️Fused negative multiply-add of scalar single-precision floating-point values: -s2 * s3 + s1
VFNMADD132PSℹ️Fused negative multiply-add of packed single-precision floating-point values: -v1 * v3 + v2
VFNMADD213PSℹ️Fused negative multiply-add of packed single-precision floating-point values: -v2 * v1 + v3
VFNMADD231PSℹ️Fused negative multiply-add of packed single-precision floating-point values: -v2 * v3 + v1
VFMSUB132SSℹ️Fused multiply-subtract of scalar single-precision floating-point values: s1 * s3 - s2
VFMSUB213SSℹ️Fused multiply-subtract of scalar single-precision floating-point values: s2 * s1 - s3
VFMSUB231SSℹ️Fused multiply-subtract of scalar single-precision floating-point values: s2 * s3 - s1
VFMSUB132PSℹ️Fused multiply-subtract of packed single-precision floating-point values: v1 * v3 - v2
VFMSUB213PSℹ️Fused multiply-subtract of packed single-precision floating-point values: v2 * v1 - v3
VFMSUB231PSℹ️Fused multiply-subtract of packed single-precision floating-point values: v2 * v3 - v1
VFNMSUB132SSℹ️Fused negative multiply-subtract of scalar single-precision floating-point values: -s1 * s3 - s2
VFNMSUB213SSℹ️Fused negative multiply-subtract of scalar single-precision floating-point values: -s2 * s1 - s3
VFNMSUB231SSℹ️Fused negative multiply-subtract of scalar single-precision floating-point values: -s2 * s3 - s1
VFNMSUB132PSℹ️Fused negative multiply-subtract of packed single-precision floating-point values: -v1 * v3 - v2
VFNMSUB213PSℹ️Fused negative multiply-subtract of packed single-precision floating-point values: -v2 * v1 - v3
VFNMSUB231PSℹ️Fused negative multiply-subtract of packed single-precision floating-point values: -v2 * v3 - v1
VFMADDSUB132PSℹ️Fused multiply-alternating add/subtract of packed single-precision floating-point values: v1 * v3 ± v2
VFMADDSUB213PSℹ️Fused multiply-alternating add/subtract of packed single-precision floating-point values: v2 * v1 ± v3
VFMADDSUB231PSℹ️Fused multiply-alternating add/subtract of packed single-precision floating-point values: v2 * v3 ± v1
VFMSUBADD132PSℹ️Fused multiply-alternating subtract/add of packed single-precision floating-point values: v1 * v3 ∓ v2
VFMSUBADD213PSℹ️Fused multiply-alternating subtract/add of packed single-precision floating-point values: v2 * v1 ∓ v3
VFMSUBADD231PSℹ️Fused multiply-alternating subtract/add of packed single-precision floating-point values: v2 * v3 ∓ v1
Double Precision Floating-point Operands
VFMADD132SDℹ️Fused multiply-add of scalar double-precision floating-point values: s1 * s3 + s2
VFMADD213SDℹ️Fused multiply-add of scalar double-precision floating-point values: s2 * s1 + s3
VFMADD231SDℹ️Fused multiply-add of scalar double-precision floating-point values: s2 * s3 + s1
VFMADD132PDℹ️Fused multiply-add of packed double-precision floating-point values: v1 * v3 + v2
VFMADD213PDℹ️Fused multiply-add of packed double-precision floating-point values: v2 * v1 + v3
VFMADD231PDℹ️Fused multiply-add of packed double-precision floating-point values: v2 * v3 + v1
VFNMADD132SDℹ️Fused negative multiply-add of scalar double-precision floating-point values: -s1 * s3 + s2
VFNMADD213SDℹ️Fused negative multiply-add of scalar double-precision floating-point values: -s2 * s1 + s3
VFNMADD231SDℹ️Fused negative multiply-add of scalar double-precision floating-point values: -s2 * s3 + s1
VFNMADD132PDℹ️Fused negative multiply-add of packed double-precision floating-point values: -v1 * v3 + v2
VFNMADD213PDℹ️Fused negative multiply-add of packed double-precision floating-point values: -v2 * v1 + v3
VFNMADD231PDℹ️Fused negative multiply-add of packed double-precision floating-point values: -v2 * v3 + v1
VFMSUB132SDℹ️Fused multiply-subtract of scalar double-precision floating-point values: s1 * s3 - s2
VFMSUB213SDℹ️Fused multiply-subtract of scalar double-precision floating-point values: s2 * s1 - s3
VFMSUB231SDℹ️Fused multiply-subtract of scalar double-precision floating-point values: s2 * s3 - s1
VFMSUB132PDℹ️Fused multiply-subtract of packed double-precision floating-point values: v1 * v3 - v2
VFMSUB213PDℹ️Fused multiply-subtract of packed double-precision floating-point values: v2 * v1 - v3
VFMSUB231PDℹ️Fused multiply-subtract of packed double-precision floating-point values: v2 * v3 - v1
VFNMSUB132SDℹ️Fused negative multiply-subtract of scalar double-precision floating-point values: -s1 * s3 - s2
VFNMSUB213SDℹ️Fused negative multiply-subtract of scalar double-precision floating-point values: -s2 * s1 - s3
VFNMSUB231SDℹ️Fused negative multiply-subtract of scalar double-precision floating-point values: -s2 * s3 - s1
VFNMSUB132PDℹ️Fused negative multiply-subtract of packed double-precision floating-point values: -v1 * v3 - v2
VFNMSUB213PDℹ️Fused negative multiply-subtract of packed double-precision floating-point values: -v2 * v1 - v3
VFNMSUB231PDℹ️Fused negative multiply-subtract of packed double-precision floating-point values: -v2 * v3 - v1
VFMADDSUB132PDℹ️Fused multiply-alternating add/subtract of packed double-precision floating-point values: v1 * v3 ± v2
VFMADDSUB213PDℹ️Fused multiply-alternating add/subtract of packed double-precision floating-point values: v2 * v1 ± v3
VFMADDSUB231PDℹ️Fused multiply-alternating add/subtract of packed double-precision floating-point values: v2 * v3 ± v1
VFMSUBADD132PDℹ️Fused multiply-alternating subtract/add of packed double-precision floating-point values: v1 * v3 ∓ v2
VFMSUBADD213PDℹ️Fused multiply-alternating subtract/add of packed double-precision floating-point values: v2 * v1 ∓ v3
VFMSUBADD231PDℹ️Fused multiply-alternating subtract/add of packed double-precision floating-point values: v2 * v3 ∓ v1

Function Primitives

These instructions perform square root, absolute value, rounding and maximum/minimum operations on packed and scalar SIMD operands.

Instruction📄Meaning
Byte Operands
VPOPCNTBℹ️Compute the number of bits set to 1 in each byte
VPABSBℹ️Computes the absolute value of each signed byte data element
VPSIGNBℹ️Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPAVGBℹ️Compute average of packed unsigned byte integers
VPMINUBℹ️Minimum of packed unsigned byte integers
VPMINSBℹ️Minimum of packed signed byte integers
VPMAXUBℹ️Maximum of packed unsigned byte integers
VPMAXSBℹ️Maximum of packed signed byte integers
VPSADBWℹ️Compute sum of absolute differences
VMPSADBWℹ️Performs eight 4-byte wide sum of absolute differences operations to produce eight word integers
VDBPSADBWℹ️Double block packed Sum of Absolute Differences on unsigned bytes
Word Operands
VPOPCNTWℹ️Compute the number of bits set to 1 in each word
VPABSWℹ️Computes the absolute value of each signed word data element
VPSIGNWℹ️Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPAVGWℹ️Compute average of packed unsigned word integers
VPMINUWℹ️Minimum of packed unsigned word integers
VPMINSWℹ️Minimum of packed signed word integers
VPMAXUWℹ️Maximum of packed unsigned word integers
VPMAXSWℹ️Maximum of packed signed word integers
VPHMINPOSUWℹ️Finds the value and location of the minimum unsigned word from one of 8 horizontally packed unsigned words. The resulting value and location (offset within the source) are packed into the low double word of the destination YMM register
Double Word Operands
VPOPCNTDℹ️Compute the number of bits set to 1 in each double word
VPABSDℹ️Computes the absolute value of each signed double word data element
VPSIGNDℹ️Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPMINUDℹ️Minimum of packed unsigned double word integers
VPMINSDℹ️Minimum of packed signed double word integers
VPMAXUDℹ️Maximum of packed unsigned double word integers
VPMAXSDℹ️Maximum of packed signed double word integers
VPLZCNTDℹ️Count the number of leading zero bits in each packed double word element
VPCONFLICTDℹ️Detect conflicts within a vector of packed double word values into dense memory
Quad Word Operands
VPOPCNTQℹ️Compute the number of bits set to 1 in each quad word
VPABSQℹ️Computes the absolute value of each signed quad word data element
VPMINUQℹ️Minimum of packed unsigned quad word integers
VPMINSQℹ️Minimum of packed signed quad word integers
VPMAXUQℹ️Maximum of packed unsigned quad word integers
VPMAXSQℹ️Maximum of packed signed quad word integers
VPLZCNTQℹ️Count the number of leading zero bits in each packed quad word element
VPCONFLICTQℹ️Detect conflicts within a vector of packed quad word values into dense memory
Single Precision Floating-point Operands
VSQRTSSℹ️Compute square root of scalar single-precision floating-point value
VSQRTPSℹ️Compute square roots of packed single-precision floating-point values
VMINSSℹ️Return minimum scalar single-precision floating-point value
VMINPSℹ️Return minimum packed single-precision floating-point values
VMAXSSℹ️Return maximum scalar single-precision floating-point value
VMAXPSℹ️Return maximum packed single-precision floating-point values
VROUNDSSℹ️Round the low packed single precision floating-point value into an integer value and return a rounded floating-point value
VROUNDPSℹ️Round packed single precision floating-point values into integer values and return rounded floating-point values
VRNDSCALESSℹ️Round scalar single-precision floating-point value to include a given number of fraction bits
VRNDSCALEPSℹ️Round packed single-precision floating-point values to include a given number of fraction bits
VDPPSℹ️Perform single-precision dot products for up to 4 elements and broadcast
VRANGESSℹ️Range restriction calculation for pairs of scalar single-precision floating-point values
VRANGEPSℹ️Range restriction calculation for packed pairs of single-precision floating-point values
VREDUCESSℹ️Perform a reduction transformation on a scalar single-precision floating-point value by subtracting a number of fraction bits
VREDUCEPSℹ️Perform reduction transformation on packed single-precision floating-point values by subtracting a number of fraction bits
VGETEXPSSℹ️Convert the biased exponent of scalar single-precision floating-point value to floating-point value representing unbiased integer exponent
VGETEXPPSℹ️Convert the biased exponent of packed single-precision floating-point values to floating-point values representing unbiased integer exponent
VGETMANTSSℹ️Extract the normalized mantissa from scalar single-precision floating-point value
VGETMANTPSℹ️Extract the normalized mantissa from packed single-precision floating-point values
VSCALEFSSℹ️Scale scalar single-precision floating-point value
VSCALEFPSℹ️Scale packed single-precision floating-point values
VEXP2PSApproximation to the exponential 2x of packed single-precision floating-point values with less than 2-23 relative error
VFPCLASSSSℹ️Tests scalar single-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFPCLASSPSℹ️Tests packed single-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFIXUPIMMSSℹ️Fix up special scalar single-precision floating-point value
VFIXUPIMMPSℹ️Fix up special packed single-precision floating-point values
VRCP14SSℹ️Computes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error < 2-28
VRCP14PSℹ️Computes the approximate reciprocals of the packed single-precision floating-point values. The max relative error < 2-28
VRCP28SSComputes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error < 2-28
VRCP28PSComputes the approximate reciprocals of the packed single-precision floating-point values. The max relative error < 2-28
VRSQRT14SSℹ️Computes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error < 2-14
VRSQRT14PSℹ️Computes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error < 2-14
VRSQRT28SSComputes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error < 2-28
VRSQRT28PSComputes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error < 2-28
VRCPPSℹ️Compute reciprocals of packed single-precision floating-point values
VRCPSSℹ️Compute reciprocal of scalar single-precision floating-point value
VRSQRTPSℹ️Compute reciprocals of square roots of packed single-precision floating-point values
VRSQRTSSℹ️Compute reciprocal of square root of scalar single-precision floating-point value
Double Precision Floating-point Operands
VSQRTSDℹ️Compute scalar square root of scalar double-precision floating-point value
VSQRTPDℹ️Compute packed square roots of packed double-precision floating-point values
VMINSDℹ️Return minimum scalar double-precision floating-point value
VMINPDℹ️Return minimum packed double-precision floating-point values
VMAXSDℹ️Return maximum scalar double-precision floating-point value
VMAXPDℹ️Return maximum packed double-precision floating-point values
VROUNDSDℹ️Round the low packed double precision floating-point value into an integer value and return a rounded floating-point value
VROUNDPDℹ️Round packed double precision floating-point values into integer values and return rounded floating-point values
VRNDSCALESDℹ️Round scalar double-precision floating-point value to include a given number of fraction bits
VRNDSCALEPDℹ️Round packed double-precision floating-point values to include a given number of fraction bits
VDPPDℹ️Perform double-precision dot product for up to 2 elements and broadcast
VRANGESDℹ️Range restriction calculation for pairs of scalar double-precision floating-point values
VRANGEPDℹ️Range restriction calculation for packed pairs of double-precision floating-point values
VREDUCESDℹ️Perform a reduction transformation on a scalar double-precision floating-point value by subtracting a number of fraction bits
VREDUCEPDℹ️Perform reduction transformation on packed double-precision floating-point values by subtracting a number of fraction bits
VGETEXPSDℹ️Convert the biased exponent of scalar double-precision floating-point value to floating-point value representing unbiased integer exponent
VGETEXPPDℹ️Convert the biased exponent of packed double-precision floating-point values to floating-point values representing unbiased integer exponent
VGETMANTSDℹ️Extract the normalized mantissa from scalar double-precision floating-point value
VGETMANTPDℹ️Extract the normalized mantissa from packed double-precision floating-point values
VSCALEFSDℹ️Scale scalar double-precision floating-point value
VSCALEFPDℹ️Scale packed double-precision floating-point values
VEXP2PDApproximation to the exponential 2x of packed double-precision floating-point values with less than 2-23 relative error
VFPCLASSSDℹ️Tests scalar double-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFPCLASSPDℹ️Tests packed double-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFIXUPIMMSDℹ️Fix up special scalar double-precision floating-point value
VFIXUPIMMPDℹ️Fix up special packed double-precision floating-point values
VRCP14SDℹ️Computes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error < 2-28
VRCP14PDℹ️Computes the approximate reciprocals of the packed double-precision floating-point values. The max relative error < 2-28
VRCP28SDComputes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error < 2-28
VRCP28PDComputes the approximate reciprocals of the packed double-precision floating-point values. The max relative error < 2-28
VRSQRT14SDℹ️Computes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error < 2-14
VRSQRT14PDℹ️Computes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error < 2-14
VRSQRT28SDComputes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error < 2-28
VRSQRT28PDComputes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error < 2-28

Opmask Instructions

Instruction📄Meaning
8-bit Operands
KMOVBℹ️Move 8-bit from and to mask registers
KTESTBℹ️Set ZF and CF depending on sign bit AND and ANDN of 8-bit masks
KORTESTBℹ️Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTBℹ️Bitwise NOT of 8-bits mask
KANDBℹ️Bitwise logical AND of two 8-bit masks
KANDNBℹ️Bitwise logical AND NOT of two 8-bit masks
KORBℹ️Bitwise logical OR of two 8-bit masks
KXORBℹ️Bitwise logical XOR of two 8-bit masks
KXNORBℹ️Bitwise logical XNOR of two 8-bit masks
KADDBℹ️Add two 8-bit masks
KSHIFTLBℹ️Shift left 8-bit mask register
KSHIFTRBℹ️Shift right 8-bit mask register
KUNPCKBWℹ️Unpack and interleave 8-bit masks
VPMOVM2Bℹ️Convert a mask register to a vector register
VPMOVB2Mℹ️Converts a vector register to a mask register
16-bit Operands
KMOVWℹ️Move 16-bit from and to mask registers
KTESTWℹ️Set ZF and CF depending on sign bit AND and ANDN of 16-bit masks
KORTESTWℹ️Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTWℹ️Bitwise NOT of 16-bits mask
KANDWℹ️Bitwise logical AND of two 16-bit masks
KANDNWℹ️Bitwise logical AND NOT of two 16-bit masks
KORWℹ️Bitwise logical OR of two 16-bit masks
KXORWℹ️Bitwise logical XOR of two 16-bit masks
KXNORWℹ️Bitwise logical XNOR of two 16-bit masks
KADDWℹ️Add two 16-bit masks
KSHIFTLWℹ️Shift left 16-bit mask register
KSHIFTRWℹ️Shift right 16-bit mask register
KUNPCKWDℹ️Unpack and interleave 16-bit masks
VPMOVM2Wℹ️Convert a mask register to a vector register
VPMOVW2Mℹ️Converts a vector register to a mask register
32-bit Operands
KMOVDℹ️Move 32-bit from and to mask registers
KTESTDℹ️Set ZF and CF depending on sign bit AND and ANDN of 32-bit masks
KORTESTDℹ️Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTDℹ️Bitwise NOT of 32-bits mask
KANDDℹ️Bitwise logical AND of two 32-bit masks
KANDNDℹ️Bitwise logical AND NOT of two 32-bit masks
KORDℹ️Bitwise logical OR of two 32-bit masks
KXORDℹ️Bitwise logical XOR of two 32-bit masks
KXNORDℹ️Bitwise logical XNOR of two 32-bit masks
KADDDℹ️Add two 32-bit masks
KSHIFTLDℹ️Shift left 32-bit mask register
KSHIFTRDℹ️Shift right 32-bit mask register
KUNPCKDQℹ️Unpack and interleave 32-bit masks
VPMOVM2Dℹ️Convert a mask register to a vector register
VPMOVD2Mℹ️Converts a vector register to a mask register
64-bit Operands
KMOVQℹ️Move 64-bit from and to mask registers
KTESTQℹ️Set ZF and CF depending on sign bit AND and ANDN of 64-bit masks
KORTESTQℹ️Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTQℹ️Bitwise NOT of 64-bits mask
KANDQℹ️Bitwise logical AND of two 64-bit masks
KANDNQℹ️Bitwise logical AND NOT of two 64-bit masks
KORQℹ️Bitwise logical OR of two 64-bit masks
KXORQℹ️Bitwise logical XOR of two 64-bit masks
KXNORQℹ️Bitwise logical XNOR of two 64-bit masks
KADDQℹ️Add two 64-bit masks
KSHIFTLQℹ️Shift left 64-bit mask register
KSHIFTRQℹ️Shift right 64-bit mask register
VPMOVM2Qℹ️Convert a mask register to a vector register
VPMOVQ2Mℹ️Converts a vector register to a mask register

String and Text Processing Instructions

Instruction📄Meaning
VPCMPESTRIℹ️Packed compare explicit-length strings, return index in ECX/RCX
VPCMPESTRMℹ️Packed compare explicit-length strings, return mask in YMM0
VPCMPISTRIℹ️Packed compare implicit-length strings, return index in ECX/RCX
VPCMPISTRMℹ️Packed compare implicit-length strings, return mask in YMM0

Secure Hash Algorithm Instructions

SHA extensions provide a set of instructions that target the acceleration of the Secure Hash Algorithm (SHA), specifically the SHA-1 and SHA-256 variants.

Instruction📄Meaning
SHA-1
SHA1NEXTEℹ️Calculate SHA1 state variable E after four founds
SHA1RNDS4ℹ️Perform four rounds of SHA1 operation
SHA1MSG1ℹ️Perform an intermediate calculation for the next four SHA1 message double words
SHA1MSG2ℹ️Perform a final calculation for the next four SHA1 message double words
SHA-256
SHA256RNDS2ℹ️Perform two rounds of SHA256 operation
SHA256MSG1ℹ️Perform an intermediate calculation for the next four SHA256 message double words
SHA256MSG2ℹ️Perform a final calculation for the next four SHA256 message double words
SHA-512
VSHA512RNDS2ℹ️Perform two rounds of SHA-512 operation
VSHA512MSG1ℹ️Perform an intermediate calculation for the next four SHA-512 message quad words
VSHA512MSG2ℹ️Perform a final calculation for the next four SHA-512 message quad words
SM3
VSM3RNDS2ℹ️Perform two rounds of SM3 operation
VSM3MSG1ℹ️Perform initial calculation for the next four SM3 message words
VSM3MSG2ℹ️Perform a final calculation for the next four SM3 message words
SM4
VSM4RNDS4ℹ️Performs four rounds of SM4 encryption
VSM4KEY4ℹ️Perform four rounds of SM4 key expansion

Advanced Encryption Standard (AES) instructions

AES instructions operate on XMM registers to provide accelerated primitives for block encryption/decryption using Advanced Encryption Standard (AES)‬.

Instruction📄Meaning
AESKEYGENASSISTℹ️Assist the creation of round keys with a key expansion schedule
AESIMCℹ️Perform an inverse mix column transformation primitive
Encryption
AESENCℹ️Perform an AES encryption round using an 128-bit state and a round key
AESENCLASTℹ️Perform the last AES encryption round using an 128-bit state and a round key
AESDEC128KLℹ️Perform 10 rounds of AES decryption flow with key locker using 128-bit key
AESDEC256KLℹ️Perform 14 rounds of AES decryption flow with key locker using 256-bit key
AESDECWIDE128KLℹ️Perform 10 rounds of AES decryption flow with key locker on 8 blocks using 128-bit key
AESDECWIDE256KLℹ️Perform 14 rounds of AES decryption flow with key locker on 8 blocks using 256-bit key
Decryption
AESDECℹ️Perform an AES decryption round using an 128-bit state and a round key
AESDECLASTℹ️Perform the last AES decryption round using an 128-bit state and a round key
AESENC128KLℹ️Perform 10 rounds of AES encryption flow with key locker using 128-bit key
AESENC256KLℹ️Perform 14 rounds of AES encryption flow with key locker using 256-bit key
AESENCWIDE128KLℹ️Perform 10 rounds of AES encryption flow with key locker on 8 blocks using 128-bit key
AESENCWIDE256KLℹ️Perform 14 rounds of AES encryption flow with key locker on 8 blocks using 256-bit key
Galois Field
VPCLMULQDQℹ️Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2k)
GF2P8MULBℹ️Galois Field multiply bytes
GF2P8AFFINEQBℹ️Galois Field affine transformation
GF2P8AFFINEINVQBℹ️Galois Field affine transformation inverse

Key Locker Instructions

These instructions are designed to enable encryption/decryption with an AES key without having access to any unencrypted copies of the key during the actual encryption/decryption process.

Instruction📄Meaning
LOADIWKEYℹ️Load internal wrapping key with key locker
ENCODEKEY128ℹ️Encode 128-bit key with key locker
ENCODEKEY256ℹ️Encode 256-bit key with key locker

State Management Instructions

MXCSR state management instructions allow saving and restoring the state of the MXCSR control and status register.

Instruction📄Meaning
VLDMXCSRℹ️Load MXCSR register
VSTMXCSRℹ️Save MXCSR register state

Agent Synchronization Instructions

Instruction📄Meaning
MONITORℹ️Sets up an address range used to monitor write-back stores
MWAITℹ️Enables a processor to enter into an optimized state while waiting for a write-back store to the address range set up by the MONITOR instruction

Cacheability Control, Prefetch and Ordering Instructions

Cacheability control instructions provide additional operations for caching of non-temporal data when storing data from SIMD registers to memory. They provide additional control of instruction ordering on store operations.

Instruction📄Meaning
Read Prefetch
PREFETCHT0ℹ️Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T0 hint
PREFETCHT1ℹ️Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T1 hint
PREFETCHT2ℹ️Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T2 hint
PREFETCHNTAℹ️Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using NTA hint
Write Prefetch
PREFETCHWℹ️Prefetch data into caches in anticipation of a write
PREFETCHWT1Prefetch data into caches with intent to write and T1 hint
Cache Line Maintenance
CLFLUSHℹ️Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy
CLFLUSHOPTℹ️Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy with optimized memory system throughput
CLWBℹ️Cache line write back
CLDEMOTEℹ️Cache line demote
Non-Temporal Stores
MOVNTIℹ️Non-temporal store of a double word from a general-purpose register into memory
VMOVNTPSℹ️Non-temporal store of four packed single-precision floating-point values from an YMM register into memory
VMOVNTPDℹ️Non-temporal store of two packed double-precision floating-point values from an YMM register into memory
VMOVNTDQℹ️Non-temporal store of double quad word from an YMM register into memory
VMASKMOVDQUℹ️Non-temporal store of selected bytes from an YMM register into memory
Direct Loads/Stores
MOVDIRIℹ️Move double word as direct store
MOVDIR64Bℹ️Move 64 bytes as direct store
VMOVNTDQAℹ️Provides a non-temporal hint that can cause adjacent 16-byte items within an aligned 64-byte region (a streaming line) to be fetched and held in a small set of temporary buffers ("streaming load buffers"). Subsequent streaming loads to other aligned 16-byte items in the same streaming line may be supplied from the streaming load buffer and can improve throughput
VLDDQUℹ️Special 128-bit unaligned load designed to avoid cache line splits
Memory Barriers (Fences)
LFENCEℹ️Serializes load operations
SFENCEℹ️Serializes store operations
MFENCEℹ️Serializes load and store operations
Instruction Serialization
SERIALIZEℹ️Serialize instruction execution
Spin-Wait Optimization
PAUSEℹ️Improves the performance of "spin-wait loops"
TPAUSEℹ️Instructs the processor to enter an implementation-dependent optimized state
Copyright 2012-2026 Eugene Zamlinsky. All rights reserved.