# Single Instruction Multiple Data (SIMD) instructions set

Beginning with the Pentium II and Pentium with Intel MMX technology processor families, many extensions have been introduced into the Intel 64 and IA-32 architectures to perform single-instruction multiple-data (SIMD) operations. These extensions include the MMX technology, SSE, SSE2, SSE3, SSE4, AVX, AVX2 and AVX512 extensions. Each of these extensions provide a group of instructions that perform SIMD operations on packed integer and/or packed floating-point data elements.

## Contents

- AVX Initialization Instructions
- Data Transfer Instructions
- Integer Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Broadcast Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Expand Instructions
- Compress Instructions
- Insert Instructions
- Integer Operands
- Floating-point Operands
- Extract Instructions
- Integer Operands
- Floating-point Operands
- Gather Instructions
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Scatter Instructions
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Blending Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Shuffle Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Permute Instructions
- Word Operands
- Double Word Operands
- Quad Word Operands
- 128-bits Integer Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- 128-bits Floating-point Operands
- Unpack Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Pack Instructions
- Words into Bytes
- Double Words into Words
- Conversion Instructions
- Byte to Word
- Byte to Double Word
- Byte to Quad Word
- Word to Byte
- Word to Double Word
- Word to Quad Word
- Double Word to Byte
- Double Word to Word
- Double Word to Quad Word
- Quad Word to Byte
- Quad Word to Word
- Quad Word to Double Word
- Double Word to Single Precision Floating-point
- Double Word to Double Precision Floating-point
- Quad Word to Single Precision Floating-point
- Quad Word to Double Precision Floating-point
- Half Precision Floating-point to Single Precision Floating-point
- Single Precision Floating-point to Double Word
- Single Precision Floating-point to Quad Word
- Single Precision Floating-point to Half Precision Floating-point
- Single Precision Floating-point to Double Precision Floating-point
- Double Precision Floating-point to Double Word
- Double Precision Floating-point to Quad Word
- Double Precision Floating-point to Single Precision Floating-point
- Logical Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Integer Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Shift and Rotate Instructions
- Word Operands
- Double Word Operands
- Quad Word Operands
- Double Quad Word Operands
- Comparison Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Packed Arithmetic Instructions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Fused Arithmetic Instructions
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Primitives of Functions
- Byte Operands
- Word Operands
- Double Word Operands
- Quad Word Operands
- Single Precision Floating-point Operands
- Double Precision Floating-point Operands
- Opmask Instructions
- 8-bit Operands
- 16-bit Operands
- 32-bit Operands
- 64-bit Operands
- String and Text Processing Instructions
- Secure Hash Algorithm Instructions
- State Management Instructions
- Agent Synchronization Instructions
- Cacheability Control, Prefetch and Ordering Instructions

**Tip:** For detailed information about each instruction please read: Intel Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z

# AVX Initialization Instructions

Instruction | Meaning |
---|---|

VZEROALL | Zero all YMM registers |

VZEROUPPER | Zero upper bits of all YMM registers |

# Data Transfer Instructions

Instruction | Meaning |
---|---|

Integer Operands | |

VMOVD | Move double word |

VMOVQ | Move quad word |

VMOVDQA | Move aligned double quad words |

VMOVDQA32 | Move aligned packed double word integer values using writemask |

VMOVDQA64 | Move aligned packed quad word integer values using writemask |

VMOVDQU | Move unaligned double quad words |

VMOVDQU8 | Move unaligned packed byte integer values using writemask |

VMOVDQU16 | Move unaligned packed word integer values using writemask |

VMOVDQU32 | Move unaligned packed double word integer values using writemask |

VMOVDQU64 | Move unaligned packed quad word integer values using writemask |

VMOVSLDUP | Loads/moves 128 bits duplicating the first and third 32-bit data elements |

VMOVSHDUP | Loads/moves 128 bits duplicating the second and fourth 32-bit data elements |

VMOVDDUP | Loads/moves 128 bits duplicating the lower 64-bit data elements |

VPMASKMOVD | Conditional SIMD integer packed loads and stores of double word values |

VPMASKMOVQ | Conditional SIMD integer packed loads and stores of quad word values |

VPMOVMSKB | Move byte mask |

VPALIGNR | Concatenate destination and source operands, extract byte aligned result shifted to the right by constant value |

VALIGND | Shift right and merge vectors with double word granularity using immediate shift value |

VALIGNQ | Shift right and merge vectors with quad word granularity using immediate shift value |

Single Precision Floating-point Operands | |

VMOVSS | Move scalar single-precision floating-point value between YMM registers or between an YMM register and memory |

VMOVAPS | Move four aligned packed single-precision floating-point values between YMM registers or between and YMM register and memory |

VMOVUPS | Move four unaligned packed single-precision floating-point values between YMM registers or between and YMM register and memory |

VMOVLPS | Move two packed single-precision floating-point values to the low quad word of an YMM register and memory |

VMOVHPS | Move two packed single-precision floating-point values to the high quad word of an YMM register and memory |

VMOVLHPS | Move two packed single-precision floating-point values from the low quad word to the high quad word of another YMM register |

VMOVHLPS | Move two packed single-precision floating-point values from the high quad word to the low quad word of another YMM register |

VMASKMOVPS | Conditional SIMD packed loads and stores of single-precision floating-point values |

VMOVMSKPS | Extract sign mask from four packed single-precision floating-point value |

Double Precision Floating-point Operands | |

VMOVSD | Move scalar double-precision floating-point value between YMM registers or between an YMM register and memory |

VMOVAPD | Move two aligned packed double-precision floating-point values between YMM registers or between and YMM register and memory |

VMOVUPD | Move two unaligned packed double-precision floating-point values between YMM registers or between and YMM register and memory |

VMOVLPD | Move low packed double-precision floating-point value to the low quad word of an YMM register and memory |

VMOVHPD | Move high packed double-precision floating-point value to the high quad word of an YMM register and memory |

VMASKMOVPD | Conditional SIMD packed loads and stores of double-precision floating-point values |

VMOVMSKPD | Extract sign mask from two packed double-precision floating-point value |

# Broadcast Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPBROADCASTB | Broadcast a byte integer value to all elements of a register |

VPBROADCASTMB2Q | Broadcast byte size mask to all elements of a register |

Word Operands | |

VPBROADCASTW | Broadcast a word integer value to all elements of a register |

VPBROADCASTMW2D | Broadcast word size mask to all elements of a register |

Double Word Operands | |

VPBROADCASTD | Broadcast a double word integer value to all elements of a register |

VBROADCASTI32X2 | Broadcast two double word values to all elements of a register |

VBROADCASTI32X4 | Broadcast four double word values to all elements of a register |

VBROADCASTI32X8 | Broadcast eight double word values to all elements of a register |

Quad Word Operands | |

VPBROADCASTQ | Broadcast a quad word integer value to all elements of a register |

VBROADCASTI64X2 | Broadcast two quad word values to all elements of a register |

VBROADCASTI64X4 | Broadcast four quad word values to all elements of a register |

Single Precision Floating-point Operands | |

VBROADCASTSS | Broadcast a single-precision floating-point value to all elements of a register |

VBROADCASTF32X2 | Broadcast two single-precision floating-point values to all elements of a register |

VBROADCASTF32X4 | Broadcast four single-precision floating-point values to all elements of a register |

VBROADCASTF32X8 | Broadcast eight single-precision floating-point values to all elements of a register |

Double Precision Floating-point Operands | |

VBROADCASTSD | Broadcast a double-precision floating-point value to all elements of a register |

VBROADCASTF64X2 | Broadcast two double-precision floating-point values to all elements of a register |

VBROADCASTF64X4 | Broadcast four double-precision floating-point values to all elements of a register |

# Expand Instructions

Instruction | Meaning |
---|---|

VPEXPANDD | Load sparse packed double word integer values from dense memory |

VPEXPANDQ | Load sparse packed quad word integer values from dense memory |

VEXPANDPS | Load sparse packed single-precision floating-point values from dense memory |

VEXPANDPD | Load sparse packed double-precision floating-point values from dense memory |

# Compress Instructions

Instruction | Meaning |
---|---|

VPCOMPRESSD | Store sparse packed double word integer values into dense memory |

VPCOMPRESSQ | Store sparse packed quad word integer values into dense memory |

VCOMPRESSPS | Store sparse packed single-precision floating-point values into dense memory |

VCOMPRESSPD | Store sparse packed double-precision floating-point values into dense memory |

# Insert Instructions

Instruction | Meaning |
---|---|

Integer Operands | |

VPINSRB | Insert a byte value from a register or memory into an YMM register |

VPINSRW | Insert a word value from a register or memory into an YMM register |

VPINSRD | Insert a double word value from register or memory into an YMM register |

VPINSRQ | Insert a quad word value from register or memory into an YMM register |

VINSERTI128 | Insert 128-bits of packed integer values from the source into the destination operand |

VINSERTI32X4 | Insert 128-bits of packed integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

VINSERTI64X2 | Insert 128-bits of packed integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

VINSERTI32X8 | Insert 256-bits of packed integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

VINSERTI64X4 | Insert 256-bits of packed integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

Floating-point Operands | |

VINSERTPS | Inserts a single-precision floating-point value from either a 32-bit memory location or selected from a specified offset in an YMM register to a specified offset in the destination YMM register. In addition, INSERTPS allows zeroing out selected data elements in the destination, using a mask |

VINSERTF128 | Insert 128-bits of packed floating-point values from the source into the destination operand |

VINSERTF32X4 | Insert 128-bits of packed floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

VINSERTF64X2 | Insert 128-bits of packed floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

VINSERTF32X8 | Insert 256-bits of packed floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

VINSERTF64X4 | Insert 256-bits of packed floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand |

# Extract Instructions

Instruction | Meaning |
---|---|

Integer Operands | |

VPEXTRB | Extract a byte from an YMM register and insert the value into a general-purpose register or memory |

VPEXTRW | Extract a word from an YMM register and insert the value into a general-purpose register or memory |

VPEXTRD | Extract a double word from an YMM register and insert the value into a general-purpose register or memory |

VPEXTRQ | Extract a quad word from an YMM register and insert the value into a general-purpose register or memory |

VEXTRACTI128 | Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand |

VEXTRACTI32X4 | Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset |

VEXTRACTI64X2 | Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset |

VEXTRACTI32X8 | Extract 256-bits of packed integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset |

VEXTRACTI64X4 | Extract 256-bits of packed integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset |

Floating-point Operands | |

VEXTRACTPS | Extracts a single-precision floating-point value from a specified offset in an YMM register and stores the result to memory or a general-purpose register |

VEXTRACTF128 | Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand |

VEXTRACTF32X4 | Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset |

VEXTRACTF64X2 | Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset |

VEXTRACTF32X8 | Extract 256-bits of packed floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset |

VEXTRACTF64X4 | Extract 256-bits of packed floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset |

# Gather Instructions

Instruction | Meaning |
---|---|

Double Word Operands | |

VPGATHERDD | Gather packed double word values using signed double word indices |

VPGATHERQD | Gather packed double word values using signed quad word indices |

Quad Word Operands | |

VPGATHERDQ | Gather packed quad word values using signed double word indices |

VPGATHERQQ | Gather packed quad word values using signed quad word indices |

Single Precision Floating-point Operands | |

VGATHERDPS | Gather packed single-precision floating-point values using signed double word indices |

VGATHERQPS | Gather packed single-precision floating-point values using signed quad word indices |

VGATHERPF0DPS | Sparse prefetch of packed single-precision floating-point values with signed double word indices using T0 hint |

VGATHERPF1DPS | Sparse prefetch of packed single-precision floating-point values with signed double word indices using T1 hint |

VGATHERPF0QPS | Sparse prefetch of packed single-precision floating-point values with signed quad word indices using T0 hint |

VGATHERPF1QPS | Sparse prefetch of packed single-precision floating-point values with signed quad word indices using T1 hint |

Double Precision Floating-point Operands | |

VGATHERDPD | Gather packed double-precision floating-point values using signed double word indices |

VGATHERQPD | Gather packed double-precision floating-point values using signed quad word indices |

VGATHERPF0DPD | Sparse prefetch of packed double-precision floating-point values with signed double word indices using T0 hint |

VGATHERPF1DPD | Sparse prefetch of packed double-precision floating-point values with signed double word indices using T1 hint |

VGATHERPF0QPD | Sparse prefetch of packed double-precision floating-point values with signed quad word indices using T0 hint |

VGATHERPF1QPD | Sparse prefetch of packed double-precision floating-point values with signed quad word indices using T1 hint |

# Scatter Instructions

Instruction | Meaning |
---|---|

Double Word Operands | |

VPSCATTERDD | Using signed double word indices, scatter double word values to memory using writemask |

VPSCATTERQD | Using signed quad word indices, scatter double word values to memory using writemask |

Quad Word Operands | |

VPSCATTERDQ | Using signed double word indices, scatter quad word values to memory using writemask |

VPSCATTERQQ | Using signed quad word indices, scatter quad word values to memory using writemask |

Single Precision Floating-point Operands | |

VSCATTERDPS | Using signed double word indices, scatter single-precision floating-point values to memory using writemask |

VSCATTERQPS | Using signed quad word indices, scatter single-precision floating-point values to memory using writemask |

VSCATTERPF0DPS | Using signed double word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write |

VSCATTERPF1DPS | Using signed double word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write |

VSCATTERPF0QPS | Using signed quad word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write |

VSCATTERPF1QPS | Using signed quad word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write |

Double Precision Floating-point Operands | |

VSCATTERDPD | Using signed double word indices, scatter double-precision floating-point values to memory using writemask |

VSCATTERQPD | Using signed quad word indices, scatter double-precision floating-point values to memory using writemask |

VSCATTERPF0DPD | Using signed double word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write |

VSCATTERPF1QPD | Using signed double word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write |

VSCATTERPF0QPD | Using signed quad word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write |

VSCATTERPF1DPD | Using signed quad word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write |

# Blending Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPBLENDVB | Conditionally copies specified byte elements in the source operand to the destination, using an implied mask |

VPBLENDMB | Performs blending of byte elements between the first and the second operand (register or memory), using the instruction mask selector |

Word Operands | |

VPBLENDW | Conditionally copies specified word elements in the source operand to the destination, using an immediate byte control |

VPBLENDMW | Performs blending of word elements between the first and the second operand (register or memory), using the instruction mask selector |

Double Word Operands | |

VPBLENDD | Conditionally copies specified double word elements in the source operand to the destination, using an immediate byte control |

VPBLENDMD | Performs blending of double word elements between the first and the second operand (register or memory), using the instruction mask selector |

Quad Word Operands | |

VPBLENDMQ | Performs blending of quad word elements between the first and the second operand (register or memory), using the instruction mask selector |

Single Precision Floating-point Operands | |

VBLENDPS | Conditionally copies specified data elements in the source operand to the destination, using an immediate byte control |

VBLENDVPS | Conditionally copies specified data elements in the source operand to the destination, using an implied mask |

VBLENDMPS | Performs blending between single-precision elements in the first operand with the elements in the second operand using an opmask register as select control |

Double Precision Floating-point Operands | |

VBLENDPD | Conditionally copies specified data elements in the source operand to the destination, using an immediate byte control |

VBLENDVPD | Conditionally copies specified data elements in the source operand to the destination, using an implied mask |

VBLENDMPD | Performs blending between double-precision elements in the first operand with the elements in the second operand using an opmask register as select control |

# Shuffle Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPSHUFB | Shuffle packed byte values |

Word Operands | |

VPSHUFLW | Shuffle packed low words values |

VPSHUFHW | Shuffle packed high words values |

Double Word Operands | |

VPSHUFD | Shuffle packed double words values |

VSHUFI32X4 | Shuffle 128-bit packed double word values |

Quad Word Operands | |

VSHUFI64X2 | Shuffle 128-bit packed quad word values |

Single Precision Floating-point Operands | |

VSHUFPS | Shuffles values in packed single-precision floating-point operands |

VSHUFF32X4 | Shuffle 128-bit packed single-precision floating-point operands |

Double Precision Floating-point Operands | |

VSHUFPD | Shuffles values in packed double-precision floating-point operands |

VSHUFF64X2 | Shuffle 128-bit packed double-precision floating-point operands |

# Permute Instructions

Instruction | Meaning |
---|---|

Word Operands | |

VPERMW | Permute packed word elements |

VPERMI2W | Permute packed word elements from two tables using indexes |

Double Word Operands | |

VPERMD | Permute packed double word elements |

VPERMI2D | Permute packed double word elements from two tables using indexes |

Quad Word Operands | |

VPERMQ | Permute packed quad word elements |

VPERMI2Q | Permute packed quad word elements from two tables using indexes |

128-bits Integer Operands | |

VPERM2I128 | Permute 128-bit integer fields using controls |

Single Precision Floating-point Operands | |

VPERMPS | Permute packed single-precision floating-point elements |

VPERMILPS | Permute packed single-precision floating-point elements using controls |

VPERMI2PS | Permute packed single-precision elements from two tables using indexes |

Double Precision Floating-point Operands | |

VPERMPD | Permute packed double-precision floating-point elements |

VPERMILPD | Permute packed double-precision floating-point elements using controls |

VPERMI2PD | Permute packed double-precision elements from two tables using indexes |

128-bits Floating-point Operands | |

VPERM2F128 | Permute 128-bit floating-point fields using controls |

# Unpack Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPUNPCKLBW | Unpack low-order bytes |

VPUNPCKHBW | Unpack high-order bytes |

Word Operands | |

VPUNPCKLWD | Unpack low-order words |

VPUNPCKHWD | Unpack high-order words |

Double Word Operands | |

VPUNPCKLDQ | Unpack low-order double words |

VPUNPCKHDQ | Unpack high-order double words |

Quad Word Operands | |

VPUNPCKLQDQ | Unpack low quad words |

VPUNPCKHQDQ | Unpack high quad words |

Single Precision Floating-point Operands | |

VUNPCKLPS | Unpacks and interleaves the two low-order values from two single-precision floating-point operands |

VUNPCKHPS | Unpacks and interleaves the two high-order values from two single-precision floating-point operands |

Double Precision Floating-point Operands | |

VUNPCKLPD | Unpacks and interleaves the low values from two packed double-precision floating-point operands |

VUNPCKHPD | Unpacks and interleaves the high values from two packed double-precision floating-point operands |

# Pack Instructions

Instruction | Meaning |
---|---|

Words into Bytes | |

VPACKSSWB | Pack words into bytes with signed saturation |

VPACKUSWB | Pack words into bytes with unsigned saturation |

Double Words into Words | |

VPACKSSDW | Pack double words into words with signed saturation |

VPACKUSDW | Pack double words into words with unsigned saturation |

# Conversion Instructions

Instruction | Meaning |
---|---|

Byte to Word | |

VPMOVSXBW | Sign extend the lower 8-bit integer of each packed word element into packed signed word integers |

VPMOVZXBW | Zero extend the lower 8-bit integer of each packed word element into packed signed word integers |

Byte to Double Word | |

VPMOVSXBD | Sign extend the lower 8-bit integer of each packed double word element into packed signed double word integers |

VPMOVZXBD | Zero extend the lower 8-bit integer of each packed double word element into packed signed double word integers |

Byte to Quad Word | |

VPMOVSXBQ | Sign extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers |

VPMOVZXBQ | Zero extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers |

Word to Byte | |

VPMOVWB | Converts packed word integers into packed bytes with truncation |

VPMOVSWB | Converts packed signed word integers into packed signed bytes using signed saturation |

VPMOVUSWB | Converts packed unsigned word integers into packed unsigned bytes using unsigned saturation |

Word to Double Word | |

VPMOVSXWD | Sign extend the lower 16-bit integer of each packed double word element into packed signed double word integers |

VPMOVZXWD | Zero extend the lower 16-bit integer of each packed double word element into packed signed double word integers |

Word to Quad Word | |

VPMOVSXWQ | Sign extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers |

VPMOVZXWQ | Zero extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers |

Double Word to Byte | |

VPMOVDB | Converts packed double word integers into packed bytes with truncation |

VPMOVSDB | Converts packed signed double word integers into packed signed bytes using signed saturation |

VPMOVUSDB | Converts packed unsigned double word integers into packed unsigned bytes using unsigned saturation |

Double Word to Word | |

VPMOVDW | Converts packed double word integers into packed words with truncation |

VPMOVSDW | Converts packed signed double word integers into packed signed words using signed saturation |

VPMOVUSDW | Converts packed unsigned double word integers into packed unsigned words using unsigned saturation |

Double Word to Quad Word | |

VPMOVSXDQ | Sign extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers |

VPMOVZXDQ | Zero extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers |

Quad Word to Byte | |

VPMOVQB | Converts packed quad word integers into packed bytes with truncation |

VPMOVSQB | Converts packed signed quad word integers into packed signed bytes using signed saturation |

VPMOVUSQB | Converts packed unsigned quad word integers into packed unsigned bytes using unsigned saturation |

Quad Word to Word | |

VPMOVQW | Converts packed quad word integers into packed words with truncation |

VPMOVSQW | Converts packed signed quad word integers into packed signed words using signed saturation |

VPMOVUSQW | Converts packed unsigned quad word integers into packed unsigned words using unsigned saturation |

Quad Word to Double Word | |

VPMOVQD | Converts packed quad word integers into packed double words with truncation |

VPMOVSQD | Converts packed signed quad word integers into packed signed double words using signed saturation |

VPMOVUSQD | Converts packed unsigned quad word integers into packed unsigned double words using unsigned saturation |

Double Word to Single Precision Floating-point | |

VCVTSI2SS | Convert scalar signed double word integer to scalar single-precision floating-point value |

VCVTUSI2SS | Convert scalar unsigned double word integer to scalar single-precision floating-point value |

VCVTDQ2PS | Convert packed signed double word integers to packed single-precision floating-point values |

VCVTUDQ2PS | Convert packed unsigned double word integers to packed single-precision floating-point values |

Double Word to Double Precision Floating-point | |

VCVTSI2SD | Convert scalar signed double word integer to scalar double-precision floating-point value |

VCVTUSI2SD | Convert scalar unsigned double word integer to scalar double-precision floating-point value |

VCVTDQ2PD | Convert packed signed double word integers to packed double-precision floating-point values |

VCVTUDQ2PD | Convert packed unsigned double word integers to packed double-precision floating-point values |

Quad Word to Single Precision Floating-point | |

VCVTSI2SS | Convert scalar signed quad word integer to scalar single-precision floating-point value |

VCVTUSI2SS | Convert scalar unsigned quad word integer to scalar single-precision floating-point value |

VCVTQQ2PS | Convert packed signed quad word integers to packed single-precision floating-point values |

VCVTUQQ2PS | Convert packed unsigned quad word integers to packed single-precision floating-point values |

Quad Word to Double Precision Floating-point | |

VCVTSI2SD | Convert scalar signed quad word integer to scalar double-precision floating-point value |

VCVTUSI2SD | Convert scalar unsigned quad word integer to scalar double-precision floating-point value |

VCVTQQ2PD | Convert packed signed quad word integers to packed double-precision floating-point values |

VCVTUQQ2PD | Convert packed unsigned quad word integers to packed double-precision floating-point values |

Half Precision Floating-point to Single Precision Floating-point | |

VCVTPH2PS | Convert eight/four data element containing 16-bit floating-point data into eight/four single-precision floating-point data |

Single Precision Floating-point to Double Word | |

VCVTSS2SI | Convert scalar single-precision floating-point value to scalar signed double word integer |

VCVTSS2USI | Convert scalar single-precision floating-point value to scalar unsigned double word integer |

VCVTPS2DQ | Convert packed single-precision floating-point values to packed signed double word integers |

VCVTPS2UDQ | Convert packed single-precision floating-point values to packed unsigned double word integers |

VCVTTSS2SI | Convert with truncation scalar single-precision floating-point value to scalar signed double word integer |

VCVTTSS2USI | Convert with truncation scalar single-precision floating-point value to scalar unsigned double word integer |

VCVTTPS2DQ | Convert with truncation packed single-precision floating-point values to packed signed double word integers |

VCVTTPS2UDQ | Convert with truncation packed single-precision floating-point values to packed unsigned double word integers |

Single Precision Floating-point to Quad Word | |

VCVTSS2SI | Convert scalar single-precision floating-point value to scalar signed quad word integer |

VCVTSS2USI | Convert scalar single-precision floating-point value to scalar unsigned quad word integer |

VCVTPS2QQ | Convert packed single-precision floating-point values to packed signed quad word integers |

VCVTPS2UQQ | Convert packed single precision floating-point values to packed unsigned quad word integers |

VCVTTSS2SI | Convert with truncation scalar single-precision floating-point value to scalar signed quad word integer |

VCVTTSS2USI | Convert with truncation scalar single-precision floating-point value to scalar unsigned quad word integer |

VCVTTPS2QQ | Convert with truncation packed single precision floating-point values to packed signed quad word integers |

VCVTTPS2UQQ | Convert with truncation packed single precision floating-point values to packed unsigned quad word integers |

Single Precision Floating-point to Half Precision Floating-point | |

VCVTPS2PH | Convert eight/four data element containing single-precision floating-point data into eight/four 16-bit floating-point data |

Single Precision Floating-point to Double Precision Floating-point | |

VCVTSS2SD | Convert scalar single-precision floating-point value to scalar double-precision floating-point value |

VCVTPS2PD | Convert packed single-precision floating-point values to packed double-precision floating-point values |

Double Precision Floating-point to Double Word | |

VCVTSD2SI | Convert scalar double-precision floating-point value to scalar signed double word integer |

VCVTSD2USI | Convert scalar double-precision floating-point value to scalar unsigned double word integer |

VCVTPD2DQ | Convert packed double-precision floating-point values to packed signed double word integers |

VCVTPD2UDQ | Convert packed double-precision floating-point values to packed unsigned double word integers |

VCVTTSD2SI | Convert with truncation scalar double-precision floating-point value to scalar signed double word integer |

VCVTTSD2USI | Convert with truncation scalar double-precision floating-point value to scalar unsigned double word integer |

VCVTTPD2DQ | Convert with truncation packed double-precision floating-point values to packed signed double word integers |

VCVTTPD2UDQ | Convert with truncation packed double-precision floating-point values to packed unsigned double word integers |

Double Precision Floating-point to Quad Word | |

VCVTSD2SI | Convert scalar double-precision floating-point value to scalar signed quad word integer |

VCVTSD2USI | Convert scalar double-precision floating-point value to scalar unsigned quad word integer |

VCVTPD2QQ | Convert packed double-precision floating-point values to packed signed quad word integers |

VCVTPD2UQQ | Convert packed double-precision floating-point values to packed unsigned quad word integers |

VCVTTSD2SI | Convert with truncation scalar double-precision floating-point value to scalar signed quad word integer |

VCVTTSD2USI | Convert with truncation scalar double-precision floating-point value to scalar unsigned quad word integer |

VCVTTPD2QQ | Convert with truncation packed double-precision floating-point values to packed signed quad word integers |

VCVTTPD2UQQ | Convert with truncation packed double-precision floating-point values to packed unsigned quad word integers |

Double Precision Floating-point to Single Precision Floating-point | |

VCVTSD2SS | Convert scalar double-precision floating-point value to scalar single-precision floating-point value |

VCVTPD2PS | Convert packed double-precision floating-point values to packed single-precision floating-point values |

# Logical Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPTESTMB | Performs a bitwise logical AND of packed byte integers and set mask |

VPTESTNMB | Performs a bitwise logical NOT AND of packed byte integers and set mask |

Word Operands | |

VPTESTMW | Performs a bitwise logical AND of packed word integers and set mask |

VPTESTNMW | Performs a bitwise logical NOT AND of packed word integers and set mask |

Double Word Operands | |

VPTESTMD | Performs a bitwise logical AND of packed double word integers and set mask |

VPTESTNMD | Performs a bitwise logical NOT AND of packed double word integers and set mask |

VPANDD | Bitwise logical AND of packed double word integers |

VPANDND | Bitwise logical AND NOT of packed double word integers |

VPORD | Bitwise logical OR of packed double word integers |

VPXORD | Bitwise logical exclusive XOR of packed double word integers |

VPTERNLOGD | Bitwise ternary logic with double word granularity. The immediate value determines the specific binary function being implemented |

Quad Word Operands | |

VPTESTMQ | Performs a bitwise logical AND of packed quad word integers and set mask |

VPTESTNMQ | Performs a bitwise logical NOT AND of packed quad word integers and set mask |

VPANDQ | Bitwise logical AND of packed quad word integers |

VPANDNQ | Bitwise logical AND NOT of packed quad word integers |

VPORQ | Bitwise logical OR of packed quad word integers |

VPXORQ | Bitwise logical exclusive XOR of packed quad word integers |

VPTERNLOGQ | Bitwise ternary logic with quad word granularity. The immediate value determines the specific binary function being implemented |

Integer Operands | |

VPTEST | Performs a logical AND between the destinations with this mask and sets the ZF flag if the result is zero. The CF flag (zero for TEST) is set if the inverted mask AND with the destination is all zero |

VPAND | Bitwise logical AND |

VPANDN | Bitwise logical AND NOT |

VPOR | Bitwise logical OR |

VPXOR | Bitwise logical exclusive OR |

Single Precision Floating-point Operands | |

VTESTPS | Packed bit test of single-precision floating-point elements |

VANDPS | Perform bitwise logical AND of packed single-precision floating-point values |

VANDNPS | Perform bitwise logical AND NOT of packed single-precision floating-point values |

VORPS | Perform bitwise logical OR of packed single-precision floating-point values |

VXORPS | Perform bitwise logical XOR of packed single-precision floating-point values |

Double Precision Floating-point Operands | |

VTESTPD | Packed bit test of double-precision floating-point elements |

VANDPD | Perform bitwise logical AND of packed double-precision floating-point values |

VANDNPD | Perform bitwise logical AND NOT of packed double-precision floating-point values |

VORPD | Perform bitwise logical OR of packed double-precision floating-point values |

VXORPD | Perform bitwise logical XOR of packed double-precision floating-point values |

# Shift and Rotate Instructions

Instruction | Meaning |
---|---|

Word Operands | |

VPSLLW | Shift packed words left logical |

VPSRLW | Shift packed words right logical |

VPSRAW | Shift packed words right arithmetic |

VPSLLVW | Variable bit shift left logical |

VPSRLVW | Variable bit shift right logical |

VPSRAVW | Variable bit shift right arithmetic |

Double Word Operands | |

VPSLLD | Shift packed double words left logical |

VPSRLD | Shift packed double words right logical |

VPSRAD | Shift packed double words right arithmetic |

VPSLLVD | Variable bit shift left logical |

VPSRLVD | Variable bit shift right logical |

VPSRAVD | Variable bit shift right arithmetic |

VPROLD | Rotate double words left using immediate bits count |

VPRORD | Rotate double words right using immediate bits count |

VPROLVD | Rotate double words left using variable bits count |

VPRORVD | Rotate double words right using variable bits count |

Quad Word Operands | |

VPSLLQ | Shift packed quad word left logical |

VPSRLQ | Shift packed quad word right logical |

VPSRAQ | Shift packed quad words right arithmetic |

VPSLLVQ | Variable bit shift left logical |

VPSRLVQ | Variable bit shift right logical |

VPSRAVQ | Variable bit shift right arithmetic |

VPROLQ | Rotate quad words left using immediate bits count |

VPRORQ | Rotate quad words right using immediate bits count |

VPROLVQ | Rotate quad words left using variable bits count |

VPRORVQ | Rotate quad words right using variable bits count |

Double Quad Word Operands | |

VPSLLDQ | Shift double quad word left logical |

VPSRLDQ | Shift double quad word right logical |

# Comparison Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPCMPEQB | Compare packed bytes for equal |

VPCMPGTB | Compare packed signed byte integers for greater than |

VPCMPB | Compare packed signed byte values into mask |

VPCMPUB | Compare packed unsigned byte values into mask |

Word Operands | |

VPCMPEQW | Compare packed words for equal |

VPCMPGTW | Compare packed signed word integers for greater than |

VPCMPW | Compare packed signed word values into mask |

VPCMPUW | Compare packed unsigned word values into mask |

Double Word Operands | |

VPCMPEQD | Compare packed double words for equal |

VPCMPGTD | Compare packed signed double word integers for greater than |

VPCMPD | Compare packed signed double word values into mask |

VPCMPUD | Compare packed unsigned double word values into mask |

Quad Word Operands | |

VPCMPEQQ | Compare packed quad words for equal |

VPCMPGTQ | Compare packed signed quad word integers for greater than |

VPCMPQ | Compare packed signed quad word values into mask |

VPCMPUQ | Compare packed unsigned quad word values into mask |

Single Precision Floating-point Operands | |

VCMPEQPS | Compare packed single-precision floating-point values and set mask if destination value is equal to source value |

VCMPLTPS | Compare packed single-precision floating-point values and set mask if destination value is less than source value |

VCMPLEPS | Compare packed single-precision floating-point values and set mask if destination value is less than or equal to source value |

VCMPGTPS | Compare packed single-precision floating-point values and set mask if destination value is greater than source value |

VCMPGEPS | Compare packed single-precision floating-point values and set mask if destination value is greater than or equal to source value |

VCMPUNORDPS | Compare packed single-precision floating-point values and set mask if at least one of the two source operands is a NaN |

VCMPNEQPS | Compare packed single-precision floating-point values and set mask if destination value is not equal to source value |

VCMPNLTPS | Compare packed single-precision floating-point values and set mask if destination value is not less than source value |

VCMPNLEPS | Compare packed single-precision floating-point values and set mask if destination value is not less than or equal to source value |

VCMPNGTPS | Compare packed single-precision floating-point values and set mask if destination value is not greater than source value |

VCMPNGEPS | Compare packed single-precision floating-point values and set mask if destination value is not greater than or equal to source value |

VCMPORDPS | Compare packed single-precision floating-point values and set mask if neither of both source operands is a NaN |

VCMPEQSS | Compare scalar single-precision floating-point values and set mask if destination value is equal to source value |

VCMPLTSS | Compare scalar single-precision floating-point values and set mask if destination value is less than source value |

VCMPLESS | Compare scalar single-precision floating-point values and set mask if destination value is less than or equal to source value |

VCMPGTSS | Compare scalar single-precision floating-point values and set mask if destination value is greater than source value |

VCMPGESS | Compare scalar single-precision floating-point values and set mask if destination value is greater than or equal to source value |

VCMPUNORDSS | Compare scalar single-precision floating-point values and set mask if at least one of the two source operands is a NaN |

VCMPNEQSS | Compare scalar single-precision floating-point values and set mask if destination value is not equal to source value |

VCMPNLTSS | Compare scalar single-precision floating-point values and set mask if destination value is not less than source value |

VCMPNLESS | Compare scalar single-precision floating-point values and set mask if destination value is not less than or equal to source value |

VCMPNGTSS | Compare scalar single-precision floating-point values and set mask if destination value is not greater than source value |

VCMPNGESS | Compare scalar single-precision floating-point values and set mask if destination value is not greater than or equal to source value |

VCMPORDSS | Compare scalar single-precision floating-point values and set mask if neither of both source operands is a NaN |

VCOMISS | Perform ordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register |

VUCOMISS | Perform unordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register |

Double Precision Floating-point Operands | |

VCMPEQPD | Compare packed double-precision floating-point values and set mask if destination value is equal to source value |

VCMPLTPD | Compare packed double-precision floating-point values and set mask if destination value is less than source value |

VCMPLEPD | Compare packed double-precision floating-point values and set mask if destination value is less than or equal to source value |

VCMPGTPD | Compare packed double-precision floating-point values and set mask if destination value is greater than source value |

VCMPGEPD | Compare packed double-precision floating-point values and set mask if destination value is greater than or equal to source value |

VCMPUNORDPD | Compare packed double-precision floating-point values and set mask if at least one of the two source operands is a NaN |

VCMPNEQPD | Compare packed double-precision floating-point values and set mask if destination value is not equal to source value |

VCMPNLTPD | Compare packed double-precision floating-point values and set mask if destination value is not less than source value |

VCMPNLEPD | Compare packed double-precision floating-point values and set mask if destination value is not less than or equal to source value |

VCMPNGTPD | Compare packed double-precision floating-point values and set mask if destination value is not greater than source value |

VCMPNGEPD | Compare packed double-precision floating-point values and set mask if destination value is not greater than or equal to source value |

VCMPORDPD | Compare packed double-precision floating-point values and set mask if neither of both source operands is a NaN |

VCMPEQSD | Compare scalar double-precision floating-point values and set mask if destination value is equal to source value |

VCMPLTSD | Compare scalar double-precision floating-point values and set mask if destination value is less than source value |

VCMPLESD | Compare scalar double-precision floating-point values and set mask if destination value is less than or equal to source value |

VCMPGTSD | Compare scalar double-precision floating-point values and set mask if destination value is greater than source value |

VCMPGESD | Compare scalar double-precision floating-point values and set mask if destination value is greater than or equal to source value |

VCMPUNORDSD | Compare scalar double-precision floating-point values and set mask if at least one of the two source operands is a NaN |

VCMPNEQSD | Compare scalar double-precision floating-point values and set mask if destination value is not equal to source value |

VCMPNLTSD | Compare scalar double-precision floating-point values and set mask if destination value is not less than source value |

VCMPNLESD | Compare scalar double-precision floating-point values and set mask if destination value is not less than or equal to source value |

VCMPNGTSD | Compare scalar double-precision floating-point values and set mask if destination value is not greater than source value |

VCMPNGESD | Compare scalar double-precision floating-point values and set mask if destination value is not greater than or equal to source value |

VCMPORDSD | Compare scalar double-precision floating-point values and set mask if neither of both source operands is a NaN |

VCOMISD | Perform ordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register |

VUCOMISD | Perform unordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register |

# Packed Arithmetic Instructions

Instruction | Meaning |
---|---|

Byte Operands | |

VPADDB | Add packed byte integers |

VPADDUSB | Add packed unsigned byte integers with unsigned saturation |

VPADDSB | Add packed signed byte integers with signed saturation |

VPSUBB | Subtract packed byte integers |

VPSUBUSB | Subtract packed unsigned byte integers with unsigned saturation |

VPSUBSB | Subtract packed signed byte integers with signed saturation |

Word Operands | |

VPADDW | Add packed word integers |

VPADDUSW | Add packed unsigned word integers with unsigned saturation |

VPADDSW | Add packed signed word integers with signed saturation |

VPHADDW | Adds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed 16-bit results to the destination operand |

VPHADDSW | Adds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed, saturated 16-bit results to the destination operand |

VPSUBW | Subtract packed word integers |

VPSUBUSW | Subtract packed unsigned word integers with unsigned saturation |

VPSUBSW | Subtract packed signed word integers with signed saturation |

VPHSUBW | Performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed 16-bit results are packed and written to the destination operand |

VPHSUBSW | Performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed, saturated 16-bit results are packed and written to the destination operand |

VPMULHUW | Multiply packed unsigned integers and store high result |

VPMULLW | Multiply packed signed word integers and store low result |

VPMULHW | Multiply packed signed word integers and store high result |

VPMULHRSW | Multiplies vertically each signed 16-bit integer from the destination operand with the corresponding signed 16-bit integer of the source operand, producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand |

VPMADDUBSW | Multiplies each unsigned byte value with the corresponding signed byte value to produce an intermediate, 16-bit signed integer. Each adjacent pair of 16-bit signed values are added horizontally. The signed, saturated 16-bit results are packed to the destination operand |

Double Word Operands | |

VPADDD | Add packed double word integers |

VPHADDD | Adds two adjacent, signed 32-bit integers horizontally from the source and destination operands and packs the signed 32-bit results to the destination operand |

VPSUBD | Subtract packed double word integers |

VPHSUBD | Performs horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant double word from the least significant double word of each pair in the source and destination operands. The signed 32-bit results are packed and written to the destination operand |

VPMULLD | Returns four lower 32-bits of the 64-bit results of signed 32-bit integer multiplies |

VPMADDWD | Multiply and add packed word integers |

Quad Word Operands | |

VPADDQ | Add packed quad word integers |

VPSUBQ | Subtract packed quad word integers |

VPMULUDQ | Multiply packed unsigned double word integers |

VPMULDQ | Returns two 64-bit signed result of signed 32-bit integer multiplies |

VPMULLQ | Returns two lower 64-bits of the 128-bit results of signed 64-bit integer multiplies |

Single Precision Floating-point Operands | |

VADDSS | Add scalar single-precision floating-point value |

VADDPS | Add packed single-precision floating-point values |

VHADDPS | Performs a single-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand |

VSUBSS | Subtract scalar single-precision floating-point value |

VSUBPS | Subtract packed single-precision floating-point values |

VHSUBPS | Performs a single-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from the third element of the second operand |

VADDSUBPS | Performs single-precision addition on the second and fourth pairs of 32-bit data elements within the operands; single-precision subtraction on the first and third pairs |

VMULSS | Multiply scalar single-precision floating-point value |

VMULPS | Multiply packed single-precision floating-point values |

VDIVSS | Divide scalar single-precision floating-point value |

VDIVPS | Divide packed single-precision floating-point values |

Double Precision Floating-point Operands | |

VADDSD | Add scalar double precision floating-point value |

VADDPD | Add packed double-precision floating-point values |

VHADDPD | Performs a double-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand |

VSUBSD | Subtract scalar double-precision floating-point value |

VSUBPD | Subtract scalar double-precision floating-point value |

VHSUBPD | Performs a double-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the second operand from the first element of the second operand |

VADDSUBPD | Performs double-precision addition on the second pair of quad words, and double-precision subtraction on the first pair |

VMULSD | Multiply scalar double-precision floating-point value |

VMULPD | Multiply packed double-precision floating-point values |

VDIVSD | Divide scalar double-precision floating-point value |

VDIVPD | Divide packed double-precision floating-point values |

# Fused Arithmetic Instructions

Instruction | Meaning |
---|---|

Single Precision Floating-point Operands | |

VFMADD132SS | Fused multiply-add of scalar single-precision floating-point values |

VFMADD213SS | Fused multiply-add of scalar single-precision floating-point values |

VFMADD231SS | Fused multiply-add of scalar single-precision floating-point values |

VFMADD132PS | Fused multiply-add of packed single-precision floating-point values |

VFMADD213PS | Fused multiply-add of packed single-precision floating-point values |

VFMADD231PS | Fused multiply-add of packed single-precision floating-point values |

VFNMADD132SS | Fused negative multiply-add of scalar single-precision floating-point values |

VFNMADD213SS | Fused negative multiply-add of scalar single-precision floating-point values |

VFNMADD231SS | Fused negative multiply-add of scalar single-precision floating-point values |

VFNMADD132PS | Fused negative multiply-add of packed single-precision floating-point values |

VFNMADD213PS | Fused negative multiply-add of packed single-precision floating-point values |

VFNMADD231PS | Fused negative multiply-add of packed single-precision floating-point values |

VFMSUB132SS | Fused multiply-subtract of scalar single-precision floating-point values |

VFMSUB213SS | Fused multiply-subtract of scalar single-precision floating-point values |

VFMSUB231SS | Fused multiply-subtract of scalar single-precision floating-point values |

VFMSUB132PS | Fused multiply-subtract of packed single-precision floating-point values |

VFMSUB213PS | Fused multiply-subtract of packed single-precision floating-point values |

VFMSUB231PS | Fused multiply-subtract of packed single-precision floating-point values |

VFNMSUB132SS | Fused negative multiply-subtract of scalar single-precision floating-point values |

VFNMSUB213SS | Fused negative multiply-subtract of scalar single-precision floating-point values |

VFNMSUB231SS | Fused negative multiply-subtract of scalar single-precision floating-point values |

VFNMSUB132PS | Fused negative multiply-subtract of packed single-precision floating-point values |

VFNMSUB213PS | Fused negative multiply-subtract of packed single-precision floating-point values |

VFNMSUB231PS | Fused negative multiply-subtract of packed single-precision floating-point values |

VFMADDSUB132PS | Fused multiply-alternating add/subtract of packed single-precision floating-point values |

VFMADDSUB213PS | Fused multiply-alternating add/subtract of packed single-precision floating-point values |

VFMADDSUB231PS | Fused multiply-alternating add/subtract of packed single-precision floating-point values |

VFMSUBADD132PS | Fused multiply-alternating subtract/add of packed single-precision floating-point values |

VFMSUBADD213PS | Fused multiply-alternating subtract/add of packed single-precision floating-point values |

VFMSUBADD231PS | Fused multiply-alternating subtract/add of packed single-precision floating-point values |

Double Precision Floating-point Operands | |

VFMADD132SD | Fused multiply-add of scalar double-precision floating-point values |

VFMADD213SD | Fused multiply-add of scalar double-precision floating-point values |

VFMADD231SD | Fused multiply-add of scalar double-precision floating-point values |

VFMADD132PD | Fused multiply-add of packed double-precision floating-point values |

VFMADD213PD | Fused multiply-add of packed double-precision floating-point values |

VFMADD231PD | Fused multiply-add of packed double-precision floating-point values |

VFNMADD132SD | Fused negative multiply-add of scalar double-precision floating-point values |

VFNMADD213SD | Fused negative multiply-add of scalar double-precision floating-point values |

VFNMADD231SD | Fused negative multiply-add of scalar double-precision floating-point values |

VFNMADD132PD | Fused negative multiply-add of packed double-precision floating-point values |

VFNMADD213PD | Fused negative multiply-add of packed double-precision floating-point values |

VFNMADD231PD | Fused negative multiply-add of packed double-precision floating-point values |

VFMSUB132SD | Fused multiply-subtract of scalar double-precision floating-point values |

VFMSUB213SD | Fused multiply-subtract of scalar double-precision floating-point values |

VFMSUB231SD | Fused multiply-subtract of scalar double-precision floating-point values |

VFMSUB132PD | Fused multiply-subtract of packed double-precision floating-point values |

VFMSUB213PD | Fused multiply-subtract of packed double-precision floating-point values |

VFMSUB231PD | Fused multiply-subtract of packed double-precision floating-point values |

VFNMSUB132SD | Fused negative multiply-subtract of scalar double-precision floating-point values |

VFNMSUB213SD | Fused negative multiply-subtract of scalar double-precision floating-point values |

VFNMSUB231SD | Fused negative multiply-subtract of scalar double-precision floating-point values |

VFNMSUB132PD | Fused negative multiply-subtract of packed double-precision floating-point values |

VFNMSUB213PD | Fused negative multiply-subtract of packed double-precision floating-point values |

VFNMSUB231PD | Fused negative multiply-subtract of packed double-precision floating-point values |

VFMADDSUB132PD | Fused multiply-alternating add/subtract of packed double-precision floating-point values |

VFMADDSUB213PD | Fused multiply-alternating add/subtract of packed double-precision floating-point values |

VFMADDSUB231PD | Fused multiply-alternating add/subtract of packed double-precision floating-point values |

VFMSUBADD132PD | Fused multiply-alternating subtract/add of packed double-precision floating-point values |

VFMSUBADD213PD | Fused multiply-alternating subtract/add of packed double-precision floating-point values |

VFMSUBADD231PD | Fused multiply-alternating subtract/add of packed double-precision floating-point values |

# Primitives of Functions

Instruction | Meaning |
---|---|

Byte Operands | |

VPABSB | Computes the absolute value of each signed byte data element |

VPSIGNB | Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero |

VPAVGB | Compute average of packed unsigned byte integers |

VPMINUB | Minimum of packed unsigned byte integers |

VPMINSB | Minimum of packed signed byte integers |

VPMAXUB | Maximum of packed unsigned byte integers |

VPMAXSB | Maximum of packed signed byte integers |

VPSADBW | Compute sum of absolute differences |

VMPSADBW | Performs eight 4-byte wide Sum of Absolute Differences operations to produce eight word integers |

VDBPSADBW | Double block packed Sum of Absolute Differences on unsigned bytes |

Word Operands | |

VPABSW | Computes the absolute value of each signed 16-bit data element |

VPSIGNW | Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero |

VPAVGW | Compute average of packed unsigned word integers |

VPMINUW | Minimum of packed unsigned word integers |

VPMINSW | Minimum of packed signed word integers |

VPMAXUW | Maximum of packed unsigned word integers |

VPMAXSW | Maximum of packed signed word integers |

VPHMINPOSUW | Finds the value and location of the minimum unsigned word from one of 8 horizontally packed unsigned words. The resulting value and location (offset within the source) are packed into the low double word of the destination YMM register |

Double Word Operands | |

VPABSD | Computes the absolute value of each signed 32-bit data element |

VPSIGND | Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero |

VPMINUD | Minimum of packed unsigned double word integers |

VPMINSD | Minimum of packed signed double word integers |

VPMAXUD | Maximum of packed unsigned double word integers |

VPMAXSD | Maximum of packed signed double word integers |

VPLZCNTD | Count the number of leading zero bits in each packed double word element |

VPCONFLICTD | Detect conflicts within a vector of packed double word values into dense memory |

Quad Word Operands | |

VPABSQ | Computes the absolute value of each signed 64-bit data element |

VPMINUQ | Minimum of packed unsigned quad word integers |

VPMINSQ | Minimum of packed signed quad word integers |

VPMAXUQ | Maximum of packed unsigned quad word integers |

VPMAXSQ | Maximum of packed signed quad word integers |

VPLZCNTQ | Count the number of leading zero bits in each packed quad word element |

VPCONFLICTQ | Detect conflicts within a vector of packed quad word values into dense memory |

Single Precision Floating-point Operands | |

VSQRTSS | Compute square root of scalar single-precision floating-point value |

VSQRTPS | Compute square roots of packed single-precision floating-point values |

VMINSS | Return minimum scalar single-precision floating-point value |

VMINPS | Return minimum packed single-precision floating-point values |

VMAXSS | Return maximum scalar single-precision floating-point value |

VMAXPS | Return maximum packed single-precision floating-point values |

VROUNDSS | Round the low packed single precision floating-point value into an integer value and return a rounded floating-point value |

VROUNDPS | Round packed single precision floating-point values into integer values and return rounded floating-point values |

VRNDSCALESS | Round scalar single-precision floating-point value to include a given number of fraction bits |

VRNDSCALEPS | Round packed single-precision floating-point values to include a given number of fraction bits |

VDPPS | Perform single-precision dot products for up to 4 elements and broadcast |

VRANGESS | Range restriction calculation for pairs of scalar single-precision floating-point values |

VRANGEPS | Range restriction calculation for packed pairs of single-precision floating-point values |

VREDUCESS | Perform a reduction transformation on a scalar single-precision floating-point value by subtracting a number of fraction bits |

VREDUCEPS | Perform reduction transformation on packed single-precision floating-point values by subtracting a number of fraction bits |

VGETEXPSS | Convert the biased exponent of scalar single-precision floating-point value to floating-point value representing unbiased integer exponent |

VGETEXPPS | Convert the biased exponent of packed single-precision floating-point values to floating-point values representing unbiased integer exponent |

VGETMANTSS | Extract the normalized mantissa from scalar single-precision floating-point value |

VGETMANTPS | Extract the normalized mantissa from packed single-precision floating-point values |

VSCALEFSS | Scale scalar single-precision floating-point value |

VSCALEFPS | Scale packed single-precision floating-point values |

VEXP2PS | Approximation to the exponential 2^x of packed single-precision floating-point values with less than 2^-23 relative error |

VFPCLASSSS | Tests scalar single-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative |

VFPCLASSPS | Tests packed single-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative |

VFIXUPIMMSS | Fix up special scalar single-precision floating-point value |

VFIXUPIMMPS | Fix up special packed single-precision floating-point values |

VRCP14SS | Computes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error is less than 2^-28 |

VRCP14PS | Computes the approximate reciprocals of the packed single-precision floating-point values. The max relative error is less than 2^-28 |

VRCP28SS | Computes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error is less than 2^-28 |

VRCP28PS | Computes the approximate reciprocals of the packed single-precision floating-point values. The max relative error is less than 2^-28 |

VRSQRT14SS | Computes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error is less than 2^-14 |

VRSQRT14PS | Computes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error is less than 2^-14 |

VRSQRT28SS | Computes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error is less than 2^-28 |

VRSQRT28PS | Computes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error is less than 2^-28 |

VRCPPS | Compute reciprocals of packed single-precision floating-point values |

VRCPSS | Compute reciprocal of scalar single-precision floating-point value |

VRSQRTPS | Compute reciprocals of square roots of packed single-precision floating-point values |

VRSQRTSS | Compute reciprocal of square root of scalar single-precision floating-point value |

Double Precision Floating-point Operands | |

VSQRTSD | Compute scalar square root of scalar double-precision floating-point value |

VSQRTPD | Compute packed square roots of packed double-precision floating-point values |

VMINSD | Return minimum scalar double-precision floating-point value |

VMINPD | Return minimum packed double-precision floating-point values |

VMAXSD | Return maximum scalar double-precision floating-point value |

VMAXPD | Return maximum packed double-precision floating-point values |

VROUNDSD | Round the low packed double precision floating-point value into an integer value and return a rounded floating-point value |

VROUNDPD | Round packed double precision floating-point values into integer values and return rounded floating-point values |

VRNDSCALESD | Round scalar double-precision floating-point value to include a given number of fraction bits |

VRNDSCALEPD | Round packed double-precision floating-point values to include a given number of fraction bits |

VDPPD | Perform double-precision dot product for up to 2 elements and broadcast |

VRANGESD | Range restriction calculation for pairs of scalar double-precision floating-point values |

VRANGEPD | Range restriction calculation for packed pairs of double-precision floating-point values |

VREDUCESD | Perform a reduction transformation on a scalar double-precision floating-point value by subtracting a number of fraction bits |

VREDUCEPD | Perform reduction transformation on packed double-precision floating-point values by subtracting a number of fraction bits |

VGETEXPSD | Convert the biased exponent of scalar double-precision floating-point value to floating-point value representing unbiased integer exponent |

VGETEXPPD | Convert the biased exponent of packed double-precision floating-point values to floating-point values representing unbiased integer exponent |

VGETMANTSD | Extract the normalized mantissa from scalar double-precision floating-point value |

VGETMANTPD | Extract the normalized mantissa from packed double-precision floating-point values |

VSCALEFSD | Scale scalar double-precision floating-point value |

VSCALEFPD | Scale packed double-precision floating-point values |

VEXP2PD | Approximation to the exponential 2^x of packed double-precision floating-point values with less than 2^-23 relative error |

VFPCLASSSD | Tests scalar double-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative |

VFPCLASSPD | Tests packed double-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative |

VFIXUPIMMSD | Fix up special scalar double-precision floating-point value |

VFIXUPIMMPD | Fix up special packed double-precision floating-point values |

VRCP14SD | Computes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error is less than 2^-28 |

VRCP14PD | Computes the approximate reciprocals of the packed double-precision floating-point values. The max relative error is less than 2^-28 |

VRCP28SD | Computes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error is less than 2^-28 |

VRCP28PD | Computes the approximate reciprocals of the packed double-precision floating-point values. The max relative error is less than 2^-28 |

VRSQRT14SD | Computes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error is less than 2^-14 |

VRSQRT14PD | Computes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error is less than 2^-14 |

VRSQRT28SD | Computes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error is less than 2^-28 |

VRSQRT28PD | Computes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error is less than 2^-28 |

# Opmask Instructions

Instruction | Meaning |
---|---|

8-bit Operands | |

KMOVB | Move 8-bit from and to mask registers |

KTESTB | Set ZF and CF depending on sign bit AND and ANDN of 8-bit masks |

KORTESTB | Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly |

KNOTB | Bitwise NOT of 8-bits mask |

KANDB | Bitwise logical AND of two 8-bit masks |

KANDNB | Bitwise logical AND NOT of two 8-bit masks |

KORB | Bitwise logical OR of two 8-bit masks |

KXORB | Bitwise logical XOR of two 8-bit masks |

KXNORB | Bitwise logical XNOR of two 8-bit masks |

KADDB | Add two 8-bit masks |

KSHIFTLB | Shift left 8-bit mask register |

KSHIFTRB | Shift right 8-bit mask register |

KUNPCKBW | Unpack and interleave 8-bit masks |

VPMOVM2B | Convert a mask register to a vector register |

VPMOVB2M | Converts a vector register to a mask register |

16-bit Operands | |

KMOVW | Move 16-bit from and to mask registers |

KTESTW | Set ZF and CF depending on sign bit AND and ANDN of 16-bit masks |

KORTESTW | Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly |

KNOTW | Bitwise NOT of 16-bits mask |

KANDW | Bitwise logical AND of two 16-bit masks |

KANDNW | Bitwise logical AND NOT of two 16-bit masks |

KORW | Bitwise logical OR of two 16-bit masks |

KXORW | Bitwise logical XOR of two 16-bit masks |

KXNORW | Bitwise logical XNOR of two 16-bit masks |

KADDW | Add two 16-bit masks |

KSHIFTLW | Shift left 16-bit mask register |

KSHIFTRW | Shift right 16-bit mask register |

KUNPCKWD | Unpack and interleave 16-bit masks |

VPMOVM2W | Convert a mask register to a vector register |

VPMOVW2M | Converts a vector register to a mask register |

32-bit Operands | |

KMOVD | Move 32-bit from and to mask registers |

KTESTD | Set ZF and CF depending on sign bit AND and ANDN of 32-bit masks |

KORTESTD | Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly |

KNOTD | Bitwise NOT of 32-bits mask |

KANDD | Bitwise logical AND of two 32-bit masks |

KANDND | Bitwise logical AND NOT of two 32-bit masks |

KORD | Bitwise logical OR of two 32-bit masks |

KXORD | Bitwise logical XOR of two 32-bit masks |

KXNORD | Bitwise logical XNOR of two 32-bit masks |

KADDD | Add two 32-bit masks |

KSHIFTLD | Shift left 32-bit mask register |

KSHIFTRD | Shift right 32-bit mask register |

KUNPCKDQ | Unpack and interleave 32-bit masks |

VPMOVM2D | Convert a mask register to a vector register |

VPMOVD2M | Converts a vector register to a mask register |

64-bit Operands | |

KMOVQ | Move 64-bit from and to mask registers |

KTESTQ | Set ZF and CF depending on sign bit AND and ANDN of 64-bit masks |

KORTESTQ | Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly |

KNOTQ | Bitwise NOT of 64-bits mask |

KANDQ | Bitwise logical AND of two 64-bit masks |

KANDNQ | Bitwise logical AND NOT of two 64-bit masks |

KORQ | Bitwise logical OR of two 64-bit masks |

KXORQ | Bitwise logical XOR of two 64-bit masks |

KXNORQ | Bitwise logical XNOR of two 64-bit masks |

KADDQ | Add two 64-bit masks |

KSHIFTLQ | Shift left 64-bit mask register |

KSHIFTRQ | Shift right 64-bit mask register |

VPMOVM2Q | Convert a mask register to a vector register |

VPMOVQ2M | Converts a vector register to a mask register |

# String and Text Processing Instructions

Instruction | Meaning |
---|---|

VPCMPESTRI | Packed compare explicit-length strings, return index in ECX/RCX |

VPCMPESTRM | Packed compare explicit-length strings, return mask in YMM0 |

VPCMPISTRI | Packed compare implicit-length strings, return index in ECX/RCX |

VPCMPISTRM | Packed compare implicit-length strings, return mask in YMM0 |

# Secure Hash Algorithm Instructions

Instruction | Meaning |
---|---|

SHA1RNDS4 | Perform four rounds of SHA1 operation |

SHA1MSG1 | Perform an intermediate calculation for the next four SHA1 message double words |

SHA1MSG2 | Perform a final calculation for the next four SHA1 message double words |

SHA1NEXTE | Calculate SHA1 state variable E after four founds |

SHA256RNDS2 | Perform two rounds of SHA256 operation |

SHA256MSG1 | Perform an intermediate calculation for the next four SHA256 message double words |

SHA256MSG2 | Perform a final calculation for the next four SHA256 message double words |

# State Management Instructions

Instruction | Meaning |
---|---|

VLDMXCSR | Load MXCSR register |

VSTMXCSR | Save MXCSR register state |

# Agent Synchronization Instructions

Instruction | Meaning |
---|---|

MONITOR | Sets up an address range used to monitor write-back stores |

MWAIT | Enables a processor to enter into an optimized state while waiting for a write-back store to the address range set up by the MONITOR instruction |

# Cacheability Control, Prefetch and Ordering Instructions

Instruction | Meaning |
---|---|

VLDDQU | Special 128-bit unaligned load designed to avoid cache line splits |

PREFETCHNTA | Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using NTA hint |

PREFETCHT0 | Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T0 hint |

PREFETCHT1 | Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T1 hint |

PREFETCHT2 | Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T2 hint |

PREFETCHW | Prefetch data into caches in anticipation of a write |

PREFETCHWT1 | Prefetch data into caches with intent to write and T1 hint |

CLFLUSH | Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy |

CLFLUSHOPT | Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy with optimized memory system throughput |

SFENCE | Serializes store operations |

LFENCE | Serializes load operations |

MFENCE | Serializes load and store operations |

VMASKMOVDQU | Non-temporal store of selected bytes from an YMM register into memory |

VMOVNTPS | Non-temporal store of four packed single-precision floating-point values from an YMM register into memory |

VMOVNTPD | Non-temporal store of two packed double-precision floating-point values from an YMM register into memory |

VMOVNTDQ | Non-temporal store of double quad word from an YMM register into memory |

VMOVNTDQA | Provides a non-temporal hint that can cause adjacent 16-byte items within an aligned 64-byte region (a streaming line) to be fetched and held in a small set of temporary buffers ("streaming load buffers"). Subsequent streaming loads to other aligned 16-byte items in the same streaming line may be supplied from the streaming load buffer and can improve throughput |

MOVNTI | Non-temporal store of a double word from a general-purpose register into memory |

PAUSE | Improves the performance of "spin-wait loops" |