AVX Initialization Instructions

Instruction	📄	Meaning
VZEROALL	ℹ️	Zero all YMM registers
VZEROUPPER	ℹ️	Zero upper bits of all YMM registers

Instructions set top

Data Transfer Instructions

The data transfer instructions move integer and floating-point operands between SIMD registers and between SIMD registers and memory.

Instruction	📄	Meaning
Integer Operands
VMOVW	ℹ️	Move word
VMOVD	ℹ️	Move double word
VMOVQ	ℹ️	Move quad word
VMOVDQA	ℹ️	Move aligned double quad words
VMOVDQA32	ℹ️	Move aligned packed double word integer values using writemask
VMOVDQA64	ℹ️	Move aligned packed quad word integer values using writemask
VMOVDQU	ℹ️	Move unaligned double quad words
VMOVDQU8	ℹ️	Move unaligned packed byte integer values using writemask
VMOVDQU16	ℹ️	Move unaligned packed word integer values using writemask
VMOVDQU32	ℹ️	Move unaligned packed double word integer values using writemask
VMOVDQU64	ℹ️	Move unaligned packed quad word integer values using writemask
VMOVSLDUP	ℹ️	Loads/moves 128 bits duplicating the first and third 32-bit data elements
VMOVSHDUP	ℹ️	Loads/moves 128 bits duplicating the second and fourth 32-bit data elements
VMOVDDUP	ℹ️	Loads/moves 128 bits duplicating the lower 64-bit data elements
VPMASKMOVD	ℹ️	Conditional SIMD integer packed loads and stores of double word values
VPMASKMOVQ	ℹ️	Conditional SIMD integer packed loads and stores of quad word values
VPMOVMSKB	ℹ️	Move byte mask
Single Precision Floating-point Operands
VMOVSS	ℹ️	Move scalar single-precision floating-point value between YMM registers or between an YMM register and memory
VMOVAPS	ℹ️	Move aligned packed single-precision floating-point values between YMM registers or between and YMM register and memory
VMOVUPS	ℹ️	Move unaligned packed single-precision floating-point values between YMM registers or between and YMM register and memory
VMOVLPS	ℹ️	Move two packed single-precision floating-point values to the low quad word of an YMM register and memory
VMOVHPS	ℹ️	Move two packed single-precision floating-point values to the high quad word of an YMM register and memory
VMOVLHPS	ℹ️	Move two packed single-precision floating-point values from the low quad word to the high quad word of another YMM register
VMOVHLPS	ℹ️	Move two packed single-precision floating-point values from the high quad word to the low quad word of another YMM register
VMASKMOVPS	ℹ️	Conditional SIMD packed loads and stores of single-precision floating-point values
VMOVMSKPS	ℹ️	Extract sign mask from four packed single-precision floating-point value
Double Precision Floating-point Operands
VMOVSD	ℹ️	Move scalar double-precision floating-point value between YMM registers or between an YMM register and memory
VMOVAPD	ℹ️	Move aligned packed double-precision floating-point values between YMM registers or between and YMM register and memory
VMOVUPD	ℹ️	Move unaligned packed double-precision floating-point values between YMM registers or between and YMM register and memory
VMOVLPD	ℹ️	Move low packed double-precision floating-point value to the low quad word of an YMM register and memory
VMOVHPD	ℹ️	Move high packed double-precision floating-point value to the high quad word of an YMM register and memory
VMASKMOVPD	ℹ️	Conditional SIMD packed loads and stores of double-precision floating-point values
VMOVMSKPD	ℹ️	Extract sign mask from two packed double-precision floating-point value

Instructions set top

Broadcast Instructions

Instruction	📄	Meaning
Byte Operands
VPBROADCASTB	ℹ️	Broadcast a byte integer value to all elements of a register
VPBROADCASTMB2Q	ℹ️	Broadcast byte size mask to all elements of a register
Word Operands
VPBROADCASTW	ℹ️	Broadcast a word integer value to all elements of a register
VPBROADCASTMW2D	ℹ️	Broadcast word size mask to all elements of a register
Double Word Operands
VPBROADCASTD	ℹ️	Broadcast a double word integer value to all elements of a register
VBROADCASTI32X2	ℹ️	Broadcast two double word values to all elements of a register
VBROADCASTI32X4	ℹ️	Broadcast four double word values to all elements of a register
VBROADCASTI32X8	ℹ️	Broadcast eight double word values to all elements of a register
Quad Word Operands
VPBROADCASTQ	ℹ️	Broadcast a quad word integer value to all elements of a register
VBROADCASTI64X2	ℹ️	Broadcast two quad word values to all elements of a register
VBROADCASTI64X4	ℹ️	Broadcast four quad word values to all elements of a register
Single Precision Floating-point Operands
VBROADCASTSS	ℹ️	Broadcast a single-precision floating-point value to all elements of a register
VBROADCASTF32X2	ℹ️	Broadcast two single-precision floating-point values to all elements of a register
VBROADCASTF32X4	ℹ️	Broadcast four single-precision floating-point values to all elements of a register
VBROADCASTF32X8	ℹ️	Broadcast eight single-precision floating-point values to all elements of a register
Double Precision Floating-point Operands
VBROADCASTSD	ℹ️	Broadcast a double-precision floating-point value to all elements of a register
VBROADCASTF64X2	ℹ️	Broadcast two double-precision floating-point values to all elements of a register
VBROADCASTF64X4	ℹ️	Broadcast four double-precision floating-point values to all elements of a register
128-bits Integer Operands
VBROADCASTI128	ℹ️	Broadcast 128-bits of integer data in memory to low and high 128-bits in YMM register
128-bits Floating-point Operands
VBROADCASTF128	ℹ️	Broadcast 128-bits of floating-point data in memory to low and high 128-bits in YMM register

Instructions set top

Expand Instructions

Instruction	📄	Meaning
Byte Operands
VPEXPANDB	ℹ️	Load sparse packed byte word integer values from dense memory/register
Word Operands
VPEXPANDW	ℹ️	Load sparse packed word integer values from dense memory/register
Double Word Operands
VPEXPANDD	ℹ️	Load sparse packed double word integer values from dense memory/register
Quad Word Operands
VPEXPANDQ	ℹ️	Load sparse packed quad word integer values from dense memory/register
Single Precision Floating-point Operands
VEXPANDPS	ℹ️	Load sparse packed single-precision floating-point values from dense memory
Double Precision Floating-point Operands
VEXPANDPD	ℹ️	Load sparse packed double-precision floating-point values from dense memory

Instructions set top

Compress Instructions

Instruction	📄	Meaning
Byte Operands
VPCOMPRESSB	ℹ️	Store sparse packed byte integer values into dense memory/register
Word Operands
VPCOMPRESSW	ℹ️	Store sparse packed word integer values into dense memory/register
Double Word Operands
VPCOMPRESSD	ℹ️	Store sparse packed double word integer values into dense memory/register
Quad Word Operands
VPCOMPRESSQ	ℹ️	Store sparse packed quad word integer values into dense memory/register
Single Precision Floating-point Operands
VCOMPRESSPS	ℹ️	Store sparse packed single-precision floating-point values into dense memory
Double Precision Floating-point Operands
VCOMPRESSPD	ℹ️	Store sparse packed double-precision floating-point values into dense memory

Instructions set top

Insert Instructions

Instruction	📄	Meaning
Byte Operands
VPINSRB	ℹ️	Insert a byte value from a register or memory into an YMM register
Word Operands
VPINSRW	ℹ️	Insert a word value from a register or memory into an YMM register
Double Word Operands
VPINSRD	ℹ️	Insert a double word value from register or memory into an YMM register
VINSERTI32X4	ℹ️	Insert 128-bits of packed double word integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI32X8	ℹ️	Insert 256-bits of packed double word integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
Quad Word Operands
VPINSRQ	ℹ️	Insert a quad word value from register or memory into an YMM register
VINSERTI64X2	ℹ️	Insert 128-bits of packed quad word integer values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTI64X4	ℹ️	Insert 256-bits of packed quad word integer values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
Single Precision Floating-point Operands
VINSERTPS	ℹ️	Inserts a single-precision floating-point value from either a 32-bit memory location or selected from a specified offset in an YMM register to a specified offset in the destination YMM register. In addition, INSERTPS allows zeroing out selected data elements in the destination, using a mask
VINSERTF32X4	ℹ️	Insert 128-bits of packed single-precision floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF32X8	ℹ️	Insert 256-bits of packed single-precision floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
Double Precision Floating-point Operands
VINSERTF64X2	ℹ️	Insert 128-bits of packed double-precision floating-point values from the source into the destination operand at 128-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
VINSERTF64X4	ℹ️	Insert 256-bits of packed double-precision floating-point values from the source into the destination operand at 256-bit granular offset. The remaining portions of the destination operand are copied from the corresponding fields of the first source operand
128-bits Integer Operands
VINSERTI128	ℹ️	Insert 128-bits of packed integer values from the source into the destination operand
128-bits Floating-point Operands
VINSERTF128	ℹ️	Insert 128-bits of packed floating-point values from the source into the destination operand

Instructions set top

Extract Instructions

Instruction	📄	Meaning
Byte Operands
VPEXTRB	ℹ️	Extract a byte from an YMM register and insert the value into a general-purpose register or memory
Word Operands
VPEXTRW	ℹ️	Extract a word from an YMM register and insert the value into a general-purpose register or memory
Double Word Operands
VPEXTRD	ℹ️	Extract a double word from an YMM register and insert the value into a general-purpose register or memory
VEXTRACTI32X4	ℹ️	Extract 128-bits of packed double word integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTI32X8	ℹ️	Extract 256-bits of packed double word integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
Quad Word Operands
VPEXTRQ	ℹ️	Extract a quad word from an YMM register and insert the value into a general-purpose register or memory
VEXTRACTI64X2	ℹ️	Extract 128-bits of packed quad word integer values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTI64X4	ℹ️	Extract 256-bits of packed quad word integer values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
Single Precision Floating-point Operands
VEXTRACTPS	ℹ️	Extracts a single-precision floating-point value from a specified offset in an YMM register and stores the result to memory or a general-purpose register
VEXTRACTF32X4	ℹ️	Extract 128-bits of packed single-precision floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTF32X8	ℹ️	Extract 256-bits of packed single-precision floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
Double Precision Floating-point Operands
VEXTRACTF64X2	ℹ️	Extract 128-bits of packed double-precision floating-point values from the source operand and store to the low 128-bit of the destination operand at 128-bit granular offset
VEXTRACTF64X4	ℹ️	Extract 256-bits of packed double-precision floating-point values from the source operand and store to the low 256-bit of the destination operand at 256-bit granular offset
128-bits Integer Operands
VEXTRACTI128	ℹ️	Extract 128-bits of packed integer values from the source operand and store to the low 128-bit of the destination operand
128-bits Floating-point Operands
VEXTRACTF128	ℹ️	Extract 128-bits of packed floating-point values from the source operand and store to the low 128-bit of the destination operand

Instructions set top

Gather Instructions

Instruction	📄	Meaning
Double Word Operands
VPGATHERDD	ℹ️	Gather packed double word values using signed double word indices
VPGATHERQD	ℹ️	Gather packed double word values using signed quad word indices
Quad Word Operands
VPGATHERDQ	ℹ️	Gather packed quad word values using signed double word indices
VPGATHERQQ	ℹ️	Gather packed quad word values using signed quad word indices
Single Precision Floating-point Operands
VGATHERDPS	ℹ️	Gather packed single-precision floating-point values using signed double word indices
VGATHERQPS	ℹ️	Gather packed single-precision floating-point values using signed quad word indices
VGATHERPF0DPS		Sparse prefetch of packed single-precision floating-point values with signed double word indices using T0 hint
VGATHERPF1DPS		Sparse prefetch of packed single-precision floating-point values with signed double word indices using T1 hint
VGATHERPF0QPS		Sparse prefetch of packed single-precision floating-point values with signed quad word indices using T0 hint
VGATHERPF1QPS		Sparse prefetch of packed single-precision floating-point values with signed quad word indices using T1 hint
Double Precision Floating-point Operands
VGATHERDPD	ℹ️	Gather packed double-precision floating-point values using signed double word indices
VGATHERQPD	ℹ️	Gather packed double-precision floating-point values using signed quad word indices
VGATHERPF0DPD		Sparse prefetch of packed double-precision floating-point values with signed double word indices using T0 hint
VGATHERPF1DPD		Sparse prefetch of packed double-precision floating-point values with signed double word indices using T1 hint
VGATHERPF0QPD		Sparse prefetch of packed double-precision floating-point values with signed quad word indices using T0 hint
VGATHERPF1QPD		Sparse prefetch of packed double-precision floating-point values with signed quad word indices using T1 hint

Instructions set top

Scatter Instructions

Instruction	📄	Meaning
Double Word Operands
VPSCATTERDD	ℹ️	Using signed double word indices, scatter double word values to memory using writemask
VPSCATTERQD	ℹ️	Using signed quad word indices, scatter double word values to memory using writemask
Quad Word Operands
VPSCATTERDQ	ℹ️	Using signed double word indices, scatter quad word values to memory using writemask
VPSCATTERQQ	ℹ️	Using signed quad word indices, scatter quad word values to memory using writemask
Single Precision Floating-point Operands
VSCATTERDPS	ℹ️	Using signed double word indices, scatter single-precision floating-point values to memory using writemask
VSCATTERQPS	ℹ️	Using signed quad word indices, scatter single-precision floating-point values to memory using writemask
VSCATTERPF0DPS		Using signed double word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1DPS		Using signed double word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write
VSCATTERPF0QPS		Using signed quad word indices, prefetch sparse single-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1QPS		Using signed quad word indices, prefetch sparse single-precision floating-point value using writemask and T1 hint with intent to write
Double Precision Floating-point Operands
VSCATTERDPD	ℹ️	Using signed double word indices, scatter double-precision floating-point values to memory using writemask
VSCATTERQPD	ℹ️	Using signed quad word indices, scatter double-precision floating-point values to memory using writemask
VSCATTERPF0DPD		Using signed double word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1QPD		Using signed double word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write
VSCATTERPF0QPD		Using signed quad word indices, prefetch sparse double-precision floating-point values using writemask and T0 hint with intent to write
VSCATTERPF1DPD		Using signed quad word indices, prefetch sparse double-precision floating-point value using writemask and T1 hint with intent to write

Instructions set top

Blending Instructions

Instruction	📄	Meaning
Byte Operands
VPBLENDVB	ℹ️	Conditionally copies specified byte elements in the source operand to the destination, using an implied mask
VPBLENDMB	ℹ️	Performs blending of byte elements between the first and the second operand (register or memory), using the instruction mask selector
Word Operands
VPBLENDW	ℹ️	Conditionally copies specified word elements in the source operand to the destination, using an immediate byte control
VPBLENDMW	ℹ️	Performs blending of word elements between the first and the second operand (register or memory), using the instruction mask selector
Double Word Operands
VPBLENDD	ℹ️	Conditionally copies specified double word elements in the source operand to the destination, using an immediate byte control
VPBLENDMD	ℹ️	Performs blending of double word elements between the first and the second operand (register or memory), using the instruction mask selector
Quad Word Operands
VPBLENDMQ	ℹ️	Performs blending of quad word elements between the first and the second operand (register or memory), using the instruction mask selector
Single Precision Floating-point Operands
VBLENDPS	ℹ️	Conditionally copies specified data elements in the source operand to the destination, using an immediate byte control
VBLENDVPS	ℹ️	Conditionally copies specified data elements in the source operand to the destination, using an implied mask
VBLENDMPS	ℹ️	Performs blending between single-precision elements in the first operand with the elements in the second operand using an opmask register as select control
Double Precision Floating-point Operands
VBLENDPD	ℹ️	Conditionally copies specified data elements in the source operand to the destination, using an immediate byte control
VBLENDVPD	ℹ️	Conditionally copies specified data elements in the source operand to the destination, using an implied mask
VBLENDMPD	ℹ️	Performs blending between double-precision elements in the first operand with the elements in the second operand using an opmask register as select control

Instructions set top

Shuffle Instructions

Shuffle instructions shuffle values in packed SIMD operands.

Instruction	📄	Meaning
Bit Operands
VPSHUFBITQMB	ℹ️	Shuffle bits from quad word elements using byte indexes into mask
Byte Operands
VPSHUFB	ℹ️	Shuffle packed byte values
Word Operands
VPSHUFLW	ℹ️	Shuffle packed low words values
VPSHUFHW	ℹ️	Shuffle packed high words values
Double Word Operands
VPSHUFD	ℹ️	Shuffle packed double words values
VSHUFI32X4	ℹ️	Shuffle 128-bit packed double word values
Quad Word Operands
VSHUFI64X2	ℹ️	Shuffle 128-bit packed quad word values
Single Precision Floating-point Operands
VSHUFPS	ℹ️	Shuffles values in packed single-precision floating-point operands
VSHUFF32X4	ℹ️	Shuffle 128-bit packed single-precision floating-point operands
Double Precision Floating-point Operands
VSHUFPD	ℹ️	Shuffles values in packed double-precision floating-point operands
VSHUFF64X2	ℹ️	Shuffle 128-bit packed double-precision floating-point operands

Instructions set top

Permute Instructions

Instruction	📄	Meaning
Byte Operands
VPERMB	ℹ️	Permute packed bytes elements
VPERMI2B	ℹ️	Permute packed bytes elements from two tables using indexes
VPERMT2B	ℹ️	Full permute of two tables of bytes elements overwriting one source table
Word Operands
VPERMW	ℹ️	Permute packed word elements
VPERMI2W	ℹ️	Permute packed word elements from two tables using indexes
VPERMT2W	ℹ️	Full permute of two tables of word elements overwriting one source table
Double Word Operands
VPERMD	ℹ️	Permute packed double word elements
VPERMI2D	ℹ️	Permute packed double word elements from two tables using indexes
VPERMT2D	ℹ️	Full permute of two tables of double word elements overwriting one source table
Quad Word Operands
VPERMQ	ℹ️	Permute packed quad word elements
VPERMI2Q	ℹ️	Permute packed quad word elements from two tables using indexes
VPERMT2Q	ℹ️	Full permute of two tables of quad word elements overwriting one source table
Single Precision Floating-point Operands
VPERMPS	ℹ️	Permute packed single-precision floating-point elements
VPERMILPS	ℹ️	Permute packed single-precision floating-point elements using controls
VPERMI2PS	ℹ️	Permute packed single-precision elements from two tables using indexes
VPERMT2PS	ℹ️	Full permute of two tables of single-precision floating-point elements overwriting one source table
Double Precision Floating-point Operands
VPERMPD	ℹ️	Permute packed double-precision floating-point elements
VPERMILPD	ℹ️	Permute packed double-precision floating-point elements using controls
VPERMI2PD	ℹ️	Permute packed double-precision elements from two tables using indexes
VPERMT2PD	ℹ️	Full permute of two tables of double-precision floating-point elements overwriting one source table
128-bits Integer Operands
VPERM2I128	ℹ️	Permute 128-bit integer fields using controls
128-bits Floating-point Operands
VPERM2F128	ℹ️	Permute 128-bit floating-point fields using controls

Instructions set top

Unpack Instructions

Unpack instructions interleave values in packed SIMD operands.

Instruction	📄	Meaning
Byte Operands
VPUNPCKLBW	ℹ️	Unpack low-order bytes
VPUNPCKHBW	ℹ️	Unpack high-order bytes
Word Operands
VPUNPCKLWD	ℹ️	Unpack low-order words
VPUNPCKHWD	ℹ️	Unpack high-order words
Double Word Operands
VPUNPCKLDQ	ℹ️	Unpack low-order double words
VPUNPCKHDQ	ℹ️	Unpack high-order double words
Quad Word Operands
VPUNPCKLQDQ	ℹ️	Unpack low quad words
VPUNPCKHQDQ	ℹ️	Unpack high quad words
Single Precision Floating-point Operands
VUNPCKLPS	ℹ️	Unpacks and interleaves the two low-order values from two single-precision floating-point operands
VUNPCKHPS	ℹ️	Unpacks and interleaves the two high-order values from two single-precision floating-point operands
Double Precision Floating-point Operands
VUNPCKLPD	ℹ️	Unpacks and interleaves the low values from two packed double-precision floating-point operands
VUNPCKHPD	ℹ️	Unpacks and interleaves the high values from two packed double-precision floating-point operands

Instructions set top

Pack Instructions

The pack instructions pack bytes, words, and doublewords.

Instruction	📄	Meaning
Words into Bytes
VPACKSSWB	ℹ️	Pack words into bytes with signed saturation
VPACKUSWB	ℹ️	Pack words into bytes with unsigned saturation
Double Words into Words
VPACKSSDW	ℹ️	Pack double words into words with signed saturation
VPACKUSDW	ℹ️	Pack double words into words with unsigned saturation

Instructions set top

Conversion Instructions

These instructions perform conversion operations on operands of different types.

Instruction	📄	Meaning
Byte to Word
VPMOVSXBW	ℹ️	Sign extend the lower 8-bit integer of each packed word element into packed signed word integers
VPMOVZXBW	ℹ️	Zero extend the lower 8-bit integer of each packed word element into packed signed word integers
Byte to Double Word
VPMOVSXBD	ℹ️	Sign extend the lower 8-bit integer of each packed double word element into packed signed double word integers
VPMOVZXBD	ℹ️	Zero extend the lower 8-bit integer of each packed double word element into packed signed double word integers
Byte to Quad Word
VPMOVSXBQ	ℹ️	Sign extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXBQ	ℹ️	Zero extend the lower 8-bit integer of each packed quad word element into packed signed quad word integers
Word to Byte
VPMOVWB	ℹ️	Converts packed word integers into packed bytes with truncation
VPMOVSWB	ℹ️	Converts packed signed word integers into packed signed bytes using signed saturation
VPMOVUSWB	ℹ️	Converts packed unsigned word integers into packed unsigned bytes using unsigned saturation
Word to Double Word
VPMOVSXWD	ℹ️	Sign extend the lower 16-bit integer of each packed double word element into packed signed double word integers
VPMOVZXWD	ℹ️	Zero extend the lower 16-bit integer of each packed double word element into packed signed double word integers
Word to Quad Word
VPMOVSXWQ	ℹ️	Sign extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXWQ	ℹ️	Zero extend the lower 16-bit integer of each packed quad word element into packed signed quad word integers
Double Word to Byte
VPMOVDB	ℹ️	Converts packed double word integers into packed bytes with truncation
VPMOVSDB	ℹ️	Converts packed signed double word integers into packed signed bytes using signed saturation
VPMOVUSDB	ℹ️	Converts packed unsigned double word integers into packed unsigned bytes using unsigned saturation
Double Word to Word
VPMOVDW	ℹ️	Converts packed double word integers into packed words with truncation
VPMOVSDW	ℹ️	Converts packed signed double word integers into packed signed words using signed saturation
VPMOVUSDW	ℹ️	Converts packed unsigned double word integers into packed unsigned words using unsigned saturation
Double Word to Quad Word
VPMOVSXDQ	ℹ️	Sign extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers
VPMOVZXDQ	ℹ️	Zero extend the lower 32-bit integer of each packed quad word element into packed signed quad word integers
Quad Word to Byte
VPMOVQB	ℹ️	Converts packed quad word integers into packed bytes with truncation
VPMOVSQB	ℹ️	Converts packed signed quad word integers into packed signed bytes using signed saturation
VPMOVUSQB	ℹ️	Converts packed unsigned quad word integers into packed unsigned bytes using unsigned saturation
Quad Word to Word
VPMOVQW	ℹ️	Converts packed quad word integers into packed words with truncation
VPMOVSQW	ℹ️	Converts packed signed quad word integers into packed signed words using signed saturation
VPMOVUSQW	ℹ️	Converts packed unsigned quad word integers into packed unsigned words using unsigned saturation
Quad Word to Double Word
VPMOVQD	ℹ️	Converts packed quad word integers into packed double words with truncation
VPMOVSQD	ℹ️	Converts packed signed quad word integers into packed signed double words using signed saturation
VPMOVUSQD	ℹ️	Converts packed unsigned quad word integers into packed unsigned double words using unsigned saturation
Double Word to Single Precision Floating-point
VCVTSI2SS	ℹ️	Convert scalar signed double word integer to scalar single-precision floating-point value
VCVTUSI2SS	ℹ️	Convert scalar unsigned double word integer to scalar single-precision floating-point value
VCVTDQ2PS	ℹ️	Convert packed signed double word integers to packed single-precision floating-point values
VCVTUDQ2PS	ℹ️	Convert packed unsigned double word integers to packed single-precision floating-point values
Double Word to Double Precision Floating-point
VCVTSI2SD	ℹ️	Convert scalar signed double word integer to scalar double-precision floating-point value
VCVTUSI2SD	ℹ️	Convert scalar unsigned double word integer to scalar double-precision floating-point value
VCVTDQ2PD	ℹ️	Convert packed signed double word integers to packed double-precision floating-point values
VCVTUDQ2PD	ℹ️	Convert packed unsigned double word integers to packed double-precision floating-point values
Quad Word to Single Precision Floating-point
VCVTSI2SS	ℹ️	Convert scalar signed quad word integer to scalar single-precision floating-point value
VCVTUSI2SS	ℹ️	Convert scalar unsigned quad word integer to scalar single-precision floating-point value
VCVTQQ2PS	ℹ️	Convert packed signed quad word integers to packed single-precision floating-point values
VCVTUQQ2PS	ℹ️	Convert packed unsigned quad word integers to packed single-precision floating-point values
Quad Word to Double Precision Floating-point
VCVTSI2SD	ℹ️	Convert scalar signed quad word integer to scalar double-precision floating-point value
VCVTUSI2SD	ℹ️	Convert scalar unsigned quad word integer to scalar double-precision floating-point value
VCVTQQ2PD	ℹ️	Convert packed signed quad word integers to packed double-precision floating-point values
VCVTUQQ2PD	ℹ️	Convert packed unsigned quad word integers to packed double-precision floating-point values
Half Precision Floating-point to Single Precision Floating-point
VCVTPH2PS	ℹ️	Convert eight/four data element containing 16-bit floating-point data into eight/four single-precision floating-point data
Single Precision Floating-point to Double Word
VCVTSS2SI	ℹ️	Convert scalar single-precision floating-point value to scalar signed double word integer
VCVTSS2USI	ℹ️	Convert scalar single-precision floating-point value to scalar unsigned double word integer
VCVTPS2DQ	ℹ️	Convert packed single-precision floating-point values to packed signed double word integers
VCVTPS2UDQ	ℹ️	Convert packed single-precision floating-point values to packed unsigned double word integers
VCVTTSS2SI	ℹ️	Convert with truncation scalar single-precision floating-point value to scalar signed double word integer
VCVTTSS2USI	ℹ️	Convert with truncation scalar single-precision floating-point value to scalar unsigned double word integer
VCVTTPS2DQ	ℹ️	Convert with truncation packed single-precision floating-point values to packed signed double word integers
VCVTTPS2UDQ	ℹ️	Convert with truncation packed single-precision floating-point values to packed unsigned double word integers
Single Precision Floating-point to Quad Word
VCVTSS2SI	ℹ️	Convert scalar single-precision floating-point value to scalar signed quad word integer
VCVTSS2USI	ℹ️	Convert scalar single-precision floating-point value to scalar unsigned quad word integer
VCVTPS2QQ	ℹ️	Convert packed single-precision floating-point values to packed signed quad word integers
VCVTPS2UQQ	ℹ️	Convert packed single precision floating-point values to packed unsigned quad word integers
VCVTTSS2SI	ℹ️	Convert with truncation scalar single-precision floating-point value to scalar signed quad word integer
VCVTTSS2USI	ℹ️	Convert with truncation scalar single-precision floating-point value to scalar unsigned quad word integer
VCVTTPS2QQ	ℹ️	Convert with truncation packed single precision floating-point values to packed signed quad word integers
VCVTTPS2UQQ	ℹ️	Convert with truncation packed single precision floating-point values to packed unsigned quad word integers
Single Precision Floating-point to Half Precision Floating-point
VCVTPS2PH	ℹ️	Convert eight/four data element containing single-precision floating-point data into eight/four 16-bit floating-point data
Single Precision Floating-point to Double Precision Floating-point
VCVTSS2SD	ℹ️	Convert scalar single-precision floating-point value to scalar double-precision floating-point value
VCVTPS2PD	ℹ️	Convert packed single-precision floating-point values to packed double-precision floating-point values
Double Precision Floating-point to Double Word
VCVTSD2SI	ℹ️	Convert scalar double-precision floating-point value to scalar signed double word integer
VCVTSD2USI	ℹ️	Convert scalar double-precision floating-point value to scalar unsigned double word integer
VCVTPD2DQ	ℹ️	Convert packed double-precision floating-point values to packed signed double word integers
VCVTPD2UDQ	ℹ️	Convert packed double-precision floating-point values to packed unsigned double word integers
VCVTTSD2SI	ℹ️	Convert with truncation scalar double-precision floating-point value to scalar signed double word integer
VCVTTSD2USI	ℹ️	Convert with truncation scalar double-precision floating-point value to scalar unsigned double word integer
VCVTTPD2DQ	ℹ️	Convert with truncation packed double-precision floating-point values to packed signed double word integers
VCVTTPD2UDQ	ℹ️	Convert with truncation packed double-precision floating-point values to packed unsigned double word integers
Double Precision Floating-point to Quad Word
VCVTSD2SI	ℹ️	Convert scalar double-precision floating-point value to scalar signed quad word integer
VCVTSD2USI	ℹ️	Convert scalar double-precision floating-point value to scalar unsigned quad word integer
VCVTPD2QQ	ℹ️	Convert packed double-precision floating-point values to packed signed quad word integers
VCVTPD2UQQ	ℹ️	Convert packed double-precision floating-point values to packed unsigned quad word integers
VCVTTSD2SI	ℹ️	Convert with truncation scalar double-precision floating-point value to scalar signed quad word integer
VCVTTSD2USI	ℹ️	Convert with truncation scalar double-precision floating-point value to scalar unsigned quad word integer
VCVTTPD2QQ	ℹ️	Convert with truncation packed double-precision floating-point values to packed signed quad word integers
VCVTTPD2UQQ	ℹ️	Convert with truncation packed double-precision floating-point values to packed unsigned quad word integers
Double Precision Floating-point to Single Precision Floating-point
VCVTSD2SS	ℹ️	Convert scalar double-precision floating-point value to scalar single-precision floating-point value
VCVTPD2PS	ℹ️	Convert packed double-precision floating-point values to packed single-precision floating-point values

Instructions set top

Logical Instructions

The logical instructions perform AND, AND NOT, OR, and XOR operations on packed SIMD values.

Instruction	📄	Meaning
Byte Operands
VPTESTMB	ℹ️	Performs a bitwise logical AND of packed byte integers and set mask
VPTESTNMB	ℹ️	Performs a bitwise logical NOT AND of packed byte integers and set mask
Word Operands
VPTESTMW	ℹ️	Performs a bitwise logical AND of packed word integers and set mask
VPTESTNMW	ℹ️	Performs a bitwise logical NOT AND of packed word integers and set mask
Double Word Operands
VPTESTMD	ℹ️	Performs a bitwise logical AND of packed double word integers and set mask
VPTESTNMD	ℹ️	Performs a bitwise logical NOT AND of packed double word integers and set mask
VPANDD	ℹ️	Bitwise logical AND of packed double word integers
VPANDND	ℹ️	Bitwise logical AND NOT of packed double word integers
VPORD	ℹ️	Bitwise logical OR of packed double word integers
VPXORD	ℹ️	Bitwise logical exclusive XOR of packed double word integers
VPTERNLOGD	ℹ️	Bitwise ternary logic with double word granularity. The immediate value determines the specific binary function being implemented
Quad Word Operands
VPTESTMQ	ℹ️	Performs a bitwise logical AND of packed quad word integers and set mask
VPTESTNMQ	ℹ️	Performs a bitwise logical NOT AND of packed quad word integers and set mask
VPANDQ	ℹ️	Bitwise logical AND of packed quad word integers
VPANDNQ	ℹ️	Bitwise logical AND NOT of packed quad word integers
VPORQ	ℹ️	Bitwise logical OR of packed quad word integers
VPXORQ	ℹ️	Bitwise logical exclusive XOR of packed quad word integers
VPTERNLOGQ	ℹ️	Bitwise ternary logic with quad word granularity. The immediate value determines the specific binary function being implemented
Integer Operands
VPTEST	ℹ️	Performs a logical AND between the destinations with this mask and sets the ZF flag if the result is zero. The CF flag (zero for TEST) is set if the inverted mask AND with the destination is all zero
VPAND	ℹ️	Bitwise logical AND
VPANDN	ℹ️	Bitwise logical AND NOT
VPOR	ℹ️	Bitwise logical OR
VPXOR	ℹ️	Bitwise logical exclusive OR
Single Precision Floating-point Operands
VTESTPS	ℹ️	Packed bit test of single-precision floating-point elements
VANDPS	ℹ️	Perform bitwise logical AND of packed single-precision floating-point values
VANDNPS	ℹ️	Perform bitwise logical AND NOT of packed single-precision floating-point values
VORPS	ℹ️	Perform bitwise logical OR of packed single-precision floating-point values
VXORPS	ℹ️	Perform bitwise logical XOR of packed single-precision floating-point values
Double Precision Floating-point Operands
VTESTPD	ℹ️	Packed bit test of double-precision floating-point elements
VANDPD	ℹ️	Perform bitwise logical AND of packed double-precision floating-point values
VANDNPD	ℹ️	Perform bitwise logical AND NOT of packed double-precision floating-point values
VORPD	ℹ️	Perform bitwise logical OR of packed double-precision floating-point values
VXORPD	ℹ️	Perform bitwise logical XOR of packed double-precision floating-point values

Instructions set top

Shift and Rotate Instructions

The shift and rotate instructions shift and rotate packed bytes, words, or doublewords, or quadwords in 64-bit operands.

Instruction	📄	Meaning
Word Operands
VPSLLW	ℹ️	Shift packed words left logical
VPSRLW	ℹ️	Shift packed words right logical
VPSRAW	ℹ️	Shift packed words right arithmetic
VPSLLVW	ℹ️	Variable bit shift left logical
VPSRLVW	ℹ️	Variable bit shift right logical
VPSRAVW	ℹ️	Variable bit shift right arithmetic
VPSHLDW	ℹ️	Concatenate and shift packed words left logical
VPSHRDW	ℹ️	Concatenate and shift packed words right logical
VPSHLDVW	ℹ️	Concatenate and variable shift packed words left logical
VPSHRDVW	ℹ️	Concatenate and variable shift packed words right logical
Double Word Operands
VPSLLD	ℹ️	Shift packed double words left logical
VPSRLD	ℹ️	Shift packed double words right logical
VPSRAD	ℹ️	Shift packed double words right arithmetic
VPSLLVD	ℹ️	Variable bit shift left logical
VPSRLVD	ℹ️	Variable bit shift right logical
VPSRAVD	ℹ️	Variable bit shift right arithmetic
VPSHLDD	ℹ️	Concatenate and shift packed double words left logical
VPSHRDD	ℹ️	Concatenate and shift packed double words right logical
VPSHLDVD	ℹ️	Concatenate and variable shift packed double words left logical
VPSHRDVD	ℹ️	Concatenate and variable shift packed double words right logical
VPROLD	ℹ️	Rotate double words left using immediate bits count
VPRORD	ℹ️	Rotate double words right using immediate bits count
VPROLVD	ℹ️	Rotate double words left using variable bits count
VPRORVD	ℹ️	Rotate double words right using variable bits count
VALIGND	ℹ️	Shift right and merge vectors with double word granularity using immediate shift value
Quad Word Operands
VPSLLQ	ℹ️	Shift packed quad word left logical
VPSRLQ	ℹ️	Shift packed quad word right logical
VPSRAQ	ℹ️	Shift packed quad words right arithmetic
VPSLLVQ	ℹ️	Variable bit shift left logical
VPSRLVQ	ℹ️	Variable bit shift right logical
VPSRAVQ	ℹ️	Variable bit shift right arithmetic
VPSHLDQ	ℹ️	Concatenate and shift packed quad words left logical
VPSHRDQ	ℹ️	Concatenate and shift packed quad words right logical
VPSHLDVQ	ℹ️	Concatenate and variable shift packed quad words left logical
VPSHRDVQ	ℹ️	Concatenate and variable shift packed quad words right logical
VPROLQ	ℹ️	Rotate quad words left using immediate bits count
VPRORQ	ℹ️	Rotate quad words right using immediate bits count
VPROLVQ	ℹ️	Rotate quad words left using variable bits count
VPRORVQ	ℹ️	Rotate quad words right using variable bits count
VALIGNQ	ℹ️	Shift right and merge vectors with quad word granularity using immediate shift value
Double Quad Word Operands
VPSLLDQ	ℹ️	Shift double quad word left logical
VPSRLDQ	ℹ️	Shift double quad word right logical
VPALIGNR	ℹ️	Concatenate destination and source operands, extract byte aligned result shifted to the right by constant value

Instructions set top

Comparison Instructions

The compare instructions compare packed and scalar SIMD values and return the results of the comparison either to the destination operand or to the EFLAGS register.

Instruction	📄	Meaning
Byte Operands
VPCMPEQB	ℹ️	Compare packed bytes for equal
VPCMPGTB	ℹ️	Compare packed signed byte integers for greater than
VPCMPB	ℹ️	Compare packed signed byte values into mask
VPCMPUB	ℹ️	Compare packed unsigned byte values into mask
Word Operands
VPCMPEQW	ℹ️	Compare packed words for equal
VPCMPGTW	ℹ️	Compare packed signed word integers for greater than
VPCMPW	ℹ️	Compare packed signed word values into mask
VPCMPUW	ℹ️	Compare packed unsigned word values into mask
Double Word Operands
VPCMPEQD	ℹ️	Compare packed double words for equal
VPCMPGTD	ℹ️	Compare packed signed double word integers for greater than
VPCMPD	ℹ️	Compare packed signed double word values into mask
VPCMPUD	ℹ️	Compare packed unsigned double word values into mask
VP2INTERSECTD	ℹ️	Compute intersection between double words to a pair of mask registers
Quad Word Operands
VPCMPEQQ	ℹ️	Compare packed quad words for equal
VPCMPGTQ	ℹ️	Compare packed signed quad word integers for greater than
VPCMPQ	ℹ️	Compare packed signed quad word values into mask
VPCMPUQ	ℹ️	Compare packed unsigned quad word values into mask
VP2INTERSECTQ	ℹ️	Compute intersection between quad words to a pair of mask registers
Single Precision Floating-point Operands
VCMPEQPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is equal to source value
VCMPLTPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is less than source value
VCMPLEPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is greater than source value
VCMPGEPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDPS	ℹ️	Compare packed single-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is not less than source value
VCMPNLEPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGEPS	ℹ️	Compare packed single-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDPS	ℹ️	Compare packed single-precision floating-point values and set mask if neither of both source operands is a NaN
VCMPEQSS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is equal to source value
VCMPLTSS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is less than source value
VCMPLESS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTSS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is greater than source value
VCMPGESS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDSS	ℹ️	Compare scalar single-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQSS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTSS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is not less than source value
VCMPNLESS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTSS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGESS	ℹ️	Compare scalar single-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDSS	ℹ️	Compare scalar single-precision floating-point values and set mask if neither of both source operands is a NaN
VCOMISS	ℹ️	Perform ordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register
VUCOMISS	ℹ️	Perform unordered comparison of scalar single-precision floating-point value and set flags in EFLAGS register
Double Precision Floating-point Operands
VCMPEQPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is equal to source value
VCMPLTPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is less than source value
VCMPLEPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is greater than source value
VCMPGEPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDPD	ℹ️	Compare packed double-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is not less than source value
VCMPNLEPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGEPD	ℹ️	Compare packed double-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDPD	ℹ️	Compare packed double-precision floating-point values and set mask if neither of both source operands is a NaN
VCMPEQSD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is equal to source value
VCMPLTSD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is less than source value
VCMPLESD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is less than or equal to source value
VCMPGTSD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is greater than source value
VCMPGESD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is greater than or equal to source value
VCMPUNORDSD	ℹ️	Compare scalar double-precision floating-point values and set mask if at least one of the two source operands is a NaN
VCMPNEQSD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is not equal to source value
VCMPNLTSD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is not less than source value
VCMPNLESD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is not less than or equal to source value
VCMPNGTSD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is not greater than source value
VCMPNGESD	ℹ️	Compare scalar double-precision floating-point values and set mask if destination value is not greater than or equal to source value
VCMPORDSD	ℹ️	Compare scalar double-precision floating-point values and set mask if neither of both source operands is a NaN
VCOMISD	ℹ️	Perform ordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register
VUCOMISD	ℹ️	Perform unordered comparison of scalar double-precision floating-point value and set flags in EFLAGS register

Instructions set top

Packed Arithmetic Instructions

The arithmetic instructions perform addition, subtraction, multiply, and divide on packed and scalar SIMD operands.

Instruction	📄	Meaning
Byte Operands
VPADDB	ℹ️	Add packed byte integers
VPADDUSB	ℹ️	Add packed unsigned byte integers with unsigned saturation
VPADDSB	ℹ️	Add packed signed byte integers with signed saturation
VPSUBB	ℹ️	Subtract packed byte integers
VPSUBUSB	ℹ️	Subtract packed unsigned byte integers with unsigned saturation
VPSUBSB	ℹ️	Subtract packed signed byte integers with signed saturation
VPDPBUSD	ℹ️	Multiply and add unsigned and signed byte integers
VPDPBUSDS	ℹ️	Multiply and add unsigned and signed byte integers with saturation
VPDPBUUD	ℹ️	Multiply groups of 4 pairs of corresponding unsigned bytes, summing products and adding them to the result
VPDPBUUDS	ℹ️	Multiply groups of 4 pairs of corresponding unsigned bytes, summing products and adding them to the result, with unsigned saturation
VPDPBSSD	ℹ️	Multiply groups of 4 pairs of corresponding signed bytes, summing products and adding them to the result
VPDPBSSDS	ℹ️	Multiply groups of 4 pairs of corresponding signed bytes, summing products and adding them to the result, with signed saturation
VPDPBSUD	ℹ️	Multiply groups of 4 pairs of corresponding unsigned and signed bytes, summing products and adding them to the result
VPDPBSUDS	ℹ️	Multiply groups of 4 pairs of corresponding unsigned and signed bytes, summing products and adding them to the result, with signed saturation
Word Operands
VPADDW	ℹ️	Add packed word integers
VPADDUSW	ℹ️	Add packed unsigned word integers with unsigned saturation
VPADDSW	ℹ️	Add packed signed word integers with signed saturation
VPHADDW	ℹ️	Adds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed 16-bit results to the destination operand
VPHADDSW	ℹ️	Adds two adjacent, signed 16-bit integers horizontally from the source and destination operands and packs the signed, saturated 16-bit results to the destination operand
VPSUBW	ℹ️	Subtract packed word integers
VPSUBUSW	ℹ️	Subtract packed unsigned word integers with unsigned saturation
VPSUBSW	ℹ️	Subtract packed signed word integers with signed saturation
VPHSUBW	ℹ️	Performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed 16-bit results are packed and written to the destination operand
VPHSUBSW	ℹ️	Performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the source and destination operands. The signed, saturated 16-bit results are packed and written to the destination operand
VPDPWSSD	ℹ️	Multiply and add signed word integers
VPDPWSSDS	ℹ️	Multiply and add signed word integers with saturation
VPDPWUUD	ℹ️	Multiply groups of 2 pairs of corresponding unsigned words, summing products and adding them to the result
VPDPWUUDS	ℹ️	Multiply groups of 2 pairs of corresponding unsigned words, summing products and adding them to the result, with unsigned saturation
VPDPWSUD	ℹ️	Multiply groups of 2 pairs of corresponding unsigned and signed words, summing products and adding them to the result
VPDPWSUDS	ℹ️	Multiply groups of 2 pairs of corresponding unsigned and signed words, summing products and adding them to the result, with signed saturation
VPDPWUSD	ℹ️	Multiply groups of 2 pairs of corresponding signed and unsigned words, summing products and adding them to result
VPDPWUSDS	ℹ️	Multiply groups of 2 pairs of corresponding signed and unsigned words, summing products and adding them to result, with signed saturation
VPMULHUW	ℹ️	Multiply packed unsigned integers and store high result
VPMULLW	ℹ️	Multiply packed signed word integers and store low result
VPMULHW	ℹ️	Multiply packed signed word integers and store high result
VPMULHRSW	ℹ️	Multiplies vertically each signed 16-bit integer from the destination operand with the corresponding signed 16-bit integer of the source operand, producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand
VPMADDUBSW	ℹ️	Multiplies each unsigned byte value with the corresponding signed byte value to produce an intermediate, 16-bit signed integer. Each adjacent pair of 16-bit signed values are added horizontally. The signed, saturated 16-bit results are packed to the destination operand
Double Word Operands
VPADDD	ℹ️	Add packed double word integers
VPHADDD	ℹ️	Adds two adjacent, signed 32-bit integers horizontally from the source and destination operands and packs the signed 32-bit results to the destination operand
VPSUBD	ℹ️	Subtract packed double word integers
VPHSUBD	ℹ️	Performs horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant double word from the least significant double word of each pair in the source and destination operands. The signed 32-bit results are packed and written to the destination operand
VPMULLD	ℹ️	Returns four lower 32-bits of the 64-bit results of signed 32-bit integer multiplies
VPMADDWD	ℹ️	Multiply and add packed word integers
Quad Word Operands
VPADDQ	ℹ️	Add packed quad word integers
VPSUBQ	ℹ️	Subtract packed quad word integers
VPMULUDQ	ℹ️	Multiply packed unsigned double word integers
VPMULDQ	ℹ️	Returns two 64-bit signed result of signed 32-bit integer multiplies
VPMULLQ	ℹ️	Returns two lower 64-bits of the 128-bit results of signed 64-bit integer multiplies
Single Precision Floating-point Operands
VADDSS	ℹ️	Add scalar single-precision floating-point value
VADDPS	ℹ️	Add packed single-precision floating-point values
VHADDPS	ℹ️	Performs a single-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand
VSUBSS	ℹ️	Subtract scalar single-precision floating-point value
VSUBPS	ℹ️	Subtract packed single-precision floating-point values
VHSUBPS	ℹ️	Performs a single-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from the third element of the second operand
VADDSUBPS	ℹ️	Performs single-precision addition on the second and fourth pairs of 32-bit data elements within the operands; single-precision subtraction on the first and third pairs
VMULSS	ℹ️	Multiply scalar single-precision floating-point value
VMULPS	ℹ️	Multiply packed single-precision floating-point values
VDIVSS	ℹ️	Divide scalar single-precision floating-point value
VDIVPS	ℹ️	Divide packed single-precision floating-point values
Double Precision Floating-point Operands
VADDSD	ℹ️	Add scalar double precision floating-point value
VADDPD	ℹ️	Add packed double-precision floating-point values
VHADDPD	ℹ️	Performs a double-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand
VSUBSD	ℹ️	Subtract scalar double-precision floating-point value
VSUBPD	ℹ️	Subtract scalar double-precision floating-point value
VHSUBPD	ℹ️	Performs a double-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the second operand from the first element of the second operand
VADDSUBPD	ℹ️	Performs double-precision addition on the second pair of quad words, and double-precision subtraction on the first pair
VMULSD	ℹ️	Multiply scalar double-precision floating-point value
VMULPD	ℹ️	Multiply packed double-precision floating-point values
VDIVSD	ℹ️	Divide scalar double-precision floating-point value
VDIVPD	ℹ️	Divide packed double-precision floating-point values

Instructions set top

Fused Arithmetic Instructions

Instruction	📄	Meaning
Single Precision Floating-point Operands
VFMADD132SS	ℹ️	Fused multiply-add of scalar single-precision floating-point values: s₁ * s₃ + s₂
VFMADD213SS	ℹ️	Fused multiply-add of scalar single-precision floating-point values: s₂ * s₁ + s₃
VFMADD231SS	ℹ️	Fused multiply-add of scalar single-precision floating-point values: s₂ * s₃ + s₁
VFMADD132PS	ℹ️	Fused multiply-add of packed single-precision floating-point values: v₁ * v₃ + v₂
VFMADD213PS	ℹ️	Fused multiply-add of packed single-precision floating-point values: v₂ * v₁ + v₃
VFMADD231PS	ℹ️	Fused multiply-add of packed single-precision floating-point values: v₂ * v₃ + v₁
VFNMADD132SS	ℹ️	Fused negative multiply-add of scalar single-precision floating-point values: -s₁ * s₃ + s₂
VFNMADD213SS	ℹ️	Fused negative multiply-add of scalar single-precision floating-point values: -s₂ * s₁ + s₃
VFNMADD231SS	ℹ️	Fused negative multiply-add of scalar single-precision floating-point values: -s₂ * s₃ + s₁
VFNMADD132PS	ℹ️	Fused negative multiply-add of packed single-precision floating-point values: -v₁ * v₃ + v₂
VFNMADD213PS	ℹ️	Fused negative multiply-add of packed single-precision floating-point values: -v₂ * v₁ + v₃
VFNMADD231PS	ℹ️	Fused negative multiply-add of packed single-precision floating-point values: -v₂ * v₃ + v₁
VFMSUB132SS	ℹ️	Fused multiply-subtract of scalar single-precision floating-point values: s₁ * s₃ - s₂
VFMSUB213SS	ℹ️	Fused multiply-subtract of scalar single-precision floating-point values: s₂ * s₁ - s₃
VFMSUB231SS	ℹ️	Fused multiply-subtract of scalar single-precision floating-point values: s₂ * s₃ - s₁
VFMSUB132PS	ℹ️	Fused multiply-subtract of packed single-precision floating-point values: v₁ * v₃ - v₂
VFMSUB213PS	ℹ️	Fused multiply-subtract of packed single-precision floating-point values: v₂ * v₁ - v₃
VFMSUB231PS	ℹ️	Fused multiply-subtract of packed single-precision floating-point values: v₂ * v₃ - v₁
VFNMSUB132SS	ℹ️	Fused negative multiply-subtract of scalar single-precision floating-point values: -s₁ * s₃ - s₂
VFNMSUB213SS	ℹ️	Fused negative multiply-subtract of scalar single-precision floating-point values: -s₂ * s₁ - s₃
VFNMSUB231SS	ℹ️	Fused negative multiply-subtract of scalar single-precision floating-point values: -s₂ * s₃ - s₁
VFNMSUB132PS	ℹ️	Fused negative multiply-subtract of packed single-precision floating-point values: -v₁ * v₃ - v₂
VFNMSUB213PS	ℹ️	Fused negative multiply-subtract of packed single-precision floating-point values: -v₂ * v₁ - v₃
VFNMSUB231PS	ℹ️	Fused negative multiply-subtract of packed single-precision floating-point values: -v₂ * v₃ - v₁
VFMADDSUB132PS	ℹ️	Fused multiply-alternating add/subtract of packed single-precision floating-point values: v₁ * v₃ ± v₂
VFMADDSUB213PS	ℹ️	Fused multiply-alternating add/subtract of packed single-precision floating-point values: v₂ * v₁ ± v₃
VFMADDSUB231PS	ℹ️	Fused multiply-alternating add/subtract of packed single-precision floating-point values: v₂ * v₃ ± v₁
VFMSUBADD132PS	ℹ️	Fused multiply-alternating subtract/add of packed single-precision floating-point values: v₁ * v₃ ∓ v₂
VFMSUBADD213PS	ℹ️	Fused multiply-alternating subtract/add of packed single-precision floating-point values: v₂ * v₁ ∓ v₃
VFMSUBADD231PS	ℹ️	Fused multiply-alternating subtract/add of packed single-precision floating-point values: v₂ * v₃ ∓ v₁
Double Precision Floating-point Operands
VFMADD132SD	ℹ️	Fused multiply-add of scalar double-precision floating-point values: s₁ * s₃ + s₂
VFMADD213SD	ℹ️	Fused multiply-add of scalar double-precision floating-point values: s₂ * s₁ + s₃
VFMADD231SD	ℹ️	Fused multiply-add of scalar double-precision floating-point values: s₂ * s₃ + s₁
VFMADD132PD	ℹ️	Fused multiply-add of packed double-precision floating-point values: v₁ * v₃ + v₂
VFMADD213PD	ℹ️	Fused multiply-add of packed double-precision floating-point values: v₂ * v₁ + v₃
VFMADD231PD	ℹ️	Fused multiply-add of packed double-precision floating-point values: v₂ * v₃ + v₁
VFNMADD132SD	ℹ️	Fused negative multiply-add of scalar double-precision floating-point values: -s₁ * s₃ + s₂
VFNMADD213SD	ℹ️	Fused negative multiply-add of scalar double-precision floating-point values: -s₂ * s₁ + s₃
VFNMADD231SD	ℹ️	Fused negative multiply-add of scalar double-precision floating-point values: -s₂ * s₃ + s₁
VFNMADD132PD	ℹ️	Fused negative multiply-add of packed double-precision floating-point values: -v₁ * v₃ + v₂
VFNMADD213PD	ℹ️	Fused negative multiply-add of packed double-precision floating-point values: -v₂ * v₁ + v₃
VFNMADD231PD	ℹ️	Fused negative multiply-add of packed double-precision floating-point values: -v₂ * v₃ + v₁
VFMSUB132SD	ℹ️	Fused multiply-subtract of scalar double-precision floating-point values: s₁ * s₃ - s₂
VFMSUB213SD	ℹ️	Fused multiply-subtract of scalar double-precision floating-point values: s₂ * s₁ - s₃
VFMSUB231SD	ℹ️	Fused multiply-subtract of scalar double-precision floating-point values: s₂ * s₃ - s₁
VFMSUB132PD	ℹ️	Fused multiply-subtract of packed double-precision floating-point values: v₁ * v₃ - v₂
VFMSUB213PD	ℹ️	Fused multiply-subtract of packed double-precision floating-point values: v₂ * v₁ - v₃
VFMSUB231PD	ℹ️	Fused multiply-subtract of packed double-precision floating-point values: v₂ * v₃ - v₁
VFNMSUB132SD	ℹ️	Fused negative multiply-subtract of scalar double-precision floating-point values: -s₁ * s₃ - s₂
VFNMSUB213SD	ℹ️	Fused negative multiply-subtract of scalar double-precision floating-point values: -s₂ * s₁ - s₃
VFNMSUB231SD	ℹ️	Fused negative multiply-subtract of scalar double-precision floating-point values: -s₂ * s₃ - s₁
VFNMSUB132PD	ℹ️	Fused negative multiply-subtract of packed double-precision floating-point values: -v₁ * v₃ - v₂
VFNMSUB213PD	ℹ️	Fused negative multiply-subtract of packed double-precision floating-point values: -v₂ * v₁ - v₃
VFNMSUB231PD	ℹ️	Fused negative multiply-subtract of packed double-precision floating-point values: -v₂ * v₃ - v₁
VFMADDSUB132PD	ℹ️	Fused multiply-alternating add/subtract of packed double-precision floating-point values: v₁ * v₃ ± v₂
VFMADDSUB213PD	ℹ️	Fused multiply-alternating add/subtract of packed double-precision floating-point values: v₂ * v₁ ± v₃
VFMADDSUB231PD	ℹ️	Fused multiply-alternating add/subtract of packed double-precision floating-point values: v₂ * v₃ ± v₁
VFMSUBADD132PD	ℹ️	Fused multiply-alternating subtract/add of packed double-precision floating-point values: v₁ * v₃ ∓ v₂
VFMSUBADD213PD	ℹ️	Fused multiply-alternating subtract/add of packed double-precision floating-point values: v₂ * v₁ ∓ v₃
VFMSUBADD231PD	ℹ️	Fused multiply-alternating subtract/add of packed double-precision floating-point values: v₂ * v₃ ∓ v₁

Instructions set top

Function Primitives

These instructions perform square root, absolute value, rounding and maximum/minimum operations on packed and scalar SIMD operands.

Instruction	📄	Meaning
Byte Operands
VPOPCNTB	ℹ️	Compute the number of bits set to 1 in each byte
VPABSB	ℹ️	Computes the absolute value of each signed byte data element
VPSIGNB	ℹ️	Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPAVGB	ℹ️	Compute average of packed unsigned byte integers
VPMINUB	ℹ️	Minimum of packed unsigned byte integers
VPMINSB	ℹ️	Minimum of packed signed byte integers
VPMAXUB	ℹ️	Maximum of packed unsigned byte integers
VPMAXSB	ℹ️	Maximum of packed signed byte integers
VPSADBW	ℹ️	Compute sum of absolute differences
VMPSADBW	ℹ️	Performs eight 4-byte wide sum of absolute differences operations to produce eight word integers
VDBPSADBW	ℹ️	Double block packed Sum of Absolute Differences on unsigned bytes
Word Operands
VPOPCNTW	ℹ️	Compute the number of bits set to 1 in each word
VPABSW	ℹ️	Computes the absolute value of each signed word data element
VPSIGNW	ℹ️	Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPAVGW	ℹ️	Compute average of packed unsigned word integers
VPMINUW	ℹ️	Minimum of packed unsigned word integers
VPMINSW	ℹ️	Minimum of packed signed word integers
VPMAXUW	ℹ️	Maximum of packed unsigned word integers
VPMAXSW	ℹ️	Maximum of packed signed word integers
VPHMINPOSUW	ℹ️	Finds the value and location of the minimum unsigned word from one of 8 horizontally packed unsigned words. The resulting value and location (offset within the source) are packed into the low double word of the destination YMM register
Double Word Operands
VPOPCNTD	ℹ️	Compute the number of bits set to 1 in each double word
VPABSD	ℹ️	Computes the absolute value of each signed double word data element
VPSIGND	ℹ️	Negates each signed integer element of the destination operand if the sign of the corresponding element in the source operand is less than zero
VPMINUD	ℹ️	Minimum of packed unsigned double word integers
VPMINSD	ℹ️	Minimum of packed signed double word integers
VPMAXUD	ℹ️	Maximum of packed unsigned double word integers
VPMAXSD	ℹ️	Maximum of packed signed double word integers
VPLZCNTD	ℹ️	Count the number of leading zero bits in each packed double word element
VPCONFLICTD	ℹ️	Detect conflicts within a vector of packed double word values into dense memory
Quad Word Operands
VPOPCNTQ	ℹ️	Compute the number of bits set to 1 in each quad word
VPABSQ	ℹ️	Computes the absolute value of each signed quad word data element
VPMINUQ	ℹ️	Minimum of packed unsigned quad word integers
VPMINSQ	ℹ️	Minimum of packed signed quad word integers
VPMAXUQ	ℹ️	Maximum of packed unsigned quad word integers
VPMAXSQ	ℹ️	Maximum of packed signed quad word integers
VPLZCNTQ	ℹ️	Count the number of leading zero bits in each packed quad word element
VPCONFLICTQ	ℹ️	Detect conflicts within a vector of packed quad word values into dense memory
Single Precision Floating-point Operands
VSQRTSS	ℹ️	Compute square root of scalar single-precision floating-point value
VSQRTPS	ℹ️	Compute square roots of packed single-precision floating-point values
VMINSS	ℹ️	Return minimum scalar single-precision floating-point value
VMINPS	ℹ️	Return minimum packed single-precision floating-point values
VMAXSS	ℹ️	Return maximum scalar single-precision floating-point value
VMAXPS	ℹ️	Return maximum packed single-precision floating-point values
VROUNDSS	ℹ️	Round scalar single precision floating-point value into an integer value and return a rounded floating-point value
VROUNDPS	ℹ️	Round packed single precision floating-point values into integer values and return rounded floating-point values
VRNDSCALESS	ℹ️	Round scalar single-precision floating-point value to include a given number of fraction bits
VRNDSCALEPS	ℹ️	Round packed single-precision floating-point values to include a given number of fraction bits
VDPPS	ℹ️	Perform single-precision dot products for up to 4 elements and broadcast
VRANGESS	ℹ️	Range restriction calculation for pairs of scalar single-precision floating-point values
VRANGEPS	ℹ️	Range restriction calculation for packed pairs of single-precision floating-point values
VREDUCESS	ℹ️	Perform a reduction transformation on a scalar single-precision floating-point value by subtracting a number of fraction bits
VREDUCEPS	ℹ️	Perform reduction transformation on packed single-precision floating-point values by subtracting a number of fraction bits
VGETEXPSS	ℹ️	Convert the biased exponent of scalar single-precision floating-point value to floating-point value representing unbiased integer exponent
VGETEXPPS	ℹ️	Convert the biased exponent of packed single-precision floating-point values to floating-point values representing unbiased integer exponent
VGETMANTSS	ℹ️	Extract the normalized mantissa from scalar single-precision floating-point value
VGETMANTPS	ℹ️	Extract the normalized mantissa from packed single-precision floating-point values
VSCALEFSS	ℹ️	Scale scalar single-precision floating-point value
VSCALEFPS	ℹ️	Scale packed single-precision floating-point values
VEXP2PS		Approximation to the exponential 2^x of packed single-precision floating-point values with less than 2^-23 relative error
VFPCLASSSS	ℹ️	Tests scalar single-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFPCLASSPS	ℹ️	Tests packed single-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFIXUPIMMSS	ℹ️	Fix up special scalar single-precision floating-point value
VFIXUPIMMPS	ℹ️	Fix up special packed single-precision floating-point values
VRCP14SS	ℹ️	Computes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error < 2^-28
VRCP14PS	ℹ️	Computes the approximate reciprocals of the packed single-precision floating-point values. The max relative error < 2^-28
VRCP28SS		Computes the approximate reciprocal of the scalar single-precision floating-point value. The max relative error < 2^-28
VRCP28PS		Computes the approximate reciprocals of the packed single-precision floating-point values. The max relative error < 2^-28
VRSQRT14SS	ℹ️	Computes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error < 2^-14
VRSQRT14PS	ℹ️	Computes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error < 2^-14
VRSQRT28SS		Computes the approximate reciprocal square root of the scalar single-precision floating-point value. The max relative error < 2^-28
VRSQRT28PS		Computes the approximate reciprocal square roots of the packed single-precision floating-point values. The max relative error < 2^-28
VRCPPS	ℹ️	Compute reciprocals of packed single-precision floating-point values
VRCPSS	ℹ️	Compute reciprocal of scalar single-precision floating-point value
VRSQRTPS	ℹ️	Compute reciprocals of square roots of packed single-precision floating-point values
VRSQRTSS	ℹ️	Compute reciprocal of square root of scalar single-precision floating-point value
Double Precision Floating-point Operands
VSQRTSD	ℹ️	Compute scalar square root of scalar double-precision floating-point value
VSQRTPD	ℹ️	Compute packed square roots of packed double-precision floating-point values
VMINSD	ℹ️	Return minimum scalar double-precision floating-point value
VMINPD	ℹ️	Return minimum packed double-precision floating-point values
VMAXSD	ℹ️	Return maximum scalar double-precision floating-point value
VMAXPD	ℹ️	Return maximum packed double-precision floating-point values
VROUNDSD	ℹ️	Round scalar double precision floating-point value into an integer value and return a rounded floating-point value
VROUNDPD	ℹ️	Round packed double precision floating-point values into integer values and return rounded floating-point values
VRNDSCALESD	ℹ️	Round scalar double-precision floating-point value to include a given number of fraction bits
VRNDSCALEPD	ℹ️	Round packed double-precision floating-point values to include a given number of fraction bits
VDPPD	ℹ️	Perform double-precision dot product for up to 2 elements and broadcast
VRANGESD	ℹ️	Range restriction calculation for pairs of scalar double-precision floating-point values
VRANGEPD	ℹ️	Range restriction calculation for packed pairs of double-precision floating-point values
VREDUCESD	ℹ️	Perform a reduction transformation on a scalar double-precision floating-point value by subtracting a number of fraction bits
VREDUCEPD	ℹ️	Perform reduction transformation on packed double-precision floating-point values by subtracting a number of fraction bits
VGETEXPSD	ℹ️	Convert the biased exponent of scalar double-precision floating-point value to floating-point value representing unbiased integer exponent
VGETEXPPD	ℹ️	Convert the biased exponent of packed double-precision floating-point values to floating-point values representing unbiased integer exponent
VGETMANTSD	ℹ️	Extract the normalized mantissa from scalar double-precision floating-point value
VGETMANTPD	ℹ️	Extract the normalized mantissa from packed double-precision floating-point values
VSCALEFSD	ℹ️	Scale scalar double-precision floating-point value
VSCALEFPD	ℹ️	Scale packed double-precision floating-point values
VEXP2PD		Approximation to the exponential 2^x of packed double-precision floating-point values with less than 2^-23 relative error
VFPCLASSSD	ℹ️	Tests scalar double-precision floating-point value for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFPCLASSPD	ℹ️	Tests packed double-precision floating-point values for the following categories: NaN, +0, -0, +Inf, -Inf, denormal, finite, negative
VFIXUPIMMSD	ℹ️	Fix up special scalar double-precision floating-point value
VFIXUPIMMPD	ℹ️	Fix up special packed double-precision floating-point values
VRCP14SD	ℹ️	Computes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error < 2^-28
VRCP14PD	ℹ️	Computes the approximate reciprocals of the packed double-precision floating-point values. The max relative error < 2^-28
VRCP28SD		Computes the approximate reciprocal of the scalar double-precision floating-point value. The max relative error < 2^-28
VRCP28PD		Computes the approximate reciprocals of the packed double-precision floating-point values. The max relative error < 2^-28
VRSQRT14SD	ℹ️	Computes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error < 2^-14
VRSQRT14PD	ℹ️	Computes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error < 2^-14
VRSQRT28SD		Computes the approximate reciprocal square root of the scalar double-precision floating-point value. The max relative error < 2^-28
VRSQRT28PD		Computes the approximate reciprocal square roots of the packed double-precision floating-point values. The max relative error < 2^-28

Instructions set top

Opmask Instructions

Instruction	📄	Meaning
8-bit Operands
KMOVB	ℹ️	Move 8-bit from and to mask registers
KTESTB	ℹ️	Set ZF and CF depending on sign bit AND and ANDN of 8-bit masks
KORTESTB	ℹ️	Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTB	ℹ️	Bitwise NOT of 8-bits mask
KANDB	ℹ️	Bitwise logical AND of two 8-bit masks
KANDNB	ℹ️	Bitwise logical AND NOT of two 8-bit masks
KORB	ℹ️	Bitwise logical OR of two 8-bit masks
KXORB	ℹ️	Bitwise logical XOR of two 8-bit masks
KXNORB	ℹ️	Bitwise logical XNOR of two 8-bit masks
KADDB	ℹ️	Add two 8-bit masks
KSHIFTLB	ℹ️	Shift left 8-bit mask register
KSHIFTRB	ℹ️	Shift right 8-bit mask register
KUNPCKBW	ℹ️	Unpack and interleave 8-bit masks
VPMOVM2B	ℹ️	Convert a mask register to a vector register
VPMOVB2M	ℹ️	Converts a vector register to a mask register
16-bit Operands
KMOVW	ℹ️	Move 16-bit from and to mask registers
KTESTW	ℹ️	Set ZF and CF depending on sign bit AND and ANDN of 16-bit masks
KORTESTW	ℹ️	Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTW	ℹ️	Bitwise NOT of 16-bits mask
KANDW	ℹ️	Bitwise logical AND of two 16-bit masks
KANDNW	ℹ️	Bitwise logical AND NOT of two 16-bit masks
KORW	ℹ️	Bitwise logical OR of two 16-bit masks
KXORW	ℹ️	Bitwise logical XOR of two 16-bit masks
KXNORW	ℹ️	Bitwise logical XNOR of two 16-bit masks
KADDW	ℹ️	Add two 16-bit masks
KSHIFTLW	ℹ️	Shift left 16-bit mask register
KSHIFTRW	ℹ️	Shift right 16-bit mask register
KUNPCKWD	ℹ️	Unpack and interleave 16-bit masks
VPMOVM2W	ℹ️	Convert a mask register to a vector register
VPMOVW2M	ℹ️	Converts a vector register to a mask register
32-bit Operands
KMOVD	ℹ️	Move 32-bit from and to mask registers
KTESTD	ℹ️	Set ZF and CF depending on sign bit AND and ANDN of 32-bit masks
KORTESTD	ℹ️	Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTD	ℹ️	Bitwise NOT of 32-bits mask
KANDD	ℹ️	Bitwise logical AND of two 32-bit masks
KANDND	ℹ️	Bitwise logical AND NOT of two 32-bit masks
KORD	ℹ️	Bitwise logical OR of two 32-bit masks
KXORD	ℹ️	Bitwise logical XOR of two 32-bit masks
KXNORD	ℹ️	Bitwise logical XNOR of two 32-bit masks
KADDD	ℹ️	Add two 32-bit masks
KSHIFTLD	ℹ️	Shift left 32-bit mask register
KSHIFTRD	ℹ️	Shift right 32-bit mask register
KUNPCKDQ	ℹ️	Unpack and interleave 32-bit masks
VPMOVM2D	ℹ️	Convert a mask register to a vector register
VPMOVD2M	ℹ️	Converts a vector register to a mask register
64-bit Operands
KMOVQ	ℹ️	Move 64-bit from and to mask registers
KTESTQ	ℹ️	Set ZF and CF depending on sign bit AND and ANDN of 64-bit masks
KORTESTQ	ℹ️	Bitwise logical OR of two 8-bit masks with setting ZF CF accordingly
KNOTQ	ℹ️	Bitwise NOT of 64-bits mask
KANDQ	ℹ️	Bitwise logical AND of two 64-bit masks
KANDNQ	ℹ️	Bitwise logical AND NOT of two 64-bit masks
KORQ	ℹ️	Bitwise logical OR of two 64-bit masks
KXORQ	ℹ️	Bitwise logical XOR of two 64-bit masks
KXNORQ	ℹ️	Bitwise logical XNOR of two 64-bit masks
KADDQ	ℹ️	Add two 64-bit masks
KSHIFTLQ	ℹ️	Shift left 64-bit mask register
KSHIFTRQ	ℹ️	Shift right 64-bit mask register
VPMOVM2Q	ℹ️	Convert a mask register to a vector register
VPMOVQ2M	ℹ️	Converts a vector register to a mask register

Instructions set top

String and Text Processing Instructions

Instruction	📄	Meaning
VPCMPESTRI	ℹ️	Packed compare explicit-length strings, return index in ECX/RCX
VPCMPESTRM	ℹ️	Packed compare explicit-length strings, return mask in YMM0
VPCMPISTRI	ℹ️	Packed compare implicit-length strings, return index in ECX/RCX
VPCMPISTRM	ℹ️	Packed compare implicit-length strings, return mask in YMM0

Instructions set top

Secure Hash Algorithm Instructions

SHA extensions provide a set of instructions that target the acceleration of the Secure Hash Algorithm (SHA), specifically the SHA-1 and SHA-256 variants.

Instruction	📄	Meaning
SHA-1
SHA1NEXTE	ℹ️	Calculate SHA1 state variable E after four founds
SHA1RNDS4	ℹ️	Perform four rounds of SHA1 operation
SHA1MSG1	ℹ️	Perform an intermediate calculation for the next four SHA1 message double words
SHA1MSG2	ℹ️	Perform a final calculation for the next four SHA1 message double words
SHA-256
SHA256RNDS2	ℹ️	Perform two rounds of SHA256 operation
SHA256MSG1	ℹ️	Perform an intermediate calculation for the next four SHA256 message double words
SHA256MSG2	ℹ️	Perform a final calculation for the next four SHA256 message double words
SHA-512
VSHA512RNDS2	ℹ️	Perform two rounds of SHA-512 operation
VSHA512MSG1	ℹ️	Perform an intermediate calculation for the next four SHA-512 message quad words
VSHA512MSG2	ℹ️	Perform a final calculation for the next four SHA-512 message quad words
SM3
VSM3RNDS2	ℹ️	Perform two rounds of SM3 operation
VSM3MSG1	ℹ️	Perform initial calculation for the next four SM3 message words
VSM3MSG2	ℹ️	Perform a final calculation for the next four SM3 message words
SM4
VSM4RNDS4	ℹ️	Performs four rounds of SM4 encryption
VSM4KEY4	ℹ️	Perform four rounds of SM4 key expansion

Instructions set top

Advanced Encryption Standard (AES) instructions

AES instructions operate on XMM registers to provide accelerated primitives for block encryption/decryption using Advanced Encryption Standard (AES)‬.

Instruction	📄	Meaning
AESKEYGENASSIST	ℹ️	Assist the creation of round keys with a key expansion schedule
AESIMC	ℹ️	Perform an inverse mix column transformation primitive
Encryption
AESENC	ℹ️	Perform an AES encryption round using an 128-bit state and a round key
AESENCLAST	ℹ️	Perform the last AES encryption round using an 128-bit state and a round key
AESDEC128KL	ℹ️	Perform 10 rounds of AES decryption flow with key locker using 128-bit key
AESDEC256KL	ℹ️	Perform 14 rounds of AES decryption flow with key locker using 256-bit key
AESDECWIDE128KL	ℹ️	Perform 10 rounds of AES decryption flow with key locker on 8 blocks using 128-bit key
AESDECWIDE256KL	ℹ️	Perform 14 rounds of AES decryption flow with key locker on 8 blocks using 256-bit key
Decryption
AESDEC	ℹ️	Perform an AES decryption round using an 128-bit state and a round key
AESDECLAST	ℹ️	Perform the last AES decryption round using an 128-bit state and a round key
AESENC128KL	ℹ️	Perform 10 rounds of AES encryption flow with key locker using 128-bit key
AESENC256KL	ℹ️	Perform 14 rounds of AES encryption flow with key locker using 256-bit key
AESENCWIDE128KL	ℹ️	Perform 10 rounds of AES encryption flow with key locker on 8 blocks using 128-bit key
AESENCWIDE256KL	ℹ️	Perform 14 rounds of AES encryption flow with key locker on 8 blocks using 256-bit key
Galois Field
VPCLMULQDQ	ℹ️	Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2^k)
GF2P8MULB	ℹ️	Galois Field multiply bytes
GF2P8AFFINEQB	ℹ️	Galois Field affine transformation
GF2P8AFFINEINVQB	ℹ️	Galois Field affine transformation inverse

Instructions set top

Key Locker Instructions

These instructions are designed to enable encryption/decryption with an AES key without having access to any unencrypted copies of the key during the actual encryption/decryption process.

Instruction	📄	Meaning
LOADIWKEY	ℹ️	Load internal wrapping key with key locker
ENCODEKEY128	ℹ️	Encode 128-bit key with key locker
ENCODEKEY256	ℹ️	Encode 256-bit key with key locker

Instructions set top

State Management Instructions

MXCSR state management instructions allow saving and restoring the state of the MXCSR control and status register.

Instruction	📄	Meaning
VLDMXCSR	ℹ️	Load MXCSR register
VSTMXCSR	ℹ️	Save MXCSR register state

Instructions set top

Agent Synchronization Instructions

Instruction	📄	Meaning
MONITOR	ℹ️	Sets up an address range used to monitor write-back stores
MWAIT	ℹ️	Enables a processor to enter into an optimized state while waiting for a write-back store to the address range set up by the MONITOR instruction

Instructions set top

Cacheability Control, Prefetch and Ordering Instructions

Cacheability control instructions provide additional operations for caching of non-temporal data when storing data from SIMD registers to memory. They provide additional control of instruction ordering on store operations.

Instruction	📄	Meaning
Read Prefetch
PREFETCHT0	ℹ️	Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T0 hint
PREFETCHT1	ℹ️	Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T1 hint
PREFETCHT2	ℹ️	Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using T2 hint
PREFETCHNTA	ℹ️	Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy using NTA hint
Write Prefetch
PREFETCHW	ℹ️	Prefetch data into caches in anticipation of a write
PREFETCHWT1		Prefetch data into caches with intent to write and T1 hint
Cache Line Maintenance
CLFLUSH	ℹ️	Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy
CLFLUSHOPT	ℹ️	Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy with optimized memory system throughput
CLWB	ℹ️	Cache line write back
CLDEMOTE	ℹ️	Cache line demote
Non-Temporal Stores
MOVNTI	ℹ️	Non-temporal store of a double word from a general-purpose register into memory
VMOVNTPS	ℹ️	Non-temporal store of four packed single-precision floating-point values from an YMM register into memory
VMOVNTPD	ℹ️	Non-temporal store of two packed double-precision floating-point values from an YMM register into memory
VMOVNTDQ	ℹ️	Non-temporal store of double quad word from an YMM register into memory
VMASKMOVDQU	ℹ️	Non-temporal store of selected bytes from an YMM register into memory
Direct Loads/Stores
MOVDIRI	ℹ️	Move double word as direct store
MOVDIR64B	ℹ️	Move 64 bytes as direct store
VMOVNTDQA	ℹ️	Provides a non-temporal hint that can cause adjacent 16-byte items within an aligned 64-byte region (a streaming line) to be fetched and held in a small set of temporary buffers ("streaming load buffers"). Subsequent streaming loads to other aligned 16-byte items in the same streaming line may be supplied from the streaming load buffer and can improve throughput
VLDDQU	ℹ️	Special 128-bit unaligned load designed to avoid cache line splits
Memory Barriers (Fences)
LFENCE	ℹ️	Serializes load operations
SFENCE	ℹ️	Serializes store operations
MFENCE	ℹ️	Serializes load and store operations
Instruction Serialization
SERIALIZE	ℹ️	Serialize instruction execution
Spin-Wait Optimization
PAUSE	ℹ️	Improves the performance of "spin-wait loops"
TPAUSE	ℹ️	Instructs the processor to enter an implementation-dependent optimized state

Instructions set top

Single Instruction Multiple Data (SIMD) instructions set

Contents

AVX Initialization Instructions

Data Transfer Instructions

Broadcast Instructions

Expand Instructions

Compress Instructions

Insert Instructions

Extract Instructions

Gather Instructions

Scatter Instructions

Blending Instructions

Shuffle Instructions

Permute Instructions

Unpack Instructions

Pack Instructions

Conversion Instructions

Logical Instructions

Shift and Rotate Instructions

Comparison Instructions

Packed Arithmetic Instructions

Fused Arithmetic Instructions

Function Primitives

Opmask Instructions

String and Text Processing Instructions

Secure Hash Algorithm Instructions

Advanced Encryption Standard (AES) instructions

Key Locker Instructions

State Management Instructions

Agent Synchronization Instructions

Cacheability Control, Prefetch and Ordering Instructions