
<!---======================= begin_copyright_notice ============================
Copyright (C) 2019-2021 Intel Corporation
SPDX-License-Identifier: MIT
============================= end_copyright_notice ==========================-->

# Intel&reg; Graphics Compiler for OpenCL&trade;
## Configuration flags for Linux Release

### Overview
Linux release build allows enabling user-selected configuration flags. They are available after installing release build according to the instructions [here](https://github.com/intel/intel-graphics-compiler/blob/master/documentation/build_ubuntu.md). This file is autogenerated from [`igc_flags.h`](../IGC/common/igc_flags.h).


##### Important notice
Configuration flags are generally used either for debug purposes or to experimentally change the compiler's behavior. Intel does not guarantee full performance and conformance when using configuration flags.

### How to enable a flag
A flag is enabled when it is set as a variable in an environment.

The syntax is as follows:
```shell
IGC_<flag>=<value>
```
For example - to enable `ShaderDumpEnable` flag in shell:
```shell
$ export IGC_ShaderDumpEnable=1
```

## VISA optimization
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `AssumeUniformIndirectCall` | Assume indirect call is uniform to avoid looping code | - |
| `AvoidDstSrcGRFOverlap` | avoid GRF overlap for destination and source operands of an SIMD16/SIMD32 instruction | - |
| `AvoidSrc1Src2Overlap` | avoid src1 and src2 GRF overlap to avoid the conflict without read suppression | - |
| `CSSIMD16_SpillThreshold` | Percentage of instructions allowed for spilling on CS SIMD16 | - |
| `CSSIMD32_SpillThreshold` | Percentage of instructions allowed for spilling on CS SIMD32 | - |
| `DPASTokenReduction` | optimization to reduce the tokens used for DPAS instruction. | Available |
| `DisableCSEL` | disable csel peep-hole | - |
| `DisableFlagOpt` | Disable optimization cmp with logic op | - |
| `DisableHFMath` | Disables HF math instructions. | - |
| `DisableIfCvt` | Disable ifcvt | - |
| `DisableLoopUnroll` | Setting this to 1/true adds a compiler switch to disable loop unrolling | Available |
| `DisableMixMode` | Disables mix mode in vISA BE. | - |
| `DisableRegDistDep` | distable regDist dependence | Available |
| `DisableSendS` | Setting this to 1/true adds a compiler switch to not generate sends commands, default is to enable sends | - |
| `DisableThreeALUPipes` | Disable three ALU Pipelines. XeHP only | Available |
| `DisableWriteCombine` | Disable write combine. PVC+ only | - |
| `DumpPromoteI8` | Dump useful info during promoting i8 to i16 | Available |
| `Enable16DWURBWrite` | Enable 16 Dword URB Write messages | Available |
| `Enable16OWSLMBlockRW` | Enable 16 OWord (8 GRF) SLM block read/write message | Available |
| `Enable64BMediaBlockRW` | Enable 64 byte wide media block read/write message | Available |
| `EnableAdd3` | Enable Add3. XeHP+ only | Available |
| `EnableAtomicFusion` | To enable/disable atomic send fusion (simd8 shaders). Valid if EnableSendFusion is on. | - |
| `EnableBCR` | Enable bank conflict reduction. | Available |
| `EnableBfn` | Enable Bfn. XeHP+ only | Available |
| `EnableCallWA` | Control call WA when EU fusion is on. 0: off; 1: on | Available |
| `EnableCoalesceScalarMoves` | Enable scalar moves to be coalesced into fewer moves | Available |
| `EnableForceDebugSWSB` | Enable force debugging functionality for software scoreboard generation | Available |
| `EnableGatherWithImmPostRA` | enable gather send with immediate | Available |
| `EnableGatherWithImmPreRA` | enable gather send with immediate | Available |
| `EnableGroupScheduleForBC` | Enable bank conflict reduction in scheduling. | Available |
| `EnableHWGenerateThreadID` | Enable new behavior of HW generating threadID for GPGPU pipe. XeHP and non-OCL only. | Available |
| `EnableHWGenerateThreadIDForTileY` | Enable HW generating threadID for GPGPU pipe for TileY mode. XeHP and non-OCL only. | Available |
| `EnableIGAEncoder` | Enable VISA IGA encoder | - |
| `EnableIGASWSB` | Use IGA for SWSB | Available |
| `EnableMathDPASWA` | PVC math instruction running with DPAS issue | - |
| `EnableNonOCLWalkOrderSel` | Enable WalkOrder selection for HW generating threadID for GPGPU pipe. XeHP and non-OCL only. | Available |
| `EnablePassInlineData` | 1: Force pass 1st GRF of cross-thread payload as inline data; -1: Force disable passing inline data | Available |
| `EnablePreemption` | Enable generating preeemptable code (SKL+) | - |
| `EnablePromoteI8` | Enable promoting i8 (char) to i16 on all ALU insts that does support i8. It's only for XeHPC+ for now. | Available |
| `EnablePromoteI8Vec` | Control if a certain i8 vector needs to be promoted (detail in code) | Available |
| `EnablePvtMemHalfToFloat` | Enable conversion from half to float for private memory. | Available |
| `EnableQWRotateInstructions` | Enable QW type support for rotate instructions. PVC only. | Available |
| `EnableQuickTokenAlloc` | Insert dependence resolve for kernel stitching | Available |
| `EnableSWSBInstStall` | Enable force stall to specific(start) instruction start for software scoreboard generation | Available |
| `EnableSWSBInstStallEnd` | Enable force stall to end instruction for software scoreboard generation | Available |
| `EnableSWSBStitch` | Insert dependence resolve for kernel stitching | Available |
| `EnableSWSBTokenBarrier` | Enable force specific instruction as a barrier for software scoreboard generation | Available |
| `EnableSendFusion` | Enable(!=0)/disable(0)/force(2) send fusion. Valid for simd8 shader/kernel only. | - |
| `EnableUntypedSurfRWofSS` | Enable untyped surface RW to scratch space. XeHP A0 only. | Available |
| `EnableVISABinary` | Enable VISA Binary | Available |
| `EnableVISABoundsChecking` | Enable VISA bounds checking. | - |
| `EnableVISADebug` | Runs VISA in debug mode, all optimizations disabled | - |
| `EnableVISADotAll` | Enable VISA DotAll. Dumps dot files for intermediate stages | - |
| `EnableVISADumpCommonISA` | Enable VISA Dump Common ISA | Available |
| `EnableVISAJmpi` | Enable/Disable VISA generating jmpi (scalar jump). | - |
| `EnableVISANoBXMLEncoder` | Enable VISA No-BXML encoder | - |
| `EnableVISANoSchedule` | Enable VISA No-Schedule | Available |
| `EnableVISAOutput` | Enable VISA GenISA output | Available |
| `EnableVISAPreSched` | Enable VISA Pre-RA Scheduler | Available |
| `EnableVISASlowpath` | Enable VISA Slowpath. Needed to dump .visaasm | Available |
| `EnableVISAStructurizer` | Enable/Disable VISA structurizer. See value defs in igc_flags.hpp. | - |
| `ExpandPlane` | Enable pln to mad macro expansion. | - |
| `Force32bitConstantGEPLowering` | Go back to old version of GEP lowering for constant address space. PVC only | - |
| `ForceBCR` | Force bank conflict reduction, no matter spill or not. | Available |
| `ForceHWThreadNumberPerEU` | Total HW thread number per-EU. | - |
| `ForceNoMaskWA` | [tmp, testing] Force NoMaskWA on any platforms | - |
| `ForcePreemptionWA` | Force generating preemptable code across platforms | Available |
| `ForcePreserveR0` | Setting this to true makes VISA preserve r0 in r0 | Available |
| `ForcePromoteI8` | Force promoting i8 (char) to i16 on all ALU insts (for testing). | Available |
| `ForceSubReturn` | If a subroutine does not have a return, generate a dummy return if this key is set (to meet visa requirement) | - |
| `ForceTexelMaskClear` | If set to 1 or 2, forces evaluate messages to clear the texel mask to 0 or 1, respectively. | Available |
| `ForceVISAPreSched` | Force enabling of VISA Pre-RA Scheduler | - |
| `ForceVISAStructurizer` | Force VISA structurizer for testing. Used on platforms in which we turns off SCF and use UCF by default | - |
| `GlobalSendVarSplit` | Enable global send variable splitting when we are about to spill | - |
| `NewSpillCostFunction` | Use new spill cost function in VISA RA | - |
| `NoMaskWA` | Enable NoMask WA by using software-computed emask flag | - |
| `ReplaceIndirectCallWithJmpi` | Replace indirect call with jmpi instruction (HW WA) | Available |
| `ReservedRegisterNum` | Reserve register number for spill cost testing. | - |
| `SIMD16_SpillThreshold` | Percentage of instructions allowed for spilling on SIMD16 | - |
| `SIMD32_SpillThreshold` | Percentage of instructions allowed for spilling on SIMD32 | - |
| `SIMD8_SpillThreshold` | Percentage of instructions allowed for spilling on SIMD8 | - |
| `SWSBTokenNum` | Total tokens used for SWSB. | Available |
| `ScratchSpaceSizeLimit` | Size limit of scratch space. XeHP and above only. Test only. Remove it once stabalized. | Available |
| `ScratchSpaceSizeReserved` | Reserved size of scratch space. XeHP and above only. Test only. Remove it once stabalized. | Available |
| `SeparateSpillPvtScratchSpace` | Separate scratch spaces for spillfill and privatememory. XeHP and above only. Test only. Remove it once stabalized. | Available |
| `SetA0toTdrForSendc` | Set A0 to tdr0 before each sendc/sendsc | Available |
| `TotalGRFNum` | Total GRF setting for both IGC-LLVM and vISA | - |
| `TotalGRFNum4CS` | Total GRF setting for both IGC-LLVM and vISA, for ComputeShader-only experiment. | - |
| `UnifiedSendCycle` | Using unified send cycle. | - |
| `Use16ByteBindlessSampler` | True if 16-byte aligned bindless sampler state is used | - |
| `UseLinearScanRA` | use Linear Scan as default register allocation algorithm | - |
| `UseMathWithLUT` | Use the implementations of cos, cospi, log, sin, sincos, and sinpi with Look-Up Tables (LUT). | - |
| `VISALTO` | vISA LTO optimization flags. check LINKER_TYPE for more details | - |
| `VISAOptions` | Options to vISA. Space-separated options. | Available |
| `VISAPostScheduleEndBBID` | The ID of BB which will be last scheduled | - |
| `VISAPostScheduleStartBBID` | The ID of BB which will be first scheduled | - |
| `VISAPreSchedCtrl` | Configure Pre-RA Scheduler, default(0), logging(1), latency(2), pressure(4) | - |
| `VISAPreSchedExtraGRF` | Bump up GRF number to make pre-RA Scheduling more greedy, 0 for the default | - |
| `VISAPreSchedRPThreshold` | Threshold to commit a pre-RA Scheduling without spills, 0 for the default | - |
| `VISAScheduleEndBBID` | The ID of BB which will be last scheduled | - |
| `VISAScheduleStartBBID` | The ID of BB which will be first scheduled | - |
| `disableCompaction` | Disables compaction. | Available |
| `disableIGASyntax` | Disables GEN isa text output using IGA and new syntax. | - |
## IGC Optimization
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `AllowMem2Reg` | Setting this to true makes IGC run mem2reg even when optimizations are disabled | Available |
| `BlockPushConstantGRFThreshold` | Set the maximum limit for block push constants i.e. UBO data pushed.<br/>                                                                Set to 0xFFFFFFFF to use the default threshold for the platform.<br/>                                                                Note that for small pixel shaders the PayloadSizeThreshold may be the limiting factor. | - |
| `DisableAttributePush` | Bit mask to disable push Attribute per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS | - |
| `DisableBranchSwaping` | Setting this to 1/true adds a compiler switch to disable branch swapping. | - |
| `DisableCodeHoisting` | Setting this to 1/true adds a compiler switch to disable code-hoisting | - |
| `DisableCodeSinking` | Setting this to 1/true adds a compiler switch to disable code-sinking | - |
| `DisableCodeSinkingInputVec` | Setting this to 1/true disable sinking inputVec inst (test) | - |
| `DisableConstantCoalescing` | Setting this to 1/true adds a compiler switch to disable constant coalesing | - |
| `DisableConstantCoalescingOfStatefulNonUniformLoads` | Disable merging non-uniform loads from stateful buffers. Note: does not affect merging to sampler loads | - |
| `DisableConstantCoalescingOutOfBoundsCheck` | Setting this to 1/true adds a compiler switch to disable constant coalesing out of bounds check | - |
| `DisableCustomUnsafeOpt` | Disable IGC to run custom unsafe optimizations | - |
| `DisableDX9LowPrecision` | Disables HF in DX9. | - |
| `DisableDynamicResInfoFolding` | Disable Dynamic ResInfo Instruction Folding | - |
| `DisableDynamicTextureFolding` | Disable Dynamic Texture Folding | - |
| `DisableEmptyBlockRemoval` | Setting this to 1/true adds a compiler switch to disable empty block optimization | - |
| `DisableFDivReassociation` | Disable reassociation for Fdiv operations to avoid precision difference | - |
| `DisableFlattenSmallSwitch` | Disable the flatten small switch pass | - |
| `DisableGatingSimilarSamples` | Disable Gating of similar sample instructions | - |
| `DisableIGCOptimizations` | Setting this to 1/true adds a compiler switch to disables all the above IGC optimizations | - |
| `DisableIPConstantPropagation` | Disable Inter-procedrual constant propgation | - |
| `DisableIRVerification` | Setting this to 1/true adds a compiler switch to disable IGC IR verification. | - |
| `DisableImmConstantOpt` | Disable IGC IndirectICBPropagaion optimization | - |
| `DisableLLVMGenericOptimizations` | Disable LLVM generic optimization passes | - |
| `DisableLoadSinking` | Setting this to 1/true adds a compiler switch to disable load sinking during retry | - |
| `DisableLoopUnroll` | Setting this to 1/true adds a compiler switch to disable loop unrolling. | - |
| `DisableMCSOpt` | Disable IGC to run MCS optimization | - |
| `DisableMatchFloor` | Setting this to 1/true adds a compiler switch to disable sub-frc = floor optimization | - |
| `DisableMatchMad` | Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization | - |
| `DisableMatchPow` | Setting this to 1/true adds a compiler switch to disable log2/mul/exp2 = pow optimization | - |
| `DisableMatchPredAdd` | Setting this to 1/true adds a compiler switch to disable pred+add = predAdd optimization | - |
| `DisableMatchSimpleAdd` | Setting this to 1/true adds a compiler switch to disable simple cmp+and+add optimization | - |
| `DisableMovingInstanceIDIndexOfVS` | Disable moving index of InstanceID in VS to last location. | - |
| `DisablePayloadCoalescing` | Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for all types | - |
| `DisablePayloadCoalescing_RT` | Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for RT only | - |
| `DisablePayloadCoalescing_Sample` | Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for Samplers only | - |
| `DisablePayloadCoalescing_URB` | Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for URB writes only | - |
| `DisablePromotePrivMem` | Setting this to 1/true adds a compiler switch to disable IGC private array promotion | - |
| `DisablePullConstantHeuristics` | Disable the heuristics to determine the no. push constants based on payload size. | - |
| `DisablePushConstant` | Bit mask to disable push constant per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS, Bit 5 = PS | - |
| `DisableRectListOpt` | Disable Rect List optimization | - |
| `DisableRuntimeLoopUnrolling` | Setting this to 1/true adds a compiler switch to disable runtime loop unrolling. | - |
| `DisableSIMD32Slicing` | Setting this to 1/true adds a compiler switch to disable emitting SIMD32 VISA code in slices | - |
| `DisableSimplePushWithDynamicUniformBuffers` | Disable Simple Push Constants Optimization for dynamic uniform buffers. | - |
| `DisableStaticCheck` | Disable static check to push constants. | - |
| `DisableStaticCheckForConstantFolding` | Disable static check to fold constants. | - |
| `DisableSynchronizationObjectCoalescingPass` | Disable SynchronizationObjectCoalescing pass | - |
| `DisableURBPartialWritesPass` | Disable IGC pass that converts URB partial writes to full-mask writes. | - |
| `DisableURBReadMerge` | Disable IGC pass that merges URB Read instructions. | - |
| `DisableURBWriteMerge` | Setting this to 1/true adds a compiler switch to disable URB write merge | - |
| `DisableUniformAnalysis` | Setting this to 1/true adds a compiler switch to disable uniform_analysis | - |
| `DisableUniformTypedAccess` | Setting this will disable uniform typed access handling | - |
| `DisableUniformURBWrite` | Disables generation of uniform URB write messages | - |
| `DispatchOCLWGInLinearOrder` | If set, dispatch HW threads based on the linearized order of WI in a WG;<br/>                                                                ie, let localSize=(lx, ly, lz), localId=(ix, iy, iz). And<br/>                                                                   linearLocalId = ix + lx * (iy + ly * iz);<br/>                                                                And linear order means that<br/>                                                                   linearLocalId(lane[x+1])=linearLocalId(lane[x])+1 | - |
| `EnableAtomicBranch` | Enable Atomic branch optimization which break atomic into if/else with atomic and read based on the operation | - |
| `EnableBitcastedLoadNarrowing` | Enable narrowing of vector loads in bitcasts patterns. | - |
| `EnableBitcastedLoadNarrowingToScalar` | Enable narrowing of vector loads to scalar ones in bitcasts patterns. | - |
| `EnableBlendToDiscard` | Enable blend to discard based on blend state. | - |
| `EnableBlendToFill` | Enable blend to fill based on blend state. | - |
| `EnableCodeAssumption` | <br/>    If set (> 0), generate llvm.assume to help certain optimizations. It is OCL only for now.<br/>     Only 1 and 2 are valid. 2 will be 1 plus additional assumption. It also does other minor changes. | - |
| `EnableCustomLoopVersioning` | Enable IGC to do custom loop versioning | - |
| `EnableDeSSA` | Setting this to 0/false adds a compiler switch to disable De-SSA | - |
| `EnableDeSSAWA` | [tmp]Keep some piece of code to avoid perf regression | - |
| `EnableExtractCommonMultiplier` | Enable ExtractCommonMultiplier optimization in CustomUnsafeOptPass. | - |
| `EnableFastMath` | Enable fast math optimizations in IGC | - |
| `EnableGVN` | Enable LLVM global value numbering | - |
| `EnableGenUpdateCB` | Enable derived constant optimization. | - |
| `EnableGenUpdateCBResInfo` | Enable derived constant optimization with resinfo. | - |
| `EnableHighestSIMDForNoSpill` | When there is no spill choose highest SIMD (compute shader only). | - |
| `EnableHoistDp3` | Enable dp3 Hoisting. | - |
| `EnableHoistMulInLoop` | Hoist multiply with loop invirant out of loop, FP unsafe | - |
| `EnableIntegerMad` | Setting this to 1/true adds a compiler switch to enable integer mul+add = mad optimization | - |
| `EnableLogicalAndToBranch` | Enable convert logical AND to conditional branch | - |
| `EnableLoopHoistConstant` | Enables pass to check for specific loop patterns where variables are constant across all but the last iteration, and hoist them out of the loop. | - |
| `EnableNewTileYCheck` | Enable new TileY check on DG2 | - |
| `EnableOptReportLoadNarrowing` | Generate opt report for narrowing of vector loads. | - |
| `EnablePingPongTextureOpt` | Enables the Ping Pong texture optimization which is used only for Compute Shaders for back to back dispatches | - |
| `EnablePlatformFenceOpt` | Force DG2 only fence optimization | - |
| `EnablePowToLogMulExp` | Enable pow to exp(log(x)*y) optimization in CustomUnsafeOptPass. | - |
| `EnableSLMConstProp` | Enable SLM constant propagation (compute shader only). | - |
| `EnableSamplerChannelReturn` | Setting this to 1/true adds a compiler switch to enable using header to return selective channels from sampler | - |
| `EnableSimplePushSizeBasedOpimization` | Enable the simplepush optimization to do push based on size | - |
| `EnableSimplifyGEP` | Enable IGC to simplify indices expr of GEP. | - |
| `EnableSoftwareStencil` | Enable software stencil for PS. | - |
| `EnableSoftwareVertexFetch` | Enable software vertex fetch for VS. | - |
| `EnableSplitIndirectEEtoSel` | Enable the split indirect extractelement to icmp+sel pass | - |
| `EnableSplitUnalignedVector` | Enable Splitting of unaligned vectors for loads and stores | - |
| `EnableStatefulAtomic` | Enable promoting stateless atomic to stateful atomic. | - |
| `EnableStatefulToken` | Enable generating patch token to indicate a ptr argument is fully converted to stateful (temporary) | - |
| `EnableStatelessToStateful` | Enable Stateless To Stateful transformation for global and constant address space in OpenCL kernels | - |
| `EnableSumFractions` | Enable SumFractions optimization in CustomUnsafeOptPass. | - |
| `EnableTextureLoadCoalescing` | Enable merging non-uniform loads from bindless textures | - |
| `EnableThreadCombiningOpt` | Enables the thread combining optimization which is used only for Compute Shaders for combining a number of software threads to dispatch smaller number of hardware threads | - |
| `EnableThreeWayLoadSpiltOpt` | Enable three way load spilt opt. | - |
| `EnableTrigFuncRangeReduction` | reduce the sin and cosing function domain range | Available |
| `EnableUnmaskedFunctions` | Enable unmaksed functions SYCL feature. | Available |
| `EnableWaveForce32` | Force Wave to use simd32 | - |
| `EnableWorkGroupUniformGoto` | Setting to 1 enables generating uniform goto for work group uniform [eu fusion only] | - |
| `ForceAddressArithSinking` | Force sinking address arithmetic closer to the usage | - |
| `ForceHoistDp3` | force dp3 Hoisting. | - |
| `ForceLinearWalkOnLinearUAV` | Force linear walk on linear UAV buffer | - |
| `ForceSupportsAutoGRFSelection` | ForceSupportsAutoGRFSelection | Available |
| `ForceSupportsStaticRegSharing` | ForceSupportsStaticRegSharing | Available |
| `ForceTileY` | Force TileY mode on DG2 | - |
| `KeepTileYForFlattened` | Keep TileY for FlattenedThreadIdInGroup on DG2 | - |
| `LLVMCommandLine` | applies LLVM command line | - |
| `LoopSinkMinSave` | If loop sink can have save more than this Minimum, do it; otherwise, skip | - |
| `LoopSinkThresholdDelta` | Do loop sink If the estimated register pressure is higher than this + #avaialble registers | - |
| `MaxImmConstantSizePushed` | Set the max size of immediate constant buffer pushed | - |
| `PSSIMD32HeuristicFP16` | enable PS SIMD32 heuristic based on fp16 characteristic | - |
| `PSSIMD32HeuristicLoopAndDiscard` | enable PS SIMD32 heuristic based on loop info and discard | - |
| `PayloadSizeThreshold` | Set the max payload size threshold for short shades that have PSD bottleneck. | - |
| `RovOpt` | Bitmask for ROV optimizations. 0 for all off, 1 for force fence flush none, 2 for setting LSC_L1UC_L3C_WB, 3 for both opt on | - |
| `SelectiveHashOptions` | applies options to hash ragne via string | - |
| `SetBranchSwapThreshold` | Set the branch swaping threshold. | - |
| `SetDefaultTileYWalk` | Use TileY walk as default for HW generating threadID | Available |
| `SetLoopUnrollThreshold` | Set the loop unroll threshold. Value 0 will use the default threshold. | - |
| `SetLoopUnrollThresholdForHighRegPressure` | Set the loop unroll threshold for shaders with high reg pressure. Value 0 will use the default threshold. | - |
| `SetRegisterPressureThresholdForLoopUnroll` | Set the register pressure threshold for limiting the loop unroll to smaller loops | - |
| `SetURBFullWriteGranularity` | Overrides the minimum access granularity for URB full writes.<br/>                                                            Valid values are 0, 16 and 32, value 0 means use default for the platform. | Available |
| `SplitIndirectEEtoSelThreshold` | Split indirect extractelement cost threshold | - |
| `SynchronizationObjectCoalescingConfig` | Modify the default behavior of SynchronizationObjectCoalescing value is a bitmask bit0 – remove fences in read barrier write scenario | Available |
| `UseHDCTypedReadForAllTextures` | Setting this to use HDC message rather than sampler ld for texture read | - |
| `UseHDCTypedReadForAllTypedBuffers` | Setting this to use HDC message rather than sampler ld for buffer read | - |
| `UseTiledCSThreadOrder` | Use 4x4 disaptch for CS order when it seems beneficial | - |
| `WaAllowMatchMadOptimizationforVS` | Setting this to 1/true adds a compiler switch to enable mul+add = mad optimization for VS | - |
| `forceFullUrbWriteMask` | Set Full URB write mask. | - |
| `forcePushConstantMode` | set the push constant mode, 0 is default behavior, 1 is simple push, 2 is gather constant, 3 is none/pull constants | - |
## Shader debugging
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `CheckCSSLMLimit` | Check SLM limit on compute shader on DG2 | - |
| `CompileOneAtTime` | Compile only one kernel (out of many in llvm::module) at a time. Prints compiled kenrels names to stdout. Useful to debug compilation time and crashes - it does not produce valid binary. | - |
| `DPASReadSuppressionWA` | Enable read suppression WA for the send and indirect access | - |
| `DebugInternalSwitch` | Code pass selection, debug only | - |
| `DisablePassToggles` | Disable each IGC pass by setting the bit. HEXADECIMAL ONLY!. Ex: C0 is to disable pass 6 and pass 7. | - |
| `DisableSendSrcDstOverlapWA` | Disable Send Source/destination overlap WA which is enabled for GEN10/GEN11 and whenever Wddm2Svm is set in WATable | - |
| `DumpPayloadToScratch` | Setting this to 1/true dumps thread payload to scartch space. Used for  workloads which doesnt use scartch space for other purposes | - |
| `EnableBitcastExtractInsertPattern` | Enable BitcastExtractInsertPattern in CustomSafeOptPass. | Available |
| `EnableCSSIMD32` | Enable computer shader SIMD32 mode, and fall back to lower SIMD when spill | - |
| `EnableDivergentBarrierCheck` | Uses WIAnalysis to find barriers in divergent flow control. May have false positives. | - |
| `EnableHashMovsAtPrologue` | Rather than after EOT, insert hash code movs at shader entry | - |
| `EnableLSCFenceUGMBeforeEOT` | Enable inserting fence.ugm.06.tile before EOT if a kernel has any write to UGM [XeHPC, PVC]. | Available |
| `EnableOptionalBufferOffset` | For StatelessToStateful optimization [OCL], if true, make buffer offset optional. Valid only if buffer offset is supported. | Available |
| `EnableRTLSCFenceUGMBeforeEOT` | [tmp]Enable inserting fence.ugm.06.tile before EOT for RT shader [XeHPC, PVC]. | - |
| `EnableRTmaskPso` | Enable render target mask optimization in PSO opt | - |
| `EnableSIPOverride` | This key forces load of SIP from a a Local File. | - |
| `EnableSupportBufferOffset` | [debugging]For StatelessToStateful optimization [OCL], support implicit buffer offset argument (same as -cl-intel-has-buffer-offset-arg). | - |
| `EnableTestIGCBuiltin` | Enable testing igc builtin (precompiled kernels) using OCL. | - |
| `EnableTrivialEmulateSinCos` | Enable Emulation for Sine and Cosine instructions | - |
| `EnableZeroSomeARF` | If set, insert mov inst to zero a0, acc, etc to assist HW debugging. | - |
| `EnablerReadSuppressionWA` | Enable read suppression WA for the send and indirect access | - |
| `ForceCSLeastSIMD` | Force computer shader to the lowest allowed SIMD mode | - |
| `ForceCSSIMD16` | Force computer shader SIMD16 mode if allowed, otherwise it will use SIMD32 | - |
| `ForceCSSIMD32` | Force computer shader SIMD32 mode | - |
| `ForceDisableShaderDebugHashCodeInKernel` | Disable hash code addition to the binary after EOT | - |
| `ForceMemoryFenceBeforeEOT` | Forces inserting SLM or gloabal memory fence before EOT if shader writes to SLM or goblam memory respectively. | - |
| `ForcePerThreadPrivateMemorySize` | Useful for ensuring a certain amount of private memory when doing a shader override. | - |
| `ForceStatelessForQueueT` | In OCL, force to use stateless memory to hold queue_t*. This is a legacy feature to be removed. | - |
| `MSAAClearedKernel` | Insert the discard code for MSAA_MSC_Cleared kernels. 2/4/8/16 | - |
| `PrintVerboseGenericControlFlowLog` | Forces compiler to print detailed log about additional control flow generated due to a presence of generic memory operations | Available |
| `RetryManagerFirstStateId` | For debugging purposes, it can be useful to start on a particular id rather than id 0. | - |
| `RouteByLodHint` | An integer offset addon to route the resource to HDC on DG2 | - |
| `SIPOverrideFilePath` | This key when enabled with EnableSIPOverride load of SIP from a specified path. | - |
| `SToSProducesPositivePointer` | This key is for StatelessToStateful optimization if the  user knows the pointer offset is postive to the kernel argument. | - |
| `ShaderDebugHashCode` | The driver will set a breakpoint in the first instruction of the shader which has the provided hash code.<br/>                                                                It works only when the value is different then 0 and SystemThreadEnable is set to TRUE.<br/>                                                                Ex: VS_asm2df26246434553ad_nos0000000000000000 , only the LowPart Need<br/>                                                                to be Enterd in Registry Ex : 0x434553ad ,i.e Lower 8 Hex Digits of the 16 Digit Hash Code<br/>                                                                for Compatibilty Reasons | - |
| `ShaderDebugHashCodeInKernel` | Add hash code to the binary | - |
| `ShaderDisableOptPassesAfter` | Will only run first N optimization passes, any further passes will be ignored. This flag can be used to bisect optimization passes. | - |
| `ShaderDisplayAllPassesNames` | Display to console all passes name with their ID and occurrence number. | - |
| `ShaderOverride` | Will override any LLVM shader with matching name in c:\\Intel\\IGC\\ShaderOverride | - |
| `ShaderPassDisable` | Disable specific passes eg. '9;17-19;239-;Error Check;ResolveOCLAtomics:2;Dead Code Elimination:3-5;BreakConstantExprPass:7-'<br/>                                                                disable pass 9, disable passes from 17 to 19, disable all passes after 238, disable all occurrences of pass Error Check,<br/>                                                                disable second occurrence of ResolveOCLAtomics, disable pass Dead Code Elimination occurrences from 3 to 5,<br/>                                                                disable all BreakConstantExprPass after his 6 occurrence<br/>                                                                To show a list of pass names and their occurrence set ShaderDisplayAllPassesNames.<br/>                                                                Must be used with ShaderDumpEnableAll flag. | - |
| `SystemThreadEnable` | This key forces software to create a system thread. The system thread may still be created by software even<br/>                                                                if this control is set to false.The system thread is invoked if either the software requires<br/>                                                                exception handling or if kernel debugging is active and a breakpoint is hit. | - |
| `UseSubDWAlignedPtrArg` | [OCL]If set, for kernel pointer arg such as ptr to char or short, the arg is not necessarily DW aligned | - |
| `ld2dmsInstsClubbingThreshold` | Do not club more than these ld2dms insts into the new BB during MCSOpt | - |
| `manualEnableRSWA` | Enable read suppression WA for the send and indirect access | - |
## Shader dumping
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `AddExtraIntfInfo` | Will add extra inteference info from .extraintf files from c:\\Intel\\IGC\\ShaderOverride | - |
| `DumpDeSSA` | dump DeSSA info into file. | Available |
| `DumpHasNonKernelArgLdSt` | Print if hasNonKernelArg load/store to stderr | Available |
| `DumpLLVMIR` | dump LLVM IR | Available |
| `DumpOCLProgramInfo` | dump OpenCL Patch Tokens, Kernel/Program Binary Header | Available |
| `DumpPatchTokens` | Enable dumping of patch tokens. | Available |
| `DumpTimeStats` | Timing of translation, code generation, finalizer, etc | Available |
| `DumpTimeStatsCoarse` | Only collect/dump coarse level time stats, i.e. skip opt detail timer for now | Available |
| `DumpTimeStatsPerPass` | Collect Timing of IGC/LLVM passes | Available |
| `DumpToCurrentDir` | dump shaders to the current directory | Available |
| `DumpToCustomDir` | Dump shaders to custom directory. Parent directory must exist. | Available |
| `DumpUseShorterName` | If set, use an internal shader name(_entry_id) in dump file name | Available |
| `DumpVariableAlias` | Dump variable alias info, valid if EnableVariableAlias is on) | Available |
| `DumpWIA` | dump WI (uniform) infomation into files in dump directory if set to true | - |
| `ElfDumpEnable` | dump ELF file | Available |
| `ElfTempDumpEnable` | dump temporary ELF files | Available |
| `EnableCapsDump` | Enable hardware caps dump | Available |
| `EnableCisDump` | Enable cis dump | Available |
| `EnableCosDump` | Enable cos dump | Available |
| `EnableLivenessDump` | Enable dumping out liveness info on stderr. | Available |
| `EnableScalarizerDebugLog` | print step by step scalarizer debug info. | Available |
| `EnableShaderNumbering` | Number shaders in the order they are dumped based on their hashes | Available |
| `ForceRPE` | Force RPE (RegisterEstimator) computation if > 0. If 2, force RPE per inst. | Available |
| `InterleaveSourceShader` | Interleave the source shader in asm dump | Available |
| `PrintAfter` | Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR after the given pass is done (mimic llvm print-after) | Available |
| `PrintBefore` | Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR before the given pass is done (mimic llvm print-before) | Available |
| `PrintHexFloatInShaderDumpAsm` | print floats in hex in asm dump | Available |
| `PrintPsoDdiHash` | Print psoDDIHash in TimeStats_Shaders.csv file | Available |
| `PrintToConsole` | dump to console | Available |
| `QualityMetricsEnable` | Enable Quality Metrics for IGC | Available |
| `RPEDumpLevel` | > 0 : dump info of register pressure estimate on stderr. See igc_flags.hpp level defs. | - |
| `ShaderDataBaseStats` | Enable gathering sends' sizes for shader statistics | - |
| `ShaderDataBaseStatsFilePath` | Path to a file with dumped shader stats additional data e.g. data available during compilation only | - |
| `ShaderDumpEnable` | dump LLVM IR, visaasm, and GenISA | Available |
| `ShaderDumpEnableAll` | dump all LLVM IR passes, visaasm, and GenISA | Available |
| `ShaderDumpEnableG4` | same as ShaderDumpEnable but adds G4 dumps (0 = off, 1 = some, 2 = all) | - |
| `ShaderDumpEnableIGAJSON` | adds IGA JSON output to shader dumps (0 = off, 1 = enabled, 2 = include def/use info but causes longer compile times) | - |
| `ShaderDumpFilter` | Only dump files matching the given regex | Available |
| `ShaderDumpInstNamer` | dump all unnamed LLVM IR instruction with variable names 'tmp' which makes easier for shaderoverriding | Available |
| `ShaderDumpPidDisable` | disabled adding PID to the name of shader dump directory | Available |
| `ShowFullVectorsInShaderDumps` | print all elements of vectors in ShaderDumps, can dramatically increase ShaderDumps size | Available |
## Debugging features
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `AvoidUsingR0R1` | Do not use r0 and r1 as generic usage registers | - |
| `DebugInfoEnforceAmd64EM` | Enforces elf file with the debug infomation to have eMachine set to AMD64 | - |
| `DebugInfoValidation` | Enable optional (strict) checks to detect debug information inconsistencies | - |
| `EnableGTLocationDebugging` | Setting this to 1 (true) enables GT location expression emission for GPU debugger | Available |
| `EnableRelocations` | Setting this to 1 (true) makes IGC emit relocatable ELF with debug info | Available |
| `EnableWriteOldFPToStack` | Setting this to 1 (true) writes the caller frame's frame-pointer to the start of callee's frame on stack, to support stack walk | - |
| `ExtraOCLInternalOptions` | Extra internal options for OpenCL | Available |
| `ExtraOCLOptions` | Extra options for OpenCL | Available |
| `ForceAssignRhysicalReg` | Force assigning dclId to phyiscal reg. | Available |
| `InitializeAddressRegistersBeforeUse` | Setting this to 1 (true) initializes address register to 0 before each use | - |
| `InitializeRegistersEnable` | Setting this to 1/true initializes all GRFs, Flag and address registers to 0 at the beginning of the shader | - |
| `InitializeUndefValueEnable` | Setting this to 1/true initializes all undefs in URB payload to 0 | - |
| `MetricsDumpEnable` | Dump IGC Metrics to file *.optrpt in current working directory.<br/>                                                                Setting to 0 - disabled, 1 - makes in binary format, 2 - makes in plain-text format. | Available |
| `PrintDebugSettings` | Prints all non-default debug settings | - |
| `UseMTInLLD` | Use multi-threading when linking multiple elf files | Available |
| `UseOffsetInLocation` | Setting this to 1 (true) preserves private base and per thread offset and removes preservation of any other debug variables | Available |
| `UseVISAVarNames` | Make VISA generate names for virtual variables so they match with dbg file | Available |
| `ZeBinCompatibleDebugging` | Setting this to 1 (true) enables embed debug info in zeBinary | Available |
| `deadLoopForFloatException` | enable a dead loop if float exception happened | - |
## IGC Features
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `AdvCodeMotionControl` | Control bits to fine-tune advanced code motion | - |
| `AdvRuntimeUnrollCount` | Advanced runtime unroll count | - |
| `AllowedSpillRegCount` | Max allowed spill size without recompile | - |
| `CSSpillThresholdNoSLM` | Spill Threshold for CS SIMD16 without SLM | - |
| `CSSpillThresholdSLM` | Spill Threshold for CS SIMD16 with SLM | - |
| `DPEmuNeedI64Emu` | Double Emulation needs I64 emulation. Unsetting it to disable I64 Emulation for testing. | - |
| `DisableDSDualPatch` | Setting it to true with enable Single and Dual Patch dispatch mode for Domain Shader | - |
| `DisableEarlyOutPatterns` | Disable optimization trying to create an early out after sampleC messages | - |
| `DisableGPGPUIndirectPayload` | Disable OCL indirect GPGPU payload | - |
| `DisableMemOpt` | Disable MemOpt, merging load/store | - |
| `DisableMemOpt2` | Disable MemOpt2 | - |
| `DisablePreRAScheduler` | Disable Pre RA Scheduling | - |
| `DisablePromoteToDirectAS` | This key disables the PromoteResourceToDirectAS pass | - |
| `DisableRecompilation` | Disable recompilation | Available |
| `DisableScalarAtomics` | Disable the Scalar Atomics optimization | - |
| `DisableWaSampleLZ` | Disable The Sample Lz workaround and generate Sample LZ | - |
| `Enable16BitLDMCS` | Enable 16-bit ld_mcs on supported platforms | Available |
| `Enable64BitEmulation` | Enable 64-bit emulation | - |
| `Enable64BitEmulationOnSelectedPlatform` | Enable 64-bit emulation on selected platforms | - |
| `EnableAIParameterCombiningWithLODBias` | Enable AI parameter combining With LOD Bias parameter. XeHP | Available |
| `EnableAdvCodeMotion` | Enable advanced code motion | - |
| `EnableAdvMemOpt` | Enable advanced memory optimization | - |
| `EnableAdvRuntimeUnroll` | Enable advanced runtime unroll | - |
| `EnableCPSMSAAOMaskWA` | Enable WA which forces rt writes to happen at pixel rate when cps, msaa, and omask are present. | Available |
| `EnableCPSOmaskWA` | Enable workaround for oMask with CPS | - |
| `EnableConstIntDivReduction` | Enables strength reduction on integer division/remainder with constant divisors/moduli | Available |
| `EnableDG2LSCSIMD8WA` | Enables WA for DG2 LSC simd8 d32-v8/d64-v3/d64-v4. [temp, should be replaced with WA id | - |
| `EnableDPEmulation` | Enforce double precision floating point operations emulation on platforms that do not support it natively | Available |
| `EnableDivergentBarrierWA` | Generate continuation code to handle shaders that places barriers in divergent control flow | - |
| `EnableDualSIMD8` | enable dual SIMD8 on supported platforms | Available |
| `EnableFallbackToBindless` | This key enables fallback to bindless mode on all shaders | - |
| `EnableFallbackToStateless` | This key enables fallback to stateless mode on all shaders | - |
| `EnableFunctionPointer` | Enables support for function pointers and indirect calls | - |
| `EnableGASResolver` | Enable GAS Resolver | - |
| `EnableGEPSimplification` | Enable GEP simplification | Available |
| `EnableGen11TwoStackTSG` | Enable Two stack TSG gen11 feature | - |
| `EnableGlobalStateBuffer` | This key allows stack calls to read implicit args from side buffer. It also emits a relocatable add in VISA. | Available |
| `EnableHFpacking` | Enable HF packing | - |
| `EnableHSSinglePatchDispatch` | Setting this to 1/true enables SIMD8 single-patch dispatch in HullShader. Default is either SIMD8 single patch/dual patch dispatch based on control point count | - |
| `EnableImplicitArgAsIntrinsic` | Use GenISAIntrinsic instructions for supported implicit args instead of passing them as function arguments | Available |
| `EnableIndirectCallOptimization` | Enables inlining indirect calls by comparing function addresses | - |
| `EnableIntDivRemCombine` | Given div/rem pairs with same operands merged; replace rem with mul+sub on quotient; 0x3 (set bit[1]) forces this on constant power of two divisors as well | Available |
| `EnableL3FlushForGlobal` | Enable/disable flushing L3 cache for globals | - |
| `EnableLSC` | Enables the new dataport encoding for LSC messages. | Available |
| `EnableLowerGPCallArg` | Enable pass to lower generic pointers in function arguments | - |
| `EnableMadLoopSlice` | Enables the slicing of mad loops. | Available |
| `EnableMaxWGSizeCalculation` | Enable max work group size calculation [OCL only] | Available |
| `EnableMeshSLMCache` | Enables caching Mesh shader outputs in SLM,<br/>                                                                bitmask:<br/>                                                                bit0 - cache AND flush mode, enable caching of Primitive Count and Primitive Indices,<br/>                                                                bit1 - cache AND flush mode, enable caching of per-vertex outputs,<br/>                                                                bit2 - cache AND flush mode, enable caching of per-primitive outputs,<br/>                                                                bit3 - mirror mode, if this bit is set bits 0, 1 and 2 are ignored,<br/>                                                                       enable caching of outputs that are read in the shader<br/>                                                                       data is only mirrored in SLM | Available |
| `EnableMeshShaderSimdSize` | Set allowed simd sizes for mesh shader compilation,<br/>                                                                bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32,<br/>                                                                e.g. 0x7 enables all simd sizes and 0x2 enables only simd16,<br/>                                                                valid values are from 0 to 7<br/>                                                                ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size,<br/>                                                                ignored if ForceMeshShaderSimdSize is set | Available |
| `EnableOCLSIMD16` | Enable OCL SIMD16 mode | Available |
| `EnableOCLSIMD32` | Enable OCL SIMD32 mode | Available |
| `EnableOCLScratchPrivateMemory` | Enable the use of scratch space for private memory [OCL only] | Available |
| `EnablePartialEmuI64` | Enable the partial I64 emulation for PVC-B | Available |
| `EnablePostCullPatchFIFOHP` | Enable Post-Cull Patch Decoupling FIFO. XeHP. | Available |
| `EnablePostCullPatchFIFOLP` | Enable Post-Cull Patch Decoupling FIFO. GEN12LP. | Available |
| `EnablePreRARematFlag` | Enable PreRA Rematerialization of Flag | - |
| `EnableReadGTPinInput` | Enables setting GTPin context flags by reading the input to the compiler adapters | - |
| `EnableRecursionOpenCL` | Enable recursion with OpenCL user functions | - |
| `EnableRuntimeFuncAttributePatching` | Creates a relocation entry to let runtime calculate the max call depth and patch required scratch space usage | Available |
| `EnableSMRescheduling` | Change instruction order to enable extra Sample Multiversioning cases | - |
| `EnableSampleBMLODWA` | Enable workaround for sample_b messages that use the mlod parameter | - |
| `EnableSampleDEmulation` | Enable emulation of sample_d. | Available |
| `EnableSampleDEmulationForTesting` | Enable emulation of sample_d on pre-XeHP platforms. | Available |
| `EnableSamplerSupport` | Enables sampler messages generation for PVC. | Available |
| `EnableScalarTypedAtomics` | Enable the Scalar Typed Atomics optimization | - |
| `EnableScratchMessageD64WA` | Enables WA to legalize D64 scratch messages to D32 | - |
| `EnableSelectiveScalarizer` | enable selective scalarizer on GPGPU path | Available |
| `EnableSingleVertexDispatch` | Vertex Shader Single Patch Dispatch Regkey | - |
| `EnableTaskShaderSimdSize` | Set allowed simd sizes for task shader compilation,<br/>                                                                bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32,<br/>                                                                e.g. 0x7 enables all simd sizes and 0x2 enables only simd16,<br/>                                                                valid values are from 0 to 7<br/>                                                                ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size,<br/>                                                                ignored if ForceMeshShaderSimdSize is set | Available |
| `EnableTileYForExperiments` | Enable TileY heuristics for experiments | - |
| `EnableTypeDemotion` | Enable Type Demotion | - |
| `EnableWideMulMad` | Enable wide (64-bit) mul and mad instructions | - |
| `Enable_Wa14010017096` | Enable Wa_14010017096 regardless of the platfrom stepping | Available |
| `Enable_Wa1507979211` | Enable Wa_1507979211 regardless of the platfrom stepping | Available |
| `Enable_Wa1807084924` | Enable Wa_1807084924 regardless of the platfrom stepping | Available |
| `Enable_Wa22010487853` | Enable Wa_22010487853 regardless of the platfrom stepping | Available |
| `Enable_Wa22010493955` | Enable Wa_22010493955 regardless of the platfrom stepping | Available |
| `Force32BitIntDivRemEmu` | Force 32-bit Int Div/Rem emulation using fp64, ignored if no native fp64 support | Available |
| `Force32BitIntDivRemEmuSP` | Force 32-bit Int Div/Rem emulation using fp32, ignored if Force32BitIntDivRemEmu is set and actually used | Available |
| `ForceDPEmulation` | Force double emulation for testing purpose | - |
| `ForceFFIDOverwrite` | Force overwriting ffid in sr0.0 | - |
| `ForceFormatConversionDG2Plus` | Forces SW image format conversion for R10G10B10A2_UNORM, R11G11B10_FLOAT, R10G10B10A2_UINT image formats on DG2+ platforms | Available |
| `ForceMeshShaderSimdSize` | Force mesh shader simd size,<br/>                                                                valid values are 0 (not set), 8, 16 and 32<br/>                                                                ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size | Available |
| `ForceNoLSC` | Disables the new dataport encoding for LSC messages. | Available |
| `ForceOCLSIMDWidth` | Force using SIMD width specified. 0 : no forcing. This overrides driver forced SIMD value(if any) and runtime behaviour could be different if driver expects something fixed | Available |
| `ForcePrefetchToL1Cache` | Forces standard builtin prefetch to use L1 cache | - |
| `ForceSPDivEmulation` | Force SP Div emulation for testing purpose | - |
| `ForceStaticToDynamic` | Force write of vertex count in GS | - |
| `ForceTaskShaderSimdSize` | Force task shader simd size,<br/>                                                                valid values are 0 (not set), 8, 16 and 32<br/>                                                                ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size | Available |
| `HoistPSConstBufferValues` | Hoists up down converts for contant buffer accesses, so they an be vectorized more easily. | - |
| `LICMStatThreshold` | LICM stat threshold to avoid retry SIMD16 for CS | - |
| `LateInlineUnmaskedFunc` | Postpone inlining of Unmasked functions till end of CG to avoid code movement inside/outside of unmasked region | - |
| `LscForceSpillNonStackcall` | Non-stack call kernels that spill will use LSC on DG2+ | Available |
| `LscImmOffsMatch` | <br/>    Match address patterns that have an immediate offset for the vISA LSC API<br/>    (0 means off/no matching,<br/>     2 means force on for all platforms (vISA will emulate the addition);<br/>     also see LscImmOffsVisaOpts | Available |
| `LscImmOffsVisaOpts` | <br/>    This maps to vISA_lscEnableImmOffsFor<br/>    (enables/disables immediate offsets for various address types; <br/>    see that option for semantics) | Available |
| `MaxLiveOutThreshold` | Max LiveOut Threshold in MemOpt2 | - |
| `OCLEnableReassociate` | Enable reassociation | Available |
| `OCLSIMD16SelectionMask` | Select SIMD 16 heuristics. Valid values are 0, 1, 2 and 3 | - |
| `OverrideDeviceIdForWA` | Enable this to override DeviceId | - |
| `OverrideProductFamilyForWA` | Enable this to override the product family, get the correct enum from igfxfmid.h | - |
| `OverrideRevIdForWA` | Enable this to override the stepping/RevId, default is a0 = 0, b0 = 1, c0 = 2, so on... | - |
| `PixelSampleHoistingLimit` | A sub-option of AdvMemOpt: hoist sample instruction in pixel shader | - |
| `RemoveLegacyOCLStatelessPrivateMemoryCases` | Remove cases where OCL uses stateless private memory. XeHP and above only! [OCL only] | Available |
| `SampleMultiversioning` | Create branches aroung samplers which can be redundant with some values | - |
| `SelectiveLoopUnrollForDPEmu` | Setting this to 0/false disable selective loop unrolling for DP emu | Available |
| `SendMultipleSIMDModesCS` | Send multiple SIMD modes for CS | - |
| `SkipPsSimdWithDualSimd` | Setting it to values def in igc.h will force SIMD mode to skip if the dual-SIMD8 kernel exists | Available |
| `UniformMemOpt4OW` | increase uniform memory optimization from 2 owords to 4 owords | Available |
| `allowLICM` | Enable LICM in IGC. | Available |
## Performance experiments
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `AddNoInlineToTrimmedFunctions` | Tell late passes not to inline trimmed functions | - |
| `AllocaRAPressureThreshold` | The threshold for the register pressure potential | - |
| `AllocateZeroInitializedVarsInBss` | Allocate zero initialized global variables in .bss section in ZEBinary | Available |
| `AllowNonLoopConstantPromotion` | Allows promotion for constants not in loop (e.g. used once) | - |
| `ByPassAllocaSizeHeuristic` | Force some Alloca to pass the pressure heuristic until the given size | Available |
| `CodePatch` | Enable Pixel Shader code patching to directly emit code after stitching | - |
| `CodePatchExperiments` | Experiment with code patching when != 0 | - |
| `CodePatchFilter` | Filter out unsupported patterns | - |
| `CodePatchLimit` | Debug CodePatch via limiting the number of shader been patched | - |
| `ConstantPromotionCmpSelSize` | Array size threshold for cmp-sel transform | - |
| `ConstantPromotionSize` | Threshold in number of GRFs | - |
| `ControlInlineImplicitArgs` | Avoid trimming functions with implicit args | Available |
| `ControlInlineTinySize` | Tiny function size for controlling kernel total size | Available |
| `ControlKernelTotalSize` | Control kernel total size | Available |
| `ControlUnitSize` | Control compilation unit size by unit trimming | Available |
| `DetectCastToGAS` | Check if the module contains local/private to GAS (Gerneric Address Space) cast, it also check internal flags | Available |
| `DiableWaSamplerNoMask` | Disable WA DiableWaSamplerNoMask | - |
| `DisableAddingAlwaysAttribute` | Disable adding always attribute | Available |
| `DisableDualBlendSource` | Force the compiler to never use dual blend source messages | - |
| `DisableFDIV` | Disable fdiv support | - |
| `DisableFastRAWA` | Disable Fast RA for hanging issues on large workloads | - |
| `DisableFastestGopt` | Disable global optimizations for stage 1 shaders. | - |
| `DisableFastestLinearScan` | Disable LinearScanRA in FastestSIMD. | - |
| `DisableUndefAlphaOutputAsRed` | Disable output red for undefined alpha output | - |
| `DisableWaDisableSIMD16On3SrcInstr` | Disable C0 WA WaDisableSIMD16On3SrcInstr, may be unsafe | - |
| `DisableWaSendSEnableIndirectMsgDesc` | Disable a C0 WA WaSendSEnableIndirectMsgDesc, may be unsafe | - |
| `DisbleLocalFences` | On CNL+ we need to emit local fences. Setting this to true removes those. It may be functionaly not correct. | - |
| `DispatchAlongY_XY_ratio` | min threshold for thread group size x / y for dispatchAlongY | - |
| `DispatchAlongY_X_threshold` | min threshold for thread group size x for dispatchAlongY | - |
| `DispatchGPGPUWalkerAlongYFirst` | 0 = No SW Y-walk, 1 = Dispatch GPGPU walker along Y first | - |
| `EmitDebugLoc` | Enable generation of .debug_loc section | - |
| `EmitOffsetInDbgLoc` | Emit offset of private memory in DW_AT_location when available | - |
| `EmitPreDefinedForAllFunctions` | When enabled, pre-defined variables for gid, grid, lid are emitted for all functions. This causes those functions to be inlined even when stack calls is enabled. | Available |
| `EmulateFDIV` | Emulate fdiv instructions | - |
| `EnableA64WA` | Guarantee A64 load/store addres-hi is uniform | Available |
| `EnableAccSub` | Enable accumulator substitution | - |
| `EnableByValStructArgPromotion` | If enabled, byval/sret struct arguments are promoted to pass-by-value if possible. | Available |
| `EnableConstantPromotion` | Enable global constant data to register promotion | - |
| `EnableDisableMidThreadPreemptionOpt` | Disable mid thread preemption | - |
| `EnableEvaluateSamplerSplit` | Split evaluate messages to sampler into either SIMD8 or SIMD1 messages | - |
| `EnableExtractMask` | When enabled, it is mostly for reducing response size of send messages. | - |
| `EnableFastestSingleCSSIMD` | Enable selecting single CS SIMD in staged compilation. | - |
| `EnableForceGroupSize` | Enable forcing thread Group Size ForceGroupSizeX and ForceGroupSizeY | - |
| `EnableForceThreadCombining` | Enable forcing Thread Combining with thread Group Size ForceGroupSizeX and ForceGroupSizeY | - |
| `EnableGPUFenceScopeOnSingleTileGPUs` | Allow the use of `GPU` fence scope on single-tile GPUs. By default the `TILE` scope is used instead of `GPU` scope on single-tile GPUs. | Available |
| `EnableGSURBEntryPadding` | Enable padding of GS URB Entry by adding extra portions of Control Data Header. | - |
| `EnableGSVtxCountMsgHalfCLSize` | Enable the Vertex Count msg of half CL size, instead of 1DW size. | - |
| `EnableGather4cpoWA` | Enable WA transforming gather4cpo/gather4po into gather4c/gather4 | - |
| `EnableHalfPromotion` | Enable pass that replaces instructions using halfs with corresponding float counterparts for pre-SKL | - |
| `EnableInsertElementScalarCoalescing` | Enable coalescing on the scalar operand of insertelement | - |
| `EnableIntelFast` | Enable intel fast, experimental flag. | - |
| `EnableLTO` | Enable link time optimization | - |
| `EnableLTODebug` | Enable debug information for LTO | Available |
| `EnableLocalIdCalculationInShader` | Enables calcualtion of local thread IDs in shader. Valid only in compute<br/>    shaders on XeHP+. IDs are calculated only if HW generated IDs cannot be<br/>    used. | Available |
| `EnableMixIntOperands` | Enable generating mix-sized operands for int ALU | - |
| `EnableOptReportPrivateMemoryToSLM` | [POC] Generate opt report file for moving private memory allocations to SLM. | - |
| `EnablePreRAAccSchedAndSub` | Enable accumulator substitution | - |
| `EnablePreRASampleCluster` | Enabling helps cluster sample instructions with identical texture index which are ready to be scheduled, to be scheduled together | - |
| `EnableSamplerSplit` | Split Sampler 3d message to odd and even | - |
| `EnableStackCallFuncCall` | If enabled, the default function call mode will be set to stack call. Otherwise, subroutine call is used. | - |
| `EnableSubroutineForEmulation` | Enable subroutine call support when emulation(double) is on. Heuristic decides which use subroutine calls. | - |
| `EnableTCSHWBarriers` | Enable TCS pass with HW barriers support. Default TCS pass is TCS pass with multiple continuation functions. | - |
| `EnableTEFactorsClear` | Enable clearing of tessellation factors. | - |
| `EnableTEFactorsPadding` | Enable padding of the TE factors. | - |
| `EnableThreadCombiningWithNoSLM` | Enable thread combining opt for shader without SLM | - |
| `EnableTrackPtr` | Track Staging Context alloc/dealloc | - |
| `EnableVariableAlias` | Enable variable aliases (part of VariableReuse Pass, but separate functionality) | - |
| `EnableVariableReuse` | Enable local variable reuse | - |
| `EnableVector8LoadStore` | Enable Vectorizer to generate 8x32i and 4x64i loads and stores | Available |
| `EnableZEBinary` | Force-enable output in ZE binary format. Leave unset for compiler to choose based on current platform's support for ZE binary | Available |
| `ExcludeIRFromZEBinary` | Exclude IR sections from ZE binary | Available |
| `ExpandedUnitSizeThreshold` | Trimming target of compilation unit size | Available |
| `ExtraRetrySIMD16` | Enable extra simd16 with retry for STAGE1_BEST_PREF | - |
| `FastCompileRA` | Provide the fast compilatoin path for RA, fail safe at first iteration | - |
| `FastSpill` | fast spill code gen. This may produce worse equality code for the spilling shader | - |
| `FastestS1Experiments` | Select configs for fastest compilation by bits. | - |
| `FirstStagedSIMD` | Force Pixel shader to be 1: FastSIMD (SIMD8), 2: BestSIMD (SIMD16 or SIMD8), 3: FatestSIMD (SIMD8 opt off) | - |
| `ForceAddingStackcallKernelPrerequisites` | Force adding static overhead for stackcall to the kernel entry such as HWTID instructions for experiments | - |
| `ForceAllPrivateMemoryToSLM` | [POC] Force moving all private memory allocations to SLM. | - |
| `ForceBestSIMD` | Force pixel shader to return the best SIMD, either SIMD16 or SIMD8. | - |
| `ForceDisableSrc0Alpha` | Force the compiler to skip sending src0 alpha. Only works if we are sure alpha to coverage and alpha test is off | - |
| `ForceFastestSIMD` | Force pixel shader to return SIMD8 as fast as possible. | - |
| `ForceGroupSizeShaderHash` | Shader hash for forcing thread group size or thread combining (lower 8 hex digits) | - |
| `ForceGroupSizeX` | force group size along X | - |
| `ForceGroupSizeY` | force group size along Y | - |
| `ForceHalfPromotion` | Force enable pass that replaces instructions using halfs with corresponding float counterparts | - |
| `ForceInlineExternalFunctions` | not to trim functions called from multiple kernels | Available |
| `ForceInlineStackCallWithImplArg` | If enabled, stack calls that uses implicit args will be force inlined. | Available |
| `ForceLowestSIMDForStackCalls` | If enabled, compile to the lowest allowed SIMD mode when stack calls or indirect calls are present | Available |
| `ForceMCFBarriers` | Force TCS pass with MCF (SW) barriers support. Default TCS pass is TCS pass with multiple continuation functions. | - |
| `ForceMixMode` | force enable mix mode even on platforms that do not support it | - |
| `ForceNoFP64bRegioning` | force regioning rules for FP and 64b FPU instructions | - |
| `ForceNonCoherentStatelessBTI` | Enable gneeration of non cache coherent stateless messages | - |
| `ForcePixelShaderSIMDMode` | Setting it to values def in igc.h will force SIMD mode compilation for pixel shaders. Note that only SIMD8 is compiled unless other ForcePixelShaderSIMD* are also selected | - |
| `ForcePrivateMemoryToGlobalOnGeneric` | Force moving private memory allocations to global buffer when generic pointer is present | Available |
| `ForcePrivateMemoryToSLMOnBuffers` | [POC] Force moving private memory allocations to SLM, semicolon-separated list of buffers. | - |
| `ForceSWCoalescingOfAtomicCounter` | Force software coalescing of atomic counter | - |
| `ForceSendsSupportOnSKLA0` | Allow sends on SKL A0, may be unsafe | - |
| `ForceSubroutineForEmulation` | Force subroutine call for all emulation functions if emulation(double) is on. | - |
| `FunctionCloningThreshold` | Limits the number of cloned functions when called from multiple function groups.<br/>    If number of cloned functions exceeds the threshold, compile the function only once and use address relocation instead.<br/>    Setting this to '0' allows IGC to choose the default value. | Available |
| `FunctionControl` | Control function inlining/subroutine/stackcall. See value defs in igc_flags.hpp. | Available |
| `FuseTypedWrite` | Enable fusing of simd8 typed write | - |
| `HPCFastCompilation` | Force to do fast compilation for HPC kernel | - |
| `HPCGlobalInstNumThreshold` | The threshold for the register pressure potential | - |
| `HPCInstNumThreshold` | The threshold for the register pressure potential | - |
| `HasDoubleAcc` | has doubled accumulators | - |
| `HybridRAWithSpill` | Did Hybrid RA with Spill | - |
| `InlinedEmulationThreshold` | Inlined instruction threshold for enabling subroutines | - |
| `KernelTotalSizeThreshold` | Trimming target of kernel total size | Available |
| `LTOForStage1Compilation` | LTO for stage 1 compilation | - |
| `LimitConstantBuffersPushed` | Limit max number of CBs pushed when SupportIndirectConstantBuffer is true | - |
| `MSAA16BitPayloadEnable` | Enable support for MSAA 16 bit payload , a hardware DCN supporting this from ICL+ to improve perf on MSAA workloads | - |
| `MaxPreRASchedulerRegPressureThreshold` | Max PreRA Scheduler Threshold | - |
| `MemOptWindowSize` | Size of the window in unit of instructions in which load/stores are allowed to be coalesced. Keep it limited in order to avoid creating long liveranges. Default value is 150 | - |
| `MidThreadPreemptionDisableThreshold` | Threshold to disable mid thread preemption | - |
| `NumGeneralAcc` | set the number [1-8] of general acc for accumulator substitution. 0 means using the platform-default value | - |
| `OCLInlineThreshold` | Setting OCL inline thershold | Available |
| `OverrideCsTileLayout` | Override compute walker tile layout. False is linear. True is TileY | Available |
| `OverrideCsTileLayoutEnable` | Enable overriding compute walker tile layout | Available |
| `OverrideCsWalkOrder` | Override compute walker walk order | Available |
| `OverrideCsWalkOrderEnable` | Enable overriding compute walker walk order | Available |
| `OverrideOCLMaxParamSize` | Override the value imposed on the kernel by CL_DEVICE_MAX_PARAMETER_SIZE. Value in bytes, if value==0 no override happens. | Available |
| `PartitionUnit` | Partition compilation unit | Available |
| `PartitionWithFastHybridRA` | Enable FastRA and HybridRA when partition is enabled | Available |
| `PixelShaderDoNotAbortOnSpill` | Do not abort on a spill | - |
| `PrintControlKernelTotalSize` | Print Control kernel total size | Available |
| `PrintControlUnitSize` | Print information about unit trimming | Available |
| `PrintFunctionSizeAnalysis` | Print analysis data of function sizes | Available |
| `PrintPartitionUnit` | Print information about compilation unit partitioning | Available |
| `RequestStage2` | Enable staged compilation via requesting stage 2 | - |
| `SaveRestoreIR` | Save/Restore IR for staged compilation to avoid duplicated compilations | - |
| `SelectiveFunctionControl` | Selectively enables FunctionControl for a list of line-separated function names in 'FunctionDebug.txt' in the IGC output dir.<br/>    When set by this flag, the functions in the FunctionDebug list will override the default FunctionControl mode. | - |
| `SetMaxPreRASchedulerRegPressureThreshold` | set Max PreRA Scheduler Threshold | - |
| `SkipTREarlyExitCheck` | Skip SIMD16 early exit check in ShaderCodeGen | - |
| `StagedCompilationExperiments` | Experiment with staged compilation when != 0 | - |
| `StripDebugInfo` | Strip debug info from llvm IR lowered from input to IGC .<br/>    Possible values: 0 - dont strip, 1 - strip all, 2 - strip non-line info | - |
| `SubroutineInlinerThreshold` | Subroutine inliner threshold | - |
| `SubroutineThreshold` | Minimal kernel size to enable subroutines | - |
| `UnitSizeThreshold` | Compilation unit size threshold | Available |
| `UpConvertF16Sampler` | up-convert fp16 sampler mesasge to return fp32 | - |
| `UseOldSubRoutineAugIntf` | Use the old subroutine augmentation code which is slower | - |
| `VATemp` | [temp]New code to replace code under EnableVATemp (removed already). Once stable, remove this. | - |
| `VFPackingDisablePartialElements` | disable packing for partial vertex element as it causes performance drops | - |
| `VariableReuseByteSize` | The byte size threshold for variable reuse | - |
| `cl_khr_srgb_image_writes` | Enable cl_khr_srgb_image_writes extension | - |
| `disableRemat` | disable re-materialization | - |
| `disableUnormTypedReadWA` | disable software conversion for UNORM surface in Dx10 | - |
| `disableVarSplit` | disable variable splitting | - |
| `forceGlobalRA` | force global register allocator | - |
| `forceSamplerHeader` | force sampler messages to use header | - |
| `samplerHeaderWA` | enable sampler header to solve HW WA | - |
## Generating precompiled headers
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `ApplyConservativeRastWAHeader` | Apply WaConservativeRasterization for the platforms enabled | - |
## Raytracing Options
| Flag  | Description | Release builds |
|:---- | :---- | :----: |
| `ContinuationInlineThreshold` | If number of continuations is greater than threshold, default to indirect | Available |
| `DeferCollectionStateObjectCompilation` | Wait to compile till the RTPSO stage | Available |
| `DisableCanonizationWA` | WA for A0 to inject shifts to canonize global and local pointers | Available |
| `DisableCompactifySpills` | Just emit spill/fill at the point of def/use | Available |
| `DisableCrossFillRemat` | Rematerialize values if they use already spilled values | Available |
| `DisableDPSE` | Disable Dead PayloadStore Elimination. | Available |
| `DisableEarlyRemat` | Disable quick remats to avoid some spills | Available |
| `DisableEntryFences` | Don't emit the evict and invalidate fences for A0 WA | - |
| `DisableExamineRayFlag` | Don't do IPO to see if we can fold control flow given knowledge of possible rayflag values | - |
| `DisableFuseContinuations` | If set, we will look for small duplicated continuations to merge into one. | Available |
| `DisableInvariantLoad` | Disabled !invariant_load metadata for raytracing shaders | Available |
| `DisableLSCControlsForRayTracing` | Disable different LSC Controls for HW and SW portions of the RTStack | Available |
| `DisableLateRemat` | Disable quick remats to avoid some spills | Available |
| `DisableMatchRegisterRegion` | Disable matching for debug purposes | Available |
| `DisableNullBVHCheck` | Disables default null check of the BVH ptr | Available |
| `DisablePayloadSinking` | sink stores to payload into inlined continuations | Available |
| `DisablePreSplitOpts` | Disable last minute optimizations befoer shader splitting | Available |
| `DisablePredicatedStackIDRelease` | Emit a single stack ID release at the end of the shader | Available |
| `DisablePrepareLoadsStores` | Disable preparation for MemOpt | Available |
| `DisablePromoteContinuation` | BTD-able continuations in the raygen may be moved to the shader identifier | - |
| `DisablePromoteToScratch` | Use scratch space rather than SWStack when possible. | Available |
| `DisableRTAliasAnalysis` | Disable Raytracing Alias Analysis | - |
| `DisableRTBindlessAccess` | do bindful rather than bindless accesses to raytracing memory | Available |
| `DisableRTFenceElision` | Disable optimization to remove unneeded fences | - |
| `DisableRTGlobalsKnownValues` | load MaxBVHLevels from RTGlobals rather than assumming = 2 | Available |
| `DisableRTMemDSE` | Analyze stores to SWStack, etc. that aren't read before Stack ID Release | - |
| `DisableRTStackOpts` | Disable some optimizations that minimize reads/writes to the RTStack | Available |
| `DisableRayTracingConstantCoalescing` | Disable coalescing | Available |
| `DisableRayTracingCustomTile` | Disables X,Y regkeys to pick a particular tile size (i.e., workgroup dimensions) | Available |
| `DisableRayTracingOptimizations` | Disable RayTracing Optimizations for debugging | Available |
| `DisableRaytracingIntrinsicAttributes` | Turn off noalias and dereferenceable attributes | Available |
| `DisableSWStackOffsetElision` | Avoid loading offseting when known at compile-time | - |
| `DisableShaderFusion` | Don't check for duplicate, renamed shaders | - |
| `DisableSpillReorder` | Disables reordering of spills to try to minmize spills in a loop | - |
| `DisableStatefulRTStackAccess` | do stateless rather than stateful accesses to the HW portion of the async stack | Available |
| `DisableStatefulRTSyncStackAccess` | do stateless rather than stateful accesses to the HW portion of the sync stack | Available |
| `DisableStatefulRTSyncStackAccess4RTShader` | do stateless rather than stateful accesses to the HW portion of the sync stack. RT Shader only. | Available |
| `DisableStatefulRTSyncStackAccess4nonRTShader` | do stateless rather than stateful accesses to the HW portion of the sync stack. nonRT Shader only. | Available |
| `DisableStatefulSWHotZoneAccess` | do stateless rather than stateful accesses to the SW HotZone | Available |
| `DisableStatefulSWStackAccess` | do stateless rather than stateful accesses to the SW Stack | Available |
| `DisableWideTraceRay` | Disable SIMD16 style message payloads for send.rta | Available |
| `EnableCompressedRayIndices` | Use an alternate form with bit twiddling to pack stack pointer and indices into two DWORDs | Available |
| `EnableFillScheduling` | Schedule fills for reduced register pressure | - |
| `EnableHoistRemat` | Hoist rematerialized instructions to shader entry. Longer live ranges but common values fused. | Available |
| `EnableIndirectContinuations` | Enable BTD for continuation shaders (regardless of inline threshold). | Available |
| `EnableInlinedContinuations` | Forcibly inline all continuations | Available |
| `EnableKnownBTIBase` | For testing, assume that we know what baseBTI is in RTGlobals | Available |
| `EnableLSCCacheOptimization` | Optimize store instructions for utilizing the LSC-L1 cache | - |
| `EnableRQHideLatency` | Hide RayQuery Proceed latency. | - |
| `EnableRTDispatchAlongY` | Dispatch Compute Walker along Y first | Available |
| `EnableRTPrintf` | Enable printf for ray tracing. | Available |
| `EnableRayTracingCustomSubtile` | Enables X,Y regkeys to pick a particular subtile size (i.e., tile within a workgroup) | Available |
| `EnableRayTracingTGMFence` | Enable tgm fence in RT workloads for debugging | - |
| `EnableSingleRQMemRayStore` | Store RayQuery MemRay[TOP] only once. | - |
| `EnableSpillWidening` | Expand SWStack spills into padding of stack frame | Available |
| `EnableStackIDReleaseScheduling` | Schedule Stack ID Release messages prior to the end of the shader | - |
| `EnableSyncDispatchRays` | Enable sync DispatchRays implementation | - |
| `ForceCSLeastSIMD4RQ` | Force computer shader with RayQuery to the lowest allowed SIMD mode | - |
| `ForceCSSimdSize4RQ` | Force RayQuery compute shader simd size,<br/>                                                      valid values are 0 (not set), 8, 16 and 32<br/>                                                      ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size | Available |
| `ForceFirstFencesEvict` | Force evict fence op on fences prior to the stack ID release | Available |
| `ForceGenMemDefaultCacheCtrl` | If enabled, no message specific cache ctrls are set on memory outside of RTStack, SWStack, and SWHotZone | Available |
| `ForceGenMemLoadCacheCtrl` | Enables GenMemLoadCacheCtrl regkey for custom lsc load cache controls in other memory | Available |
| `ForceGenMemStoreCacheCtrl` | Enables GenMemStoreCacheCtrl regkey for custom lsc store cache controls in other memory | Available |
| `ForceNullBVH` | Swap BVH with null pointer. Infinitely fast ray traversal. | Available |
| `ForceRTCheckInstanceLeafPtr` | Check MemHit::valid before loading GeometryIndex, PrimitiveIndex, etc. | Available |
| `ForceRTCheckInstanceLeafPtrMask` | Test only. 1: committedindex; 2: potentialindex | Available |
| `ForceRTConstantBufferCacheCtrl` | Enables RTConstantBufferCacheCtrl regkey for custom lsc load cache controls for constant buffers | Available |
| `ForceRTRetry` | Raytracing is compiled in the second retry state | - |
| `ForceRTShortCircuitingOR` | Only for specific test.... Short curcite OR condition if CommittedGeometryIndex is used | Available |
| `ForceRTStackLoadCacheCtrl` | Enables RTStackLoadCacheCtrl regkey for custom lsc load cache controls in the RTStack | Available |
| `ForceRTStackStoreCacheCtrl` | Enables RTStackStoreCacheCtrl regkey for custom lsc store cache controls in the RTStack | Available |
| `ForceRayTracingSIMDWidth` | Valid values: 0 = default, 8 = SIMD8, 16 = SIMD16 | Available |
| `ForceSWHotZoneLoadCacheCtrl` | Enables SWHotZoneLoadCacheCtrl regkey for custom lsc load cache controls in the SWHotZone | Available |
| `ForceSWHotZoneStoreCacheCtrl` | Enables SWHotZoneStoreCacheCtrl regkey for custom lsc store cache controls in the SWHotZone | Available |
| `ForceSWStackLoadCacheCtrl` | Enables SWStackLoadCacheCtrl regkey for custom lsc load cache controls in the SWStack | Available |
| `ForceSWStackStoreCacheCtrl` | Enables SWStackStoreCacheCtrl regkey for custom lsc store cache controls in the SWStack | Available |
| `ForceWholeProgramCompile` | Compile as if we know all of the shaders upfront | Available |
| `KnownBTIBaseValue` | If EnableKnownBTIBase is set, use this value for baseBTI | Available |
| `PrintfBufferSize` | Set printf buffer size. Unit: KB. | Available |
| `RTFenceToggle` | Toggle fences | Available |
| `RTInValidDefaultIndex` | If MemHit::valid is false, the default value to return for some intrinsics like GeometryIndex or PrimitiveIndex etc. | Available |
| `RayTracingConstantCoalescingMinBlockSize` | Set the minimum load size in # OWords = [1,2,4,8,16]. When 0, # OWords = 4 | Available |
| `RayTracingCustomSubtileXDim2D` | X dimension of subtile (default: 4) | Available |
| `RayTracingCustomSubtileYDim2D` | Y dimension of subtile (default: 4) | Available |
| `RayTracingCustomTileXDim1D` | X dimension of tile (default: 256) | Available |
| `RayTracingCustomTileXDim2D` | X dimension of tile (default: 32) | Available |
| `RayTracingCustomTileYDim1D` | Y dimension of tile (default: 1) | Available |
| `RayTracingCustomTileYDim2D` | Y dimension of tile (default: 4) | Available |
| `RayTracingDumpYaml` | Dump yaml input/output files | Available |
| `RayTracingKeepUDivRemWA` | Workaround till jitIsa supports cr0 for rtz conversions | Available |
| `RematThreshold` | Tunes how aggresively we should remat values into continuations | Available |
| `ShaderFusionThrehold` | If there are less shaders than this, don't spend time checking duplicates | - |
| `TotalGRFNum4RQ` | Total GRF used for register allocation for RayQuery only. Test only. Delete later. | - |
