DORSETRIGS
Home

sse (10 post)


posts by category not found!

SSE2 Color Blending

Speeding Up Color Blending with SSE 2 Instructions Color blending is a fundamental operation in computer graphics used to create realistic and visually appealin

2 min read 06-10-2024 52
SSE2 Color Blending
SSE2 Color Blending

GCC generates slow code when targeting more recent sse version

Why Your GCC Compiled Code is Slow The SSE Version Conundrum Modern CPUs boast advanced instruction sets like SSE Streaming SIMD Extensions to accelerate perfor

2 min read 04-10-2024 47
GCC generates slow code when targeting more recent sse version
GCC generates slow code when targeting more recent sse version

inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch

Understanding Inlining Failures in SIMD Programming The Case of mm mullo epi32 When working with SIMD Single Instruction Multiple Data operations in C or C deve

2 min read 28-09-2024 52
inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch
inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch

Why does removing instructions from my SSE intrinsic function make it slower?

Why Does Removing Instructions from My SSE Intrinsic Function Make It Slower When optimizing code that employs SIMD Single Instruction Multiple Data capabilitie

3 min read 23-09-2024 70
Why does removing instructions from my SSE intrinsic function make it slower?
Why does removing instructions from my SSE intrinsic function make it slower?

Push XMM register to the stack

Pushing and Popping XMM Registers to the Stack Many developers working with x86 assembly language encounter the need to store and retrieve values held in XMM re

2 min read 07-09-2024 44
Push XMM register to the stack
Push XMM register to the stack

Per-element atomicity of vector load/store and gather/scatter?

Diving Deep into Per Element Atomicity of Vector Operations on x86 This article delves into the complex world of vector load store gather and scatter instructio

2 min read 06-09-2024 46
Per-element atomicity of vector load/store and gather/scatter?
Per-element atomicity of vector load/store and gather/scatter?

Speed-up byte signature scanning in memory using SIMD

Supercharge Your Byte Signature Scanning with SIMD Finding specific byte sequences within large blocks of memory is a common task in many applications from secu

2 min read 29-08-2024 77
Speed-up byte signature scanning in memory using SIMD
Speed-up byte signature scanning in memory using SIMD

How to implement real-time responses in a Flask-based chatbot with OpenAI Assistants API?

Implementing Real Time Responses in a Flask Based Chatbot with Open AI Assistants API This article explores how to implement real time responses in a Flask base

3 min read 28-08-2024 45
How to implement real-time responses in a Flask-based chatbot with OpenAI Assistants API?
How to implement real-time responses in a Flask-based chatbot with OpenAI Assistants API?

Twice as slow SIMD performance without extra copy

Twice as slow SIMD performance without extra copy This article explores a puzzling performance disparity observed in SIMD Single Instruction Multiple Data code

2 min read 28-08-2024 58
Twice as slow SIMD performance without extra copy
Twice as slow SIMD performance without extra copy

Why CSAPP say Gcc do not use vcvtss2sd?

Why CSAPP says GCC does not use vcvtss2sd The statement in the book Computer Systems A Programmers Perspective 3rd Edition about GCC not using vcvtss2sd for sin

2 min read 27-08-2024 50
Why CSAPP say Gcc do not use vcvtss2sd?
Why CSAPP say Gcc do not use vcvtss2sd?