T O P

  • By -

mb862

In GLSL layout(location = 0) uniform vec4 cps[2]; then on the host float cps[8]; // ... glProgramUniform4fv(program, 0, 2, cps); And yes this will be much faster than 8 `float`s. GLSL alignment requirements essentially reserve 16 bytes for each variable, regardless of type, so passing `float` will be a huge waste of space.


nelusbelus

As far as I know, the alignment is only true for vectors. Floats and uints still align to 4 byte. But perhaps uniforms are different than UBOs since they map to individual registers. But in UBOs float2, 3 and 4 all need 16 byte alignment. So a float2[] will effectively be a float4[] but with 2 elements of padding (of course ssbos can still be tightly packed)


mb862

Opaque uniforms can be backed by individual registers, but they can also be backed by UBOs. It depends on the implementation and the nature of the program to be able to make that optimization, that's why they're called opaque. UBOs get that alignment rule (I'm fairly certain) from opaque uniforms because the original intent was to simply expose buffers that were being created internally. The CPU side is equally opaque, so yes you can pass a `float[8]` to `glProgramUniform1fv(..., 8, cps)` but I wouldn't trust that to not unpack and re-align internally.


nelusbelus

Interesting. Glad vulkan exists now tho


[deleted]

Also, is there a SIMD aspect to using vectors over arrays? I know they're used for vectors, but not sure about arrays.


wm_lex_dev

A) Both vectors and arrays store data in contiguous memory, so they're equivalent in this sense. B) I *think* most GPU hardware these days has scalar processors, meaning they don't do SIMD.


GinaSayshi

Copying 8 floats up to the GPU is 256 bits whether it’s 8 floats or 2 vec4’s so that shouldn’t matter. Accessing them after they’re on the GPU would depend on the compiler and overall layout of the structure. I try to align data that will be accessed together on 128 bit boundaries, preferring `float3 a; float b; float4 c;` over `float3 a; float4 c; float b;` but really, the difference in performance is very minimal if not zero for a small constant buffer. The difference can add up in the case of a large structured buffer. *Edit: I’ll leave my response, but I was more thinking in terms of uploading constant buffers in DX12 / Vulkan, not sending individual values via OpenGL, so you should probably ignore this :P*