ARTICLE AD BOX
I want to create a computational block which should be able to use AVX if it is available. For that I want to use a template depending on the concrete AVX, compile all versions and in runtime use the one supported by the local CPU. I plan it to look something like this [note that for simplicity, we consider that at least one AVX is available]:
enum AvxT : int { AVX_512, AVX_256, AVX_128 }; template<AvxT E> class AvxCalc { public: using Cell = std::conditional_t<E == AvxT::AVX_128, __m128i, std::conditional_t<E == AvxT::AVX_256, __m256i, __m512i> >; template<AvxT T = E> std::enable_if_t<T == AvxT::AVX_128, AvxCalc<E>> operator + (const AvxCalc<E> & other) const; // ... [the other AvxT declarations] private: Cell data; }; template<AvxT E> template<AvxT T> std::enable_if_t<T == AvxT::AVX_128, AvxCalc<E>> AvxCalc<E>::operator+(const AvxCalc<E> & other) const { return AvxCalc<E>(_mm_add_epi32(data, other.data)); } // ... [the other AvxT implementations]Somewhere else there will be router:
if (haveAvx128()) // non constexpr router { AvxCalc<AvxT::AVX_128> calc; // ... [do calculations] }The question is - if it is compiled on a machine capable to do any of the AVX and then distributed, will it crash on cpus not compatible with, say, AVX512? Is this idea actually viable at all?
