Making non cuda code compatible with cuda which uses std functions where cuda::std exists

22 hours ago 1
ARTICLE AD BOX

I'm attempting to create a function that handles bit casting a float/double/float16 to an unsigned key of equivalent size. I can do this in host code easily with

template<std::floating_point T> [[nodiscard]] constexpr auto bit_cast_unsigned(T t) { if constexpr(sizeof(T) == 1) { return std::bit_cast<std::uint8_t>(t); } if constexpr(sizeof(T) == 2) { return std::bit_cast<std::uint16_t>(t); } if constexpr(sizeof(T) == 4) { return std::bit_cast<std::uint32_t>(t); } if constexpr(sizeof(T) == 8) { return std::bit_cast<std::uint64_t>(t); } }

However, I want to be able to also support this in device code and I want this function used at runtime, not just at compile time, I have to do the following:

template<std::floating_point T> [[nodiscard]] __host__ __device__ constexpr auto bit_cast_unsigned(T t) { if constexpr(sizeof(T) == 1) { return cuda::std::bit_cast<std::uint8_t>(t); } if constexpr(sizeof(T) == 2) { return cuda::std::bit_cast<std::uint16_t>(t); } if constexpr(sizeof(T) == 4) { return cuda::std::bit_cast<std::uint32_t>(t); } if constexpr(sizeof(T) == 8) { return cuda::std::bit_cast<std::uint64_t>(t); } }

The problem is that this doesn't work when I'm in a codebase which isn't using CUDA, and I don't want to make CUDA a requirement on this function existing. But I still want to support cuda::std:: equivalent functions when available.

Is there a way to do something like :

#if defined(CUDA_ENABLED) #include <cuda/bit> #define PROJECT_HOST_DEVICE_STD_NAMESPACE cuda::std #define PROJECT_HOST_DEVICE __host__ __device__ #else #include <bit> #define PROJECT_HOST_DEVICE_STD_NAMESPACE std #define PROJECT_HOST_DEVICE #endif ... template<std::floating_point T> [[nodiscard]] PROJECT_HOST_DEVICE constexpr auto bit_cast_unsigned(T t) { if constexpr(sizeof(T) == 1) { return PROJECT_HOST_DEVICE_STD_NAMESPACE::bit_cast<std::uint8_t>(t); } if constexpr(sizeof(T) == 2) { return PROJECT_HOST_DEVICE_STD_NAMESPACE::bit_cast<std::uint16_t>(t); } if constexpr(sizeof(T) == 4) { return PROJECT_HOST_DEVICE_STD_NAMESPACE::bit_cast<std::uint32_t>(t); } if constexpr(sizeof(T) == 8) { return PROJECT_HOST_DEVICE_STD_NAMESPACE::bit_cast<std::uint64_t>(t); } }

I've seen posts like CUDA and nvcc: using the preprocessor to choose between float or double which seem to indicate using __CUDACC__ as a solution, however I'm worried I'll run into weird linker issues or something if I have one section of code that compiles using bit_cast_unsigned in a cpp file in code that doesn't have cuda enabled (ie transitively through VCPKG) and then another piece of code, either in another cuda enabled library or a final cuda enabled executable that does have cuda enabled using the same bit_cast_unsigned.

Can I use __CUDACC__ with out issue like above with cuda and non cuda codebases (which may even be mixed together, it like in VCPKG) or is there another solution?

Read Entire Article