Use static inline functions in header to do CPU feature detection.
The c files are already compiled/linked with SIMD support and might have
used instructions from that featureset already.
NEON optimized code might be used in multiarch/universal builds.
So not only guard with WITH_NEON but also with architecture defines from
winpr/platform.h