Support Windows on ARM64
When I was working on !321 (merged), it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM64 (which makes cacheline clearing impossible). ARM64 implies Neon, and MSVC does not support SVE -- this make the CPU capability detection as easy as on macOS on ARM64.