Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • pystencils pystencils
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 17
    • Issues 17
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • pycodegen
  • pystencilspystencils
  • Merge requests
  • !225

Closed
Created Feb 26, 2021 by Michael Kuron@kuronMaintainer
  • Report abuse
Report abuse

WIP: ARM cache line zeroing

  • Overview 0
  • Commits 1
  • Pipelines 3
  • Changes 1

ARM has a cache line zero instruction that prevents data that will be overwritten anyway from being loaded from RAM. Kind of a light version of a non-temporal store. Saw this near the end of https://www.youtube.com/watch?v=BP7XD7JHgrI in the context of SVE, but might be relevant on Neon too. Just wanted to keep a note of this here.

Integrating this into pystencils is probably not completely straight-forward as you first need to check how much would be zeroed (64 bytes on all current chips, not guaranteed to match the cache line size), zero it, and then write the corresponding amount of data. Not sure if there are guarantees as to whether it's a multiple of the vector width.

There is not a whole lot of information for ARM, but the exact same thing has existed on IBM‘s PowerPC architecture (!228 (merged)) for decades. There, a cache line has 128 bytes (can be queried from the kernel via sysconf(_SC_LEVEL1_DCACHE_LINESIZE)) and can be zeroed with the __dcbz intrinsic.

Edited Mar 18, 2021 by Michael Kuron
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: cachelinezero