Yes, I wrote one a long time ago that exposed basic counters from the ARMv7 PMU to userspace -- where they are normally not accessible unless used through perf_events. This is because perf wasn't available (I think the shitty old vendor BSP I was using couldn't support it for some reason) and it's relatively costly (perf_event_open is a whole syscall + another syscall to `read` off the event info) -- I just wanted some basic cycle timings for some cryptography code I wrote to see what worked and what didn't.
Even though it has some annoying gotchas (such as the fact ARM cores can sleep/frequency scale on demand with no forewarning, meaning cycles aren't always the most precise units of measurement), and is very simple -- this thing ended up being mildly popular. Even though I wrote it years ago, someone from CERN recently emailed me to say they happily used it for work, and someone from Samsung ported it to ARMv8 for me...
(I should dust off my boards one of these days and clean it up again, perhaps! People still email me about it.)
Even though it has some annoying gotchas (such as the fact ARM cores can sleep/frequency scale on demand with no forewarning, meaning cycles aren't always the most precise units of measurement), and is very simple -- this thing ended up being mildly popular. Even though I wrote it years ago, someone from CERN recently emailed me to say they happily used it for work, and someone from Samsung ported it to ARMv8 for me...
(I should dust off my boards one of these days and clean it up again, perhaps! People still email me about it.)