I never tried it but I am expecting the performance to be not good enough, it ta...

spraak · on Feb 17, 2018

Is this for an open source project, or anything you'll be publishing? I'd be interested to follow on with the results!

danbruc · on Feb 17, 2018

Right now it is just an effort to develop a tool to diagnose and hopefully thereafter fix random performance problems we are experiencing with one of our applications in production. Despite having a small team dedicated to investigating the problems, monitoring every click and function call with Dynatrace, having had a Microsoft SQL server expert look into it, and getting the system audited by one of the big consulting companies, the problem persists since years and nobody has really any clue about what is going wrong.

The performance is never really great, it is [one of] the central applications of the company and depends on the interaction with a sizable junk of the system landscape developed over decades and therefore it is prone to be affected by incidents in a lot of systems but most of the time it is good enough. But once every couple of weeks or months something goes badly wrong an requests, it's a web application, start taking several seconds or even minutes to complete. Minutes later everything is back to normal.

But I digress. If I would manage to come up with a reusable and somewhat general tool to analyze data similar to what I am looking at, I would consider releasing it. It could either be a somewhat general data analysis and visualization tool, think R, or it could be more specifically tailored towards looking for anomalies in data sets like the one I am investigating. But as of now I am struggling to come up with a general framework to express the analyses I am performing and therefore all I have is a rather ad hoc collection of transformations that extract and visualize aspects of the data that could lead to new insights into what is going on.

But right now it is really driven by our specific issue, I notice something in one view of the data and then come up with a new transformation to look at it in more detail or from a different angle. It is nothing that could easily be reused by anyone else and so for the moment it seems most likely that this will never become public or maybe only in the form of a blog article explaining what kind of information might be useful to look at and how to derive it from logs that look rather uninteresting at first glance.

spraak · on Feb 18, 2018

Aha, well thanks for sharing this far :)

mjburgess · on Feb 17, 2018

100% you should.

All of what you have just described isnt very hard in SQL.. coalescing on null, aggregating, etc.

GROUPING SETS() will give you your missing rows.