If theres a benchmark, people will cheat, lie and optimize for that benchmark. Honest depends on the compliance enforced on teams. But if, compliance itself is weak, it is going to be taken advantage of. Like growing up india, you would optimize for the exam and not what you learn from it.
[1] https://news.ycombinator.com/item?id=47920787
We want to migrate away from InfluxDB eventually (because of their 180 on OSS, and their tendency to reinvent the product every major release), and QuestDB seems like an interesting option.
My only complaints are:
1) Memory usage is a bit high. We went with the AWS instance they recommended in the docs and even that went over our provisioned memory. It's not much but I think it could be improved
2) You need to buy their enterprise plan if what you're storing is remotely sensitive like health data, PII, etc. Any row level security or credential features are locked behind that license. Our use case isn't that sensitive so we can get away with putting it in a VPN and password protecting it, but if you need DB-level security the FOSS license is severely behind Postgres in terms of features.
Other than that, it's never gone down, it's very, very fast and comes with it's own webui for querying your data. We migrated from AWS Timestream and couldn't be happier with the switch.
I do sympathize with OP, though, their objection to measuring cold-start queries is incomplete without also describing how often cold start needs to happen. If you restart once every five years then it doesnt matter as much if it takes 20 minutes to be warm. Every hour, that would be a real problem.
I don't think this is an oversight but it is just what they found to be feasible. This is explicitly written in [1]. Also the guy who setup this benchmark is very serious about benchmarking under difficult conditions [2]
My personal opinion is that you need a massive amount of data and massive number of different variables to test for separately. For example you might want to monitor how many cache misses/hits there were, p99 latency etc. And you want to do it under full load, expected load etc. And you want to compare the different versions of the same database because comparing different databases makes things combinatorially more difficult, unless you have a real production use case that you are optimizing for ofc.
The swisstable talk on cppcon is a good example of a useful benchmark and optimization that shows how difficult it is to really asses performance effects of even "small" changes. [3]
[1] https://github.com/ClickHouse/ClickBench#data-loading
Benchmarking is hard, no argument from me!