Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus
blog.oodle.aiHello HN!
My co-founder, Vijay and I are excited to open up Oodle for everyone.
We used to be observability geeks at Rubrik. Our metrics bills grew like 20x over 4 years. We tried to control spend by getting better visibility, blocking high cardinality labels like pod_id, cluster_id, and customer_id. But that made debugging issues complicated. App engineers hated blocking metrics, and blocking others' code reviews was not fun for platform engineers either! Migrations (and lock-ins) were very painful, the first migration from Influx to Signalfx took 6+ months and the second migration from Splunk took over a year.
Oodle is taking a new approach to building a cost-efficient serverless metrics observability platform. It delivers fast performance at high scale. We leverage custom storage format on S3, tuned for metrics data. Queries are serverless. The hard part is how to achieve fast performance while optimizing for costs (every cpu cycle, storage/memory byte counts!). We've written about the architecture in more detail on our blog: https://blog.oodle.ai/building-a-high-performance-low-cost-m...
Try out our playground with 13M+ active time series/hr & 13B+ samples/day: https://play.oodle.ai
Explore all features with live data via Quick Signup: https://us1.oodle.ai/signup - Instant exploration (<5min): Run one command to stream synthetic metrics to your account - Easy integration (<15min): Explore with your data from existing Prometheus or OTel setup.
We’d love your feedback!
cheers
The UI feels _very_ similar to Grafana. Even the dashboard folders look exactly the same to me. I would have thought that Grafana being AGPL woudl specifically forbid this?
Edit: Or maybe the AGPL just requires releasing any code you change? I could be confused.
It’s indeed Grafana. We’ve been maintaing a public fork of Grafana.
Where do you keep the code?
Found it: https://github.com/oodle-ai/grafana
Why do this instead of just build a data source?
Edit: Not to be that guy (but I'm about to be that guy). You have links to grafana.com (which, is your competitor), all over the source in your page. This also lists the version as 11.1.0, which was released 6-21.
All of the versions in your fork repo mention 11.0.0-pre. Did I find the wrong repo, or are you using code that you haven't published?
The reason I mostly care is that this is the sort of reason that good open source projects get closed down, and that makes me a bit sad.
Oodle can be utilized solely as a datasource, but we also wanted to provide a solution for customers who don’t have a visualization platform in place.
Here is the branch we use: https://github.com/oodle-ai/grafana/tree/v11.1.0-oodle-stabl..., which has all the changes we have made in Grafana.
So the vast majority of your fork is just rebranding? Customers get to lose thousands of commits worth of improvements for that?
We are reasonably close to the latest version of Grafana. We periodically pull in new changes.
Cool! The website says “No Lock-In” does it mean that I can bring by own compute and storage?
Also, found a few typos and a broken link, see error report here: https://triplechecker.com/s/xEd4Hp/oodle.ai?v=uxGS1
Thanks for the report - we just deployed fix for the same.
No lock-in means it’s 100% open source (PromQL) compatible. You can swap out vendors or move to self-hosted open source solutions should you need to move away from Oodle. When you migrate out, you get to export all your data, dashboards and alerts. you don't need to make any code changes.
We support bringing your own bucket (BYOB) for large enterprise customers however, you cannot bring your own compute at this time. Our thoughts are along the lines of how Snowflake approached the problem - everything fully managed to keep the operational overhead minimal. https://jack-vanlightly.com/blog/2023/9/25/on-the-future-of-...
I'm wondering something: how is the storage/compactation solved? AFAIK S3 lacks append semantics, so data must be accumulated somewhere else before storing it. Kinesis?
We use a local disk to temporarily stage data before putting it on S3. We have smaller WAL (write ahead log) objects, and a periodic compaction process which creates read-optimized files on S3.
The logo on your main page for oodle.ai is blurry.
Why use a .ai domain? I love LLM but this is a turnoff to me.
We are still early in our journey, and are currently working on leveraging LLMs for incidents and query / dashboard generation.
We do use pre-LLM-era AI and statistical analysis to provide insights and auto create dashboards for alerts (currently in alpha).
We, at Workorb, migrated from Grafana to Oodle and very happy so far. Observability space does need a ground up reimagination and we think Oodle is positioned to do that.
I'm curious, why did you move off of grafana?
For the same reasons op listed or for other reasons?
Love the observability feature here. Would love to see a detailed feature set comparision along on the competetitors landscape
Thanks for the kind words - we will be posting a feature comparison matrix in the upcoming weeks on our website.
Some comparison to Thanos would be great!
Great question! Vijay here, I'm one of the co-founders of Oodle. Compared to thanos 1. We use object store (S3) for all queries - even recent time ranges. Object store is not just an archival solution 2. Customized indexing to minimize memory usage. Index is also on object storage. 3. Custom columnar file format optimized for storing metrics on object storage 4. Serverless functions for achieving good query performance. This helps us break down and parallelise queries without impacting cost with pre-provisioned compute. 5. No downsampling. Downsampling is not required to improve query performance or reduce costs with serverless and object storage
Yup, fan of the LGTM stack + Alloy
I don't know how trademark works or anything like that not a lawyer etc etc but lots of stuff are called oodle. I wish you luck.
This is the one that came to mind for me when I saw Oodle: https://www.radgametools.com/oodle.htm
Same. Oodle is extremely well known in the game dev sphere. It’s literally baked into PS5 silicon for hardware decompression.
Thanks for the heads up. we did check on IP/trademarks just to be sure to avoid violations.
Oodle is a registered trademark:
https://uspto.report/TM/88478792
RAD is now owned by Epic Games (acquired in 2021) so they have very deep pockets.
A lot of people, including myself, were clearly initially confused that there must be some association given you are using this name in a not-entirely-unrelated field.
IANAL but I hope you're real sure that you are legally in the clear before you commit too deeply to the name.
Is it SaaS-only?
If it weren't, then you'd need servers and it couldn't be serverless! :)
"Serverless" is an overloaded term marketing that really means functions-as-a-service. Looking at the stack, I don't actually see any components that you couldn't easily port to an on-prem solution.
This architecture diagram (https://oodle.ai/product#magicbehindoodle) goes into more detail into where we leverage Serverless. For ingestion, we still use dedicated compute, but for queries, we leverage serverless.
yes, it's only fully managed at this time. However, oodle is very cost-efficient, it's cheaper than your self-hosted infra costs. https://oodle.ai/usecases/self-hosted
I would love to see an actual breakdown of oodle vs self hosted costs. I seriously doubt that it’s cheaper.
any plans to open source? I feel very comfortable using neon.tech (separates compute from storage for postgres) b/c they open source their stack but it would be hard for me to adopt something like oodle without an open source version.
I have been meaning to ask the observability experts this question:
Why not dump all metrics , events and logs into Clickhouse ? and purge data as necessary? For small to medium sized businesses/solution ecosystem, will this be be enough ?
It'll work. Clickhouse has even experimental support for storing prometheus metrics natively. A big missing piece is alerting.
ClickHouse is great for logs and traces, however, for metrics, it is still in the early phase. ClickHouse is also a general purpose, real time analytics database. See clickhouse.com. Whereas Oodle is specifically built for end-to-end metrics observability.
Not to be confused with Oodle[1]
[1] https://www.radgametools.com/oodle.htm
Why did you name your startup the same name as the most popular network compression library for video games? This seems short sighted. Even if you don't run afoul of trademark/copyright, you're sharing a lot of SEO and marketing terminology.
Point taken. Thanks for the feedback. Our reasoning is that we’d like the name to be short and memorable. And a bunch of observability companies have observe keyword overloaded all over the place, we wanted our name to stand out. Oodle = Optimized Observability Data Lake.
[flagged]
[flagged]
Our P99 query latency is under 3 seconds, we have tested up to 100M unique time series / hour and the architecture can scale up to billion time series / hour. To get a feel of the performance at high scale, give us a try at https://play.oodle.ai
[flagged]
With our custom columnar format and indexes, we are able to filter relevant data files where high cardinality column is present. This helps us to keep the queries faster for high cardinality labels as well, thus, allowing us to quickly drill down on specific pod_id/cluster_id/customer_id kind of labels.
Why is the primary sales call to action is that it's serverless if it's a hosted solution? Who cares.
Because at the early stages it’s really important to talk to customers.
This also helps find users for whom this is a huge pain point - metrics costs are so high that they’d love to talk to someone and complain about the problem.
“fully-managed, cheap metrics, ideal for serverless applications”
"fully managed, serverless"
So its not really a drop in replacement for prometheus then, its more of a send all your data to some other bloke kind of replacement.
Software as a service is fine, but you dont need to hide it behind hip marketing terminology.
Technically you are correct, the scraper will still exist. However, the hard part is scaling the query and storage layers which we replace.