Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus

blog.oodle.ai

61 points by kirankgollu 5 hours ago

Hello HN!

My co-founder, Vijay and I are excited to open up Oodle for everyone.

We used to be observability geeks at Rubrik. Our metrics bills grew like 20x over 4 years. We tried to control spend by getting better visibility, blocking high cardinality labels like pod_id, cluster_id, and customer_id. But that made debugging issues complicated. App engineers hated blocking metrics, and blocking others' code reviews was not fun for platform engineers either! Migrations (and lock-ins) were very painful, the first migration from Influx to Signalfx took 6+ months and the second migration from Splunk took over a year.

Oodle is taking a new approach to building a cost-efficient serverless metrics observability platform. It delivers fast performance at high scale. We leverage custom storage format on S3, tuned for metrics data. Queries are serverless. The hard part is how to achieve fast performance while optimizing for costs (every cpu cycle, storage/memory byte counts!). We've written about the architecture in more detail on our blog: https://blog.oodle.ai/building-a-high-performance-low-cost-m...

Try out our playground with 13M+ active time series/hr & 13B+ samples/day: https://play.oodle.ai

Explore all features with live data via Quick Signup: https://us1.oodle.ai/signup - Instant exploration (<5min): Run one command to stream synthetic metrics to your account - Easy integration (<15min): Explore with your data from existing Prometheus or OTel setup.

We’d love your feedback!

cheers

CubsFan1060 2 hours ago

The UI feels _very_ similar to Grafana. Even the dashboard folders look exactly the same to me. I would have thought that Grafana being AGPL woudl specifically forbid this?

Edit: Or maybe the AGPL just requires releasing any code you change? I could be confused.

kirankgollu 2 hours ago

It’s indeed Grafana. We’ve been maintaing a public fork of Grafana.
- CubsFan1060 2 hours ago
  
  Where do you keep the code?
  Found it: https://github.com/oodle-ai/grafana
  Why do this instead of just build a data source?
  Edit: Not to be that guy (but I'm about to be that guy). You have links to grafana.com (which, is your competitor), all over the source in your page. This also lists the version as 11.1.0, which was released 6-21.
  All of the versions in your fork repo mention 11.0.0-pre. Did I find the wrong repo, or are you using code that you haven't published?
  The reason I mostly care is that this is the sort of reason that good open source projects get closed down, and that makes me a bit sad.
  - mvijaykarthik an hour ago
    
    Oodle can be utilized solely as a datasource, but we also wanted to provide a solution for customers who don’t have a visualization platform in place.
    Here is the branch we use: https://github.com/oodle-ai/grafana/tree/v11.1.0-oodle-stabl..., which has all the changes we have made in Grafana.
    
    sosodev 23 minutes ago
    
    So the vast majority of your fork is just rebranding? Customers get to lose thousands of commits worth of improvements for that?
    
    mvijaykarthik 17 minutes ago
    
    We are reasonably close to the latest version of Grafana. We periodically pull in new changes.

TripleChecker 2 hours ago

Cool! The website says “No Lock-In” does it mean that I can bring by own compute and storage?

Also, found a few typos and a broken link, see error report here: https://triplechecker.com/s/xEd4Hp/oodle.ai?v=uxGS1

kirankgollu an hour ago

Thanks for the report - we just deployed fix for the same.
No lock-in means it’s 100% open source (PromQL) compatible. You can swap out vendors or move to self-hosted open source solutions should you need to move away from Oodle. When you migrate out, you get to export all your data, dashboards and alerts. you don't need to make any code changes.
We support bringing your own bucket (BYOB) for large enterprise customers however, you cannot bring your own compute at this time. Our thoughts are along the lines of how Snowflake approached the problem - everything fully managed to keep the operational overhead minimal. https://jack-vanlightly.com/blog/2023/9/25/on-the-future-of-...

estebarb 2 hours ago

I'm wondering something: how is the storage/compactation solved? AFAIK S3 lacks append semantics, so data must be accumulated somewhere else before storing it. Kinesis?

mvijaykarthik an hour ago

We use a local disk to temporarily stage data before putting it on S3. We have smaller WAL (write ahead log) objects, and a periodic compaction process which creates read-optimized files on S3.

infecto 3 hours ago

The logo on your main page for oodle.ai is blurry.

Why use a .ai domain? I love LLM but this is a turnoff to me.

mvijaykarthik an hour ago

We are still early in our journey, and are currently working on leveraging LLMs for incidents and query / dashboard generation.
We do use pre-LLM-era AI and statistical analysis to provide insights and auto create dashboards for alerts (currently in alpha).

nileshbansal 3 hours ago

We, at Workorb, migrated from Grafana to Oodle and very happy so far. Observability space does need a ground up reimagination and we think Oodle is positioned to do that.

ijidak 3 hours ago

I'm curious, why did you move off of grafana?
For the same reasons op listed or for other reasons?

suntracks 3 hours ago

Love the observability feature here. Would love to see a detailed feature set comparision along on the competetitors landscape

kirankgollu an hour ago

Thanks for the kind words - we will be posting a feature comparison matrix in the upcoming weeks on our website.

alanfranz 3 hours ago

Some comparison to Thanos would be great!

mvijaykarthik 2 hours ago

Great question! Vijay here, I'm one of the co-founders of Oodle. Compared to thanos 1. We use object store (S3) for all queries - even recent time ranges. Object store is not just an archival solution 2. Customized indexing to minimize memory usage. Index is also on object storage. 3. Custom columnar file format optimized for storing metrics on object storage 4. Serverless functions for achieving good query performance. This helps us break down and parallelise queries without impacting cost with pre-provisioned compute. 5. No downsampling. Downsampling is not required to improve query performance or reduce costs with serverless and object storage
verdverm 2 hours ago

Yup, fan of the LGTM stack + Alloy

navjack27 4 hours ago

I don't know how trademark works or anything like that not a lawyer etc etc but lots of stuff are called oodle. I wish you luck.

alias_neo 3 hours ago

This is the one that came to mind for me when I saw Oodle: https://www.radgametools.com/oodle.htm
- forrestthewoods an hour ago
  
  Same. Oodle is extremely well known in the game dev sphere. It’s literally baked into PS5 silicon for hardware decompression.
kirankgollu 4 hours ago

Thanks for the heads up. we did check on IP/trademarks just to be sure to avoid violations.
- bayarearefugee 2 hours ago
  
  Oodle is a registered trademark:
  https://uspto.report/TM/88478792
  RAD is now owned by Epic Games (acquired in 2021) so they have very deep pockets.
  A lot of people, including myself, were clearly initially confused that there must be some association given you are using this name in a not-entirely-unrelated field.
  IANAL but I hope you're real sure that you are legally in the clear before you commit too deeply to the name.

thinkmassive 3 hours ago

Is it SaaS-only?

bradfitz 3 hours ago

If it weren't, then you'd need servers and it couldn't be serverless! :)
- mike_d 3 hours ago
  
  "Serverless" is an overloaded term marketing that really means functions-as-a-service. Looking at the stack, I don't actually see any components that you couldn't easily port to an on-prem solution.
- kirankgollu 3 hours ago
  
  This architecture diagram (https://oodle.ai/product#magicbehindoodle) goes into more detail into where we leverage Serverless. For ingestion, we still use dedicated compute, but for queries, we leverage serverless.
kirankgollu 3 hours ago

yes, it's only fully managed at this time. However, oodle is very cost-efficient, it's cheaper than your self-hosted infra costs. https://oodle.ai/usecases/self-hosted
- sosodev 19 minutes ago
  
  I would love to see an actual breakdown of oodle vs self hosted costs. I seriously doubt that it’s cheaper.
- someguy101010 3 hours ago
  
  any plans to open source? I feel very comfortable using neon.tech (separates compute from storage for postgres) b/c they open source their stack but it would be hard for me to adopt something like oodle without an open source version.

manishsharan 3 hours ago

I have been meaning to ask the observability experts this question:

Why not dump all metrics , events and logs into Clickhouse ? and purge data as necessary? For small to medium sized businesses/solution ecosystem, will this be be enough ?

iampims 3 hours ago

It'll work. Clickhouse has even experimental support for storing prometheus metrics natively. A big missing piece is alerting.
- kirankgollu an hour ago
  
  ClickHouse is great for logs and traces, however, for metrics, it is still in the early phase. ClickHouse is also a general purpose, real time analytics database. See clickhouse.com. Whereas Oodle is specifically built for end-to-end metrics observability.

utf_8x 2 hours ago

Not to be confused with Oodle[1]

[1] https://www.radgametools.com/oodle.htm

devmor 2 hours ago

Why did you name your startup the same name as the most popular network compression library for video games? This seems short sighted. Even if you don't run afoul of trademark/copyright, you're sharing a lot of SEO and marketing terminology.

kirankgollu 2 hours ago

Point taken. Thanks for the feedback. Our reasoning is that we’d like the name to be short and memorable. And a bunch of observability companies have observe keyword overloaded all over the place, we wanted our name to stand out. Oodle = Optimized Observability Data Lake.

neetip 3 hours ago

[flagged]

sidd22 2 hours ago

[flagged]

kirankgollu an hour ago

Our P99 query latency is under 3 seconds, we have tested up to 100M unique time series / hour and the architecture can scale up to billion time series / hour. To get a feel of the performance at high scale, give us a try at https://play.oodle.ai

shivarevanth 2 hours ago

[flagged]

kirankgollu an hour ago

With our custom columnar format and indexes, we are able to filter relevant data files where high cardinality column is present. This helps us to keep the queries faster for high cardinality labels as well, thus, allowing us to quickly drill down on specific pod_id/cluster_id/customer_id kind of labels.

weego 3 hours ago

Why is the primary sales call to action is that it's serverless if it's a hosted solution? Who cares.

memset 2 hours ago

Because at the early stages it’s really important to talk to customers.
This also helps find users for whom this is a huge pain point - metrics costs are so high that they’d love to talk to someone and complain about the problem.
- beaker52 2 hours ago
  
  “fully-managed, cheap metrics, ideal for serverless applications”

protocolture 3 hours ago

"fully managed, serverless"

So its not really a drop in replacement for prometheus then, its more of a send all your data to some other bloke kind of replacement.

Software as a service is fine, but you dont need to hide it behind hip marketing terminology.

mvijaykarthik an hour ago

Technically you are correct, the scraper will still exist. However, the hard part is scaling the query and storage layers which we replace.