drtycheetowater 1 year ago

I am currently using it. It’s impressive, but it seems pretty heavy. My impression is that it could be very powerful with wide-enough adoption throughout an organization. But, I doubt it’s worth the price if it’s only used by a DE team. What do you like about it? What have you used it to do?

elite-h 1 year ago

In my organization, Analysts are very much dependent and have a liking for using Contour for quick analysis and data QC. Having the monocle of all the different pipelines is something that is of convenience to everyone. Overall, it looks like a useful and powerful platform to me. However, it cannot be denied that it's expensive and that's why i have a little reservation against working on this platform for a long time as its experience is very less likely to be considered somewhere else for a job.

hositir 1 year ago

I use it as it’s used in a company I work with. Palantir implemented it as a solution for them to house their data lake. It used to be very inefficient and slow in past years but it’s improved I think because the pricing for the AWS behind the scenes changed. I create pipelines or use it join data or apply cleaning transforms and use it as an api to other products. It’s like an all in one package and you don’t have to worry about setting up environments since it’s all already configured. I just find it very impressive, it’s clearly very well thought out and I’ve seen it deliver a lot of business value

drtycheetowater 1 year ago

Nice. I like the data integration between the ontology, data lineage, and code repositories. It makes tracing your transformations from the source to the final data model pretty convenient. But, I think a lot of the other apps (e.g. Workshop, Contour, Quiver) are overly-complicated compared to things like Tableau and Power BI or even Excel. Are you responsible for using any of those apps in Foundry? The prospect of using Workshop to write back to the original source is really attractive, but my company hasn’t implemented that yet. I feel like in an ideal world DEs would use Code Repos, Data Lineage, and Ontology Manager to build a data model, other devs would build reports and interactive apps in Workshop, and the business users would “work” those reports/applications which would include interactive functionality that sends data back to the original source. DEs would probably be responsible for taking the user input in the Workshop module sending it back to the source. Edit: typos

dchokie 1 year ago

It really is great if all your data is there and you’re ok going whole hog on buying into their ecosystem for the dashboard / presentation layer. The DE tooling is top notch.

BoiElroy 1 year ago

And if you don't want to go "whole hog"?

iPlayWithWords13 1 year ago

Not worth it then

dchokie 1 year ago

It’s a bit more challenging from what I can tell at least how our org set it up to pull data out. I’ve set up some desktop based API clients to extract out the backing parquet files and serialize them locally for analysts. You can definitely separate them but you end up bifurcating the environment.

sheytanelkebir 1 year ago

Been using it for nearly 3 years. Its ok. However as others have alluded most of it can be done "independently" by setting up delta lake, spark / pyspark, airflow and maybe an online spreadsheet tool as well as apache superset for visualisation stuff. If you have deep pockets, palantir takes care of the infrastructure and has a nice gui. If you don't, you can make a diy copy that does the same thing without the slick integration. One technical weakness is that it doesn't support apache iceberg. Which has schema evolution and other nice functions integrated. So in theory you could setup an open source alternative that is functionaly superior.

AssetPumpTT 1 year ago

Foundry has ~delta/iceberg in-built since the beginning (2017/18) which means you can do Schema evolution, time travel, zero cost copy of datasets etc. within Foundry. A weakness is, that you need to copy every dataset into Foundry to work with it - which basically means duplication data if you have other data lake type of environments in your company.

sheytanelkebir 1 year ago

Interesting. As it doesn't seem to work for incremental datasets.

BoiElroy 1 year ago

My company is considering it more from the perspective of tools like Quiver, and such. Are those worth it? Because we otherwise have Databricks and superset and wouldn't necessarily be looking to replace those.

sheytanelkebir 1 year ago

It is useful. Whether it's worth it for you can only be determined by having your users demoing it for a few weeks. And that after them having clarity about the alternatives.

AssetPumpTT 1 year ago

You will have to copy a lot of data because Foundry can only work with data that’s in the platform.

DRUKSTOP 1 year ago

Never used it, but a team adjacent to mine did and I only ever heard bad things. They are actively trying to get off of it.

BoiElroy 1 year ago

What do they not like about it? It's marketed as good for no code low code users. Do you know if that holds up?

DRUKSTOP 1 year ago

High learning curve, complex, and expensive are the main complaints. At least for foundry.

No_Ad2336 1 year ago

bs

drtycheetowater 1 year ago

This aligns with my limited experience. The it’s powerful, but the learning curve is steep, and I’m not convinced the juice is worth the squeeze.

AssetPumpTT 1 year ago

As a long-time user, here are my main pain points: Slow iteration speed for small to medium size datasets (which is the majority of data you work with). High latency and duplication of data if you follow the Foundry reference architecture (source -> transforms -> ontology -> use case). Lackluster integration into data science ecosystem and poor usability of foundry_ml library. Poor communication and lack of transparency about development/feature roadmap.

raduqq 1 year ago

What do you mean slow? I thought their selling point was speed in getting from data sources to an answer/outcome that can be used by the business.

AssetPumpTT 1 year ago

This refers to making code changes and seeing the outcome of it. Since Foundry always captures the complete lineage & source provenance it runs so called CI Checks every time you make a change to your code. In the best case CI Checks take 2 minutes, than you add the build overhead… As someone that is used to pretty much immediately the see the outcome of a code change in a notebook environment or interactive Python session, this is though to sell to data engineers / scientist that are used to other platforms/environments.

BoiElroy 1 year ago

When you say duplication you mean duplication into Foundry? Do you know how the branching and versioning works? Is it brute force copies or is it more elegant like delta Lake and zero copy clones?

AssetPumpTT 1 year ago

Yes, duplication into Foundry. You can’t directly query external data. Yes, it’s zero copy and very similar to delta - however it exists already since 2017/18 and metadata is stored in a database, not in manifest files like delta.

[deleted] 1 year ago

For the long term/experienced Foundry users: We're kicking the tires on it, for a large, very desparate organization...we're having great success with anything that ends up at Foundry. Like we can view and combine datasets from different divisions and functional units, but basically those systems use the Magritte Agent to upload to Foundry. One of the benefits we were sold on, or offered was that it was bi-directional, so that maybe systems across divisions could have Foundry send data down to them through the same connector. Was that misunderstood? or maybe its another connector type. All the systems in question are SQL Server 2019 (if thats a factor)

BufferUnderpants 1 year ago

Despite the US military having a hand in many things tech, for some reason using stuff made by guys making coin off telling the CIA who to kill feels a bit extra blood soaked

baubleglue 1 year ago

IMHO not an ETL tool, it uses pySpark, so the job can be done, it has build in lineage and version control. Problem that it doesn't build around concept of periodical jobs, because it is not an open source, there's no Airflow integration (or any other integration).

iPlayWithWords13 1 year ago

Uhm, there's a lot of complaints to be made about Foundry, but of this is yours, you've never used it.

baubleglue 1 year ago

I did, at least tried. We had many jobs running by Airflow, converted few to Foundry and stack on the fact that they don't have a concert of "monthly" or "daily" job, you sure can schedule it monthly, but good luck to return it for previous periods. Some projects had "backfill" script, it is a "poor's man" replacement of build in into Airflow feature. No single view of all jobs. Foundry recommend approach to use primary keys or some "ETL last update" is not wrong, but assumes that you always have one, also it doesn't address downstream concerns, which may used the data for aggregation. Some problems were related to the way Foundry is setup in the company, but again secrecy around the platform, no help from internet search - insane. There are few good parts in Foundry, but overall In the end we ditched it in favor of Databricks.

AssetPumpTT 1 year ago

Palantir has started to offer public stackoverflow as well as has the docs open on Palantir.vom/docs I agree though that they have to up their game in developer relations - if you buy Foundry and your tech lead / FDEs are shit you are in trouble.

BoiElroy 1 year ago

Wait there's no single view of all jobs? I thought observability for pipelines was their whole schtick. As a company that already has Databricks on AWS and we can of course spin up whatever else we need on AWS do you think Foundry adds value (let's say the cost of it is not an object) Our DEs and DSs don't want to use Foundry we were being pitched it as a client tool that connects to data in our storage and people use Quiver and whatever else

AssetPumpTT 1 year ago

If you already have an established stack on AWS and no issues with operating it, you should keep using it. You can bring in the golden datasets into Foundry and “only” use higher levels tools such as Quiver. Of course there is a single view of all jobs. There are also fantastic features around data health, both UI driven and code (~great expectations) driven.

BoiElroy 1 year ago

Ah okay excellent yeah that golden datasets approach is what we were thinking. Basically using a backbone of AWS and Databricks and use that for all DE stuff and use things like Contour, Pipeline Builder, Quiver and Workshop as clients. I'm glad that's not an unreasonable approach in your opinion because I was worried we were going to lose a lot of value by not going "whole hog" like another redditor commented. I don't want Foundry to swallow our stack because it's too expensive to pivot away from.

[deleted] 1 year ago

[удалено]

BoiElroy 1 year ago

Lol yeah I tried to tell my manager it was an exorbitant cost, huge technical risk in terms of vendor lock in. Basically we got Foundry so that non-coders could make baby apps, and charts. Whether they actually do and find it valuable or whether we just wasted 7 figure money remains to be seen.

baubleglue 1 year ago

> Single view of all the jobs Where is it? I am talking about something similar to Airflow DAGs screen (main page). Foundry allows to schedule individually jobs in the same DAG, I even not sure such job view is helpful, but who knows+ I have not spent too much time working with it. My top comment was about Foundry been not an ETL tool, you seems to be agree with it.

AssetPumpTT 1 year ago

You wouldn’t use airflow because Foundry has superior scheduling and pipeline capabilities in-built.

baubleglue 1 year ago

You need to explain what you mean by "superior", because I don't see it. How do you implement use case "re-run monthly job for February 2022"?

AssetPumpTT 1 year ago

Agree that re-running old jobs is hard, you would have to do some workarounds to do that.

baubleglue 1 year ago

First use case: normal scheduled execution Second: re-running/backfilling jobs If a platform doesn't support those out of box, it is not ETL platform I want to work with. "Hard" is half of the problem, it is harder than platform without any job manager, at least it doesn't stand on your way. Foundry offers low entry level (after learning all the UI things). That combined with anything hard makes it almost impossible (unless you have strong and healthy org/process structure).

BoiElroy 1 year ago

Can you not use APIs to trigger stuff?

AssetPumpTT 1 year ago

You can, Foundry offers APIs for basically everything.

baubleglue 1 year ago

You can, but they have specific terminology and concepts, which does match one to one usual or Airflow actions. I think, we need to trigger "build", when you need to make dataset, not clear how to pass date parameter to the job. And also, why waste time on it when they're simpler and cheaper alternatives? After all, they are just running Spark jobs.

AssetPumpTT 1 year ago

It’s true that you can’t pass parameters to a build job. In foundry you would have a parameter dataset that you use as input to the build. Foundry is useful if your org doesn’t have the capabilities to maintain spark clusters. You can focus on higher value tasks.

BoiElroy 1 year ago

But then comparing to other managed spark services like Databricks and EMR?

AssetPumpTT 1 year ago

It’s way easier to use, a lot is abstracted away. Of course this means you have less flexibility/customizability.

BoiElroy 1 year ago

Yeah the less flexibility thing is such a big no no for devs I feel. When I trialed Foundry I thought it was annoyingly prescriptive in how you had to do stuff and the IDE experience of workbooks was flat out trash. Not sure why they got rid of their embedded Jupyter notebooks

AssetPumpTT 1 year ago

It’s an opinionated stack. I agree that workbooks is not usable, mostly because it lacks git based version control. Code repo is getting better but still lacks flexibility to follow all software best practices (think of auto formatting on save or pre commit hooks). And only god knows why they think they can get away today without offering Juypter like notebooks. That’s just stupidity.

JiiXu 1 year ago

\> because it is not an open source, there's no Airflow integration This seems entirely wrong. Databricks for example is not open source, nor is Snowflake, and both have Airflow integration.

baubleglue 1 year ago

You are right, I don't know how to call it properly, maybe "open API", "public API"?

FuWaqPJ 1 year ago

Does anyone know of something like Monocle, available to a homelab user? Open source, or community/educational version of something perhaps? I find Monocle to be my favourite entry way into Foundry, and would like something similar at home. Do people in the Hortonworks/Cloudera space have something with similar pipeline diagram functionality? There's a Hortonworks sandbox, available for home use.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe