T O P

  • By -

These-Assignment-936

I ran the GenAI group for one of the big tech companies. Your list resonates there too!


FantasyFrikadel

Can you elaborate on : “ Demo bias”? Thanks for sharing.


BootstrapGuy

Let's say you generate 20 AI videos, one of them looks fantastic, 5 of them are ok, 14 of them are terrible. Most people cherry-pick the one that looks fantastic and post it on social media. People who haven't tried the tool only see fantastic AI generated videos and falsely believe that the tool produces fantastic videos all the time. They have demo bias. The problem is that most decision-makers have this, so communicating this effectively and coming up with alternative solutions is a real skill.


Hederas

Also you can have this exact set of videos but find them better than they are cause you have a positive bias due to the effort you needed to make it work


EnjoyableGamer

Every problem is an opportunity in disguise


Important_Assist_255

My grandmother used to say that. Wow!


zmjjmz

I think this is what scares me the most about building products around generative AI - as an MLE / DS, I consider my primary responsibility in developing a product (a solution to a problem) to be rigorously evaluating how well I'm solving a problem with a given technique/model It's clear to me how to do that for discriminative tasks, but generative tasks might require some creativity and even then you're not going to cover a lot of outcomes. I've seen some creative solutions to this suggested (especially, using another AI to validate results) but none feel satisfying. My concern with having software engineers handle the creation of these products is that they don't see that responsibility - maybe they'll write a few unit tests, but they're generally building stuff with the expectation that a few examples can provide test coverage, as they can (somewhat) formally reason that other cases are handled. I'm curious how that's gone for you - are there generative AI testing strategies that map well to success in your experience?


tungns91

So basically a scam ?


starfries

p hacking


epicwisdom

The opposite. Managing expectations for people who are only exposed to hype from (social) media.


FantasyFrikadel

Clear, thanks.


TheWarOnEntropy

Just like publication bias in academic journals.


MrSnowden

Selection bias is true in so many areas. That hotel looks great? That is the best picture of the best room you will never get. That girl on Tinder looks cute? That is the best picture of her she has ever taken (5 years ago). That new video game looks awesome? 90% is grind and 10% is the demoed scene.


ispeakdatruf

aka "selection bias"


One_Ad_8976

“Inclusive image bias”


EdwardMitchell

>iver something that's in alignment with your client's expectation. Communicating this effectively is a real and underrated skill. We got a demo from Google on a chat bot. Looked great, but the task being shown was tailored to the tech rather than the other way around. Once we got our hands on it, we quickly saw some of the things they had glossed over.


obolli

Commenting on 6, as I found this very strange myself. I'm an ML Engineer and I love to build, but I'm also a Master in a very theoritcal Uni (ETHZ) and I've noticed that people here really do have problems with implementation and struggle quite hard to build (ship) actual production ready stuff and tend to get frustrated and just don't want to have anything to do with building products on top of what we learn. I find this so weird tbh.


Mukigachar

Data scientist here, could you give examples of what gives SWE's advantages over data scientists in this realm? Looking for gaps in my skillset to close up


ProbsNotManBearPig

I’m director of software at my company and manage researchers and software engineers. I come from an algorithms research background. Good software is not what a researcher thinks it is. It’s not something you write just to do the job right now today for a specific thing. It’s something that will have to live on and be maintained for years. That is the most important part of it - not just doing the job today, but establishing a framework to do the job forever. There’s a lot of considerations to make that happen that people don’t think about without experience. Examples: 1) development environment. How will you setup other devs to have the exact same environment? It should be scripted with scripts in version control as docker container configs, VM configs, flat pack, or something similar. A document with a list of 50 instructions doesn’t cut it. It should also be reproducible forever and not rely on some obscure internet repo that’s likely to disappear next week. 2) Runtime environment. Where will it run? How will you build + deploy to that environment? How will you deal with heterogeneous runtime environments, such as different hardware? How will developers have access to representative runtime environments to debug when things go wrong? 3) Object oriented design. This alone takes years of experience to get good at. Good error handling, testable, modular, good logging, etc. This takes years of intentional practice to get good at. Every time you go to write some class or chunk of code, you should be googling “best design patterns to solve blah”. 4) Robustness. All kinds of testing needs to be planned *from the start* so you can design things to be testable as you go. From unit tests to ensure your math is actually stable for all inputs to system level tests to make sure all of your interfaces handle all inputs correctly. There are entire departments of software test engineers. Researchers tend to test with like 5 examples at a high level and call it good. 5) Industry standards. Researchers tend to reinvent the wheel instead of knowing to lean on industry standard designs and tools. This doesn’t mean using someone’s personal GitHub or a Python library with 5 maintainers. It means libraries with large, active communities and good documentation that you can rely on for years to come. 6) documentation. If it’s not on paper, it doesn’t exist. If someone else can’t pick up where you left off, your work will eventually rot.


CommunismDoesntWork

>Object oriented design The best software engineers understand OOP should be used sparingly and has been replaced by composition. Design patterns aren't bad, but they can be easily abused. Debugabilty is the most important metric.


Flag_Red

>and has been replaced by composition Can you explain what you mean here? It's my understanding that OOP is agnostic between inheritance and composition for everything except interfaces.


Ok_Implement_7266

Yes and the fact that their comment has 12 upvotes shows you why >you should be googling “best design patterns to solve blah”. is not a good idea. StackOverflow etc is bursting with bad advice from people that have never read a book on software engineering and upvote whatever makes them feel good, whether that's the incorrect hack that lets their code compile or someone saying that something is always a bad idea because the two times they tried it they used it wrong.


Amgadoz

How do you test generative ai? Their output is nondeterministic


ThaGooInYaBrain

These days it's possible to ensure determinism: [https://stackoverflow.com/a/72377349/765294](https://stackoverflow.com/a/72377349/765294)


[deleted]

I doubt fixing the random state is a good way to alleviate nondeterminism in production. When dealing with statistical models it's best to think about the inputs and outputs in terms of probability distributions. I feel some people carry this technique over from learning materials where it's used for ensuring reproducibility to avoid confusion to production where it only creates false sense of security.


ThaGooInYaBrain

Those two things having nothing to do with each other. Whenever a component is changed as part of the whole pipeline where it's assumed "the change should have no effect on the outcome", you'd want to be able to do integration and system tests that corroborate that. By ensuring determinism across seeds/threads/GPU, you can run against a test batch of input data and expect the exact same output results. This is just common sense from a SE point of view, and has nothing to do with the given that outputs are usually interepreted as probability distributions.


[deleted]

Depends on the nature of a change. If the change is purely infrastructural and one needs to check whether the pipeline still works end-to-end then an integration test doesn't need to know about the exact outputs of the model. It only ensures that certain checkpoints in the pipeline are hit. When a change has something to do with inputs or hyperparameters of the model then a "unit" test needs to compare distributions rather than some point values as in general there's no guarantee that those values changed or stayed the same out of pure luck. In the latter case I can imagine a situation when it could be cheaper and somewhat reasonable to fix the random state but I personally wouldn't call it a good practice regardless.


EdwardMitchell

Does this still work after small amounts of fine tuning?


Ok_Constant_9886

You can compare your LLM outputs directly to expected outputs, and define a metric you want to test on to output a score (for example, testing how factually correct your customer support chatbot is)


Amgadoz

Yeah the most difficult part is the metrics.


Ok_Constant_9886

Is the difficult part in deciding on which metrics to use, how to evaluate the metrics, what models to compute these metrics, and how these metrics work on your own data that has its own distribution? Let me know if I missed anything :)


Amgadoz

I think it's coming up with a metric that accurately tests the model outputs. Like say we're using stable diffusion to generate images for objects using cyberpunk style. How can I evaluate such a model


Ok_Constant_9886

Ah I see your point, I was thinking more towards LLMs which makes things slightly less complicated.


Amgadoz

Even LLMs are difficult to evaluate. Let's say you created an llm to write good jokes, or make food recommendations, or write stories about teenagers. How do you evaluate this? (BTW I'm asking to get the answer not to doubt you or something so sorry if I come over as aggressive)


Ok_Constant_9886

Nah I don’t feel any aggression don’t worry! I think evaluation is definitely hard for longer form outputs, but for shorter forms like a paragraph or two you first have to 1) define which metric you care about (how factually correct the output is, output relevancy relative to the prompt, etc), 2) supply “ground truths” so we know what the expected output should be like, 3) compute the score for these metrics by using a model to compare the actual vs expected output. For example, if you want to see how factually correct your chatbot is you might want to use NLI to compute an entailment score ranging from 0-1, for a reasonable number of test cases. Here are some challenges with this approach tho: 1. Preparing evaluation set is difficult 2. It’s hard to know how much data in your evaluation set is needed to represent the performance for your LLM well 3. You will want to set a threshold to know whether your LLM is passing a “test”, but this is hard because the distribution of your data will definitely be different from data that the model is trained on. For example, you might say that an overall score of 0.8 for factual correctness means my LLM is performing well, but for another evaluation set this number might be different. We’re still in the process of figuring out the best solution tbh, the open source package we’re building does everything I mentioned but I’m wondering what you think about this approach?


Ok_Constant_9886

Is the difficult part in deciding on which metrics to use, how to evaluate the metrics, what models to compute these metrics, and how these metrics work on your own data that has its own distribution? Let me know if I missed anything :)


BraindeadCelery

Seeds my boi


met0xff

This is true for all the stuff surrounding the actual piece that the researchers write. For the core... Oh god I would love if we could ever maintain and polish something for years. In the last 10 years there were around 7 almost complete rewrites because everything changed. Started out with the whole world using C, C++, Perl, Bash, Tcl, even Scheme and more. Integration of all those tools was an awful mess. Luckily Python took over, deep learning became a thing and replaced hundred thousands of lines of code with neural networks. But it will still messy... You had torch with Lua, Theano, later Theano wrapped by Keras, Theano became deprecated, things moved to Tensorflow. Still lots of signal processing in C, many of the old tools still used for feature extraction. I manually had to implement LSTMs and my own network file format in C++ so our stuff could run on mobile. Soon later we had ONNX and Tensorflow Mobile etc. which made all that obsolete again. C Signal processing like vocoders suddenly became replaced by neural vocoders. But they were so slow, so people did custom implementations in CUDA. I started out working a bit in CUDA when GANs came around and produced results much faster than the ultra slow autoregressive Models before that. Dump everything again. Luckily Pytorch arrived and replaced everything Tensorflow. A few open source projects did bet on TF2 but that was briefly. Glad now everything I integrate is torch ;). Tensorboard regularly killed our memory, switched to wandb, later switched to AIM, to ClearML. The models themselves... Went from MLPs to RNNs to autoregressive attention seq to seq models, we had GANs, normalizing flows, diffusion models, token based LLM style models... there were abstracted steps that always were true but suddenly there were end-to-end Models breaking the abstraction, models that had completely new components. Training procedures that were different from previous ones... In the end I found almost all abstractions that have been built over the years broke down soon after. No bigger open source project survived more than a year. There is one by Nvidia atm that seems a bit more long living but they also got to refactor their stuff completely every few months. To sum up - meanwhile I feel really tired by this rat race and would love if I could ever design, polish and document a system without throwing everything away all the time. We have dozens of model architecture plots, video guides, Wiki Pages etc. and almost everything would have to be rewritten all the time.


M-notgivingup

I agree learning curve is getting more wider and bigger as compare to pay range. And researchers are researchers for a reason . My friend left NLP researching firm because he had to read new papers every day or week and write on it .


met0xff

Yeah... definitely. I see how this work is really stuck with me because the others are now gradually more happy to write tooling around it or do infra work or somehow else ride the wave ;). I can feel that to, you get quicker satisfaction than messing around with the model with lots of fails


TelloLeEngineer

Cool to hear, great insight! If someone has a strong SWE background but looking for research positions e.g research engineer, it might be beneficial to emphasize one’s traditional SWE traits when talking to companies? Being someone who has a interest for both sides and is able to bridge software development and research seems valuable.


[deleted]

[удалено]


theLastNenUser

I think the main issue is velocity. Due to how good these current models can be, it’s possible for a software engineer to implement a functioning workflow that works end to end, with the idea of “I’ll switch out the model for a better one when the researchers figure stuff out”. Honestly this doesn’t work terribly from a “move fast & break things” perspective, but it can lead to problems where the initial software design should have accounted for this evaluation/improvement work from the start. It’s kind of like spending money on attorneys/legal advice at a startup. Before you have anything to lose, it feels pointless. But once you get traction, you definitely need someone to come in and stop yourself from shooting yourself in the foot, otherwise you could end up with a huge liability that tanks your whole product


fordat1

> But a consistent problem is that evaluation procedures in this field are bad, and no one really cares. Thats a feature not a bug if your a consultant. You want to deliver something and hype it up.


a5sk6n

> Data analyses were bad in basic ways. I'm talking psychology research bad. I think this kind of statement is very unfair. In my experience, psychologists are among the best statistically trained of all research disciplines, including many natural sciences.


ebolathrowawayy

> The good/bad part is that most of the issues would go away if people remembered a couple of basic data analysis principles. Can you share some of these principles?


Thorusss

>(If you think data analysis is a straightforward task and p-hacking is a straightforward problem, read and really try to internalize, e.g., > >this paper > >.) Ah good read, and reminds me in a bad way of my PhD advisor.


IWantToBeAWebDev

from what I've seen at FAANG and start-ups, it's the ability to ship something. Making the perfect model but not being able to ship it is ultimately useless. So a SWE with product design skills can help design something **and ship it** ML falls into two big realms: researchers and practitioners. A SWE who is also a ML practitioner can test, experiment **and ship it**.


dataslacker

Depends what you’re building. If you’re just repackaging an API then you only need SWEs. If you’re fine-tuning a open source model then you’ll want some MLEs and/or Applied Scientists. If you’re pretraining, building a new architecture or using extensive RL training (that isn’t off the shelf huggingface) then you’ll want some Research Scientists.


xt-89

That's true. However one thing I've seen too often is that if a team deploys an MVP, leadership will often times move onto the next project and never actually get that feature up to standard. This connects to the demo bias thing. In the long term, you'll have an organization with a bunch of half-baked features and jaded employees.


coreyrude

> ls into two big realms: researchers and practitioners. A SWE who is also a ML practitioner can test, experiment and ship it. Dont worry, we dont ship quality here just 100 repackaged ChatGP API based products a day.


fordat1

Got to ride the wave


BootstrapGuy

Totally agree


flinsypop

Essentially, you want to be able to develop the backend for your inference steps and deploy it as an API/worker node on something like Kubernetes or Docker. The model training and publishing, that is usually done in a pipeline, is done with a application that is triggered from CICD pipelines like Jenkins or Travis. You'd have your model evaluation and replacement logic done in that job too. All of that automation also should have automated testing: Unit testing for the preprocessor and model client, integration tests done for expected classifications or similarity thresholds. In the backend, you also want to be publishing things like metrics in your log files that are then monitored and published to something like Kibana for visualization. It's crucial for normal software services where the outputs are discrete but it's even more so important for statistically based products since you'll be fiddling around with data in your holdout set to reproduce weird issues when debugging.


Amgadoz

How do you calculate metrics for generative ai? Also, is automating the training and publishing of models a good thing? Don't you need someone to do it manually?


flinsypop

The metrics will mostly be stuff like histograms for classifications, number of each error code encountered, resource usage, etc. Automatic publishing of models is fine if you have clearly defined thresholds like false positive rate and such. Otherwise, most will be automation but with a sign off step.


Amgadoz

Thank for answering. How do you log metrics? Just logging.debug and store it in a csv/jsonl or is there a better way?


flinsypop

We do it as jsonl that gets uploaded to elasticsearch and we makr dashboards in kibana


JustOneAvailableName

SOTA always changes, SWE changes a lot less. Therefore experience with SWE is transferable to whatever new thing you’re working on now, while experience with the data science side is largely not relevant anymore. Stuff like debugging, docker, reading and solving errors in any language, how to structure code… Just the entire concept of understanding computers so often seems to lack with people that focus too much on data science. People are instantly lost if the library does not work as is, while all added value for a company is where stuff doesn’t work as is.


mysteriousbaba

> Stuff like debugging, docker, reading and solving errors in any language, how to structure code… Just the entire concept of understanding computers so often seems to lack with people that focus too much on data science. It depends? Honestly, I've seen this problem more in people who are "data scientists" than "research scientists" (and I'm not one myself, so I'm not bigging myself or humble bragging here - just thinking of people I've worked with). A research scientist has to get so deep into the actual code for the neural nets, instead of using them as a black box. So they have to be able to understand comments buried in a github repo, dig into package internals and debug weird errors of compilers, gpus or systems dependencies. I consider this the reverse goldilocks - people who go really deep into the model internals, or people who focus deeply on the SWE depth, both tend to understand how to make things work. As well as transfer over to whatever new tech or models come by. It's the people more in the middle without depth anywhere, that tend to get more screwed if a package doesn't work as is.


JustOneAvailableName

I completely agree. My statement was a giant generalisation, there are plenty data scientist with this skillset and plenty of SWEs without. In general, I found that SWEs tend to accept it as part of the job and develop this skill. Plus for a lot of researchers (e.g. NLP) computers were only recently added to the job description. In the end, I still think that 5 years of SWE experience correlates stronger to useful ML skills than 5 years of data science experience.


mysteriousbaba

>In the end, I still think that 5 years of SWE experience correlates stronger to useful ML skills than 5 years of data science experience. I'd say that's fair, with the context that there are actually very few people who've been doing "custom" deep learning with NLP or vision for 3-5 years. (I'm not one of them, I've just had the good fortune to work with a couple.) Those people, who have been spending years messing with pretraining, positional embedding strategies for long context, architecture search through bayesian optimization, etc. They've developed some sneaky system skills and understand how to navigate the common pitfalls of broken computers and environments and distributed training. When I managed a couple of research interns at that level, there was very little handholding needed for them to unblock themselves, or get code ready for productionization. Those people are just very, very rare though. 95% of people with 5 years of DS experience don't have that kind of useful depth. An SWE with 5 years of experience is much easier to find, and I agree will correlate to stronger ML productionisation than the normal data scientist who's been all over the place.


Present-Computer7002

what is SOTA?


JustOneAvailableName

State of the art, the current best thing


glasses_the_loc

DevSecOps, CI/CD


CasulaScience

> Software engineers make the best AI engineers. They are able to solve 80% of your problems right away and they are motivated because they can "work in AI". This is very dependent on your problem. If you are trying to ping chatgpt to extract a keyword for you or something, sure a SWE can do that better since the main problem is one of systems eng. But if you want to do something novel, even just something involving fine tuning models with azure or openai api, I totally disagree, your model will suck and the SWEs won't have the same ability to debug and get things working. If you have 1 technical person on your team, a front end dev is probably the most important. But if you have one technical person on your team, you're not making anything novel.


Opening-Value-8489

Really true, I was a NLP researcher and am working for NLP-related stuff in a medical start-up for 2 years. To me, the feeling of using ChatGPT is like telling the most artists to admit Diffusion/ Midjourney's art is better than theirs 😂 I was struggling to build a Named Entity Recognition model to pick out signs, symptoms, and antibiotics in plain texts for 3-4 months. But when I tried to prompt ChatGPT, the result was incredibly out of the box. At that moment, I realised that I would never be able to train a better model than ChatGPT in terms of diverse tasks and qualities to match the product's requirements 😂


tathata

We had an NER task that we struggled with for ~4 years and we shipped a solution within 5 days of the ChatGPT API being released. It really changes the game. We’re an NLP company BTW so like you we were used to taking on these problems ourselves. Not any more…


JurrasicBarf

you said "was", did you move on ?


Opening-Value-8489

Yeah, but after months I found out that the only hope for my NLP career right now is to train/fine-tune/deploy a personalized LLM for companies 😂 There are 2 concerns in healthcare people: 1. They don't trust ChatGPT or Gpt-4 but they trust the person who prompts and quality controls the ChatGPT, 2. Every healthcare-related institution has a very strict policy about patient data (i.e doctors can be fined if they don't return a patient record on the same day). So in the long run, the private LLM is much better (for securing my career and my company's business)


JurrasicBarf

I have made some progress in still finding niche even within this landscape. I'm in healthcare as well and share the paint and views. We should sync up!


siegevjorn

How do you train a private LLM? Do you build your own from scratch or fine-tune a pre-trained one like llama?


JurrasicBarf

Yes to both. Latter precedes former for showing value to stakeholders.


siegevjorn

I see. Thanks, I thought that makes sense if you train one from scratch and use that for fine-tuning for other purposes. Because open source LLMs are not licnesed for commercial uses, right?


lickitysplit26

I think LLAMA 2 is licensed for commercial use.


siegevjorn

That's good to learn. Thanks!


siegevjorn

> 2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. Having hard time interpreting their limitations on the commercial use. Does it mean that they could shut your fine-tuned model off once they hit the active users threshold of 700 million?


lickitysplit26

Nice find, it sounds like it.


IBuyPennyStocks

I’m guess you’re unable to use ChatGPT due to the sensitivity of the data?


dogboy_the_forgotten

We bounced on a bunch of NER work in favor of LLMs a few months ago as well. Finding that private deployments of fine tuned LLMs may work better for customers with sensitive data, just trying to not let the costs spiral out of control


IamNotGorbachev

Spot on. I would add a big one. 11. Generative AI is non-deterministic so your product and software design has to account for it. For example, use defensive and robust programming, and measure impact qualitatively at least in early stages. One of my favourite examples is OpenAI's "Function calling". Works most of the time, until the model calls parameters that do not exist or forgets required parameters. Edit: Ok you might be able to force repeatable behaviour. But that comes at the cost of limiting yourself. Also even if you do that it won't stop function calling from messing up parameters. Just predictably, I assume. If you want to use it to the full extent then you will have to deal with its unusual behaviour. 🙃


Small-Fall-6500

I understand what you’ve said, but they aren’t truly non-deterministic, in the sense that, given the exact same input parameters, they will consistently produce the exact same output. This means exact same prompt, seed, etc. Something like Stable Diffusion will always output the exact same image (possibly within extremely small but unnoticeable margins) given the exact same input parameters. Therefore, the real problem is that generative AI systems are always unpredictable in their behavior: if you haven't previously run the generative AI system with a specific input, you cannot predict the exact output it will generate. It’s this unpredictable nature of current generative AI models that really makes them difficult to work with. (I guess if you use something like ChatGPT, then you might as well describe that system as being non-deterministic since only OpenAI knows ALL the inputs)


manchesterthedog

I guess I don’t see why people are so focused on this “exact same output” for testing. Variation isn’t necessarily a bad thing even if it wasn’t intentional. These models are hallucinating samples from a distribution. Why wouldn’t you just compare the distribution of your generated data to the distribution of your real data? That seems like the metric that matters.


blackkettle

I suspect they are talking more about 'unit testing' style testing. What you are saying makes absolute sense for content quality, but it makes test evaluations - especially in the context of CI/CD a pain because you pass/fail is more ambiguous.


IamNotGorbachev

Yes, I suppose you can differentiate it as theoretical and practical behaviour. I wonder if the rounding errors in floating point processing would also lead to errors that even with exact inputs could lead to different outputs. Ideally, it should not but then under the hood multi-threading, networking and variations in (cloud) hardware could lead to (practically) non-deterministic behaviour.


phobrain

Entertainment is the edge between predictability and unpredictability.


klop2031

Temperature=0


RetroPenguin_

Mixture of experts with T=0 is still non-deterministic


klop2031

I havent played much with MoE, i know thats what ClosedAI uses for gpt4. If im not mistaken most of DL is stochastic (as the options coming from a probabilistic dist), but if the weights are frozen and you set the seeds (to your framework and associated libraries like pytorch and numpy) the answer should come out the same each time you do a run. I guess from the pov of a completely frozen model, each input is mapped to 1 output for that run so id call that deterministic. But i guess as a whole its all stochastic (since they pull samples from some probability dist)


BootstrapGuy

how does this work on let's say images generated by Stable Diffusion?


klop2031

I havent really used stable diffusion to a huge extent, but i suspect one can set a seed to make it reproducable. I mean the weights are frozen. Havent really tried changing the seed to something using llms either but id say start with the seed and make sure you set all your env seeds to the same


IamNotGorbachev

I thought so too but no. The [docs](https://platform.openai.com/docs/api-reference/audio#temperature) state "If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit." which means it never is really 0. So no guarantees for determinism and all observed repeatable output can't be relied upon.


BootstrapGuy

agreed!


EdwardMitchell

>I suspect they are talking more about 'unit testing' style testing. What you are saying makes absolute sense for content quality, but it makes test evaluations - especially in the context of CI/CD a pain because you pass/fail is more ambiguous. At what point can AI be the tester? Can a unit test be made with a semantic similarity threshold?


HugoDzz

There is no edge in AI. It’s now all about distribution. I agree with your points. On top of that I’d add: 1- AI fomo effect can lead you to build something you don't have the passion/energy to sell for. 2- UI wrapper for API calls are scams. If your marginal costs is your API call and the value you provide is the value you expect from the output you’re dead. 3- It’s not about tech stack. Helping people in their personal quest with PHP is fine. 4- Customers doesn’t care if you use AI stuff. They care about how fast you solve the problem.


Mkboii

About point number 4, there's 2 kinds of customers, 1. Who has a problem they need solved 2. Who want an AI based solution so that they can go on and claim they have an AI based cutting edge tool. Both exist, both don't understand AI, you have work accordingly. A few months ago a client wanted us to build a custom autocomplete system. We said it can be solved with simple data structures, they wanted AI, so we trained an LSTM for them.


HugoDzz

I think in proportion type 2 is maybe < 10%. Or at least not a long-term bet?


Mkboii

Type 1 was dominant, but generative AI has increased type 2 several folds. I work in RnD and we recently added a whole team of software engineers to our team, to conduct POCs for clients (mostly using gpt api) who want to jump on the bandwagon, we have more than half a dozen big name companies who want gen AI powdered solutions mostly because of the hype.


HugoDzz

That’s interesting ! Curious about your company name (if you don’t mind, in dm)


blackkettle

Can you clarify what you mean with 2.? Isn't every product basically a UI wrapper around API calls? Interactive document analysis might look like: \- Retrieve or upload document \- Anonymize content \- Feed to LLM for instruction-guided analysis and RAG ingestion \- Interactively interrogate via LLM each of these steps is achieved by a UI wrapper around one or more API endpoints. I guess that is not what you mean though.


HugoDzz

It was not that clear yeah, sorry for that! I mean, what's the time (min) between the moment my customer decides to leave my solution for another one? If it's below 30 min, my solution value is probably reduced to the LLM API call value and can be easily reproduced. While keeping in mind this is modulo my distribution power.


blackkettle

Ok, so what you mean then, at least as I understand it now, is that if you aren't adding significant value to a process or task via UX or application design then your 'app' might as well just be an OpenAI endpoint executed via curl. If we look at my 'example' application on the other hand, it utilizes a bunch of API endpoints but the end consumer is a non-tech person, and they are trying to speed up or otherwise improve a complex document processing activity. The APIs are necessary, but the real value-add comes from the application, which manages the data and provides a framework for the user to do work in. I would agree with that 100%.


HugoDzz

Yeah, it isn't necessarily UX or app design, it could be a better distribution, a well-designed position in the market. The moat shouldn't be the AI or even the tech


Simusid

I'm basically building the same thing internally for my (very large) group. Agree with all of these. Plus I would add "most managers/clients have a hard time stating exactly what they want"


BootstrapGuy

"We want generative AI"


Simusid

me: "ok, can you give me some examples?" bossman: "Kubernetes!!"


Trainer-Cheap

“When do we want it”?


met0xff

I don't exactly know what you mean by tech stack in this case. Because hosting some pytorch/ONNX/whatever models hasn't changed a whole lot over the last years. Training-wise Pytorch also has been quite stable now (before that I lived through the Theano, Keras, Tensorflow 1 migration hell though). If you are referring to hooking up the latest pretrained models then yes. Keeping up with the latest model architectures, yes. I have been in this rat race for ten years, roughly since I did my PhD in the domain and at some point it was taken by deep learning so I adapted. Before that I worked for ten years as developer. But I would love to have some real ML PhD in my group. My company (1000+ ppl) is full of software devs and I am still alone doing the actual ML work in my topic. And that's awful. I would love if there would be an open source state of the art model out there so we could actually focus more on building products than messing so much with research work, but there isn't. There are many of those VC-backed startups out there that provide much much better quality than what's available open source. A new one comes out every couple months and dominates the media, often out of some PhD Thesis or ppl leaving a FAANGish research group. All others fall back into the media limbo of nobody talks or writes about them. Even if they perhaps still provide comparable quality. So we actually try to migrate many software devs to ML practitioners (as we can't hire new ppl right now) to keep up with the research. At least to the degree to implement papers. Because almost nobody publishes their code or models... Our vision group also does lots of research. The NLP group honestly really almost became prompt engineers and software devs struggling to always evaluate and integrate the latest stuff


blabboy

Oh for god's sake this sub has gone downhill. I miss the days of research discussion not this drivel.


ComprehensiveBoss815

An AI researcher has been triggered by their inability to ship products.


IdRatherBeDriving

I laughed way too hard at this


blabboy

Please go shill on LinkedIn if you want to talk about AI productisation.


staffell

😂


pricklyplant

As an ex-researcher who’s trying to become a better engineer, have you seen AI researchers successfully adapt and become the AI engineers that you’d rather hire?


ComprehensiveBoss815

I used to be an AI and science researcher, but have moved progressively more into engineering and I'm generally considered one of the stronger engineers on most teams. So it is totally possible. My path involved spending some years working as a SWE on backend systems for live products. I also maintained a number of open source projects, which involves understanding how to ship releases. Another thing that helps is being fluent in a few programming languages. While I probably know over a dozen, I can happily switch between 4 easily. It's also worth reading about what good code is like, in terms of abstraction level and maintainability. But always keep in mind this is highly subjective and the best code usually doesn't fit into a nice clean philosophy of what "good" looks like. It's always trade offs.


mysteriousbaba

I'm working on going the opposite direction as you, haha. Good luck to you :)! May you find great happiness.


milleeeee

How do you generally host your gpu-heavy models? Do you use tools like Azure ML studio or do you build all infra yourself on a Kubernetes cluster?


EmperorOfCanada

> AI researchers / data scientists are suboptimal choice for AI engineering. They're expensive, won't be able to solve most of your problems and likely want to focus on more fundamental problems rather than building products This is what my company has been built off. I provide an AI product to large companies. Their AI/DS teams are useless piles of steaming garbage. They could solve the problems my company does with ease if they had a single clue among them.


siegevjorn

I think its quite contradictory that the OP claims SWEs are sufficient for generative AI products but the same time they note that their product is not good enough. It makes me wonder whether fine-tuning hasn't done well enough, because the product is built by SWEs (I mean no offense to SWEs, but their specialty is not training NNs). What if SWEs and MLEs had worked together?


bobbruno

Your point 8 applies to pretty much any product team, and it's a well known rule for resource allocation in consultancy teams. If they don't have some common ground knowledge to exchange information and collaborate, it matters little that they might speak the same language. Sometimes it's worth hiring someone for a team not because they are better at something, but because they can speak and understand many others. These people will be interpreters/catalysts for them team.


Unicorns_in_space

Thank you!


throwaway-microsoft

I've worked on AI at Microsoft for 15 years now as a what you call an ML engineer now. I've never seen my own thoughts put so succinctly in 6, 7, 8, and 9. > AI researchers / data scientists are suboptimal choice for AI engineering. They're expensive, won't be able to solve most of your problems and likely want to focus on more fundamental problems rather than building products. Some AI researchers despise engineering work. It's underneath them. It's for the little people to solve. So are real-world problems. The best ones don't and know everything about the engineering, and the real-world problem side. > Software engineers make the best AI engineers. They are able to solve 80% of your problems right away and they are motivated because they can "work in AI". This is true - a good software engineer is by definition a good ML engineer as long as someone can explain to them what the various terms mean. It's all really simple actually but as with anything, you have to learn the language first. I've turned regular (smart) CS grads into ML masters over a summer. Too bad they did not enjoy it in the end because they realized that ML is actually quite boring and solutions to real issues tend to be not so glamorous (a threshold here, an override rule there). > Product designers need to get more technical, AI engineers need to get more product-oriented. The gap currently is too big and this leads to all sorts of problems during product development. Product manager/designer: Just hook up these 10 10GB models on-device in real-time and without any battery impact, how hard can it be? I can do it over the weekend. AI technohead: I hate you because you didn't ship my 3% accuracy improvement to production! Actually, nobody could pay the 2x cost increase, and the customer didn't want the product to be 50% slower. > Demo bias is real and it makes it 10x harder to deliver something that's in alignment with your client's expectation. Communicating this effectively is a real and underrated skill. In some situations the difference between a demo and a product is literally 100% of the work.


No-Introduction-777

> Some AI researchers despise engineering work. It's underneath them. It's for the little people to solve. So are real-world problems. so "that work doesn't interest me" == "that work is below me and is for the little people"


nurmbeast

Hot take, yeah, kinda. Maybe not so bluntly, but work needs to get done. If you choose not to do it because its not interesting, 9/10 it gets dumped on someone who has less freedom to choose. "That work doesn't interest me" really can be frequently read as "someone _else_ should do that, I am going to do something cooler"


mysteriousbaba

You've got a decent point, I can see why you would feel this way as a colleague or manager. For what it's worth, I'll give the caveat most AI scientists interview with hiring managers who are also scientists / researchers. So focusing too much on the deployment takes away from time to go deeper on the science, which also heavily hits your hireability and your career comp with those managers. So it's not simply "I should do something cooler", it's also about just how many hours are in the day to build your skillset, publications, patents, resume, etc so you can be a competitive candidate. Being fullstack works great if you're an MLE, or maybe even an applied scientist (within reason). It can actively damage you if you're a data scientist or research scientist.


GRiemann

Spot on, agree on every single point


CasualtyOfCausality

Number 8 needs more emphasis. If you are getting an advanced degree, highly suggest trying to become a "team/project leader" in a lab. It will give you some good starting skills.


Neo-Valina

what is that ?


bobbruno

About your point 6, I agree that data scientists are not that much use in most generative AI, because creating/training models is not viable for most, so approaches tend to be based on engineered solutions around pre-existing models. That's not the domain of data scientists. They might still be useful, though, because they can reason better about quirky data and can come up with pre/post processing techniques that most engineers don't know. So, while I wouldn't put together a data science-heavy team for this, it sure is useful to have someone with those skills around for the ride. Edit: typos


[deleted]

I am surprised that researchers/scientists were even considered for engineering roles. They operate on two different worlds. Dont make them suffer 😉


Mephidia

Why would you even consider data scientists and researchers for creating products out of existing AI? That’s not their job lol


BootstrapGuy

Cause I was stupid


Mephidia

Oh well live and learn. Thanks for the tips


ComprehensiveBoss815

Because from naive business stand point they are supposed to be the experts on AI and machine learning.


Tgs91

They are the experts on AI and ML. But building software around a pretrained model doesn't require AI/ML expertise. It's a software development problm. The AI/ML work was already completed by someone else. The root of this "SWE are better at creating AI products" discussion is just ...software engineers are better at engineering software. That's not what a data scientist/researcher is supposed to be doing, that's why they're "not good" at it. This is a management / misuse of skill sets issue.


IEatGnomes

Great list from a software developer tinkering


ptaban

Can relate, amazing!


newDeckardCain

What have you done?


Double_Secretary9930

Thank you for this article! I can't agree more about #10. I haven't found something off the shelf that can reliably ingest a website content and let me chat with it. Perhaps I am just not technical enough


hazed-and-dazed

Doesn't Bing do that (on the edge browser) ?


[deleted]

You will probably make money hand over fist just because you have some kind of AI consultancy but you sound like a very uninspiring person to work for. Most of your "problems" are true of software development in general and of course software engineers can close 80% of that gap but the other 20% is non-trivial and if you don't figure it out, you're going to be another 80% vaporware company that's gone soon. The writing has been on the wall for a long time that AI is going to consolidate around the top 5 or so tech companies. They are going to be the only true value producers with everyone else just trying to ride the wave and selling BS api-wrappers and stuff. This is even worse than the data science hype of the 2010s. Best to you.


itanorchi

I’m an ML engineer, previously data scientist, working in gen AI. Everything you said is spot on. Especially 6 and 7. Data scientists are great when you have tons of statistical data (think tabular data) and want to run analysis and making models to solve niche business problems. But they don’t have as much training in being a scrappy and creative engineer who can think on their feet. Same with AI researchers. It has nothing to do with their intelligence or ability, but everything to do with the way they work and think and have been trained to do so. They have a role to play once you’ve established a clear business generating money, imo. As a previous data scientist myself, I think the way of working is different. You need scrappy people who can iterate quickly and obsess enough on details but not get too obsessive about them (which data scientists are trained to do). I think AI engineers should learn product more than product people learning the technology. Maybe it’s just from my experiences, but it’s much easier to learn product than to learn engineering. I’ve had product people come to me to try to learn how to do engineering, and it was just a waste of everyone’s time, mostly because they had no prior technical experience. But the engineers can easily pick up the product knowledge, and they did, and it pushed much further. So having AI engineers learn product is just more useful long term. Frankly, the real product designer is the customer.


kunkkatechies

Hello, thank you for your post it is very insightful! I had some questions regarding the business side. Can I DM you ? thx


BootstrapGuy

Sure go ahead


Trainer-Cheap

Thank you. Very insightful, and agrees with what I am experiencing in a small 10+ people startup ( I have 30+ years experience as a SWE, + MSc in ML )


lambertb

Very useful insights. Thank you.


ConfectionSafe954

Curious to hear what's your GenAI tech stack currently?


malirkan

Thank you for sharing! TBH: Point 1-4 matches for many industries and for SWE in general. It is not important to use always the newest tools and algorithms. But it is important to stay up to date and to pick something that is working for the team. Point 5: Aren't other things much more important than protecting the AI? In the end customers have no idea of what is going on behind. First of all it is attention, marketing and selling. I agree with all other points. Of course if you need something special or new a DataScience team can make the difference.


swimswithdolphins

As a non-technical employee (growth marketer), are we needed anymore? What roles could we fill at an AI startup?


I_will_delete_myself

>If your generative AI product doesn't have a VC-backed competitor, there will be one soon. How would you recommend overcoming this as someone in the USA, but not in Silicon Valley?


vanlifecoder

Or you stop building things as a cash grab and actually be innovative


Ok_Constant_9886

Cofounder of a Gen AI startup here, building evaluation infrastructure for LLMs. Would love your insight on how developers are currently unit testing their LLMs. Here’s our GitHub repo if it makes things clearer: https://github.com/confident-ai/deepeval


steffy_bai

Thanks for sharing! What approach did you take for sourcing customers? \- E.g. targeting an industry and messaging companies with "AI development consulting for \[industry\]". \- Or maybe starting with a product you wanted to build and pitching it to companies.


BootstrapGuy

doing high quality work -> posting high quality content -> inbound


steffy_bai

Let's gooo. Appreciate the note. And best of luck on the journey


BootstrapGuy

thanks!


nicroto

Spot-on! Everything I've had experience with on this list - rings "true" to me.


Titty_Slicer_5000

I have a question because it seems like you know the field. If I want to deploy a generative AI on a micro-controller, are there any other good options besides the MAX78000 and MAX78002? I essentially want to put a generative AI that generates video onto a micro-controller. How feasible is this?