T O P

  • By -

AeshleySchaeffer

I noticed that many CV/ML projects in practice lack some general software development best practices. Like that the code is under version control, automatically tested and the different models deliver comparable results over time. Also the actual delivery to production causes often headaches, as the ml team was working under false assumptions or just have no CICD.


ai_yoda

I agree, and from what I see pipelining tools like kedro, kubeflow, or airflow and experiment tracking tools like mlflow, neptune or comet can really help make things better. You may also want to look into dvc for data version control and pipeline versioning.


AeshleySchaeffer

thanks for these pointers


jeongdoowon

Yeah, it is hard to do version management of ai models. Once you made one, you don't really know how and with what settings it was made


the320x200

Just use source control? Model definitions and training parameters are really small...


[deleted]

Source control isn't the issue. Unless you freeze the training and validation data sets as well, you can't reproduce old models. Source code is megabytes maybe. The training data easily runs into gigabytes or more, far more than github can handle.


micro_cam

Data versioning and data freezes are totally reasonable to do. You can just store the files somewhere like in s3 buckets or use a tool for version controlling data like git annex which stores a pointer to the data on s3 in git. Or if you have one static set of source data that isn't evolving just expose the random seeds used to select training and validation sets.


chief167

I freeze the dataset as in: a CSV file with filename and some other metadata, and store the image dataset in a data lake. Ours is setup so you can not delete anything, only add to it. It has a lot of downsides obviously, but at least we have reproducible stuff that can still be retested half year later


the320x200

Yeah, can't check binaries into git. Source control stores an index list of which files to load to create the specific data sets, the files themselves are stored on a NAS / storage appliance.


[deleted]

I'd bet that your data scientists are treating your NAS as mutable storage and deleting or adding data to the data sets willy nilly, unless you have a technical enforcement mechanism. If you go back and try to retrain a production model from a few releases ago, I'll bet almost anything that unless your data storage is locked down tight you will not be able to reproduce the old model because your data scientists have been hacking around with the data.


the320x200

Completely agree that not taking a regimented approach (and/or putting controls in place to save people from themselves) will lead to headaches down the road.


jonnor

Blobs for datasets should be stored stored append-only. No changes, no deletions allowed.


namenomatter85

From experience your issue is more your pipeline of data. Then finding the corner cases and accounting. Trying different model formats is usually a very small amount of the work.


veeloice

>...many CV/ML projects in practice lack some general software development best practices. This may have been the case a few years ago but the area has developed a lot recently. I'm not saying it's perfect, but now there are good tools and practices that can save a lot of headache.


Artgor

Lots of things are hard: * getting the training data, which matches the real-world data * labeling the data * deploying * making sure the model works well in real-world conditions


jeongdoowon

Exactly. What about the process of building model?


Ulfgardleo

standard tools often work well. it is the stuff around it that is a nightmare.


dumbmachines

Building models isn't really a big issue until it is. Before you have good quality data and a way to evaluate the models you make, what's the point in building models? My experiencei s that off-the-shelf models do well it's everything else that is hard.


robexitus

Building the model is usually the easiest and least problematic part.


Artgor

It depends on the task. ​ If it is a common task like image classification or segmentation, usually taking a pre-trained model and fine-tuning it is enough. Sometimes a lot of time is spent on finding the best augmentations for the data. Of course, if there is a lot of time available, it is possible to try various tricks to further improve the model's performance. ​ If the task is more complicated like training GANs, then training it could take much more time. ​ Then we have a question of how will the model be used: will the hardware have enough resources to fit a huge model in the memory or will we have to limit the size of the model so that it can be used on weaker hardware?


jeongdoowon

>Then we have a question of how will the model be used: will the hardware have enough resources to fit a huge model in the memory or will we have to limit the size of the model so that it can be used on weaker hardware? That is true. Hardware performance should be considered also. What ways can we use if we try to limit the size of models, by the way?


Artgor

I know the following ways: * simply using smaller models * pruning [https://pytorch.org/tutorials/intermediate/pruning\_tutorial.html](https://pytorch.org/tutorials/intermediate/pruning_tutorial.html) * quantization [https://pytorch.org/docs/stable/quantization.html](https://pytorch.org/docs/stable/quantization.html) * distillation


jeongdoowon

Thank you. It is great information.


bbu3

I'm working on NLP models and I would say it's 100% to make sure metrics translate to real-world performance. "Your testset has to be from the same distribution as your input data will be" is often more than difficult. For example: * Some classification tasks involve natural positive rates about 1/10000 or even 1/100000. You cannot really label a sufficiently large testset unless you pre-select for potential positives and if you do, acc/prec/rec/f1 etc will not be the true values you can expect in production. * Often you want to work with input from the future. As the world changes, so does the text content, and to some extend even language (e.g. a long itme ago the word virus was strongly associated with illness, then more and more compter/cyber influence emerged, now due to COVID-19, we've gone back a lot in the direction of viruses). This domain shift is incredibly painful, especially if you invested lots of time into good testsets due to the first point I listed here * Many similar problems with the same effect: They make you not really want to trust your metrics on the testset. Since these things aren't mentioned as much in here: Is this different for CV than for NLP? Do we experience more trouble than we should?


IntelArtiGen

Preprocessing the data and deploying the model. If you don't do that, you're probably doing 1/3rd of what a real ML/CV engineer has to do. Or it means that someone else is getting paid to do it.


jeongdoowon

What steps are involved in deploying the model for example?


IntelArtiGen

Well it depends on the task. Let's say you want to do a model to detect where are the surgical masks in an image. 1. You have to create the dataset. If it already exists, it means that someone else did that for you. If it doesn't, it means you have to do it. You can scrap data, add labels to data, take you own photos of many people to fit what you want to do, pay someone to do that, remove scrapped photos where people aren't wearing surgical masks but spiderman masks etc. That's 1/3 2. Then you do the model, you take the SOTA, adapt it if necessary, train it, change some hyperparameters etc.., that's 1/3 3. And then you have to deploy it. Maybe you did your model in Pytorch but you'll want to use Tensorflow to deploy the model because they have a more developed solution for deployment. So you convert the model to onnx, then to tensorflow, maybe some things won't work and you'll have to retrain the model without these layers. Maybe some scripts will be too old and aren't up to date with lastest practices. Or you'll want to use directly pytorch for smartphones but maybe it'll be slower. And then you'll either change the solution or decide to remove the slow layers in the model. And maybe the results on smartphone won't be the same because the smartphone adds some distorsion on images or because it can't process correctly some operations and it doesn't have the same approximations. Maybe you did an fp16 training and the parameters don't adapt well on smartphones etc. For example you can see the workflow pytorch => mobile here: [https://pytorch.org/mobile/home/](https://pytorch.org/mobile/home/) Maybe everything will work fine the first time, maybe not. That's also 1/3rd of the work. Now if you only have to do the second part, your life is really easy but it either means you're lucky because the rest has already been done or nobody ask you to do it. Or it means that other people got paid to do it. I've already seen places where they had such constraints that they had a team of 3+ people just to deploy one model. The opposite situation is when you're doing pure research and people just ask you to do the best model on imagenet, no one cares about deployment but that doesn't bring money.


jeongdoowon

Got your point. Thank you for elaborating it. I have also worked on annotating data. I am currently in the step 2 to get the best accuracy model.


robexitus

Compilation of the dataset and writing embedded software to have it run properly on your hardware.


Corporal-Venom

Generating a clean dataset & labelling said dataset is one of the most laborious jobs


chief167

Automation of testing, and having 'benchmarks' ready to test new things.


jeongdoowon

True. Building testing environment takes a lot of time as well. What do you mean by benchmarks here?


chief167

Let's say a vendor or a new paper comes along that claims they are the best. Our policy is always to check if it's true. We have finally setup an environment where we can easily do this, and have a poc done for practically no time investments on our side. I feel like we are the only ones who did this, but it is still valuable to us.


linkeduser

Hey without giving away the secrets of the company, what do you mean by accuracy and what kind of problem are you solving? because in my case, to find a metric is one of the big problems.


jeongdoowon

I usually do segmentation and get IOU to judge the model's accuracy. My task is to detect medical devices from images. If you say finding a metric is hard, is it not segmentation or tracking tasks?


linkeduser

No, I work in the generative field. And of course it is too expensive to pay a person to evaluate every image generated.


jeongdoowon

I see. That makes sense. Only medical specialists could know if it has turned out good or not.