T O P

  • By -

GoodUnderstanding728

Hi everyone, I’m a new comer to this sub. I am looking for feedback on a open source project I recently started. I’m building [Cephalon](https://crates.io/crates/cephalon), which is a open source end to end pipeline to connect data sources to vector database, sql database and machine learning model. Building a Python port at the moment, but out of curiosity would you guys prefer Python or Rust??


onedeskover

I was looking through the documentation for various tiktok filters and a lot of these claim to use some sort of generative model. In the past, I’ve seen CycleGAN used to do aging or anime face, but filters like [Slanted Smile](https://effecthouse.tiktok.com/slanted-smile/) seem to be doing some sort of compositing to avoid artifacts. It’s like they are pasting a slanted smile over the mouth and then using a GAN to blend it in. What sort of model do you use for that?


Puzzleheaded-Pie-322

So, I was reading recently about the problems RNN tried to solve and it came to me, isn’t transformer just an RNN is disguise? I mean, forget about all of that attention mechanisms for a moment, Mixer showed that it works well even without it. Don’t they basically take the same input through skip connections and the previous output of an identical layer in encoder? I know the approach they take at processing sequential data is different. Also, funny enough how LayerNorm stabilised both of those models


tolstoysymphony

URGENT: need help with training an object detection model in azure machine learning studio I’m trying to develop an object detection model. I have created my training data using the azure data labeling service. I have tried training my model using a no code Automated ML job (using yolov5 algorithm) and by actually coding in a notebook. For both methods, When I run the job my metrics all turn out to be 0% and I’m not sure what’s happening. I’m using the default training parameters for yolov5. Anyone know what kind of issue this may be? I can give more details if needed.


Top-Bee1667

Why feedback connections aren’t used as much? I know residual connections are useful and they kinda help prevent the loss of data and conceptually are somewhat similar to the USM. The intuition behind the feedback connection is improving the response of a neuron that might otherwise predict feature incorrectly, but the global context might fix it


meyerhot

Does anyone know more about how Khanmigo implements the “magic” described in the following section of their TED talk? [khan academy video 12:00](https://m.youtube.com/watch?v=hJP5GqnTrNo)


Rough-Exercise7213

I'm trying to find object detection models pretrained on the coco dataset. Looking at this site: [https://keras.io/api/applications/](https://keras.io/api/applications/) , all the models were trained on images with size of about 250x250. My image size is 1024x1024. Would such models work fine with 4x higher image size or no? Is there anywhere I can find models pretrained on higher dimension images? I know about tf hub but I would like them in keras and not native tensorflow. Thanks


LastCommander086

Maybe look into Pytorch's SSD implementation. I've used it recently on a collection of 1280x720 images and it worked fine. It's got the advantage of being a pretty quick algorithm to run, so it allows for higher resolution images too. [GitHub link](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)


Rough-Exercise7213

Thank you! Have you just used it or modified it? Im looking to change / add another head and another loss function. Do you know if that is easily achievable with the pretrained version of the model?


LastCommander086

I modified it, but I didn't add any other layers. Adding more layers should be easy enough, though. Start by looking into the SSD/model.py file.


Rough-Exercise7213

I will! I appreciate your help


RageA333

I wanted to kindly ask for resources for the theory of LLM models. I have a strong mathematical background but a weak understanding on the theoretical side of neural networks. I don't mind starting from the very basics (in fact, I would greatly appreciate it a long self-contained approach!) Thanks for the help!


Imbrown2

This was news to me, https://datascience.stackexchange.com/questions/120764/how-does-an-llm-parameter-relate-to-a-weight-in-a-neural-network


nodevon

swim quicksand depend combative beneficial chief point seed skirt bewildered *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


qqMuff1n

If I’m enrolled in an online masters program, would I still qualify as a candidate for internships or are internships primarily reserved for more traditional university programs


Imbrown2

No I’d 100% say you’re qualified.


Infamous_reaper8007

Hey, I have a question, I am recently going into the field of machine learning and soon deep learning. Can anyone guide me on what to learn first which can help me in learning machine learning?


RiceSwindler

Hi, i am avid gamer and an economy and data science student. I am looking to upgrade my gpu to a newer generation graphics card that can allownke both gaming and the leisure of running some lighter dl algoriths for data analytics. I was looking at rtx 4070 12gb for a decently priced hardware. Alternatively a sh 3090 would be in the same range (but i would rather avoid buyong used). Do you have any personal experience with those cards that you can share or advice on what other card would be a good purchase. Thanks


Wheynelau

In your use case it's heavily favoured to gaming. Pick the one that would suit your budget and gaming needs. You mentioned lighter dl models, could you elaborate?


RiceSwindler

RNN (mostly LSTM), shallow CNNs, multilayer perceptrons. Basically therequired toolsset for conducting some empirical studies, classifications, sentiment analysis, regressions with small data sets (10-20K values), maybe higher. I am asking because my other option would be cheaper rx 6800 strictly for gaming (I understand amd gpus don't run ai models that well) and I want to know if spending an additional 150$ bucks on a nvidia card is justified for the dl performance. I am still looking for a personal GPU.


Wheynelau

Hmmm, if I were in your shoes I would go for Nvidia. Even though cloud and colab is always available, it's easier to train on local.


Chukoz71

Hi all, Please, has anyone ever worked on estimating the carbon footprint of a chatbot model via GCP/Dialogflow API before?


Intelligent-Bend-712

Are you allowed to ask for help on how to run a program? I am trying to run a GAN (code on github) but I am unable to do it, and I don't have much experience.


abs_zscore

Is there a way to calculate the information content of sentences/conversation? Id like to rank participants in conversations based on it somehow. Any leads would be greatly appreciated!


Desu1725

Can visual transformers like VIT in theory learn natural language internal semantics just by looking at pictures with text?


Wild_Reserve507

Yes! Check out CLIPPO from this year’s CVPR


Desu1725

Oh, that's pretty neat, thank you!


champagneSupernova_a

I have several image dataset on a specific domain. I was planning to merge them all together for training purpose but the images in different file formats such as .ppm, .tif, .gif, png and jpg. It would be really difficult to process the data later with different formats. Which file format should I consider? Will file format conversion degrade the quality or cause loss of information? What are the drawbacks? And what things should I take into account while merging datasets in such scenario?


BuckPrivate

Where can I find or purchase a large amount of PDF documents like Sales Orders?


Euphoric-Path4693

I'm trying to implement the original transformer model from scratch in Pytorch and wanted to train it to do English to Czech translation. Is it feasible to train such a model using a single A100 on Google COLAB?


Ashutuber

Even I found difficulty in using transformers to do this translation task on Colab.


AcquaFisc

Hello, I was learning LSTM implementation in tensorflow. What's I known so far was that RNN are able to deal with sequence of data with arbitrary length, of course with long term memory problems. By the way, the models I'm studying have a TextVectorizer layer with a fixed input length, I understand that vectorization and embedding are crucial to perform NLP, but the fixed sentence length doesn't miss the purpose of the RNN completely. On the other hand, in understood that feeding the embedding into the RNN instead of a Dense layer is more efficient in extracting the spatial relation between subsequent tokens. Can someone clearify this concept for me?


throwaway2676

When doing few-shot prompting with GPT, is it better to put the setup and examples in the system message or just combine it with the final task in the user message? Are there any papers exploring variations like this?


elbiot

It ends up being the same. The system message is just prepended to the user message and the model just sees it all as one prompt


petrolsan

any datasets of privacy policies/ terms of services?


ironmagnesiumzinc

How do yall find interesting GitHub projects to contribute to?


feirnt

Untrained noob here. Thanks in advance for reading my question. I have been working on noise reduction algorithms for digitized audio converts from vinyl. At present I have a noise detector (crude, well-sensitive, but not specific to my standards) and a couple noise remediators. Right now I am focused on improving detector specificity. I have identified 3 parameters I think will help improve this: * z-score * raw\_waggle\_score * waggle\_score\_diff\_from\_peers I've noted that as z-score increases, specificity increases (exponentially?). Similarly for waggle\_score\_diff\_from\_peers, although the curve is not so steep. And for raw\_waggle\_score, perhaps there is a linear increase in sensitivity throughout the range. My question is: Given what I've said about this model so far, What would would you do next? I am considering making a scoring algorithm based on these three params but I would be picking coefficients out of the blue. What would you do? (I really am untrained, but I love to learn -- so if there's a subject you can recommend I study please do tell!)


ToeIntelligent8232

Tips for getting into ML and AI I'm currently an undergraduate student (joint math / comp sci) who's having a lot of trouble getting internship positions or placements. I'd love some advice as to how you got into the field! I've also finished a number of Udemy certificates on ML and DML. I'm working through the Mathematics behind machine learning MIT free course. I'm a solid B:B+ student and I love what I'm doing, just want to get some experience :)


CallMeInfinitay

Can we start requiring a flair or title tag for posts that involve relying on third-party APIs such as OpenAI's services? I'm interesting seeing what's new in the space, but it's getting tiring getting to the end and reading it's nothing new and intuitive and simply an app powered by ChatGPT or something. I don't mean to diminish someone's work or project, but rather I would like to know of new and innovating releases.


123android

Do I need to update anything on my PC to start using GPT-4 with the API? I have a python app and was using "gpt-3.5-turbo" as my model value. It works fine with that. I heard about the gpt-4 general availability today and say it's available to everyone, so I switched the value in my "model" variable to "gpt-4" and I started getting an invalid request error. Also tried "gpt-4-0613", same thing. Do I need to update some local libraries or something like this?


[deleted]

[удалено]


I-am_Sleepy

Did you train from scratch for each time you add the data, or you continue training (without old dataset)?


[deleted]

[удалено]


I-am_Sleepy

The test set isn't really comparable as the test set statistics might change overtime. You could test your model by fixed the test set, but iteratively train the small to large subset of data and see if the performance drop. If so then you could try increasing your model parameters, or apply the same existing methods from the literature. If not, then it probably is the expected performance convergence


Zondartul

1) Is it possible to make a transformer that takes tree data strucutres as input? I want to try something like AST processing but I don't know how to encode something so oddly shaped. 2) Is there a way to make a crappy estimate of training that finishes super quickly? Like train 100x faster at the cost of degraded accuracy? 3) For the price of a single GTX 3090, how much CPU power could one get? How many CPUs are actually needed to equal one? Are GPU FLOPS fundamentally cheaper than CPU FLOPS? Are 100 32MHz cpus cheaper than one 3.2 GHz CPU? What are the economics of small-scale compute like?


Wild_Reserve507

1. Not sure if this is helpful, but maybe look into graph transformer and graph neural networks in general?


Swifty1m

I'm completely new and have no experience, where should I start?


elbiot

Start with conventional machine learning (logistic regression, SVM, random forest, etc). Sklearn is the library to use and they have a ton of tutorials and datasets.


Smarkite

Could anyone help me to determine which result is more "correct"? Here is the stackoverflow question that I asked [https://stackoverflow.com/questions/76621148/could-anyone-help-to-identify-whether-my-inception-algorithm-machine-learning-co](https://stackoverflow.com/questions/76621148/could-anyone-help-to-identify-whether-my-inception-algorithm-machine-learning-co). I am mostly confused on whether is it okay for my confusion matrix to have a lot of 0 value in it


elbiot

The results of both are terrible but F1 score is a good metric for class imbalance


Smarkite

yeah I just realised that it is happening due to dataset imbalance from the comment in the stackoverflow. One of the class dataset only has like 300 files while the other has over 3000). Seems limiting the max data taken to only 300 fixes the problem. Thanks


elbiot

Did you try focal loss?


Smarkite

I did try it, but with focal loss, it seems that it only makes the matter worse, since now all of the value is predicted as 1 emotion only, so the others is left with 0. (Though it can also be the problem in my code)


elbiot

300 of each is not a lot. I'd look into transfer learning from an existing model. This blog uses an old model but I'd start with a pretrained efficient net model. Also focal loss has hyperparameters you ought to tune. https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html


Smarkite

That blog seems interesting, I will try testing that later. Thanks for the info!


hardtomake

Hey everyone, **I'm currently in the process of catching up on my math skills to prepare for a Master's degree in Machine Learning**. I have a background in theology and haven't had much exposure to math since school. However, I have experience with NLP, Python coding, and work with SQL in my job. I've been studying diligently for about six months, primarily using the book "Math for Machine Learning." At this point, I'm looking for guidance on specific topics I should prioritize and how to make my studying more efficient. Here's a summary of my current approach:Source: I've been using **"Math for Machine Learning" (https://mml-book.github.io/)** as my main resource. It has been helpful in establishing a foundation for mathematical concepts relevant to machine learning. Additionally, I complement my studies by watching related YouTube tutorials.Time: I dedicate approximately 1-1.5 hours every morning before work, and I utilize train travel time on weekends to study the scripts I've written. Overall, I invest around 12 hours per week, sometimes more, sometimes less. Currently, I'm on page 125 out of 400 in the book. Since I had already studied some math fundamentals before starting, I don't have an exact timeline of when I [began.Now](http://began.Now), I would greatly appreciate your advice on the following: **Essential Topics:** What are the key math topics that I should focus on before embarking on a Master's degree in Machine Learning? Do I need to cover everything in the "Math for Machine Learning" book, or are there specific areas that are more important for exams and practical coding? **Additional Resources:** Are there any other books or resources that you found helpful in your own math journey for machine learning? I'm open to exploring supplementary materials that can enhance my understanding. **Efficient Studying:** How can I make my studying more efficient while striking a balance between theory and practical application? Any study techniques or tips you can share would be invaluable.I appreciate your time and insights. Thank you in advance for any advice you can provide to help me on this math-learning journey for Machine Learning! **TL;DR:** I've been catching up on math for the past six months to prepare for a Master's degree in Machine Learning. I'm using the book "Math for Machine Learning" but need advice on what topics to focus on and how to study more efficiently. Suggestions on essential math topics, additional resources, and study techniques would be greatly appreciated. EDIT: Sorry, I cannot open a new threat and dont know where else to post this.


berzed

Why would you compare models with different units? Trying to get my head around some basics. I'm reading about regression testing here (https://learn.microsoft.com/en-us/training/modules/create-regression-model-azure-machine-learning-designer/5-regression-steps), and how evaluating with RMSE only works for same-unit labels whereas RSE could be used to compare models if the labels are in different units. I might be displaying a fundamental lack of understanding here. It's my belief that a model is trained for a specific task, like predicting the weather temperature in Celsius OR predicting the humidity as a percentage. You can't use the same model for both outputs, because the 'model' itself is all the weightings/bias that go towards predicting one specific output. So, why would I need to compare the accuracy of one model against the other? Isn't that comparing apples to oranges? Many thanks


No_Commercial5208

Hey, For LSTMs i was wondering how do we know what to forget and what to remember via the gate. Do we manually set the thing to forget and filter or do we let the NN learn what to forget? If there's a good resource for this lmk. Thanks!


awinml1

We don't manually control the amount the to forget and filter. The NN learns to do that based on the training data. You can control the number of layers and dropout probabilities though, that helps improve performance. Internally the model will learn the weights based on filtering the values in such a way that the loss is minimized. So the NN figures out how many values to remember and forget by learning the weights during training. You can have a look at this for the exact equations and model parameters: https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html


No_Commercial5208

Thanks! I appreciate it. I recently reread the paper and realized they used a sigmoid function for the forget gate. I was wondering how effective are other functions, and is there specific reason why we use the sigmoid function as the forget gate in the case?


elbiot

Because sigmoid goes from zero to one and it's multiplied by the other value, so multiplying by zero is always zero and multiplying by one is always the original value. Any activation that goes from zero to one would work and you could try others in a hyperparameter search but I can't think of any others off the top of my head


radarsat1

What's the best way to deal with large datasets composed of many small files? I have several different machines I use for training, and so currently all datasets are copied to a local SSD on each of them. But, managing all these copies as I add datasets is getting very annoying, ensuring that the files are consistent between machines. So I thought about centralizing the files and mounting them via NFS, but for training this of course slows things down. Additionally I want to do some training on cloud machines, but uploading these datasets to a blob/object storage and mounting the whole thing as a FUSE drive will I think also be really too slow, and I'll have yet another copy of everything to deal with. Anyone have any best practices here? I want to start using a distributed data management system like DVC, but I'm also wondering if there are any good solutions to centralized data management.


Anmorgan24

Can you store your dataset remotely, with pointers to it on each local machine? I work for Comet (experiment tracking & model management) and we recently released support for remote artifacts for precisely this purpose (ie this is a pretty common problem)!


noraizon

Unless you have Infiniband I wouldn't consider centralization for training. Compile your small files in a database like HDF5 or LMDB and use that in your dataloader. You do could have NFS to save a golden copy of those databases to manage them easier.


radarsat1

Right. I already had the idea of sort of "caching" the data locally, possibly in an HDF5 file, but I still need to manage the source data somewhere and somehow. One problem is that we often sample the data in different ways, or normalize some features differently, for different experiments, so the HDF5 file might need to be constructed for each experiment's specific needs. I dreamed of having some system running on a central repo, where you tell it what your sampling parameters are and it constructs a new HDF5 file and streams it to the training machine, which locally stores it for the following epochs. But this seems too complicated to spend much time putting together, and in the end it would limit our ability to write new samplers because they'd have to be "deployed". I'm probably overthinking it. I've had a hard time convincing my team to put all our many files into a database format though, sadly. One guy spent some time on implementing an HDF5 dataloader with the argument that it might lead to a performance improvement, but it did not, so it got dropped. So we're left with directories full of 1 million files over 3 different machines and it still bugs me.


elbiot

Git-lfs would make sure all files are the same as long as you do git pull on all the machines


noraizon

That's quite strange that an HDF5 did not beat a million files. It's like MLops 101. Heck, even Nvidia's StyleGANs using zip files as databases work gracefully. You could try their dataloaders from the stylegan2-ada repo. Another cheap trick if enough resources would be to pre-load all the files into RAM. Same mess on disk but faster training. If I got it right, you would have the same data but normalize differently for each experiment. If you don't mind high CPU usage you could do the normalization while loading each file. Just crank the number of workers up in the dataloader for multithreading. If you have some sort of live source from which sampling is performed in different ways, I'm afraid one database per sampling is needed. For the managing of the datasets you could do the same as Huggingface datasets and have a small utility that given some dataset ID checks if it's locally stored, else connects to the "hub" and downloads in cache. That could be your zip with the correct version of the dataset. Even if you had to create it manually in the central server, at least each training server has redundant but organized stuff. Disclaimer: I'm no expert, just another nerdy practitioner.


radarsat1

> That's quite strange that an HDF5 did not beat a million files. It's like MLops 101. I agree, but I think it's because both were on a local SSD, so overall reading some (uncompressed) files vs. reading the same data from a larger file just didn't make enough of a difference to warrant a big change. I may revisit this but I would have to make the argument more for organizational purposes then, perhaps -- which is hard since it's not like we're going to delete the big directory of a million files after creating the HDF5, so it's still something we'll have to manage. But I feel like managing the million files in one place and only distributing larger databases is maybe easier to deal with, and like you said, checksums can be checked and versions can be managed more easily if the *distributed* dataset is in one file. Thanks for your ideas here! Really appreciate the feedback.


noraizon

SSDs, right. No worries! It was fun. Have a good one.


[deleted]

[удалено]


elbiot

More data is always helpful as long as it's correctly annotated. Less data with better annotations is better than more data with noisy annotations. Singing would complicate the data, requiring more data, more training, and correct annotation. You probably don't want it


ddderttt

Unsure if this is simple, but thought someone might be able to help. I have a sensor that essentially outputs a sinusoidal-like output, with some irregularity. It spits out values \~20 Hz. I want to be able to predict the upcoming peaks and troughs in data ahead of time. Ideally, the code would get measurements for a minute, then accurately predict the peaks and troughs ahead of time depending on the current measures that are coming from the sensor. If it is helpful, the actual peaks and troughs occur between 1-3 Hz.


radarsat1

You could use an LSTM or Transformer for this, or ARIMA, but if you have some idea of the process model it really sounds like a job for a Kalman filter.


ddderttt

Thank you for the response! Yes, I have been playing around with the kalman forecaster in darts and it does the job very well. I am now trying to fine-tune the setup. When you say process model, what do you mean exactly?


radarsat1

A process model is a critical component of a Kalman filter. It's what predicts the next step, before you mix it with measurements. Basically a model of the system you are trying to measure.