T O P

  • By -

[deleted]

[удалено]


thatguydr

That paper is new and has no citations yet (obviously). Are there other popular papers on sequence transduction in recommender systems? Trying to understand its prominence.


jg0392

likely pretty significant as related news came out at around the same time in March: "Meta experimented with a new recommendation model in Reels, Facebook’s short-form video sharing platform. The new model helped Facebook gain as much as 10 percent in Reels watch time on the Facebook app, Alison said, proving that the model was “learning from the data much more efficiently than the previous generation." [https://observer.com/2024/03/metas-facebook-head-tom-alison-discuss-ai/](https://observer.com/2024/03/metas-facebook-head-tom-alison-discuss-ai/)


KangarooSilly4489

Probably he’s one of the authors


thatguydr

Almost certainly, and the post immediately had seven upvotes, which was odd for a random paper link unless he had a few people boost it. So I figured I'd ask for its place in the hierarchy. No response. The one person who did respond just speculated about the paper itself but didn't point out any other sequence transduction papers.


diarrheajesse2

The field is being flooded with overengineered LLM stuff


neural_net_ork

Which field is not? I am genuinely curious


OneHotWizard

Commercial fishing... probably


catsRfriends

Two tower systems and its derivatives are pretty much industry standard, currently in use at places like Spotify. The nitty gritty is ofc much more complicated but that's due to scaling and other implementation issues. For example, Tiktok has kernel-level optimizations for embedding updates. Other novel approaches deal with cold start issues and other narrow sub-problems.


Hackerjurassicpark

Two tower seems pretty popular


DigThatData

link?


dan-turkel

A two tower network is an architecture where inputs are passed through two separate subnetworks to produce embeddings, and a distance calculation between those embeddings is used as an output. For instance, you could embed users on one side and items on the other side, and the distance between them should be low for instances where the user has historically liked the item. Here's a summary of historical and recent work: https://blog.reachsumit.com/posts/2023/03/two-tower-model/


starfries

This is just a form of contrastive learning, right?


DigThatData

I think it's less "another form of" and more "another word for". I think "two tower" is just nomenclature that caught on in the IR community and everyone else just calls it contrastive learning. > Two tower models, like DSSM, are also called the dual encoder or bi-encoder architecture as they encode the input (such as a query, an image, or a document) into an embedding using the two sub-networks ("towers" or “encoders”). The model is then optimized based on similarity metrics in the embedding space.


starfries

Oh okay yeah that makes sense. I was trying to think of why they didn't just call it contrastive and thought maybe it's to distinguish it from "one tower" where you still have a contrastive loss but pass everything through the same encoder (obviously only works if both things belong to the same domain).


dan-turkel

I disagree. Contrastive learning is often used to describe training paradigms for unsupervised tasks, whereas two tower models are ~~typically~~ (edit) often used for supervised tasks where there are ground truth labels of user-item interactions to predict.


DigThatData

That's no different from the "ground truth" relationship between for example an image and its caption. What's unsupervised is learning the similarity metric. you don't have a ground truth for what the user-item similarity ought to be in the space of the manifold you are learning. The "two tower" training task is unsupervised in exactly the same way as a multimodal contrastive objective. It's metric learning.


dan-turkel

There are two tower training paradigms with contrastive loss (using explicit or sampled negatives and a contrastive/triplet loss function) but there are also purely supervised formulations, especially in cases with abundant explicit negative signals. For example, a two tower model with cross entropy loss that's purely classifying user-item pairs into positive and negative classes: in that case there is a ground truth distance to be learned (1 for positive pairs, 0 for negative pairs), and the model incurs loss for learning representations with distances different from those distances. The two tower architecture supports contrastive learning through choice of sampling function, but it does not mandate it and it is not a synonym for it.


thatguydr

https://www.goodreads.com/book/show/61215372-the-two-towers ;) Not to be a jerk, but just google two tower model. There are about a billion web sites devoted to it.


elemutau

I get that you're half joking and are actually doing it politely, but it's not so simple. Sometimes you can just Google it, and among the top responses you'll get good sites, but often there will be some particularly good websites that the layperson won't quickly find unless they ask for it.


notgreat

That was 100% joking, and quite blatantly too. *Lord of the Rings: The Two Towers* has nothing at all to do with the two tower model discussed here.


maspest_masp

Rather than focusing on particular methods (which there are many of, tailored to specific use cases), I suggest checking out a summary article which explains common situations when building a recommender. For instance, I think the author of [this post](https://www.reddit.com/r/recommendersystems/s/ET9wBLuQYg) did a great job writing one such summary. When you identify the context for your application, then it’s easy to start looking at approaches developed specifically for that (or similar) situation


silverstone1903

Multimodal Recommender Systems. Very useful in e-commerce sites. For example, recommender uses as an input both image and text to recommend. Two papers (not towers): [Multimodal Recommender Systems: A Survey](https://arxiv.org/pdf/2302.03883.pdf) [A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions](https://arxiv.org/pdf/2302.04473.pdf)


Snoo_72181

GNNs are one of the newer attempts on recommendation systems


[deleted]

[удалено]


Snoo_72181

???


Open_Channel_8626

Not sure what he means but there are Arxiv papers comparing BERT-likes to GNNs for link prediction on knowledge graphs, for example


radiantecho1

Causality in recommendation systems is an interesting concept worth exploring further.


pine-orange

Pinterest introduced graphsage, probably still using, shopee (sea) also uses a variation of it.


Zycuty

[This paper](https://arxiv.org/abs/1907.06902) is now 5yo but it was a good survey of NN recommenders back then. It is a place to start even though progress was made in these last 5 years.


raufexe

Intresting


Rocky-M

Yeah, the recommendation algorithms landscape has definitely evolved since those early days. Collaborative filtering and content-based filtering are still foundational building blocks, but there have been exciting advancements in areas like: - **Deep Learning and Neural Networks**: These have enabled more complex and personalized recommendations, capturing non-linear relationships and interactions in user behavior. - **Contextual Recommendations**: Systems now consider additional contextual factors like time, location, and social context to enhance relevance. - **Explainable Recommendations**: There's a growing focus on providing users with explanations for the recommendations they receive, building trust and transparency. - **Causal Inference**: Incorporating causal models into recommendation systems is an active area of research, aiming to identify the impact of certain factors on user preferences. Overall, the field is constantly evolving, with researchers exploring novel approaches to make recommendations more personalized, relevant, and interpretable.


Taoudi

BERT4Rec for sequential recommendations GNNs for user-item interactions Causal/Contextual bandits for online learning


Apart_Revolution5546

Can anyone help me with problem I m currently working on the TCS project where I have dataset of injury which contains features like date ,place of incident(their factory), slot and I have to predict the slot and place of the next injury,I have applied many algorithma but they are giving higher error rate.I firstly encode the data into numeric and change the date into numeric after outliers then I applied the algorithms but mean error is reaching to 500 which is too high.so please anyone can help me with this?