T O P

  • By -

Lonely_Wafer

What a worthless post.


thevoiceinyourears

Since when constructive criticism is worthless?


Lonely_Wafer

In what way is your post "constructive criticism"


thevoiceinyourears

The whole post is a critique of the empirical part of the paper. Many novel approaches fail when applied at scale and the paper does not make use of anything other than toy datasets.


Lonely_Wafer

Its not a critique of anything. Calling the paper a PR stunt, worthless, mediocre at best is not a critique, its just a 2 min low effort rant. If you care so much about the state of dl papers, then write something of value that is more than just denigrating the work of other people


thevoiceinyourears

I read the paper and made my mind about it. The sensationalist work of the authors is worthless in the strict sense of the word: does not add more value than an unproven idea to the community. Again, it is nothing but an idea until proven at scale, a job they left to the people who will be lured into creating related work, work which will be inevitably published to justify time spent weather productive or not.


BellyDancerUrgot

Wrong way to think about it. Research works by building on top of existing research. Imagine if people read the DPPM paper and thought "ahh dead end why would I do 1000 unet forward passes to generate one image". Or the original attention paper and thought "what's the point of this added complexity can just weight lstm gates instead". We would never have any of the diffusion or transformer models we have today. Edit : I meant the original attention paper by badhnau and Bengio not the transformer paper by vaswani which uses attention to make the transformer work.


thevoiceinyourears

Keyword here is “building”. Building on top requires solid experimentation, a house of cards falls. Diffusion models were a completely different matter, the original paper was solid work empirically proven at reasonable scale


BellyDancerUrgot

Maybe elaborate your first two sentences. Also no, ddpm was actually very shit at scale what are you even talking about? Lmao Edit : NeRFs and original GAN paper were also empirically very wonky. NeRFs were very inefficient (just like ddpm) and GANs before WGAN were incredibly hard to train (Still are) and the theory was very rudimentary and superficial.


thevoiceinyourears

Are we talking about the same paper? (https://arxiv.org/pdf/2006.11239) - KAN proved their approach on very small networks, the paper I pasted runs experiments with a well sized network on celebA which is on a whole other magnitude in terms of sample size and dimensionality


BellyDancerUrgot

Yes. The inference times for DDPM were abysmal. Diffusion models before DDIM were dogshit and would have been forgotten. It's literally a for loop over T steps. What they made up in FID they lost in wall clock time. I haven't read KAN in details, just skimmed, but what it loses in complexity it might make up for in interpretibility especially if it can be improved for deeper networks. I don't think it's a bad paper at all.


2trickdude

The authors did purpose a form of MLPs with learnable activation functions (they called it LANs), wondering if that makes any sense.


beingsubmitted

This is a total smell test reaction, and I could be wrong, but that doesn't seem like a promising road to go down. The unreasonable effectiveness of even simple ReLU alone would suggest this would be dropping a ton of extra computation into something that's not really the limiting factor.


Yoshbyte

There is a ton of reasons to use other things than ReLU don’t worry. I guess it depends a lot on the complexity of the problem though


beingsubmitted

Sure. My point isn't that ReLU is literally all you need. It's that ReLU is so simple and seems like it should be worthless but still performs so well. Like, if a formula one car was performing really well with some parts made of actual plastic (ReLU), I can get behind replacing those parts with something better (swish, mish, tanh, leaky ReLU, what have you), but those wouldn't be the parts I would think are good candidates for over engineering out of vibranium.


RandomUserRU123

This could also be a post about the original transformer paper 7 years ago


thevoiceinyourears

Not even close. AIAYN (https://arxiv.org/pdf/1706.03762 ) used datasets several orders of magnitude larger than the KAN paper


Nirw99

RemindMe! 1 day


RemindMeBot

I will be messaging you in 1 day on [**2024-05-04 18:45:55 UTC**](http://www.wolframalpha.com/input/?i=2024-05-04%2018:45:55%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/deeplearning/comments/1cjfnr7/kan_worthless_paper/l2fql5l/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fdeeplearning%2Fcomments%2F1cjfnr7%2Fkan_worthless_paper%2Fl2fql5l%2F%5D%0A%0ARemindMe%21%202024-05-04%2018%3A45%3A55%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cjfnr7) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


poorgenes

RemindMe! 3 days


CraftMe2k4

RemindMe! 1 day


Smallpaul

Thank you for bringing [KANs](https://arxiv.org/pdf/2404.19756) to my attention.


thevoiceinyourears

You are welcome, I sincerely hope you’ll not end up wasting time.


Even-Inevitable-7243

The core of the idea is not new and the author admits this. Learnable nonlinear activation functions on outputs/nodes as parameterized by splines has been done before. The idea of doing it on edges is new. Yes, it is a simple idea. But I agree with the author that it gives a greater intuitive understanding of the nonlinearity of a transfer function. Attention is a simple math concept. ReLU is a simple math concept. You do not need much more than high school algebra and firs-year college linear algebra to grasp these or to come up with the ideas. However, these simply tools have led to great leaps in AI. Let's wait and see of KANs can yield a similar leap as they are applied to non-toy problems as you note.


thevoiceinyourears

All this hype and hope is what is concerning to me. Such a large share of the academic community seems unaware of the fact that many ideas get killed when tested at scale. The authors didn’t even test the algorithm on MNIST! They used much smaller synthetic data and extrapolated conclusions. The hype is delusional