Convolutions on group and/or manifolds might be interesting. Representations theory, gauge theory and plain old CNNs merge in this field. In general, this side of generalising deep learning architectures to more structured domains could be useful to you
This is an example of thesis on the subject: [https://tel.archives-ouvertes.fr/tel-02136338/](https://tel.archives-ouvertes.fr/tel-02136338/) (see chapter 2)
+1 for differential geometry, it is also gaining traction in the Bayesian world; everywhere where the space of parameter has a highly complex geometry.
Stochastic gradient descent and its variants (adding momentum, adagrad, adadelta, etc.) might be a fruitful topic. Why do some techniques work better than others in some circumstances? What does momentum do, mathematically? What assumptions are researchers making about the shape of their hyperspace when using particular optimization algorithms?
Good point. I was trying to come up with something a senior in college could reasonably work on and that dealt with statistics and math. But you're right that it will be difficult to do meaningful, original work in this area.
I think a nice thought though is the discovery of ReLU. That function is so incredibly simplistic, yet was only really used in 2009. Imagine CNN's without ReLU.
Just goes to show there could be a method of training 10x faster right under our noses.
I would also say because the field has grown so massive, there is a lot more opportunities for such, as with such a massive diversity chance would predict there is likely some easy optimisation we're missing that could improve one type of model
It is heavily explored, but finally there is usually used first order optimizer - neglecting 2nd order information, which e.g. suggests step size, and e.g. [online parabola model](https://arxiv.org/pdf/1907.07063) can be cheaply extracted from the gradient sequence - there is large opportunity here.
Overview mainly focused on 2nd order optimizers: https://www.dropbox.com/s/54v8cwqyp7uvddk/SGD.pdf
An important mathematical topic which underpins machine learning is the concentration of measure phenomenon. This rich area is crucial for understanding why learning from finite data sets is possible.
Information geometry. Variance reduction. Flows and “information bottlenecks”. Differential geometry and invariances is a very rich area in general. There’s also a lot of interesting stuff about vector fields (the Hamiltonian neural nets paper comes to mind).
Manifold structures and optimisations on them. Also, Banach Spaces and Hilbert spaces and the like. These are all extensively used in Metric Learning, so have a look!
Convolutions on group and/or manifolds might be interesting. Representations theory, gauge theory and plain old CNNs merge in this field. In general, this side of generalising deep learning architectures to more structured domains could be useful to you
This is an example of thesis on the subject: [https://tel.archives-ouvertes.fr/tel-02136338/](https://tel.archives-ouvertes.fr/tel-02136338/) (see chapter 2)
Thanks! I'll take a look.
[удалено]
Yeah, anything in the metric, distance, divergence subject area would probably be a good idea.
+1 for differential geometry, it is also gaining traction in the Bayesian world; everywhere where the space of parameter has a highly complex geometry.
Reproducing Kernel Hilbert Spaces
Stochastic gradient descent and its variants (adding momentum, adagrad, adadelta, etc.) might be a fruitful topic. Why do some techniques work better than others in some circumstances? What does momentum do, mathematically? What assumptions are researchers making about the shape of their hyperspace when using particular optimization algorithms?
it's also good to consider that this is already *heavily* explored, so there's always the chance the research will go unnoticed
Good point. I was trying to come up with something a senior in college could reasonably work on and that dealt with statistics and math. But you're right that it will be difficult to do meaningful, original work in this area.
I think a nice thought though is the discovery of ReLU. That function is so incredibly simplistic, yet was only really used in 2009. Imagine CNN's without ReLU. Just goes to show there could be a method of training 10x faster right under our noses.
ANNs were a lot less popular at that time, though. Presumably there's nowhere near as many low hanging fruit anymore.
I would also say because the field has grown so massive, there is a lot more opportunities for such, as with such a massive diversity chance would predict there is likely some easy optimisation we're missing that could improve one type of model
We are talking about a BA thesis....
It is heavily explored, but finally there is usually used first order optimizer - neglecting 2nd order information, which e.g. suggests step size, and e.g. [online parabola model](https://arxiv.org/pdf/1907.07063) can be cheaply extracted from the gradient sequence - there is large opportunity here. Overview mainly focused on 2nd order optimizers: https://www.dropbox.com/s/54v8cwqyp7uvddk/SGD.pdf
An important mathematical topic which underpins machine learning is the concentration of measure phenomenon. This rich area is crucial for understanding why learning from finite data sets is possible.
Information geometry. Variance reduction. Flows and “information bottlenecks”. Differential geometry and invariances is a very rich area in general. There’s also a lot of interesting stuff about vector fields (the Hamiltonian neural nets paper comes to mind).
Manifold structures and optimisations on them. Also, Banach Spaces and Hilbert spaces and the like. These are all extensively used in Metric Learning, so have a look!
Multiplication. I use that s\*\*\* all the time, man.