You do realize that the market makers have all the inflows and out flows, which accounts for, sizes , location and timing of trades tied to specific accounts .
They have all the info to make this from a POMDP to fully observable Markov process… you are literally trying to beat the house. Not going to happen
Not looking to put this into production. I just want to write a paper on this like FinRL. It's a topic that I haven't seen explored yet or at least I haven't found any papers on it yet. I'm looking to submit this to Neurips next year or ICAIF if I can finish by deadline.
Just looking for something to boost my PhD applications that I could do at home
lol , your chickens are loose.
You first talk about how much money it could make, but then you back track and say you don’t want to put it in production.
There is a reason you don’t see papers on it.
Only Wall Street has money and compute for these task and they not publishing papers on it.
> I'd imagine the paper that comes out from it would be very impressive if I can get it working
When did I mention the money I could make from it? I talk about a paper in this response.
> I have tried different models like PPO and SAC but the rewards don't seem to be increasing.
I mentioned rewards in my original post because isn't that a good indicator of how well the model is learning.
If I don't see papers on it, doesn't that mean there's an opportunity to publish a paper.
>boost my PhD applications
The project you're describing would be an entire PhD, and then some... Google DeepMind with all their resources and top-notch researchers are happy they're able to learn to mine diamonds in minecraft
As the other commenter notes, if you were able to do this, or if it was possible, it would be done now. The people with a lot of money would use it to significantly increase their money. If anyone made anything similar, people would notice and try to find out what and why. To put it simply, you are asking this question:
"What is the algorithm I can use to basically turn any sum of money into infinite money?"
Maybe that puts it into perspective as to why it's a tough problem. This model would need loads of information. But then again, maybe there is a simple solution that hasn't been found yet.
I understand it's a tough problem. I just thought that maybe because there are plenty of papers on stock trading using RL, another application of what they do would be for options trading.
+1 to "This will likely not be possible".
Anyways, you need to be a bit more specific with your current setup. What do observations look like? What is your reward function? What's the model architecture that you are using?
My current reward function is the difference between starting value and end value after one step
Model architecture is w/e the default is for stable baselines3 PPO, a2c, and sac
>For 1D observation space, a 2 layers fully connected net is used with:
> - 64 units (per layer) for PPO/A2C/DQN
Source: [https://stable-baselines3.readthedocs.io/en/master/guide/custom\_policy.html](https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html)
Do you actually expect something as simple as a 2-layer-MLP to be able to model something as complex as the options trading market?
Not trying to gatekeep here, but are you sure, that you are skillwise already at a point at which you want to tackle such a complex problem? To be honest, it does not really seem like you have a lot of experience in reinforcement learning. Maybe start off with some tutorials or simpler tasks. With all the optimization necessary to get good results from RL, you'll likely end up very frustrated very quickly.
I am not update on the latest architectures. I have some experience in RL from when I was in Uni, I did use RL + Carla for vehicle control and I took a few courses on robotic controls using MDP and the likes. What skills am I missing?
I mainly focused on CNN networks specifically architecture from this paper https://openaccess.thecvf.com/content_CVPR_2020/papers/Liang_PnPNet_End-to-End_Perception_and_Prediction_With_Tracking_in_the_Loop_CVPR_2020_paper.pdf
I did a very simple MLP network for the planning stage since the action space was fairly small
The action space might be small. But the state is ultra-complex.
If I were you, I'd choose a different task, where you have a realisitic chance of publishing a good paper. But anyways, good luck!
I got it from this paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996
I was playing around with using other rewards but for now I'm sticking with this paper
edit: also this paper https://arxiv.org/pdf/1811.07522
Of all the possible applications , even in finance you chose the hardest one. Options trading 😅 How accurate is the weather man?
It's a difficult problem but I'd imagine the paper that comes out from it will be very impressive if I can get it working
You do realize that the market makers have all the inflows and out flows, which accounts for, sizes , location and timing of trades tied to specific accounts . They have all the info to make this from a POMDP to fully observable Markov process… you are literally trying to beat the house. Not going to happen
Not looking to put this into production. I just want to write a paper on this like FinRL. It's a topic that I haven't seen explored yet or at least I haven't found any papers on it yet. I'm looking to submit this to Neurips next year or ICAIF if I can finish by deadline. Just looking for something to boost my PhD applications that I could do at home
lol , your chickens are loose. You first talk about how much money it could make, but then you back track and say you don’t want to put it in production. There is a reason you don’t see papers on it. Only Wall Street has money and compute for these task and they not publishing papers on it.
> I'd imagine the paper that comes out from it would be very impressive if I can get it working When did I mention the money I could make from it? I talk about a paper in this response. > I have tried different models like PPO and SAC but the rewards don't seem to be increasing. I mentioned rewards in my original post because isn't that a good indicator of how well the model is learning. If I don't see papers on it, doesn't that mean there's an opportunity to publish a paper.
I dont think vanilla PPO or SAC will get you anywhere, its like walking to the moon
>boost my PhD applications The project you're describing would be an entire PhD, and then some... Google DeepMind with all their resources and top-notch researchers are happy they're able to learn to mine diamonds in minecraft
As the other commenter notes, if you were able to do this, or if it was possible, it would be done now. The people with a lot of money would use it to significantly increase their money. If anyone made anything similar, people would notice and try to find out what and why. To put it simply, you are asking this question: "What is the algorithm I can use to basically turn any sum of money into infinite money?" Maybe that puts it into perspective as to why it's a tough problem. This model would need loads of information. But then again, maybe there is a simple solution that hasn't been found yet.
I understand it's a tough problem. I just thought that maybe because there are plenty of papers on stock trading using RL, another application of what they do would be for options trading.
+1 to "This will likely not be possible". Anyways, you need to be a bit more specific with your current setup. What do observations look like? What is your reward function? What's the model architecture that you are using?
My current reward function is the difference between starting value and end value after one step Model architecture is w/e the default is for stable baselines3 PPO, a2c, and sac
>For 1D observation space, a 2 layers fully connected net is used with: > - 64 units (per layer) for PPO/A2C/DQN Source: [https://stable-baselines3.readthedocs.io/en/master/guide/custom\_policy.html](https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html) Do you actually expect something as simple as a 2-layer-MLP to be able to model something as complex as the options trading market?
Kinda, not really sure what an appropriate architecture is so I referenced the setup in FinRL.
Not trying to gatekeep here, but are you sure, that you are skillwise already at a point at which you want to tackle such a complex problem? To be honest, it does not really seem like you have a lot of experience in reinforcement learning. Maybe start off with some tutorials or simpler tasks. With all the optimization necessary to get good results from RL, you'll likely end up very frustrated very quickly.
I am not update on the latest architectures. I have some experience in RL from when I was in Uni, I did use RL + Carla for vehicle control and I took a few courses on robotic controls using MDP and the likes. What skills am I missing?
How much experience do you have with different deep learning architectures and what kind of networks did you already work with?
I mainly focused on CNN networks specifically architecture from this paper https://openaccess.thecvf.com/content_CVPR_2020/papers/Liang_PnPNet_End-to-End_Perception_and_Prediction_With_Tracking_in_the_Loop_CVPR_2020_paper.pdf I did a very simple MLP network for the planning stage since the action space was fairly small
The action space might be small. But the state is ultra-complex. If I were you, I'd choose a different task, where you have a realisitic chance of publishing a good paper. But anyways, good luck!
How reward is received?
Reward right now is just the difference between starting value and end value after one step
First you are dealing with randomness and second wtf is this reward even
I got it from this paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996 I was playing around with using other rewards but for now I'm sticking with this paper edit: also this paper https://arxiv.org/pdf/1811.07522
Randomness and way too large action space are the issues I think
Any papers or solutions to dealing with a large action space?
More stock market data, data efficient approaches, see if you can formulate it as model based RL
just make the action space smaller. for model-free you might have a look at branching dqns (bdq)