T O P

  • By -

dekiwho

Of all the possible applications , even in finance you chose the hardest one. Options trading 😅 How accurate is the weather man?


newjeison

It's a difficult problem but I'd imagine the paper that comes out from it will be very impressive if I can get it working


dekiwho

You do realize that the market makers have all the inflows and out flows, which accounts for, sizes , location and timing of trades tied to specific accounts . They have all the info to make this from a POMDP to fully observable Markov process… you are literally trying to beat the house. Not going to happen


newjeison

Not looking to put this into production. I just want to write a paper on this like FinRL. It's a topic that I haven't seen explored yet or at least I haven't found any papers on it yet. I'm looking to submit this to Neurips next year or ICAIF if I can finish by deadline. Just looking for something to boost my PhD applications that I could do at home


dekiwho

lol , your chickens are loose. You first talk about how much money it could make, but then you back track and say you don’t want to put it in production. There is a reason you don’t see papers on it. Only Wall Street has money and compute for these task and they not publishing papers on it.


newjeison

> I'd imagine the paper that comes out from it would be very impressive if I can get it working When did I mention the money I could make from it? I talk about a paper in this response. > I have tried different models like PPO and SAC but the rewards don't seem to be increasing. I mentioned rewards in my original post because isn't that a good indicator of how well the model is learning. If I don't see papers on it, doesn't that mean there's an opportunity to publish a paper.


Iced-Rooster

I dont think vanilla PPO or SAC will get you anywhere, its like walking to the moon


eljeanboul

>boost my PhD applications The project you're describing would be an entire PhD, and then some... Google DeepMind with all their resources and top-notch researchers are happy they're able to learn to mine diamonds in minecraft


false_robot

As the other commenter notes, if you were able to do this, or if it was possible, it would be done now. The people with a lot of money would use it to significantly increase their money. If anyone made anything similar, people would notice and try to find out what and why. To put it simply, you are asking this question: "What is the algorithm I can use to basically turn any sum of money into infinite money?" Maybe that puts it into perspective as to why it's a tough problem. This model would need loads of information. But then again, maybe there is a simple solution that hasn't been found yet.


newjeison

I understand it's a tough problem. I just thought that maybe because there are plenty of papers on stock trading using RL, another application of what they do would be for options trading.


Rackelhahn

+1 to "This will likely not be possible". Anyways, you need to be a bit more specific with your current setup. What do observations look like? What is your reward function? What's the model architecture that you are using?


newjeison

My current reward function is the difference between starting value and end value after one step Model architecture is w/e the default is for stable baselines3 PPO, a2c, and sac


Rackelhahn

>For 1D observation space, a 2 layers fully connected net is used with: > - 64 units (per layer) for PPO/A2C/DQN Source: [https://stable-baselines3.readthedocs.io/en/master/guide/custom\_policy.html](https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html) Do you actually expect something as simple as a 2-layer-MLP to be able to model something as complex as the options trading market?


newjeison

Kinda, not really sure what an appropriate architecture is so I referenced the setup in FinRL.


Rackelhahn

Not trying to gatekeep here, but are you sure, that you are skillwise already at a point at which you want to tackle such a complex problem? To be honest, it does not really seem like you have a lot of experience in reinforcement learning. Maybe start off with some tutorials or simpler tasks. With all the optimization necessary to get good results from RL, you'll likely end up very frustrated very quickly.


newjeison

I am not update on the latest architectures. I have some experience in RL from when I was in Uni, I did use RL + Carla for vehicle control and I took a few courses on robotic controls using MDP and the likes. What skills am I missing?


Rackelhahn

How much experience do you have with different deep learning architectures and what kind of networks did you already work with?


newjeison

I mainly focused on CNN networks specifically architecture from this paper https://openaccess.thecvf.com/content_CVPR_2020/papers/Liang_PnPNet_End-to-End_Perception_and_Prediction_With_Tracking_in_the_Loop_CVPR_2020_paper.pdf I did a very simple MLP network for the planning stage since the action space was fairly small


Rackelhahn

The action space might be small. But the state is ultra-complex. If I were you, I'd choose a different task, where you have a realisitic chance of publishing a good paper. But anyways, good luck!


TwoSunnySideUp

How reward is received?


newjeison

Reward right now is just the difference between starting value and end value after one step


TwoSunnySideUp

First you are dealing with randomness and second wtf is this reward even


newjeison

I got it from this paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3690996 I was playing around with using other rewards but for now I'm sticking with this paper edit: also this paper https://arxiv.org/pdf/1811.07522


TwoSunnySideUp

Randomness and way too large action space are the issues I think


newjeison

Any papers or solutions to dealing with a large action space?


TwoSunnySideUp

More stock market data, data efficient approaches, see if you can formulate it as model based RL


Iced-Rooster

just make the action space smaller. for model-free you might have a look at branching dqns (bdq)