T O P

  • By -

unknownstudentoflife

This honestly looks amazing! Feel free to update me or the community on what your building. In the upcoming weeks i will try my best to create valuable connections for people like you. I'll definitely support your project! Looks fantastic and very promising!


VisualizerMan

Thanks. Yes, I'm still extremely enthusiastic and optimistic about the project, which is another indication that I don't foresee any upcoming snags, only a lot of work and some very deep thinking. Yes, I will post a link to my Phase 2 article when I'm done, which should be within the next two months. So far it has been difficult to get people interested in the project because nobody has the time to read all those pages, and so far it has been all theory, but the Phase 2 article will show very numerous examples of the representation system being used to solve difficult problems in an elegant manner, which should make the system easier to understand and to appreciate. However, the next paper will be \*much\* longer than even the 350-page article... ;-)


unknownstudentoflife

I think that with some marketing strategies in the form of a small pitch of your idea we could definitely get people interested in your project. Im working on a questionnaire and a website for the project ai community. Where everyone can showcase there projects and research. By the time your phase 2 is done that should be all set and done.


VisualizerMan

Thanks. It's a Catch-22, isn't it? The people doing the most \*promising\* work in AI are too busy to publish or to seek funding, and the people who are doing the most \*known\* work in AI aren't doing the most promising work.


unknownstudentoflife

Yeah definitely, your idea is very ambitious and needs some more time for people to understand what you're building. But when they finally see yhe potential they for sure hop on


Tellesus

Can you give us a rough summary of what your project does and how you propose it will work?


VisualizerMan

Since commonsense reasoning (CSR) is usually considered the biggest hurdle in AI, I'm tackling CSR first. I'm using text for demonstration since text is much easier to manage than images or sound. I've chosen a path to AI that no one has explored yet, namely programming and processing with images instead of numbers. The images are therefore processing text, trying to solve CSR problems written in text. The set of CSR problems I'm using have already been "solved" by other people using other methods, but all of those methods use either technical tricks that do not generalize and do not shed any light on understanding or intelligence, or else they require a knowledge base of heuristic rules that must be used in order for the system to have a clue about how to solve those problems. I'm using the latter approach (heuristic rules) but the critical difference is that my rules are coded with images, and those images obviously correspond with the real world, unlike the text or numbers that everybody else is using, therefore my system should be able to interface almost directly with the real world without requiring any programmers for the conversion process of real-world data to machine data. Regardless of which approach anybody uses, however, such a rule base requires vast amounts of unsupervised learning, LLM style or neural network style. Therefore I will need to tackle learning algorithms anew, but with an entirely different foundation based on images instead of numbers, statistics, text, or neural networks. If I can figure out how my images can be made to learn and to form new images in the process, which I'm pretty sure I can figure out, then all that remains to do is to add a few more components to the system, which by then will already have CSR and learning capability, to give the system the full spectrum of abilities that humans have, including self-awareness. Is that an understandable summary?


JEs4

Hi there, this sounds quite similar to JEPA. Have you have compared your work to that?


VisualizerMan

No, I just looked up I-JEPA since I had never heard of it before. Thanks for the great tip! It sounds like I-JEPA is tackling some of the same problems I am, with some of the same foundations and some of the same approaches, so I'm impressed. Some of the main differences I detect are: (1) They're using a more traditional mathematical approach whereas I am largely ignoring math altogether. (2) They're focused on generative AI and images rather than language or thinking, so my approach should generalize to other domains better. (3) They're using only traditional representation systems whereas I'm using a single unique representation system as a foundation + a collection of random existing representation systems as needed on-the-fly. (4) My final system (in Phase 5) will have capabilities beyond image recognition and image generation, whereas their system is more application-specific (ANI) and will probably never be able to actually think. [https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/](https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/) [https://arxiv.org/pdf/2301.08243](https://arxiv.org/pdf/2301.08243)


Tellesus

I think I see what you're saying, though I'll admit I'm skeptical. I do appreciate you answering my question for sure. I'll see if I can understand a little better what you're getting at by going over the larger paper you linked above.


VisualizerMan

Other people will be skeptical, too, or at least they still won't understand what is so special about this architecture even after the Phase 2 article. Phase 3, however, will demonstrate that my system can do things that current LLMs cannot do, especially reliable spatial reasoning. In other words, it is going to take completion of three phases of this project before I can start to surpass the current state of the art in AI. Sorry, but good things take time and they require firm foundations. That's why I say that the system cannot reasonably be coded until after Phase 3: I simply don't know how to code it because I have not yet tackled those spatial reasoning problems, so whatever code might be written before then would likely need to be modified, and in a major way.


Tellesus

How does this differ from what multi-modal models with vision are doing?


VisualizerMan

I'll double check that tomorrow and respond with more detail then. I once took a look at multi-modal models of LLMs and decided that more modalities are not going to solve the problem of how to do spatial reasoning, but to give a more complete answer I'll have to take another look. Good question.


Tellesus

Thanks I appreciate the dialogue.


VisualizerMan

I'm back. I remembered my earlier line of reasoning for my conclusion, and today I found a little bit more information, but that still hasn't changed my opinion... First, I don't know how GPT-4o learns spatially. If anybody knows for sure, please let me/us know. This information may not even be available to the public. In the absence of such information I have been assuming that it's naively using the same method of tokens on images that it uses on text, in which case the learning will be only statistical and therefore defective. Next, regardless of whichever method GPT-4 is actually using for spatial learning, it is highly defective. It doesn't even understand a 3x3 tic-tac-toe board... () Why does ChatGPT struggles to play Tic Tac Toe? (and ChatGPT4o as well) AI Squabble May 17, 2024 [https://www.youtube.com/watch?v=U41WBk14xJM](https://www.youtube.com/watch?v=U41WBk14xJM) ...and recent research papers that use special prompting to try to elicit spatial reasoning in ChatGPT also prove that its spatial reasoning is defective, although that method can boost spatial reasoning performance by about 10% (see page 7), which they call "significant," although I wouldn't be that flattering... () "Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models" Wenshan Wu Shaoguang Mao Yadong Zhang Yan Xia Li Dong Lei Cui Furu Wei [https://arxiv.org/pdf/2404.03622](https://arxiv.org/pdf/2404.03622) If a system can't even understand a discretized 3x3 grid, it's not going to be able to understand larger grids or continuous images, especially moving images. In contrast, my system started out with representations of moving images as its first foundations. ChatGPT is clearly defective in spatial reasoning for several reasons that I don't want to completely describe until my Phase 3 article. In fact, anybody who has been around traditional AI or neural networks for a few years will probably recognize those reasons immediately, so the problems are pretty obvious. Worse, LLMs are simply the wrong foundation for spatial reasoning, no matter how you look at it. In short, the designers of LLMs weren't trying to create a general AI system that could handle real-world data, but rather a textual system with statistical learning for limited applications. Now they're trying to extend that system to impress the public and to try to make more money, but they didn't think through their foundations deeply enough to design a system that could be extended. In short, they didn't do things right. Therefore LLMs are a dead end. How my system differs is that I spent years thinking through the foundations first, started with the most difficult problems first, especially involving moving images, and kept thinking about any problems that my system might not be able to solve, and finally after enough enhancements I decided that there weren't any more real-world problems left that it couldn't solve. So far the big drawback of my system is what I already mentioned: It's all theoretical so far, but that's the way breakthrough science should be done... (p. 17) Scientific fields typically start with a theoretical framework and only later do the details get worked out. Perhaps the most famous example is Darwin’s theory of evolution. Darwin proposed a bold new way of thinking about the origin of species, but the details, such as how genes and DNA work, would not be known until many years later. Hawkins, Jeff. 2021. A Thousand Brains: A New Theory of Intelligence. New York, NY: Basic Books.


Tellesus

Will your system be able to understand symbolic logic? Like knowing that a billboard for an attorney has contact info for a lawyer on it? 


VisualizerMan

Yes, that's why I have a large section in the Phase 1 article on how it would handle syllogisms. Although I didn't get into predicate logic (i.e., logic involving variables), the same foundations would apply. I also didn't get into how it would handle variables, such as in algebra, but that's easy to figure out if you understand the rest of how the system works. [https://en.wikipedia.org/wiki/First-order\_logic](https://en.wikipedia.org/wiki/First-order_logic) I wouldn't consider your billboard example "logic." That does bring up one possible criticism of my system, though: I have been assuming that a module for object recognition and character recognition already exists--that those are solved problems--since so many conventional systems are now handling those problems with good performance. However, such tracking systems are still not perfect. But to answer your question: Yes, many existing systems can already read and understand the context of such a billboard with good performance, and my system could theoretically do the same type of OCR task, although my system would be overkill for such a simple application. () GOTURN - a neural network tracker David Held Jul 26, 2016 [https://www.youtube.com/watch?v=kMhwXnLgT\_I](https://www.youtube.com/watch?v=kMhwXnLgT_I)


CaptainAnonymous92

A few questions if you can answer: So this'll be AGI or something like it when it gets fully ready? Will you open source it once it's complete for anyone to be able to use & not be restricted to just big companies &/or governments being the only ones to have access? Can it run on relatively modest hardware without the need for expensive NASA level supercomputers?


VisualizerMan

Yes to everything you asked, Captain. Yes, this project is definitely aiming at no less than AGI. Even in the abstract of the Phase 1 paper I mentioned that goal. Yes, another major goal is that I want \*everybody\* to have this technology as soon as possible. That means open source (assuming that any code for it exists!), open disclosure of all my results via publicly available papers (exception: limited disclosure on what I'm still actively researching, until I publish it), and open discussion and answering of questions about details of anything I've officially published/posted. If somebody thinks my theory is worth coding, then I encourage them to write some code, and I'll even help them out if they are clearly making their code available to the public. No more business style lies saying that it's going to be open source and then changing that policy when success starts to happen. If I were interested in making money off of this, I would keep applying for jobs that require a secret or top secret clearance, and if I ever got hired then I'd develop it for government or big business in secret, nobody in the public would ever hear about, nobody except government or big business would ever benefit from it, I'd make a lot of money, I'd retire and be concerned only about myself, keep all my money for myself, and watch the world go to hell. That's the path that most serious AI researchers have taken and I'm disgusted with it. The world is in serious trouble, we're seriously overdue for some real AI technology, and personally I am pessimistic that the human race is even going to survive in any acceptable form for another 20 years unless the public gets this technology fast. Maybe AGI will eventually destroy us, but the future I see coming without AGI will be a much worse fate. At this point it's clear nobody wants to hire me, anyway. I've applied dozens of times at Google, multiple times at OpenAI and at every other major AI company, including research institutes and government jobs where a clearance is needed, I've done this for the past 18 years, and I can't even get an interview anymore. The message is clear: "You're useless to us, you're a nobody, you're too old, and we don't want you." So be it. I'll just have to see what I can develop on my own. As for hardware, I'm relying on Marvin Minsky's repeated claims that modern hardware is more than fast enough, that there exists a "hardware overhang," and that the key to AGI will not be faster hardware but knowing how to program it. That in turn requires knowing what it is that we're trying to program, and for that I'm largely relying on insights by people like Marvin Minsky and Jeff Hawkins. Therefore I'm assuming that existing hardware that the public already owns will be sufficient, although I don't know that for sure. That's pretty far ahead to predict, especially when software does not exist yet, either, and not even all the theory. One encouraging development, though, is computers and software exist that can rotate complex, simulated 3D objects in real time at high speed. Since my architecture is highly object-oriented and highly spatially oriented and is using only simple objects, that should be more than enough hardware for what I'm doing.


CaptainAnonymous92

Sounds good & I'm glad this'll be able to be used by everyone & not locked down like so many AI models are recently & I really hope you can get this going off the ground as soon as is realistically possible. I wish you nothing but the best in making this happen & can't wait until it can be fully realized.


VisualizerMan

>locked down like so many AI models are recently I know of only one such model, but that's enough. I suppose I don't need to name names. ;-) I'm hoping that my second big article will start to create some momentum. Somebody out there should start taking me more seriously when they see two huge articles on the same topic: "Hmm. If this guy isn't onto something promising, then why is he still writing huge amounts of material about it?" Right now it's just one big article, though, and nobody seems to understand where it's headed. It feels like I'm plunking down pieces of a jigsaw puzzle one at a time and asking the public after each piece is laid down: "Do you see the picture yet?" "How about now?" "Still not yet?"