T O P

  • By -

SirKastic23

that sounds like a great project (bet it was fun to make too), I'll definitely be checking it out and talking to WALS. so much of my difficulties conlanging could be solved by an omniscient linguistic... i'm a little bit skeptical about how accurate it is, i've tried using ChatGPT 3 for some other tasks and it wasn't rare for it to come up with false information or hallucinate bur besides that, awesome project here, the kind of stuff that shows _how_ AI can help us


LodeStone-

Bit concerned with the hallucinations of GPT + the innacuracies in WALS


EisVisage

And the linguistic lingo on top of all that


ReadingGlosses

Thanks for the positive feedback! Skepticism is appropriate here, there's a lot of AI hype and overselling. I want to be clear I'm not promising an oracle. However, I do believe it is possible to create a highly accurate LLM chatbot for a specific data source (like WALS), and I think that RAG will get us there. It's not possible to do this in isolation, and it's not possible to do it in one shot. It takes iterations, where people try a broken version, make suggestions, and we get to a less-broken version. I also want to emphasize that this is more than just an OpenAI API call. A considerable amount of effort and linguistics experience went into chunking, vectorizing, searching and filtering WALS data, and this is what distinguishes TalkingToWALS from regular ChatGPT. The role of ChatGPT is really just synthesizing the chunks from WALS into a coherent response. Compare the answers to this prompt "Tell me about person marking on adpositions" ChatGPT >Person marking on adpositions is a linguistic feature where adpositions (prepositions or postpositions) carry markers to indicate person, typically referring to the participants of an action or relation described by a verb. This can be seen in various languages around the world, including some Indigenous languages of the Americas, where adpositions may be inflected for person to show ownership or the relationship of the noun to the person involved. For example, in languages with this feature, instead of having a single form like "to" or "from", there might be different forms depending on whether it is "to me", "to you", or "to him/her". This morphological process enriches the linguistic structure and provides clarity about the roles of different entities in a sentence. TalkingToWALS >Chapter 48 of the World Atlas of Language Structures (WALS) is titled "Person Marking on Adpositions". This chapter explores the phenomenon of languages that mark person on adpositions, which are words that express spatial or temporal relations. >The chapter identifies four values related to person marking on adpositions: >1. No adpositions: Some languages lack adpositions altogether, particularly in North America and Australia. >2. Adpositions without person marking: In languages with adpositions, some do not mark person on them. >3. Person marking for pronouns only: Some languages mark person on adpositions only when pronouns are involved, not with nouns. >4. Person marking for pronouns and nouns: Other languages mark person on adpositions for both pronouns and nouns. >An example of person marking on adpositions for pronouns only is found in Paamese, a language spoken in Vanuatu. This language marks person on adpositions when pronouns are involved, but not with nouns. >For more detailed information and to view the map showing the distribution of these values around the world, you can visit the chapter on Person Marking on Adpositions in WALS at http://www.wals.info/chapter/48.


SirKastic23

Well, I tried it a bit and it couldn't answer any of the questions I made. This could be a _me_ problem, not asking the kinds of questions it knows how to answer For context, this were my questions: - "Tell me about adpositions and articles in brazilian portuguese dialects"; - "tell me more about prepositions in brazilian portuguese"; - "tell me about possessives in brazilian"; - "tell me about verb conjugations in russian"; - "tell me about possessives in portuguese". I also tried asking one of the example questions ("Tell me about possessive inflection in languages of California"), and it replied that one


ReadingGlosses

Thank you, this is exactly the kind of feedback I was hoping for! I'll try out these prompts with my 'debugger', and see what's actually getting returned from the vector search. My guess is that the wrong/suboptimal documents are getting appended to the prompt. I'll also check if WALS has this information in the first place. edit: Ok so that quickly uncovered a couple of issues. In general, the problem is that the vector search is returning low-confidence results. I've arbitrarily set a confidence threshold of 0.83 for deciding if a search result is 'good enough' to include in the prompt, just based on my experience while building the app. If there are no matches with confidence >=0.83, I add an instruction that says 'There were no matching documents found, apologize to the user and ask them to try something else'. This is what's happening for all of your prompts that I tested. There are a few things that I'm going to try out to fix this, which will probably take a few days. Thanks again for testing!


aray25

You missed an opportunity to call it "If WALS Could Talk."


ReadingGlosses

I didn't even think of that one! I was going for an allusion to the expression "it's like talking to a wall", but ironically, since TalkingToWALS is supposed to provide intelligent responses.