I built a basic version of this for myself with a prompt in chat gpt in an afternoon. It's great that you've built this yourself, but where's the magic? If it's your prompt it can probably be extracted in a few minutes by those who know how to do so.
Thanks for working on this! Language learning really needs a breakthrough.
Now, I tried the web app and chose to learn Greek as a beginner. And while I had better experience with your app than with ChatGPT or Gemini voice modes, I still got lost 5 minutes in because the AI tutor doesn't seem to have a plan for me, nor does it "see" my struggles. For example, after asking me about a hobby, it gives me a long sentence in Greek about how how it is nice to hike in mountains. Being absolute noob I cannot reply to it, nor even repeat it. And I don't even know what it is expected from me at the moment. A human tutor here would probably repeat a part of the sentence with a translation and ask me to repeat, or would explain something. The AI just sits there waiting for me to make a sound, and when I make it, it goes on on a tangental subject of beach vacations. :)
Again, this is still relatively not bad, and I'm going to give it another try.
I had a similar feeling with Swedish just now. It isn't really much different than conversing with ChatGPT in advanced voice mode - it's up to me to drive the conversation and it all feels quite arbitrary (and I find myself instinctively falling back on topics I know how to talk about, which quite defeats the purpose). I was hoping for a more structured learning plan that strategically expands my comfort zone and skills in a guided way.
Thanks for the feedback. Yeah we need to improve the beginner experience, it's more tailored towards intermediate/advanced students at the moment.
I'm an advanced learner but I stopped after a few moments because it's boring. It's asking me questions that you'd ask a beginner (although a beginner wouldn't understand the questions). It just asked what food I like to eat, where I like to travel, whether I like the weather, etc. I have a language tutor IRL and I have found that we run out of things to talk about too. So we often find ourselves just discussing the latest events from the news. I think you should feed fresh conversation topics daily from a data source like the news, localized to the user. There are global news APIs you can subscribe to.
Do you mean that the experience is meant to have more structure if you pick the intermediate or advanced level? (fwiw I did pick intermediate for my Swedish level in the app).
My thinking is - I can have unstructured conversations with Advanced Voice Mode or in real life here in Sweden. What I'd really appreciate is a guided learning experience taking me up from intermediate/slightly above intermediate to fluent in the most efficient possible way (as opposed to just having us 'ramble' about random topics of my own choosing).
Why wouldn't intermediate/advanced students just talk directly to ChatGPT? From what I see, I thought your value prop was for the beginners.
I paid for Memrise to polish up French. The scripted lessons alwere great but it dropped me into an AI conversation assistant that did exactly the same. It forgot the vocab and grammar level that the scripted lessons has taught, and often broke into idiom. I haven't picked it up since.
I'm a Memrise beta member w/ lifetime premium access for my contributions to the site in its early days. I cannot recommend anyone use Memrise for anything nowadays it has been so heavily enshittified. In fact, I recommend against using it in favor of Anki (Memrise's biggest strength over Anki in the early days was the community mnemonics and courses (Anki equivalent "community decks") - none of which really exist in any way today).
I tried following the modern Japanese track on Memrise and was appalled at how bad it is nowadays.
I think the point here is for you to practice (i.e. develop "muscle memory" for speaking), not to learn.
I think this is a pretty big limitation of the architecture (STT->LLM->TTS) they've chosen. The intonation around struggling to speak or difficulty with certain phrases is totally lost when the text is transcribed.
The ChatGPT mobile app in hands-free voice conversation mode works quite well for language practice with one important call-out: you have to give it a topic at the beginning otherwise it won't be able to drive the conversation forward and will stick to banal pleasantries.
What I usually do is pick a random blurb in the news and paste the entire thing along with the Reuters link at the beginning and inform ChatGPT that we'll be carrying on language practice specifically over that topic of discussion.
I've used this to carry an hour long foreign language practice in Spanish while walking my husky. Just put the phone in my pocket and go. If you're an intermediate/advanced learner, it's a pretty decent solution.
In fact, you can actually instruct ChatGPT that you are going to speak in your native language, but ChatGPT is only allowed to respond in the target language if you just want to focus on practicing listening comprehension.
I'd be interested in hearing how significantly improved Issen is over this.
Luis von Ahn spoke in the early 2010s—probably around 2014—at The LAB in Wynwood, Miami. He recounted how his fascination with crowd-sourcing led first to reCAPTCHA and then to his latest venture, Duolingo. He made it clear that his real passion wasn’t language per se, but building a crowd-sourced human translation service as a business model. At that point, Duolingo had roughly 24 employees—and, much to his surprise, only two were focused on the crowd-sourcing engine. He explained how they’d enlisted some of the world’s leading language-education researchers as consultants. Their very first question: “Which part of speech should learners tackle first?” The experts confessed they didn’t know, so the team gathered the data and used A/B testing coupled with statistical analysis to pinpoint the answer.
Today, it’s not only easier than ever to launch a platform to challenge Duolingo, but its core product—its crowd-sourced human translation service—has been distrupted.
This morning, I found myself thinking about how all those decade-old learning platforms—like Coursera, as reflected in its ever-falling stock price—are being distrupted.
Your product looks awesome and I hope you distrupt all the language learning platforms. Thank you for sharing.
(I had ChatGPT fix my grammatical errors and now this comment doesn't sound like me, sorry.)
Honestly tried it out, I wanted to like it but in its current form I found myself frustrated enough to just end the 'call' and close the app. Been learning Spanish for quite some time now so wasn't put off by the 'it always talks in X language' thing people are talking about.
The thing that put me off was the speech recognition. I am not in a loud environment and I wasn't even talking and it was picking up responses and responding to it before I even opened my mouth. It blazed through the 'preferences' set up itself making up responses. Then when I did get to talk it just simply got my answers wrong. It would often interject too at random during my sentences.
Thanks for sharing! I tried using it for Thai language coming from English and found that the app understands me well! But I couldn’t understand it at all. It replied to my turns with very long messages (20+ syllables) in pure Thai and spoke with an unnatural rhythm which made it hard to pick out words or phrases. The foreign alphabet made it really difficult too. I tried changing some settings in the bottom left menu and it started speaking English to me too, but I found it unbearably slow. At one point it asked me if I wanted it to speak in pure Thai or a mix and then ignored my answer. Ultimately as a beginner I don’t think Issen will work for me very well as-is. Happy to check back in the future!
>We didn’t want to focus too much on gamification.
Thank you so much for this. Duolingo is literally unbearable because it's so gamified. I'll try it out later. I've seen a few of these apps, can I seamlessly go between my native language and the language I'm trying to learn? If I am trying to learn Hindi, can I ask a question in English in the middle of a conversation?
Yes, we've spent a lot of time getting the STT and TTS to work seamlessly in multilingual, it works pretty well!
I tried the app. I love that you’re tackling this and I’m rooting for you. I’ll tell you about myself, my experience, and my thoughts.
I’m currently learning French as a beginner and I’ve learned other languages in the past. I’ve trued Duolingo as well as italki and frantasic as well as just ChatGPT. I am very familiar with Anki and I think it’s critical to make your own flashcards by choosing images and sounds. I don’t want auto cards.
My experience with Issen:
* it’s frustrating when the conversation partner doesn’t remember what it just said - it means I can’t get a chance to ask que c’est que ça veut dire.
* it’s frustrating (just like with ChatGPT) that the conversation partner tends to interrupt and jump in while I’m thinking. I think many learners speak slowly and spend extra time thinking. ChatGPT allows you to hold the glowing circle and it won’t interrupt while you do.
I’d love to see the chat bubbles have more in depth features like:
* much clearer indicator of hover or click words for translation, and more features like example sentences or click to pronounce
* an option to ask for an explanation of some or all the text
* for my own text I’d love to see feedback with more UI native elements about how accurately I pronounced each word and any grammatical mistakes I made. The text summary is a great start
I found myself ignoring the features of the chat bubbles and only in writing this feedback did I notice them! They could maybe use more contrast and clear UI emphasis. Duolingo does a good job of making their UI very clear with this kind of feedback.
I think it’s important to build features that augment the app to work around LLM limitations. My guess is a lot of the settings change the prompt and that’s great but I think it leaves too much room for hallucinations to nosedive the experience.
I’d also love to see some way to have a hold to talk or something similar.
I’m very conscious at this point about the cost of these lessons and I have a hard time finding the price. Frantastic is absurdly expensive and it made me switch to italki where human conversation is literally cheaper. Without differentiating more from ChatGPT I would have a hard time justifying an additional subscription to my wife!
This looks great, congrats! As someone that has gone through Assimil courses and done lots of comprehensible input for various languages, language production is typically the weak point that isn't covered well. I've done plenty of lessons on iTalki, but I've been wanting something more structured and this seems like it could cover it. Definitely going to give it a shot!
The feature request I make for all language course makers: please consider Bengali support in the future! It's wild to me that the 7th most spoken language in the world, with a deep culture around literature and poetry [1], gets zero attention from language course makers. I can buy an Assimil course on Breton, spoken by 200k people, and not Bangla, spoken by 284 million.
I'm glad someone is building this! I was using this in Thai. I expected it to be awful. But it's actually very good. I only used it for a few minutes but will try to use it more later. It's possibly good enough for me to stop paying my tutor. However, please use a different Text to Speech model because the current Thai one sounds robotic, like the old (current?) Google Translate. This seems like a great product.
Ok, thanks for the feedback. Who was your tutor for Thai, Supatra or Malee?
I've been learning Arabic, and I noticed that the app uses Arabic script right from the start. This can be quite challenging for beginners who haven't learned how to read it yet. May I suggest adding an Englishized (romanized) version of the Arabic text to help ease the learning curve?
It also seems to not listen to me when I asked to give me shorter sentences. It seems to not care that I'm struggling despite my pleading.
I later switched to Spanish, which was a better experience. This one seems to listen to me better. I can ask the tutor to repeat what they said in English and give me shorter sentences, and thankfully, it does.
Interacting with the tutors does feel I have to drive the conversation which is taxing. Compared to a human tutor, where I feel assured that I can be guided properly.
Still an interesting app. Would love to try Spanish some more, in the future.
I haven't tried it out yet. I will. But I just want to say that I have wanted this to exist since I first used ChatGPT in 2022. Thank you for building it.
Just used it for French right now. The Design is excellent! but the LLM task orientedness needs some work. The tutor needs to follow the curriculum well. This has the same issue that I have in my day job i.e. keeping the LLM on topic. Its not strict. i.e. after asking it to make sure to remind me to reply in french it very easily forgets to do so. Its not following a structured approach or even in casual conversation isn't correcting my mistakes unless I ask.
I'm trying to learn vietnamese, but the lessons are really really rough and borderline bad advice.
---
AI: Anh mệt is good if bạn are a man speaking about yourself. You can also say, “Em mệt” if you’re a woman.
this isn't correct. If you are of "older brother" age and are male, you say Anh. Em is for if you are "younger person" (does not matter the gender). Women tend to prefer being called "em" (even if they are older), because women prefer to be identified as younger than their true age... But that doesn't mean you can't call younger men em.
A good tutor would know your age relative to theirs and explain this context.
---
It would say english phrases with a vietnamese accent.
---
It also would give me really complex vietnamese phrases that I am not ready for. when I prompt for an explaination or translation, it would get off track from the original thing we were learning.
---
Way more people in Vietnam (and the globe) speak southern Vietnamese, but the tutors seem to be from north Vietnam.
---
The STT also was very forgiving if I pronounced things incorrectly. Or it would confuse english and vietnamese. I would say, "Phai", but it heard "bye"
---
I was ready to pull out my credit card, but I can't trust it to teach me the right information. I pay $160/mo for Vietnamese tutoring ($20 per class). This would be way cheaper and I don't have to schedule my classes.
This sounds very much like the kinds of mistakes that LLMs typically make. It's a pity, I would love a good language learning platform.
Alright, having tried this with Japanese I can say it's frustrating. As a near complete beginner the tutor kept speaking in Japanese even when I said "sorry I don't understand" repeatedly and then when I asked it to start in English and then gradually transition to Japanese it lasted all of one sentence in English before switching back. I can totally see how this would be useful conversation practice if you've progressed that far, but I'd love to have something for even earlier beginners. Also since many of the models you use are natively multi modal this could readily integrate visual media for discussion and grounding.
Also, for the transcription it would be great to get pure romanji to start with!
Yes, I can understand and empathize with your experience. Quite honestly our current focus is more for B1+ students. That 0 -> 1 / bootstrapping of the language is much better served by traditional material that is less talking / listening-heavy.
Unfortunately, I think you will soon learn that the market for advanced language learners is 1/500th the size of the market for beginner learners. But thank you very much and please keep focusing on us.
I'm a second-gen Korean-American; my korean is weak but conversational. I am intrigued by the reasoning model that analyzes my speech and points out various mistakes I'm making. It's a good first attempt at separating the 2 tracks of actual conversation vs mistake-correcting.
I think showing the raw reasoning text is not quite the right UI; maybe highlighting the specific text in red and showing a suggested correction would work better?
It's also a little awkward that the conversation is live; I don't really have any breathing room to read the reasoning traces on what mistakes I made / could have done better. I hung up the first time I tried to figure out how to pause.
I can't wait to try this! I studied a few languages in school and have lost any semblance of proficiency -- mainly because I never have a real occasion to use anything other than English. I've been waiting for someone to build something like this
It would probably be better to pick one or two languages, actually work with native speakers to make sure it's right.
These "we cover every single language" tools get it like 75% right at best.
I disagree because of how AI is progressing and because there's tons of neglected language markets they can pick up. Obviously your approach can work too, perhaps better. But 95% of language learning tools don't support Thai (my target language) for example so I am an eager user for that reason alone. I think they'll be able to make a generalized curriculum and have the AI use it in all languages.
My tool supports Thai, if you'd like to try it - https://nuenki.app . I added it at the request of a user, who seems to be happy with it.
It's a browser extension that finds English sentences in webpages, and translates the ones at your difficulty level into the language you're learning.
I appreciate your comment about gamification. I’ve kept a streak alive on other apps for no other reason than keeping a streak alive. Not learning a thing.
Yeah, this is the biggest gripe we hear about much of the existing language learning landscape. That they're effectively gaming apps masked as language learning apps.
You should still gamify it. Gamification is orthogonal to whether the tool actually works and positively correlated with whether the user actually uses it.
Speaking of translation with LLMs I've been looking for a solution to quickly open a bi-directional translation context without having to prompt ChatGPT or any other LLM every time. iOS lets you set the action button to use the default translation app quickly, but the translation it provides is vastly inferior to LLMs.
Even some basic app that can pre-load the prompt doesn't seem to exist?
Cool stuff! Probably one of the less popular languages, but I noticed that the transcription with Russian is often quite poor.
Part of me loves this—no judgement, endless convenience, cheap. But another part mourns, sensing it strips away the grit, the stumbles, the soul of language learning. The kind that only comes from fumbling through conversations with another human.
When I was learning Spanish, I used italki extensively and found having a live Columbian tutor invaluable and very affordable for most Westerners. It would genuinely make me sad if those excellent tutors start losing work to this kind of AI.
I tried the Japanese track. I'm a total beginner and the first lesson wasn't helpful at all. The AI asked about maybe mixing up Japanese<>English, but it didn't actually follow through. It either spoke fully in Japanese or fully in English. Maybe this is a standard practice for language lessons? I remember going to the first day of French class in a community college, and the teacher only spoke French, which was extremely overwhelming. Perhaps it's the standard way of teaching? Even if it is, I'm not sure if it works when compressed down to the shorter times I see myself opening the app.
I tried the Web Version. Started, then tried to create an account, but it kept looping, informing me that my email address does not exist in your system. Well, the “Create New Account” got kicked off and gets me in a loop of “Do not Exist”. I just went through the whole process again, and I'm back to the beginning.
I’m going to assume this works better on the App.
Unclear what issue you hit, we'll look into it. Thanks for sharing.
Is all this capital, energy and opportunity cost really worth displacing tutors who are already pretty cheap and demonstrably effective? I put AI language apps somewhere near fad diets, in that they appeal to the convenience mindset.
Do you store conversations? And what's the general privacy philosophy behind the app?
We store the messages, but not the audio. We also store session summaries and a "user facts" summary that gets regenerated after every session, based on all session summaries, everything in our AWS DB.
You can delete your account at any time to fully wipe all your data, but there is no way to delete sessions ATM.
I've been waiting for someone to build this! Trying it out now
Ok, so feedback from me.
It asks me for my level; I'm half way through this audiobook (https://www.amazon.co.uk/Next-Steps-Spanish-Paul-Noble/dp/B0...), and have listened to the book before it a number of times, so I'd say I'm between beginner and intermediate. I think you could do better than a "what level are you, pick from 3 options" and throw you straight into a chat - ask some basic Spanish questions, and then try and figure out where the user is from there.
Next I chose Blanca from Barcelona, and she said an awful lot of words and I understood very little of them, so I think I'm not ready. Half the grammar lessons have both a Spanish and English explanation, and half don't.
I'll keep the feedback coming, but I'm on a train now and the questionable internet is not good enough for an actual conversation.
(not at all relevant but I work for Devyce, from the YC S22 batch!)
Why not use the Gemini flash voice-api directly instead? Cost?
I ask because from the demo, the tutor's voice seems mechanical.
I've played with the gemini voice api and it's quite impressive for conversation with low latency, I'd say perfect for your use case. It even switches languages if I say "Okay, let's talk in $foo language".
The vocabulary tooling looks neat and well thought out.
Multiple reasons (which also apply to openAIs realtime API):
- it's less intelligent than the non voice apis
- intelligence degrades even further with lots of context
- more expensive
- latency is not a free lunch, it comes at the cost of more interruptions from the tutor, which is a really bad UX. We prefer to interrupt less and have higher latency
Also, we prefer the eleven labs voices, but there is definitely varying quality. I'm guessing later this year or next, the voice to voice models will become good enough, and we will switch over.
Which spaced repetition algorithm are you using? I recently learned that there is a much improved one that has been adopted by Anki. (https://domenic.me/fsrs/) Have you adopted that as well?
Jarrett Ye, the creator of FSRS, is a big fan of Math Academy. He records some of his sessions and posts them to YouTube.
Sorry but the approach is too naive and the tech isnt there yet.
You can't make up a couple of conversation topics and expect the LLMs to do the rest by just switching languages. People approach the same topics completely different in different languages. The app looks like someone picked a couple of topics and the rest is "just" ChatGPT advanced voice mode.
And the worst thing is that the LLMs in TTS do not sound native and cannot teach you pronounciation and learning to listen and understand (which is the whole point in having spoken conversation).
And the other way around, the STT will not notice pronounciation mistakes made by the student - so the app cannot tell you: oh, its pronounced like this.
This actually looks pretty neat. How have you been able to achieve such broad language support so quickly?
How widely have you tested your supported languages on native-speakers and learners?
The STT and LLM support many languages out of the box. For TTS we use multiple providers based on their strengths and weaknesses (for example minimax is great for Chinese)
We've done a lot of testing on Spanish, English, Italian, Japanese, and French, but much less on the others and none at all for some of the niche ones.
The language support is based on the intersection of the languages that have low word errors rates in the transcribers, as well as officially supported by LLM/TTS (like gpt4.1, eleven labs etc).
We've seen the models' quality improve consistently over the last 6 months, in all languages we tested, and now the error rates are getting really low.
Right - I think it would be appreciated by your users if you at the very least made it clear from the outset how well different languages are supported and what degree of testing you have done.
Certainly if your product were to mis-teach me important details, and I were to then find out that you had spent less time testing than I had spent learning, I would be quite angry.
Awesome, I was going back and forth with LLMs trying to keep a conversation up. You guys managed to channel those process, I think I will love this app!
I'll try it, but that seems pricy compared to a Duolingo subscription. And while I understand that they are different, will your average lead know that?
how're you handling latency on turn overlaps : buffered stream with early intent cutoff or full duplex with partial decoding?
We transcribe after 400ms of silence in 200ms chunks. 3 voice chunks (VAD) automatically interrupts, unless it's a back channel like "yeah" or "right" or something like that.
Whisper can transcribe in <100ms.
We then wait for the turn detection model, LLM, and tts to trigger a streamed response back to eh client.
This s super interesting! i have been wanting to learn other languages, but it i have been unsatisfied with most mainstream solutions. From what i have seen and for the price, i could see myself giving this a shot!