Hi everyone! New to WeWeb here. Is streaming RPC possible directly to weweb? I’m trying to build an app with google speech to text.
Any thoughts?
Hi everyone! New to WeWeb here. Is streaming RPC possible directly to weweb? I’m trying to build an app with google speech to text.
Any thoughts?
Hi @pablo
I think we’d need to develop a plugin for this. Could you tell us a bit more about your use case?
I also need that function. I run a clinic, I want patient to voice input their symptoms and generate a text for the intake form.
You would probably need a server for this, e.g spin up your own backend. That’s probably why it hasn’t been implemented. But I might be wrong.
I am using xano which can communicate with other API e.g. open ai whisper. The difficult part is how weweb can get audio input and generate a file to save into the backend. And whether it is fast enough.
The best scenario is to embed speech to text function provided by the service so that it has feedback directly in the front end. For English, maybe can try assemblyai, but I also need Cantonese which is not available in assemblyai yet…
Capturing data to a file is straightforward. MediaRecorder in non-streaming mode. YOu hook the events of stop/finish to take the waveform and send it as a file to your backend, which then sends it to your transcription service. Can be done, its a bit of work, expect a bit (measured in seconds) of delay.
Realtime audio transcription is a bit involved. You need to either implement MediaRecorder or a client-side package to wrap that audio capture on the front end in streaming mode. Then, as the javascript gets dataavailable events, your handler should upload that “frame” of video (usually about a second at a time) to your server, which can then send it to your transcription service. The transcription service can send data back to your back-end,which can push it (probably using websockets/realtime) to the front end.
The usual approach to this kind of streaming setup is to have a consistent backend process that is receiving these frames as a stream. Which is very much not how Xano works with either HTTP or realtime. So you’ll probably need to finagle that a bit and/or replace Xano for the purpose of this transcription relay.