Update on the recent server failure issue

Between Saturday the 21st of June 2025, and Monday the 23rd of June 2025, we experienced a failure on our plugins server.

As a result, data fetching for applications using the Airtable plugin, SQL plugin, and REST API plugin with CORS enabled in production was disrupted. This also impacted access to the editor for applications relying on these plugins.

The issue has now been resolved, and the server is back to normal, functioning properly.

The issue went undetected by our monitoring tools because the server was going down and restarting at a high frequency, preventing alerts from being triggered.

The team began investigating the issue on Sunday evening and found the root cause and a solution approximately 10 hours later.

This resolution time was unacceptably long, especially given that it affected production, something that had never happened in our seven-year history. We deeply apologize to all users impacted by this incident.

Regarding the root cause of the problem: there hadn’t been any changes on this service on our end since April 2nd 2025. It appears the problem stemmed from a change made by Supabase, which introduced a new type of error that was thrown en masse to our servers, ultimately causing them to fail.

Here is what we are planning to do so that this doesn’t happen again:

  1. We are updating our server monitoring approach to ensure that instabilities like this can be detected and reported immediately.
  2. We put in place a status page so that our users can track the status of servers at any point in time.
  3. This plugin has proven to be a single point of failure for a significant portion of our users. As a result, we’re planning a major revamp of our backend architecture to provide each user with dedicated computing units. This will ensure that one user’s issue won’t impact others.

Feel free to reply here in thread or to reach out to me at raphael@weweb.io if you need to have more information or discuss about it personally with me.

We deeply apologize for this issue. Please rest assured that we are taking all necessary measures to ensure it does not happen again.

20 Likes

We need a status notifications please by subscription via email or slack so it can help us in monitor.

6 Likes

I also had a bunch of issues with the xano auth plugin over the last few days. Is that related or not?

Thanks for the detailed breakdown of what happened and the steps yous are taking to resolve it. I for one am happy with the steps you’re all taking to ensure this doesn’t happen again (the server status pages were a great touch), and will continue to use WeWeb for app development!

2 Likes

what were the issues?

Good idea, I’ll see if we can do it

1 Like

thanks for your trust @Dan_OOT, much appreciated.

2 Likes

It’s excellent that weweb are taking this seriously. One issue that emerged this weekend and i didnt see a solution is the apparent lack of a weweb team handling critical matters on the weekends. I know its not easy and with higher cost to have someone on weekend shift. But we cant just be stranded like that for almost 3 days.

Hey @tomerer2000 thank you for your feedback.

The root cause of this weekend’s delayed response wasn’t primarily a staffing issue, but rather a critical failure in our detection systems. Our monitoring failed to trigger the proper alerts that would have notified our on-call team about the severity of the situation.

As we have rapidly scaled, we have been receiving an increasing volume of support tickets and issue reports. However, without proper correlation through our detection systems, distinguishing critical issues from isolated problems has proven challenging.

This incident did expose gaps in our monitoring and escalation process which we are taking very seriously. We are now implementing the improvements mentioned.

While we can’t change what happened this weekend, we are committed to ensuring our detection and response systems match the reliability you all deserve.

3 Likes

I understand that this issue could have been fixed by better detection systems but I also believe it could have been fixed or at least work could begin if there was an ability to contact a member of staff. From what I hear from the people having this issue they had 0 method of contacting a single person.

So while yes in this case better detection systems would have helped I think not having 1 person available is not good enough. If an error slips through on an update and isn’t recognised by the weekend a better server monitoring system isn’t going to help. Someone needs to be available.

6 Likes

I agree with this. This causes a lot of unnecessary issues.

1 Like

Users did reach out to us (community, support, e-mail, linkedin, etc.) and we saw the messages over the week-end, we added trackers on the projects with apparent failures to monitor them and figure out where the problem came from. We understood where it came from on Sunday night, tested and the fix went live on Monday morning. We could have been faster if the monitoring system would have pointed out the error straight away, what took time is:

  1. Realizing there was a problem affecting many users (not an isolated issue).
  2. Having to add trackers, monitor the error and figure out where it came from (it was not obvious since nothing changed on our side).

Nevertheless you are right that it could have been resolved faster, but to be clear, for this kind of error, that will only be possible through a better monitoring system.

2 Likes

Ok most of that makes sense. The part I dont understand is why there were multiple users here in the forums mentioning the issue and saying that they had used some of the other streams for communication you mentioned but they received no feedback that anything was being done. They were consistently reporting and saying that they had heard nothing back for over 2 days.

Unfortunately we thought these were isolated issues on Saturday, figured it might not and added the trackers on Sunday morning, on Sunday evening we had enough data to figure out the problem then it took us 10 hours to solve it for good.
We should have been much better at this and will.

my client’s website is suddenly offline - is there another server issue right now?

not that we know of, sending you a DM

From what you are saying it seems that if we have “isolated issues” to our projects that bring our production apps down you will offer no communication for up to 48hours as this is what users reported.

2 Likes

The problem is not as much that there was a problem to solve. It was totally radio silence in the community from wewebs side.

If you saw a problem on Saturday, community should have the message that you are aware of the problem, and working on it.

3 Likes

To be honest, there’s really only 1 solution and change that would actually make us comfortable that you’re serious. And I haven’t seen that so far… An SLA with dedicated response times and resolution times. P0 (critical) = within 1 hour response and resolution within 6 hours. P1 = within 12 hours response and resolution within 24 hours, etc. Normal SaaS things, with full monthly refund for example for all affected customers if you dont comply. Simple. Then I would trust it again.

6 Likes

Morning @Raphael, I don’t know if it’s related or not but my Connection between Supabase and my WeWeb Project is crashing every 30 min. I am working on some data collections and when I fetch, I still get an error.

According to your page status, your connection with supabase is OK and running at 67 ms. But even when I try to reconnect with the plugin, the loading page gets stuck forever and never gets to connect to my project.

A hand here?

Thanks a lot!