Replay Tech Issues

Hi, everyone – I wanted to give you an update on our recent tech issues.

First, I am very sorry that you all had to deal with more problems that kept you from enjoying your games. I am extremely frustrated that you’ve continued to have a subpar experience. It’s not ok, and your concerns are being regularly shared with the team.

Second, I want to be transparent about what has been occurring. We found that there were two points of failure:

  • A tool we’ve been using for testing was not working properly.
  • Code was updated to a third-party library we use, and it caused issues.

How we’re responding:

  • We fixed the problem with the tool.
  • We rolled back to an older version of the library, and we’re waiting for the provider to release a fixed one.
  • We’re ensuring our QA processes are more stringent.
  • We’re vigilantly on the hunt for anything else that may have contributed to this.

To those of you who have submitted emails to support, thank you. When our metrics are showing that things look fine, but we’re continuing to get messages, we can point to those in order to dig deeper.

UPDATE: This thread will be updated as we have more news on these issues. We’ll be removing outdated information, and the most relevant info will be at the end of the thread.

21 Likes

We saw three more periods of lag/freezing yesterday (around 5:30pm, 7:15pm, and 2:30am EDT). Tech is continuing their work on resolving these bursts of performance issues and we will have an update as soon as we have more news to share.

11 Likes

Hello everyone, my name is Rodrigo, and I’m the Technical Manager at Replay.

It’s been a tough week. We have worked diligently over this entire time, checking our systems and connections with external providers to determine the source of these issues.

As of right now, we believe the problem lies in a service hosted at Google Cloud Server, and we are in close contact with them. They need a bit more time to review and debug issues on the messaging system used to connect the game with our servers (hosted and managed by them).

I can only apologize and ask you to bear with us while we keep working on this over the weekend.

23 Likes

Hi Everyone!

I just want to let you know that our Team is still working on the performance issues in our site. You’ve heard this a lot, but please accept our apology for the game interruptions. :frowning: We are refunding tournaments that are affected. The refund usually shows up shortly, but please allow up to 24 hours for the refund to show in your account.

We are hoping to provide an update soon as it gets available.

4 Likes

Hello, Rodrigo, Replay’s Technical manager here. It’s been another long week for us, working closely with our hosting provider to find a solution for the outages.

Unfortunately, we don’t have a fix yet, but we are working with them day and night to fix the faulty messaging system (see my previous post for more information on this). We appreciate your patience as we work to resolve this issue.

And again, I can only apologize for the inconvenience caused to you. There is a light at the road’s end, but we still need to do more work to get there. We will work over the weekend to fix this as quickly as possible.

I hope to provide better news as soon as possible. We are on it.

8 Likes

Hi everyone - I’m here to give just a very brief update. Performance issues returned over the weekend, and we had another incident yesterday morning. Refunds have been issued for all affected periods, and we’ll continue to handle that quickly if technical problems continue to occur. I’m so sorry — we’ve got all hands on this trying to figure out workarounds. Someone from our tech team will share an update in this thread shortly.

8 Likes

Hello all, I’m Vlad, the Technical Lead on Rodrigo’s team. I’m here to share a technical update.

Google Product Team was able to resolve the Pubsub issue on their end and we no longer see that particular issue. Even though things are looking better for the time being, we are working to switch to a more robust Pubsub implementation: the code change is almost complete and we are starting to test it out internally before deploying it in production. Once we finish this move, we will have a more responsive and more stable platform.

Aside from ongoing pubsub issues, we have been experiencing overall networking issues between various components of our architecture. They were the trigger for the outages that we experienced this weekend and Monday morning. We are working with Google to resolve them, as well as on putting in additional measures to minimize their impact when they do happen.

9 Likes