morleyjoe wrote:Glad to see it's all running fine now. To those who are complaining or are pissed off, would you have preferred to find CC did not do backups at all? Having had to replace my share of data on crashed or dead computers, I think it is amazing to see that they were able to get this backup in place and running so quickly. It could have been far worse. Congrats to the team for their hard work is in order.
I wouldn't go so far as to say that it should even be in the realm of possibility that CC didn't have any[i] backup, so I'm not going to give the admin props for quite that much. But I will second you on the direction if not the magnitude of your sentiment. I think that a 24 hour rollback for a situation that hasn't happened in 3 or 4 years is pretty impressive. I'm surprised and impressed that such a thing was so well prepared for (although I don't know if it was just luck that the last rollback was only 24 hours prior).
I can't believe all the griping in this thread. The single best post so far has been this one:
drunkmonkey wrote:The random outcome of my rolls was lost at a random point, and the new random results are different! It's an outrage!
But that doesn't stop people from having atrociously bad ideas:
CHECK-M8 wrote:All games in progress need to be deleted. That is the only fair way to do it.
Wow. How about thinking next time before you post, okay? Can you imagine the outrage if the games were completed deleted? TOs and their clan counterparts would probably start finding and stabbing people. Not to the mention the thousands of users who would lose entire games rather than just a turn or so. Unbelievable that you would actually suggest this.
TheProwler, I was very interested in your post though:
[spoiler]
[/spoiler]TheProwler wrote:bigWham wrote:In the late evening of Oct 3 (CC Time) one of our core system data tables suffered data loss and could not be recovered.
I find this interesting...
Surely you have (lots of) storage redundancy...you should be able to recover from hardware failure [i]without reverting to a backup.
Was it bad code? Did you implement a change that wasn't properly tested? Is your system documentation lacking and your developer(s) getting overwhelmed?
I'm just curious. Downtime is something that might happen when a disaster occurs. But having to go to a backup? Shit, somebody fucked up badly.bigWham wrote:The only efficient solution was to roll back our entire database to the most recent backup, which happened to be approximately 24 hours before.
I think the word that is screaming at me in that sentence is "efficient".
Because I've designed a number of systems with 100+ tables...and if one the of the "core" tables somehow "suffered data loss", I would expect to be able to recover the vital information from those tables based on their child and parent tables data, and other related tables. Whatever table was lost, you should be able to re-build it with data other tables.
I know there might be some information loss like exact time of turns, but that wouldn't be a big deal. You could look at the physical order to the rows and estimate the time of turns. Obviously I have to speak in general terms because I don't know shit about your design or what table was lost. But you can go to a backup for everything up to the last backup, and then "fix" the data for the time since the last backup.
I guess without going on and on, I think you chose the word "efficient" because you know that there was a better solution with respect to recovering all the turns, but you were either too fuckin' lazy to do the work to take the site down and fix the problem properly, or because you don't understand the data well enough to fix the problem in an acceptable amount of time.
All these pats on the back that people are giving you shouldn't fool you; reverting to a backup is called "Failing".
I have been very interested to read the posts by folks that work in similar industries and their takes on the matter. You seem to be somewhat in the minority here, chalking it up to a complete failure rather than something that can be learned from and improved as we go forward. Nonetheless, I appreciated the informative post from that viewpoint. Makes me slightly reconsider my kudos to the admin. Although, I still think I come down generally supportive and impressed by their handling of this.
Finally, I think the biggest losers here are the forum posters. A lot of those guys are running tourneys, may have taken games out of their Watch This Game screen, are running clan wars, are posting long, informative forum posts, etc. That type of stuff is more of a bitch to redo than just having to take a few turns over again. I hope if the admin have to make a choice that they will put emphasis on keeping a live backup of the forum in the future. (errr ... the "biggest losers" are maybe the people who are groaning about the loss of "their" dice that they "should have" got a second time, but I meant biggest losers in a different sense.)







