
Otakulypse anyone?
So some of you might have noticed a bit of site interruption over the past few days (read the past few days). This time I did, in fact, break the site. This is as opposed to all the other times when people accuse me of breaking the site, when in fact it’s something completely unrelated to me.
So, what happened? You might be wondering that. Well I’ll tell you. It all began Friday morning. I was at work, doing my work thing. I had just finished patching my work machine for the DST fix. I thought to myself, “self, you should patch the AP server so that the forums will have the correct times”. So away to ssh I went. I ran a yum update, but seeing as I was distracting, being as previously mentioned, at work, I wasn’t paying too much attention. So I ended up doing a full update of every package on the system. It seemed to work fine, so I went on with my day.
Fast forward three hours. I see Deathgod reporting that the emails seem to not be working on the forums. I go to look, and sure enough. I ssh back in and can’t even ping Google.com. Slight panic sets in. I do some snooping and find out that the update broke name resolution. I manage to get that fixed, and then I’m getting an error with the tls encryption that gmail requires. Apparently the update also broke openssl. Slightly more panic sets in. I go to try and do an yum remove for openssl so I can install a previous version, and get an error. Why? Because the yum update broke yum. Now full blown panic sets in. I start trying multiple things to see if I can get it working. Nothing works. It’s 5:40pm and I’m supposed to be at dinner at 6pm and then “300″ afterwards. So I head out. Chatting with Chigo and Batou at dinner, having a large headache because I had broken the emails as well as other things, I realize that because emails were broken, so was registration. Eek! Of course I could have just changed validation to none or admin, but I didn’t think of it. So I call Rangi asking her to change that if she has a chance. She’s not home so I leave a message.
300 kicked ass, and helped me to relax a bit. That and sushi. So I get home about 11:30pm and change the validation. Then I decide to get up in the morning and see what I can find.
I get up in the morning, and give the dog a haircut, have lunch with a friend and change the oil on my motorcycle (Suzuki GSX-R1000). Coming off those successful things I’m feeling confident. So I start working again at like 3pm. I come to realize that there is really no fixing what I broke. I’ve got most of the files backed up locally, and I can get database backups, and the last image (read OS) backup I have is back in August. (Mental note: do image backups more often). So I take backups of all eight databases and then disable the forums. Then I pull down all the files and restore the image.
So now we have a functioning site again, but it thinks that it was last used in August, so I disable the forum again, and start copying all the files back up. Then I start restoring the databases. It works fine for the first couple small databases, but fails on the important ones (read main site and forums). I mess with that for an hour or two and get more and more frustrated. Rukia asks how it’s going and I say “I can’t get this stupid web based database interface to work” and then stop and think. Of course! If the web based interface sucks, just copy the sql scripts up to the site and do it on the command line! So I do that, and find out it’s erroring because the mysql server buffer size is set to 1mb. This is very stupid. So I do some research to figure out how to set that value, get that set, and then get the databases restored. So now the files are pretty much back up, and I’ve got this site working, as well as the forums, wiki, galleries, and the rangiku files, but the main site is failing. So I leave that for a minute and try to confirm that the forum is working. I do a post, no problem. I do a 2nd post, and get an error. Wha!?! Turns out my backups, which were supposed to be complete, left the autoincrement attributes out, so I have to go through every table in every database that uses an incrementing identifier and add it in manually. Then stuff starts working fine. So I’ve got everything working now except for the main site. I can’t figure out why it’s not working, so finally I try running the index.php file from the command line in ssh, and figure out that there was one shortcut file missing from one plugin. I create the link and it starts working, as if by magic. It’s now about 11pm. So the last thing I need to do is fix the template for this site. So about another hour and it’s all back running. Then I just have to get the the remaining 1GB of files copied back up, most of which are the AMV’s from rangi’s site and all the files/logs from our Datte bot for Chigo. So I leave that running overnight.
So that’s what happened. That’s also why theOtaku.com show and our show didn’t come out until Sunday, because I wasn’t going to work on those until the site was fixed.
So it was an issue that wouldn’t have happened excepted I wasn’t paying enough attention while doing an update. And what have I learned? Leave stuff alone if it works and make sure to update very carefully. and most important of all. Before doing a major update as I did, take an image backup first. Had I done that I could have just restored it and lost at the most three hours of forum activity. Well, lesson learned. Sorry for the downtime and hopefully it won’t happen again.
Upgrading Fun
I just finished upgrading the main sites. Boy was that fun. I had been working on updating the Anime Pulse wordpress theme for WP 2.1. I was almost done when they released a new version, so I had to redo a bunch of work. Normally updating themes is simple, but since I’m doing funky stuff with the feed and the sidebar with the javascript boxes and the targeting of the iframe, it means I have to write my own versions of wordpress functions. The functions changed quite a bit between versions 2.0 and 2.1, so I had a lot of work to do. Then I did the upgrade, and nothing worked. I bounced back and forth between the default theme and our theme as I tried to fix things, and got the sidebar working, but not the frame. So I gave up and went to bed, leaving it on the default theme.
That would have been fine except for one thing. I forgot to make sure that the feed still worked. Well, guess what? It didn’t. I got up at 6:30 and saw Deathgod’s post on the forums saying that the feed wasn’t validating and I checked it out. Turns out there was no feed, which would explain a lack of validation
. So I spent an hour rewriting the feed file to work correctly and managed to get that done before I went to work. Then on my way back from Japanese I had a light-bulb. I’m not talking some dinky 100 watt bulb here, I’m talking Sports Arena, Alien Ship floodlight. The main page loop had changed from a “foreach” loop to a “while” loop. But in my tiredness last night I had only changed the loop delaration, but I forgot to change the “endfor” to an “endwhile”. I got back to work and made that quick change and viola, we had a site again. I did a quick upgrade of podpress, and some tweaking and we were back in business. It was much faster upgrading this site, as the theme was done, I just had to tweak it for my site. It makes me wish I was more diligent in setting up a test environment to verify upgrades before rolling it out live. I know that if AP gets bigger then I will have to do that to avoid downtime, but at least this time I had the default theme to fall back on, otherwise I would have had to roll back the upgrade which is always a good time.
Anyways, I thought I’d better check back in as it’d been a while since my last post. I’m working on an advertising deal through Kiptronic that should come through in the next couple months. Nothing huge, but it could be lucrative for us. A host’s work is never done. ![]()