00:21.06 | *** join/#ldstech TimRiker (~timr@bzflag/projectlead/TimRiker) |
00:21.06 | *** mode/#ldstech [+o TimRiker] by ChanServ |
00:21.32 | TimRiker | scgallafent, any update? should we try the windows solution? ie: rebooting the server? |
00:22.21 | TimRiker | hmm. still getting the out-of-service page sometimes, and load above 30. |
00:22.26 | TimRiker | going to try a reboot. |
00:42.44 | scgallafent | TimRiker, just got back. |
00:43.05 | TimRiker | rebooted, no effect. shutting down wiki again. what have you learned? |
00:43.15 | scgallafent | Looks like the reboot didn't solve it. |
00:43.25 | scgallafent | It looks like the wiki is the problem. |
00:43.42 | scgallafent | I found that about half the wiki execute time was in the parser, but wasn't able to pin down exactly where. |
00:43.56 | scgallafent | I'm going to tweak the wiki out of service redirect ... hold on. |
00:44.13 | TimRiker | it's back in place now... |
00:44.44 | scgallafent | How long has the wiki been OOS? |
00:45.31 | TimRiker | just moved it... |
00:45.49 | TimRiker | see comment above. 3 minutes? |
00:46.53 | scgallafent | OK. Load was still pretty high. |
00:47.09 | scgallafent | I'm setting the wiki so you and I can hit it but anyone else gets sent to out of service. |
00:48.59 | scgallafent | OK. If you're signed in, you should be able to hit https://tech.lds.org/wiki/ |
00:49.21 | scgallafent | Load is back down below 3. |
00:50.50 | scgallafent | I figure we're down to one of a handful of issues: |
00:51.00 | scgallafent | * Change in the VM environment slowing things down |
00:51.17 | scgallafent | * Some kind of server issue (disk space / database corruption / etc) |
00:51.19 | scgallafent | * Content |
00:51.40 | scgallafent | There were some edits earlier today that I'm going to roll back. |
00:55.19 | scgallafent | :( No better |
00:58.05 | scgallafent | runs back down the wiki rabbit hole |
01:01.26 | TimRiker | cpitt's text counting patches? |
01:04.29 | scgallafent | I tried disabling that earlier and didn't see any effect. That should only affect page updates, not reads. |
01:04.47 | scgallafent | Just added my timing code back to the wiki index page. |
01:05.43 | scgallafent | 34.69s to process the wiki index page. Woohoo! |
01:06.10 | scgallafent | Disabling extensions again. |
01:11.44 | scgallafent | TimRiker, vastool/vasd were at 90%+ a second ago. Any reason they would be working that hard? |
01:12.53 | TimRiker | I think that's normal for them, as in I've seen them do it before. |
01:17.45 | TimRiker | switching computers.... brb |
01:41.14 | *** join/#ldstech TimRiker (~TimRiker@bzflag/projectlead/TimRiker) |
01:41.14 | *** mode/#ldstech [+o TimRiker] by ChanServ |
02:48.08 | *** join/#ldstech scgallafent- (scgallafen@conference/ldstech/x-vhjaxdryhlpgrqma) |
02:48.08 | *** mode/#ldstech [+o scgallafent-] by ChanServ |
03:08.36 | *** join/#ldstech TimRiker (~timr@bzflag/projectlead/TimRiker) |
03:08.37 | *** mode/#ldstech [+o TimRiker] by ChanServ |
04:59.59 | *** join/#ldstech scgallafent (~scgallafe@c-67-160-61-229.hsd1.wa.comcast.net) |
04:59.59 | *** mode/#ldstech [+o scgallafent] by ChanServ |
05:04.07 | *** join/#ldstech TimRiker (~timr@bzflag/projectlead/TimRiker) |
05:04.08 | *** mode/#ldstech [+o TimRiker] by ChanServ |
05:05.10 | TimRiker | scgallafent, got a test from smootar saying the wiki is down. :/ |
05:05.32 | scgallafent | Yes. I've been trading texts with him. Wiki is up now, but slow. Side effect is that the forum is down. |
05:05.33 | TimRiker | tried copying the db, renaming the old one, putting in the new one, same result. |
05:05.51 | TimRiker | wiki is back down again. |
05:05.53 | scgallafent | I spent quite a bit of time profiling the code. It's just generally slow. |
05:05.59 | scgallafent | What does load look like? |
05:06.03 | TimRiker | I moved your change to Localsettings and re-enabled it. |
05:06.35 | scgallafent | Going to try reconnecting to VPN. I may disappear for a minute. |
05:06.53 | scgallafent | Am I still here? |
05:06.57 | TimRiker | load was just over 30 and nothing was getting done. |
05:07.04 | TimRiker | yep you're still here. :) |
05:07.12 | scgallafent | I think we take the wiki down and leave it until someone can look at the VM. |
05:07.14 | TimRiker | smootar hopes to drop in later. |
05:07.36 | TimRiker | I got a voice mail from a tech. he says the vm looks fine. |
05:07.54 | scgallafent | Can't reach the server. Messing with VPN settings again. |
05:07.55 | TimRiker | heavy load, but normal. |
05:09.31 | *** join/#ldstech scgallafent- (scgallafen@conference/ldstech/x-ekscqtvyfenxiqor) |
05:09.32 | *** mode/#ldstech [+o scgallafent-] by ChanServ |
05:09.52 | scgallafent- | Now I think I've got everything connected. |
05:09.58 | scgallafent- | bemoans VPN again |
05:10.46 | scgallafent | I did some testing earlier on the wiki code. At one point I had a section that was pretty "clean" (basically branches and variable updates) that took 0.5s to complete. |
05:10.49 | scgallafent | Something isn't right. |
05:11.26 | scgallafent | Wiki is showing out of service because the code in LocalSettings isn't commented. |
05:11.33 | scgallafent | Do we want it disabled right now? |
05:15.43 | TimRiker | <PROTECTED> |
05:16.02 | TimRiker | scgallafent, yes. the rest of the site dies if we enable it. |
05:16.13 | TimRiker | I'd rather have the wiki down than the whole site. |
05:16.14 | scgallafent | OK. Wasn't sure if you had intentionally disabled it. |
05:16.19 | scgallafent | Agreed. |
05:16.41 | scgallafent | Forums are still showing server too busy. |
05:16.54 | scgallafent | I'm not sure what method vBulletin uses to determine "too busy." |
05:16.56 | scgallafent | looks |
05:17.02 | TimRiker | load at 6 |
05:17.20 | TimRiker | don't know what time period vbulletin looks at. |
05:18.00 | scgallafent | dlhace should be in at some point and we can have a party on the server |
05:18.40 | TimRiker | yea |
05:19.40 | scgallafent | I left the word count extension disabled on the wiki. Everything else is enabled. |
05:19.53 | scgallafent | I tried disabling different extensions with no visible improvement. |
05:21.00 | TimRiker | nods |
05:21.32 | scgallafent | Seeing anything in your database file? |
05:22.02 | scgallafent | The only thing with the database that raises my eyebrows is that there were some changes earlier this morning with Chinese work. |
05:22.18 | scgallafent | The time frame is pretty close to when you started getting errors. |
05:22.41 | TimRiker | nope. as the chinese edit was around the time things started, I'm trying a conversion from latin1 to utf8 |
05:22.58 | scgallafent | That's not a task for the faint at heart. |
05:23.14 | scgallafent | There haven't been many edits since then. Maybe we should try a backup from just before the edits. |
05:23.32 | TimRiker | hmm. that's a good idea.. |
05:23.46 | scgallafent | Give me a minute and I'll grab the content off the page. |
05:24.00 | TimRiker | I just made wikidb_orig as a copy of the bad one. |
05:24.43 | scgallafent | I've got the one Chinese page on screen. Go ahead and roll back the db. |
05:25.42 | TimRiker | rolling back to the 0446 backup |
05:26.08 | scgallafent | Looks like the first edit was at 04:59, so just about perfect. |
05:26.50 | scgallafent | wonders if we really need the Community Project Handbook page in Chinese |
05:36.08 | TimRiker | well, if it didn't break things, then sure. :) |
05:36.35 | scgallafent | The jury is still out on that. |
05:37.04 | scgallafent | Waiting for mysql restore is two steps up from agonizing. |
05:42.07 | TimRiker | hehe |
05:42.28 | TimRiker | fires up wizard101 to pass the time |
05:43.55 | scgallafent | Looks like it may be finished |
05:44.25 | scgallafent | pokes TimRiker to check |
05:46.18 | *** join/#ldstech TimRiker (~timr@bzflag/projectlead/TimRiker) |
05:46.18 | *** mode/#ldstech [+o TimRiker] by ChanServ |
05:46.34 | TimRiker | hmm. xchat crashed |
05:46.51 | scgallafent | Looks like the restore may be finished |
05:46.56 | TimRiker | "restore" finished. how are things? |
05:47.04 | scgallafent | Checking.... |
05:48.00 | TimRiker | still a lot more cpu usage than I'd expect. :( |
05:48.20 | scgallafent | Page load is still ugly. 44.8 seconds. |
05:48.26 | TimRiker | ugh |
05:48.42 | scgallafent | I'm going to start bleeding where I'm scratching my head. |
05:49.07 | TimRiker | heh |
05:49.07 | scgallafent | dlhace asked if caching was running. I see memcached in the process list. Should something else be running? |
05:50.06 | TimRiker | restored the apache client limit. |
05:50.20 | TimRiker | yes, memcached does appear to be running ok. |
05:50.45 | TimRiker | listening ok, entered in localsettings, etc. |
05:51.28 | TimRiker | we were hanging with timeouts when it was down. that's back when the server was first setup. |
05:51.41 | TimRiker | I did try restarting memcached too. no effect. |
05:52.22 | scgallafent | I'm baffled. |
05:52.30 | scgallafent | Would something have been upgraded automatically? |
05:52.44 | TimRiker | nope. it would show up in the logs. |
05:53.05 | TimRiker | I manually applied a few updates earlier today to see if they helped. no change. look for yum in the logs. |
05:54.02 | scgallafent | brb |
05:55.23 | scgallafent | back |
05:55.48 | scgallafent | Code doesn't appear to have changed. Database has been rolled back. |
05:55.51 | scgallafent | What are we missing? |
05:56.06 | scgallafent | Is there an external service we depend on that is running slow? |
06:01.37 | TimRiker | could be, but why just the wiki? we use ldap, but not every call, should be in session and just used on login. |
06:02.35 | scgallafent | I don't think we even hit LDAP on login. We added a WAM plugin with the upgrade. |
06:03.27 | TimRiker | hmm. yep. |
06:03.39 | TimRiker | my head hurts |
06:05.10 | scgallafent | Forum is happy with wiki disabled. I really don't want to go back down the MediaWiki rabbit hole. |
06:06.30 | TimRiker | yeah. I'm still thinking, and may try a utf8 conversion, but I don't really have high hopes. |
06:06.54 | scgallafent | I'm still stuck trying to come up with an idea on what could have changed. |
06:07.02 | TimRiker | me too |
06:07.28 | scgallafent | There was a core network change, but that was overnight Thursday/Friday. |
06:12.36 | TimRiker | right |
06:15.20 | scgallafent | CPU usage is still higher than I would expect. Four httpd processes above 10% seems high. |
06:17.43 | TimRiker | nods |
06:18.05 | scgallafent | Just tested /ldshelp. Page loads there are marginal (1.5 to 4 seconds). |
06:24.08 | scgallafent | Did your UTF-8 rewrite finish on the .sql file? |
06:26.21 | TimRiker | not yet |
06:52.25 | scgallafent | Next thought: |
06:52.48 | scgallafent | The wiki is probably getting the worst of the slowdown because it's arguably the most complex code we've got. |
06:53.14 | scgallafent | With the wiki disabled, the httpd processes are still overloaded. |
06:53.31 | scgallafent | They're not getting that much traffic right now. |
06:53.36 | scgallafent | Why so busy? |
08:53.51 | *** join/#ldstech mailman0 (~mailman0@ip72-201-159-66.ph.ph.cox.net) |
09:02.07 | *** join/#ldstech Spennig (~Spennig@pool-108-11-215-19.hrbgpa.fios.verizon.net) |
16:25.38 | *** join/#ldstech scgallafent (scgallafen@conference/ldstech/x-hogbjncwropaviyz) |
16:25.39 | *** mode/#ldstech [+o scgallafent] by ChanServ |
20:21.52 | *** join/#ldstech Spennig (~Spennig@pool-108-11-215-19.hrbgpa.fios.verizon.net) |
20:22.21 | *** part/#ldstech Spennig (~Spennig@pool-108-11-215-19.hrbgpa.fios.verizon.net) |