Thursday, January 10, 2008

Start of Term & New Release

The Stanford Winter term started Tuesday without Marc Brierley.

What I liked best about Marc was how he really really tried to engage you, and was dogged and admirably pleasant about it. You couldn't even shake him too much by being rude to him. He would always look for another tact. He was really getting the whole software-development picture too. Our team is really damaged right now, and there is a very low grade of functioning as people are trying to cope. I can't really imagine how his wife and parents are doing.

So the Quarter.

We didn't have the login-latency/failure and DB load issues we hit in fall. That's great, but it wasn't worth coming back from skiing for... :) ( wonderful powder in Tahoe after the winds died down.)

Operationally our issues were:
  • We didn't catch that one of the servers wasn't back up after the Holiday Upgrade. This is a 'back office' server which isn't in the end user pool, so it's not in the monitored pool. It's running with a modified Quartz configuration which provides a private Job pool running only on that machine.
  • Some of the Out Of Band (OOB) Course Site administration Quartz jobs which run on that machine were not defined to work in sync with our partners CRON jobs.
  • A return Tuesday night of an occasional lack of SSL setup from our Sakai boxes to the Registry. We failed to pull 5-600 XML Course documents that night. (Yes I built it to pause and retry...) This of course causes a great deal of confusion. Wednesday night we had not a single problem - no code change of course. I've seen SSL setup failures on low-memory conditions and in Tomcat when class loading of the jsse stuff gets mal-sequenced (really, search the tomcat newsgroups). My Operations Guy, Julian Morley and I have not yet been able to reproduce the condition, so we have an out of user-pool production Sakai server running with -Djavax.net.debug=all hoping we get a catch.

End-user issues abound due first to the Sakai UI. Users can't find their Course Sites. Here is a summary:
  • The tabs showing Course Sites are in their usual wonderful semi-chronological order, and W08 Sites are buried under the "more" pulldown. There must be dozens of these a day.
  • Not all instructors have published their Sites ( some don't know how. many don't want to know how. )
  • The student signed up for the Course 10 min ago and the data hasn't flushed through to Sakai yet.
  • Not all Stanford courses are represented in Coursework. They are all present in the Peoplesoft front end, Axcess, and so many folks expect them to be automatically present in Coursework/Sakai... which is a strict no-no from the Stanford instructional staff. (Many want no semblance of any online presence for their efforts. Check your calendars, folks.)
  • Instructors are quickly hired and it takes a few days for their records to flow through the sad slurry of tubes and pumps Stanford uses to send data around.
  • Enterprise network Ids, the SUNetIDs, are in delayed for provisioning.
  • Courses are cross-listed only in the minds of the instructors...
  • Courses IDs are different than what the Student things...
Since this is only the second quarter in Sakai for the vast majority of our users I expect the HelpSU tickets to start falling off rapidly.

There is a stickler - sometimes we still need to kick the Site Info Update Participants button to refresh the Course Site. This is after we can see the user info flow from the Registry, to Sakai's Course Management tables, and the Site Realm getting updated... and still now and again the user does not appear to be in the site.

The biggest problem is the broken Sections / Realms bug, SAK-11320 where Section Aware tools using ACLs in XML lose track of Realm Membership via Section Manager updates. (ACLs in XML? in a column? ah well.) I understand that Michigan goes in and updates the XML itself when there are problems. This effects all sorts of not-even-really-advanced uses of Sakai Sections.

Our CM Sync code cleans up orphaned Realms. I think this causes the problem to become apparent; otherwise only 'new people' will have ACL problems in the Section Aware tools, and I'm not sure how often other Sakai deployments churn their Sections. An easy thing I can do is leave that hunk of code out. This will result in an astounding growth of unused / out of date Realms. I have to think that other big deployments trim these after time.

At least then we won't see very abrupt changes in access rights.

I'm going to go work on that now. In our Stanford 2.4.x branch. Then I'll see if I can get a Sakai 2.5 build going and if the patch is appropriate for the next release.

Then I go to Davos Switzerland! whoo hoo!

No comments: