Tuesday, November 23, 2010

running Sakai3 Ruby tests against remote hosts

The Sakai 3 checkout comes with a nice set of Ruby test coverage. Reviewing tests is a good way to learn the landscape of a given system, which is why I'm poking around with this Ruby and Selenium stuff.

All the Ruby tests can be run via a utility script: tools/runalltests.rb

The tests run on localhost.

If you want to run the existing tests against a remote host you can manually modify the Ruby testing setup script to point at your target host. You'll get some errors, but later for that ;)

Or you can use this script to modify the testing framework, temporarily pointing it at an arbitrary host and then run all the tests. When it's done it returns the framework to the original state.

# runall_elsewhere.sh
# edit test.rb to point to a remote server
# and then via tools/runalltests.rb run all tests
# found in:
#   SlingRuby/kerns
#   SlingRuby/tests
# caseyd@czwxllc.com

# make sure we have an argument

if [ $# -ne 1 ]
 echo "Usage: $0 TESTHOST_URL"
 exit 1


# some variables
# NAK_HOME needs to be set to your checkout/workarea

# the current runalltests script in NAK_HOME/tools expects to be run
#     from the SlingRuby directory

pushd $TEST_HOME

cp ./lib/sling/test.rb ./lib/sling/test.rb.bak

# in-place editing fun
sed -i ''  "s!Sling\.new()!Sling\.new('${TARGET_SERVER}')!" ./lib/sling/test.rb

# stash the result away for later inspection, or not.
cp ./lib/sling/test.rb /tmp/test.rb.EDIT

# run all the tests in TEST_HOME/kerns and TEST_HOME/tests
#  except now the test setup will point at your TARGET_SERVER


# repair changes to the testing framework
mv -f ./lib/sling/test.rb.bak ./lib/sling/test.rb

# go back to where ever you were

Remote targeting can be useful for quick A/B comparisons.

I suppose this retargeting idea could be built into the current runalltests.rb script.
What I didn't see was a way to easily mongle server retargeting into the current test framework.

You will experience errors.
Tests were not written with this kind of use in mind, so you will get funky race conditions and conflicting states. Also some of the tests are hardcoded to use localhost, so they'll have trouble.

If the various ID generators were slightly tweaked to generate more unique IDs, and the use of localhost removed, these tests would run against arbitrary hosts.

You could use this technique to crowdsource load against a common target server. You and your evil pals would fire up a flock of shells on your local machines and point the test suite at the target server.

Localhost tests:


OK time for me to cook a stack of gluten free pancakes for my crew!

Saturday, November 20, 2010

it is not alive.

After all that exploratory dorking around I've created a tiny patch for sling.rb, in sakai3's testscripts/RubySling/lib/sling area, which adds an "is_Ready()" method.

The method returns false if
  1. there is no connection to the desired server
  2. if all of the sakai-nakamura bundles are not up and Active
which should be sufficient for helping automate the Ruby tests in a CI server context.


once you have that mod made you can cook up a little tester like this
#!/usr/bin/env ruby
# test to see if sakai 3 is up and all
#  sakai bundles are loaded

# assume we're running from 3AKHOME/testscripts/SlingRuby

$LOAD_PATH << './lib'

require 'sling/sling'
include SlingInterface

@s = Sling.new()
# at this point we are logged in, yes as admin
# ( it's arguable that the constructor should bailfail if S3 isn't up. )

true == @s.is_Ready() ? exit(0) : exit(1)

and use it in your testing shell scripts so they won't run if the server isn't up or isn't accessible.

One thing that struck me during all this was that if the RubySling tests could be run by a group of people against a common remote host you would have a quick way to produce additional 'load' during a bug bash. Granted it would be just-less-than-senseless load. Suitably empowered participants would just fire up the tests in a mess of shells on stray machines and let them mumble around in the background while doing ad-hoc bashing.

Looks like just a little tweaking to the tests and their setup. hmm.

Wednesday, November 17, 2010

is it alive II

I need a hammer for pulling bundle state info as Sakai3 comes up:
. ~/.resty/resty
resty http://santoku.local:8080
for  namei in santoku_three_{1..20}
 echo "getting $namei at one second intervals"
 GET /system/console/bundles/.json -u admin:admin | pp > /tmp/lifecycle$namei &
 sleep 1
then I examined the files 'by hand' for differences. This black box approach is good enough to learn how the state changes. ( of course comments are welcome! )

So on my test box sakai3 comes up and is fully alive in about 12-odd seconds.

The bundle states go from Installed to Resolved to Active.

So testing scripts can start up once all the bundles are Active.

Starting in with the Ruby scripts found in testscripts/SlingRuby I follow the README.txt's instructions for updating my build of Ruby. To get the top-level "is your environment ready" scripts to run I have to modify them a bit. They didn't load the supplied ./lib utilities so I added

$LOAD_PATH << './lib'

This may be due to some kind of Rubistia envronment setting I'm missing. The "is your environment ready" scripts were testing localhost, which I'm not, so I modified the line that looks like it creates a new Sling bot:

@s = Sling.new("http://santoku.local:8080/")

and blammo:
bash-3.2$ ./create-user.rb testuser
User: testuser (pass: testuser)

As the main body of the Ruby test scripts are run quite often I wonder what the heck is going on here. 

IRC trip to #sakai to ask whether or not this load path stumble I had is due to my environment missing some settings ruby practitioners commonly use. It looks like these little scripts were just left out of a large sweep which reworked issues like this for the main body of Ruby tests.

It would be better if the README.txt and the "is your environment ready" scripts just worked OOTB, esp as more newbs like me come on-line, but that's a little thing in the pace of the project. I'll file a JIRA for the library path problems.

These little scripts are enough for me to see how to plug in the "is it alive" bit. Onward!

is it alive?

Watched Mel Brook's Young Frankenstein with my kiddo over the weekend: "It's ALIVE!" is now our catchphrase for, well, you know whenever a 12 year old boy wants to yell something.

How do I know that my Sakai3 has come up and is ready for being poked with a sharp stick? In IRC I chatted for a moment with stuartf about this, and eventually wrote up KERN-1868 to put that sharp stick in the sand.

With the Sakai deployments at Stanford I cooked up a mess of lightweight web tests which could be resolved from the command line. These exercised various points of the service stack culminating in heartbeat test: IT'S ALIVE! (Stanford's Julian Morley has taken these tests and greatly improved them, weaving them through the load balancer's tests.)

Personally being a blunt instrument I'm going to get the JSON used in the Sling Web Console console/bundles report, parse it for Sakai related bundles ( somehow ) status things, and provide a FAIL of some sort if all the Sakai bundles are not up. I think it's clear that this is rather blunt - it will undoubtedly come to pass that some Sakai categorized bundles are in development, or are being A/B'ed and so won't be active, but this is an OK place to start. I'll presume that the Categorization is sufficient for now.

Speaking of starting I am a lazy programmer so I'm going to not get all mightly curly as I explore this approach, but start off by using resty to make my requests.

Resty is a curl wrapper which exposes a set of commands into your shell, allowing you to REST at ease. Here's an example:

 $ . resty
 $ resty http://3akai.sakaiproject.org
 $ GET /system/console/bundles/.json --basic -u"admin:admin"
{"status":"Bundle information: 127 bundles in total - all 127 bundles active.","s":[127,126,1,0,0],"data":[{"id":0,"name":"System Bundle","fragment":false,"stateRaw":32,"state":"Active","version":"2.0.4","symbolicName":"org.apache.felix.framework","category":""},{"id":81,"name":"Apache Aries JMX API","fragment":false,"stateRaw":32,"state":"Active","version":"0.1.0.incubating","symbolicName":"org.apache.aries.jmx.api","category":""},{"id":85,"name":"Apache Aries JMX Core","fragment":false,"stateRaw":32,"state":"Active","version":"0.1.0.incubating","symbolicName":"org.apache.aries.jmx.core","category":""},{"id":68,"name":"Apache Commons IO Bundle","fragment":false,"stateRaw":32,"state":"Active","version":"1.4","symbolicName":"org.apache.commons.io","category":""},{"id":18,"name":"Apache Derby 10.5","fragment":false,"stateRaw":32,"state":"Active","version":"10.5.3000000.802917","symbolicName":"derby","category":""},{"id":111,"name":"Apache Felix Bundle Repository","fragment":false,"stateRaw":32,"state":"Active","version":"1.6.4","symbolicName":"org.apache.felix.bundlerepository","category":""},{"id":45,"name":"Apache Felix Configuration Admin Service","fragment":false,"stateRaw":32,"state":"Active","version":"1.2.4","symbolicName":"org.apache.felix.configadmin","category":"osgi"},{"id":47,"name":"Apache Felix Declarative Services","fragment":false,"stateRaw":32,"state":"Active","version":"1.6.0","symbolicName":"org.apache.felix.scr","category":""},
it has a buddy, pp, which is a perl pretty-printer one liner.

$ curl https://github.com/micha/resty/raw/master/pp > /usr/local/bin/pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0    88    0    88    0     0     83      0 --:--:--  0:00:01 --:--:--   423
$ more /usr/local/bin/pp
perl -0007 -MJSON -ne'print to_json(from_json($_, {allow_nonref=>1}),{pretty=>1})."\n"'
$ chmod +x /usr/local/bin/pp
which makes things a bit easier for me to read:
Octo:~ casey$ GET /system/console/bundles/.json --basic -u"admin:admin" | pp
   "status" : "Bundle information: 127 bundles in total - all 127 bundles active.",
   "data" : [
         "stateRaw" : 32,
         "version" : "2.0.4",
         "fragment" : false,
         "name" : "System Bundle",
         "symbolicName" : "org.apache.felix.framework",
         "state" : "Active",
         "id" : 0,
         "category" : ""
         "stateRaw" : 32,
         "version" : "0.1.0.incubating",
         "fragment" : false,
         "name" : "Apache Aries JMX API",
         "symbolicName" : "org.apache.aries.jmx.api",
         "state" : "Active",
         "id" : 81,
         "category" : ""
What I can now do is GET and PUTS from my command line, with the shell preserving connection info and mongling up the real curl call.

This means I'm ready to rapidly poke at the Console's Bundles page and see what's happily loaded. I'll have to do a lifecycle test to see just how the states change as Sling comes up, and probably have to break something to see what a broken Sakai 3 looks like. Perhaps a stubbly broken bundle. hmm.

Onward! From what I've seen most of the Sakai test framework is harnessed up in Ruby, so to make this quickly useful I'll learn Ruby's JSON parsing next. (I've also been looking into jsawk, which is pretty neat.)

Then back to the 3elenium Grid track.


https://github.com/micha/resty#readme and be sure to watch the movie!

Friday, November 12, 2010

that 'headless'' firefox - not this way

Yesterday I mentioned that when distributing Selenium RC workers about the shop I stumbled across a configuration where a RC ran Firefox w/out a display:
[java] 16:43:43.770 INFO - creating new remote session
[java] 16:43:43.771 INFO - Allocated session f01c7e0b83304b429bbff24ea5710742 for http://3akai.sakaiproject.org/dev/index.html/dev/, launching...
[java] 16:43:43.906 INFO - Preparing Firefox profile...
[java] Wed Nov 10 16:43:43 kitchen.local firefox-bin[3537] : kCGErrorFailure: Set a breakpoint @ CGErrorBreakpoint() to catch errors as they are logged.
[java] _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
[java] Wed Nov 10 16:43:44 kitchen.local firefox-bin[3537] : Window Server is not available.
[java] Wed Nov 10 16:43:44 kitchen.local firefox-bin[3537] : Window Server is not available.
[java] Wed Nov 10 16:43:44 kitchen.local firefox-bin[3539] : kCGErrorFailure: Set a breakpoint @ CGErrorBreakpoint() to catch errors as they are logged.
[java] _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
[java] Wed Nov 10 16:43:44 kitchen.local firefox-bin[3539] : Window Server is not available.
[java] Wed Nov 10 16:43:44 kitchen.local firefox-bin[3539] : Window Server is not available.
[java] 16:43:47.026 INFO - Launching Firefox...
[java] 16:43:49.873 INFO - Got result: OK,f01c7e0b83304b429bbff24ea5710742 on session f01c7e0b83304b429bbff24ea5710742
[java] 16:43:49.884 INFO - Command request: setTimeout[300000, ] on session f01c7e0b83304b429bbff24ea5710742
[java] 16:43:49.902 INFO - Got result: OK on session f01c7e0b83304b429bbff24ea5710742
[java] 16:43:49.909 INFO - Command request: setSpeed[1000, ] on session f01c7e0b83304b429bbff24ea5710742
[java] 16:43:49.909 INFO - Got result: OK on session f01c7e0b83304b429bbff24ea5710742
[java] 16:43:49.916 INFO - Command request: open[/dev/index.html, ] on session f01c7e0b83304b429bbff24ea5710742
[java] 16:44:05.704 INFO - Checking connection to hub...
[java] 16:44:05.704 INFO - Ping Hub at http://octo.local:4444/heartbeat?host=
[java] 16:44:10.416 INFO - Got result: OK on session f01c7e0b83304b429bbff24ea5710742
The test continues at a slightly faster pace, logging into Sakai 3. At first I thought this was totally cool, and might of been a way to increase the number of workers which could be run on a single box.  Selenium RC launches each Firefox with an individual anonymous profile, so running a few simultaneously isn't going to hurt anything. And it might be, with caution.

What's happening here is that I started the RC as a user different than the user who is, in Apple's terms, the 'console user.' That's the userid who's in charge of the windows on that machine - take a look at the permissions on /dev/console and you'll see what I mean.

Firefox, or at least the version on that box, must be using the OS X global window server service at launch so it can lurch into this odd state and continue running w/out displaying any UI.

Apple gives a warning about this, and infact it looks like the recent OS X Server has proceeded down that path.

There are XUL based headless Firefox builds available and they may be much more useful.

OK so back to the meat of this :)

Thursday, November 11, 2010

3elenium Grid onward

So unpack the Selenium Grid tar file and start er up. no problem.

Basic Java demos for Selenium Grid ran fine; nice. I always found it a compliment when folks said my stuff "just works." This griddage just works.

Since my dorky login failure spike is in Ruby, I decide to give the Ruby examples a whirl. This is where not knowing Ruby really hurt!

I started off by followed Ruby instructions from selenium-grid but I don't think the grid examples have been keeping up with changes in the Ruby world.

I found I had to
* change the yml configuration with paths to apps on my OS X machine
* to run the ruby tests add gem install rspec
* change the example/ruby/Rakefile to use rspec > then default 1.1.8
* gem install deep_test, modify path to pull it in, change Rakefile to >=
* find out that the spectask no longer exists, switch over to rspec/core/rake_task
* upgrade version requirement for selenium-client to >1.2.7
* etcetera

but eventually stopped - I don't know enough Ruby and Ruby culture to figure out what is going on with the un-updated dependencies. Something called Spec is / was in the middle of being converted to RSpec, or was converted, and something called deep_test is still deep_test_pre? or what? this is noising up my spike.

I realized that, tho, I can brute force the Ruby test by running multiple Ruby processes against the server, and have 4 registered RC processes. That should show how the Ruby jobs are doled out to the RC workers. Not very useful in a 'share the love' kinda way but it'll help me see what other kinks there.

It only takes a moment to open a mess of shells and script up some parallel testing against a local Sakai3 instance, fine, a login in 11 secs on average, and a one line URL change to send a mess of logins to sakai3.sakaiproject.org. Ohh Kay. About 18 seconds on average.

Next to distribute this run-kit to other machines around here and preform the test again. Works well, quite nice. I make a set of assumptions about the location of browsers on various machines which mostly work out.

Mostly in that this leads to an inadvertent creation of a 'headless' Firefox instance on remote machines. I'll have to describe that tomorrow - this may be useful. The central idea is that the user running an OS X Window Server process is the only one, other than root, who can launch new windows. Give it a whirl from your OS X command line. So when Selenium RC wants to launch a new Firefox it needs to be running as the user who is running the windowing system. Mostly. Sorta.

What I get is a non-window-displaying Firefox which still navigates Sakai3. cool; it runs the test a couple of seconds faster.

Tuesday, November 9, 2010

3elenium Grid?

Hi, this is a worklog of a spike into using Selenium, Selenium RC and Grid for distributed load testing. Yes I know that "real load" is best generated in other ways, and that'll happen too. Given how easy Selenium is reported to be I thought it's worth the experiment.

My goals
  • asses how well Selenium may work with the current 3akai interface.
  • see if repeatable scenarios based on the 6k users generated by the kern-1306, and assorted content, can be distributed on an ad-hoc basis for peer use during 'free form' bug bashes.

some links
Selenium http://seleniumhq.org/ 
a FAQ on Selenium Grid: http://selenium-grid.seleniumhq.org/faq.html 
a spin off and its quickstart guide: http://saucelabs.com/docs/quickstart
The Sakai 3 demo build: http://3akai.sakaiproject.org/dev/index.html
The official QA box: http://sakai3-demo.uits.indiana.edu:8080/dev/index.html

worklog -

fire up nakamura head on Santoku, using run_production.sh
download selenium IDE and install into my firefox
start recording a simple test: failed login on my local HEAD build.
after a fencepost error of some kind of the first run the test executes nicely.
however to my joy running the test against the canonical QA server fails.
what gives?

local server login fail, which is a successful Selenium test (firebug screenshot):

same test against Canonical QA server, and fail, but the Selenium test also fails:

OK so what's the difference in those strings? Grr. nothing a restart of the IDE didn't, for some reason, fix. delightful mystery for another day. After a fair amount of bashing I found I could trigger this state by switching tabs or clicking in another browser window, and after doing that restarting the IDE was the only way to recover.

However if you don't trigger this condition it's really easy in the IDE to switch between different domains ( such as my local build, the Indiana build and the sakaiproject demo build.

My take away on this is that I have to be pretty careful when running multiple domain tests in the IDE. I'll give a run in a simpler machine setup at some point.

Stability and Reproducibility via external scripts?

Let's see about this Selenium RC server. The idea here is to have a control box somewhere driving browsers on other boxes to perform the tests. The remote control script may be from the IDE ( or hand written, natch ) and loaded up into the remote control box. This box then connects to the Selenium RC process running on the worker-boxes.

For this quick spike I picked Ruby from among the plentiful IDE options. I picked Ruby because I found a fair number of Ruby scripts in the Sakai3 testscripts directory and thought it would be best to support whatever previous choices led to that. Another benefit - I don't know Ruby so I might as well learn by fire!

I had two central wrinkles. The first was in modifying the generated Ruby code (you have several options from the Selenium IDE. ) to allow it to find a browser on OS X:

    @selenium = Selenium::Client::Driver.new \
      :host => "localhost",
      :port => 4444,
      :browser => "*googlechrome /Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
      :url => "http://sakai3-demo.uits.indiana.edu:8080/dev/",
      :timeout_in_second => 60
I also chose Chrome just for the heck of it.

When I run this simple login test at full speed it always fails. The IDE has a slider which you can use to slow down the interaction, but the setting you dial-in on does not get exported. The test failure is most likely due to the client-side JS overhead time. So after some poking around I found a doc for the Ruby Selenium client and added this line:


to the setup method/function/body.

Then it was time to start up the RC ( remote control ) server. I just followed the directions and set it up on this dev box. For this spike I just fired off java on the command line, passing in the location of the server jar file. Later I'll set up a couple more remotes in CZWX HQ and do some parallel testing, but this is fine for today.

Starting the  Ruby script...
bash-3.2$ ./loginTestCase.rb
Loaded suite ./loginTestCase
Finished in 19.757175 seconds.

1 tests, 2 assertions, 0 failures, 0 errors
and reviewing the Selenium RC log...
13:44:36.603 INFO - Command request: setSpeed[1000, ] on session null
13:44:36.603 INFO - Got result: OK on session null
13:44:36.617 INFO - Command request: getNewBrowserSession[*googlechrome /Applications/Google Chrome.app/Contents/MacOS/Google Chrome, http://sakai3-demo.uits.indiana.edu:8080/dev/, , ] on session null
13:44:36.617 INFO - creating new remote session
13:44:36.618 INFO - Allocated session 540c35ae9cbd4d18a611d524aefabe3e for http://sakai3-demo.uits.indiana.edu:8080/dev/, launching...
13:44:36.618 INFO - Launching Google Chrome...
13:44:40.585 INFO - Got result: OK,540c35ae9cbd4d18a611d524aefabe3e on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:40.588 INFO - Command request: setTimeout[300000, ] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:41.594 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:41.597 INFO - Command request: open[/dev/index.html, ] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:50.797 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:50.799 INFO - Command request: waitForPageToLoad[300000, ] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:51.819 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:51.822 INFO - Command request: type[username, CaseyD] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:52.846 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:52.849 INFO - Command request: type[password, wooga] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:53.855 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:53.858 INFO - Command request: click[loginbutton, ] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:54.868 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:54.871 INFO - Command request: isTextPresent[The username or password you entered is incorrect!, ] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:55.880 INFO - Got result: OK,true on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:55.883 INFO - Command request: testComplete[, ] on session 540c35ae9cbd4d18a611d524aefabe3e
13:44:55.883 INFO - Killing Google Chrome...
13:44:56.355 WARN - Google Chrome seems to have ended on its own.
13:44:56.356 INFO - Got result: OK on session 540c35ae9cbd4d18a611d524aefabe3e
and everything is cool.

At this point I like the fact that the test is running through an actual accursed browser, with all the foibles which can be introduced on the client side. It also allows me to create interaction timing scenarios similar to what real users will experience.

I'm also struck by the fact that it's fragile as hell. Because it's driving the UI :)

Here's a movie - it's astoundingly google-compressed at full screen so you'll probably want to just run it in here ;)


So that's nice. And I'm sure I can push this a bit forward into more comprehensive tests via recording and coding. Notice the timestamps in the logs. I'll have to look into what reporting is provided by Selenium

To get some real Selenium load on the Sakai3 I'll have to find some collaborators. A minimal configuration would be folks running a set of scripts during bug bashes on some spare machines. A step up would be a coordinated grid of spare machines.

Grid setup?

KERN-1306 provided a set of tools to populate a Sakai 3 instance with a range of userids, tags, and a big set of content. The goal is to stuff a mess of users in and build tests against a loaded system.

The result is a set of users with random login ids with random system dictionary tags and a big pool of messages between them. Oh and megabytes of data in the repository. Chunky monkey!

Gridding the Selenium tests using the 6k users won't be possible unless there is a canonical set of netids. Currently practices don't create a 'the' of users, instead creating a different cohort each reload. That can be managed as long as accompanying automated test scripts take steps to do so.

A possible effort would be to have someone generate the canonical set, ( users, tags, file IDs) store it somewhere on the net and have the gridded tests pull the canonical user sets down. This allows repeatability of load testing against the bugblast or the QA servers, and distributivity of load testing. Coherent messaging and chat tests could be developed.

OK! onward - I'll try try the grid tomorrow in house, and then see about coding up a framework to pull "canonical" userNetID (etc) files from the net.