Docker containers for Taxonworks biodiversity software

Docker helps you develop and ship code and data more efficiently. Taxonworks is a Ruby on Rails app that facilitates biodiversity informatics research. Instead of installing Taxonworks directly to our local machine, we’ll build some portable containers that anyone can run on any host operating system: OS X, Windows, or Linux. Docker containers will help make our work reproducible and portable, separating our development environment from any OS-specific dependencies.

Well, almost. To run docker on Mac OSX or Windows you’ll first need to install Boot2Docker. Boot2Docker provides a Docker-configured Linux virtual machine that will run in VirtualBox or whatever VM platform you have. The boot2docker VM will serve as the host machine for our Docker containers. If you already use Linux on your desktop then you’ll ignore Boot2Docker and just install Docker.

Let’s build some portable containers for the Taxonworks app and the Postgresql database.

With Boot2docker installed: boot2docker ssh

Get Boot2docker’s IP address on your host machine. We’ll need this later to access our app in a browser:
boot2docker ip

Get the Dockerfile.

The Dockerfile I’ve created for Taxonworks does a few simple things. First, it instructs Docker to base a new Docker image on Phusion’s Passenger Ruby image for the Ruby version (2.1) we need. Next it maps my Taxonworks app directory to the image. Finally, it instructs Docker, using the run command, to execute bundle install.

docker@boot2docker:~$ docker build -t jstirnaman/taxonworks .

Now, get a new Docker container for Postgresql + PostGIS. I chose James Brink’s image which also has good documentation:
docker@boot2docker:~$ docker run -P --name taxonworks_pg -e PASSWORD=`openssl rand -hex 10` -e USER=taxonworks -e SCHEMA=taxonworks -e POSTGIS=true jamesbrink/postgresql

More configuration options for the Postgresql container.

The --name option specifies a friendly name for the container. Without it, Docker will assign a random name.

The docker ps command should display the running container for our Postgresql image. In the Ports column is a list of ports open on the container and the host (boot2docker in our case) port that the container’s port is mapped to. In this case, host port 49158 is mapped to container port 5432, the Postgresql default.

docker@boot2docker:/Users/jstirnaman/dev/play/taxonworks$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
99d1655eb744 jamesbrink/postgresql:latest "/var/lib/postgresql 2 days ago Up 2 days 0.0.0.0:49158->5432/tcp taxonworks_pg

Next, we’ll startup a Taxonworks app container and use Docker’s container linking feature. Linking containers allows us to to link our Postgresql db securely to our Taxonworks app without keeping track of network addresses. Docker will handle that for us.
I’ve already configured taxonworks/config/database.yml to connect to a database host named db. Now, when I run my Taxonworks app container I’ll specify the --link option and assign the alias “db” to my Postgresql container. You can call it whatever you want – just make sure that your database.yml and alias match because Docker will create an entry in your app container’s /etc/hosts for the alias. This entry maps the alias to the Postgresql container’s IP address. Brilliant!

To illustrate with docker run, bring up the container and execute the env command so we can see the special environment variables created by Docker for our database connection. By default, the app container exposes standard HTTP ports 80 and 443 so we’ll use the docker run -p flag to map port 80 in the container to port 49555 on our host machine.

docker@boot2docker:~$ docker run --rm --name taxonworks_web -p 49555:80 --link taxonworks_pg:db jstirnaman/taxonworks env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=f01881df5463
DB_PORT=tcp://172.17.0.27:5432
DB_PORT_5432_TCP=tcp://172.17.0.27:5432
DB_PORT_5432_TCP_ADDR=172.17.0.27
DB_PORT_5432_TCP_PORT=5432
DB_PORT_5432_TCP_PROTO=tcp
DB_NAME=/taxonworks_web/db
DB_ENV_PASSWORD=7aaba37e0c3ca5cffa26
DB_ENV_USER=taxonworks
DB_ENV_SCHEMA=taxonworks
DB_ENV_POSTGIS=true
HOME=/root

Follow the Taxonworks installation instructions for running migrations and seeding the database. Because we assigned the container a friendly name in our docker run command and because our Dockerfile sets the Taxonworks install directory as our default directory, we can do this:
docker exec taxonworks_web rake db:migrate RAILS_ENV=test
docker exec taxonworks_web rspec
docker exec taxonworks_web rake db:seed

Finally, start up the Taxonworks Rails app using the default Webbrick server. As we saw earlier, the app container exposes standard HTTP ports 80 and 443 so we’ll instruct rails server to listen on port 80.
docker exec taxonworks_web rails s -p 80

Taxonworks should be browseable at the boot2docker IP address and port 49555
Taxonworks

Docker containers for Taxonworks biodiversity software

Hunspell for medical terms spell-checking

The Learning Technologies Team at MPOW was on the lookout for a spell-check dictionary for our LMS so when students and faculty misspell medical words, they’ll get suggestions for the correct spelling. One of the LMS is Blackboard and it supports Hunspell dictionaries.

I found https://github.com/Glutanimate/hunspell-en-med-glut on Github which appears to be a quality, active project. Using Homebrew, I installed hunspell on my Mac to validate the dictionary.
me$ brew install hunspell

en-med-glut is a “special” dictionary, an extension of your base language dictionary. So, you need to load one of those into hunspell prior to loading en-med-glut. Chances are you already have dictionary files on your machine somewhere, but I went ahead and downloaded the en_US dictionary from SCOWL (And Friends). You can find other sources and details in the Elasticsearch hunpell docs.

Because the dictionaries that I downloaded weren’t in my standard path location, I used some environment variables to point to them, as described in this tutorial.

me$ env
...
DICPATH=/Users/me/dev/en_US
DICTIONARY=en_US

The hunspell -D should report something like this
me $ hunspell -D
...
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
./en_US_OpenMedSpel
./en_US/en_US
/usr/share/hunspell/en_med_glut
LOADED DICTIONARY:
./en_US/en_US.aff
./en_US/en_US.dic
Hunspell 1.3.3

You can check for a word by typing into STDIN here, but you aren’t yet using the medical dictionary; it’s available, but not yet loaded. Use hunspell -d:
me $ hunspell -d en_US,en_med_glut
Hunspell 1.3.3

Now test by misspelling a known term and to see if you get suggestions back:

astromreplicell
astromReplicell 2 0: AastromReplicell, gastroepiploic

Two suggestions! Success!

I haven’t seen the dictionary in action within Blackboard, but our LMS Admin tells me that it worked after creating a new dictionary in Blackboard, uploading Hunspell, and then removing the existing default dictionary file.

Hunspell for medical terms spell-checking

Productive (design) meetings

“Design by committee” is rarely fun or productive. We’re in the middle of redesigning a small but critical service for our students and faculty. The goal is to make it simpler, more elegant, and require less decision-making on behalf of the user. Let’s just give them what they want.

The service is also used heavily by back-office staff, which presents a challenge. Then there are other staff who might want to promote other helpful services, albeit with good intentions, within the context of the one that we’re trying to simplify.

In the spirit of agile development, we always try to identify product owners and make them responsible for setting direction and prioritizing the backlog. In this case, we have not pinned down a clear product owner. To complicate matters, we have many stakeholders, but none solely looking out for the interests of the primary customers.

After sending out the first demo of the newly designed service and exchanging a few comments via email, some concerned staff wisely suggested that we meet. I asked for interested parties from among the stakeholders and got positive responses. I scheduled the meeting for the next day and I decided to run it myself. Like many organizations, we have a culture of less than effective meetings. I’m likely as guilty as the next person. Thanks to our history of agile planning sessions, Kanban work, and agile process study, I knew we could do better, though I didn’t know specifically how to run a good design meeting.

I did know that I wanted it focused, simple, pain-free, and even fun if possible. What does one call such a meeting? Google eventually led me to the magic words “design critique”. And from there I found Jake Knapp’s excellent article “9 rules for running productive design critiques“. In his article, Jake cites Scott Berkun’s essay, “#23 – How to run a design critique“. I was familiar with Berkun from his project management book, Making Things Happen. Berkun’s essay is certainly worth reading, but I owe Knapp for his succinct, last-minute inspiration and guidance.

As I only had 10 minutes before I’d have to sprint 4 blocks to the meeting, I skimmed Knapp’s article. By the time I got to Rule #6, Write It Before You Say It, I had a pretty good idea of how I wanted this to go down. It struck me that the principles of running a design critique don’t differ much from the composition of any good meeting: establishing some shared vision, stating the goals of what you want to accomplish together, making sure everyone gets a chance to be heard, engaging in focused, respectful discussion, making some decisions, leaving with some tasks, and laying a plan for the next meeting. I’d attempted bits and pieces of those and certainly recognized when they were missing.

That said, there are a couple of Knapp’s rules that I might consider more specific to the design process, such as identifying the design owner and avoiding pitching a design in the meeting. Rule #8, Don’t Design In The Meeting might not seem design-centric at first, but real estate developers, project managers, and financiers wouldn’t attempt to draw up blueprints during a meeting or give the engineer pointers about structural capacity… would they?

So I walked hastily to my meeting, armed with at least the beginnings of a plan. I talked a little about why the redesign. I broadly laid out my goals. I gave everyone 3 extra-large Post-Its and 5 minutes to write three critiques, concerns, or advantages of the current design. We discussed. We grouped and prioritized. Then we compared those against the redesign demo. I came away with design tasks. A couple of others went away with usability study tasks. It was productive. And it went well. Heck, it was even a little fun.

Next time, I’ll do some things differently. I’ll lay out my goals more specifically. I’ll ask more questions up front to make sure that we’re on the same page (even though I got a clear sense that everyone ultimately was wiling to prioritize the customer’s needs over their own. We have some cool people like that). I’ll be clearer about the need to prioritize the tasks, and maybe employ some dot voting. I’ll wrap up quicker and get confirmation that everyone feels heard and feels good about what we accomplished. Next time it will be better, and that’s iterative and incremental.

Productive (design) meetings

RDF with Jena and Fuseki (Part 3 of 3): Query RDF data in Jena and Fuseki

See Part 2: Validate and store RDF data in Jena and Fuseki

Keep these in mind

W3C SPARQL Query Language

SPARQL over HTTP

Get Started

Assuming we have a running Fuseki server and a modest dataset from Part 2, visit your Fuseki control panel (http://localhost:3030/control-panel.tpl), select the dataset, and post a query. This simple query selects all triples from the dataset as subject, predicate, object.

SELECT * {?s ?p ?o}

Run the same query using the SPARQL-over-HTTP command-line tools that come with Fuseki:

$ cd [path_to_fuseki]/jena-fuseki-1.0.0
 $ ./s-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}'

Back in the Fuseki control panel select all linkedAuthor and linkedInformationResource:

PREFIX core: <http://vivoweb.org/ontology/core#>
PREFIX bibo: <http://purl.org/ontology/bibo/>
SELECT ?author ?work
WHERE
{
?x core:linkedAuthor ?author.
?x core:linkedInformationResource ?work
}

Gist at Github

…and using s-query:

$ ./s-query --service http://localhost:3030/ds/query 'PREFIX core: <http://vivoweb.org/ontology/core#> PREFIX bibo: <http://purl.org/ontology/bibo/>SELECT ?author ?work WHERE { ?x core:linkedAuthor ?author. ?x core:linkedInformationResource ?work }'
RDF with Jena and Fuseki (Part 3 of 3): Query RDF data in Jena and Fuseki

RDF with Jena and Fuseki (Part 2 of 3): Validate and Store RDF data in Jena and Fuseki

See Part 1: Installing Fuseki (with TDB) server and Jena command-line tools on OS X.

Keep these in mind

Reading and writing RDF in Jena

Get Started

I hear the WWW may have some data about published research:

$ cd jena-fuseki-1.0.0
$ curl -o ./Data/bibapp_works.rdf 'http://experts.kumc.edu/search?utf8=%E2%9C%93&q=mass+spectrometry&commit=Search&format=rdf'

It’s supposed to be RDF/XML. Let’s validate that using Jena’s CLI. Fuseki’s web UI has a validator, but it doesn’t support RDF/XML:

$ rdfxml --validate ./Data/bibapp_works.rdf

We’re getting to the fun part. Let’s push some RDF into a Fuseki dataset.

Normally, we should be able to use the SPARQL HTTP support with s-put. With our TDB datastore:

$ ./s-put -v http://localhost:3030/ds/data default ./Data/bibapp_works.rdf

With the in-memory datastore, I kept getting NoMethod on ‘nil’ from Ruby. I suspect it has to do with a content-type or encoding declaration. If you run into the same problem, try using PUT with curl instead of s-put like this:

$curl -X PUT -H "Content-Type: application/rdf+xml" -d ./Data/bibapp_works.rdf http://localhost:3030/ds/data?default

Did it work? Retrieve the entire dataset serialized to RDF and paired with the graph IRI.

$ ./s-get http://localhost:3030/ds/data default

See Part 3: Query RDF data in Jena and Fuseki

RDF with Jena and Fuseki (Part 2 of 3): Validate and Store RDF data in Jena and Fuseki

RDF with Jena and Fuseki (Part 1 of 3): Installing Fuseki (with TDB) server and Jena command-line tools on OS X

Fuse-what?

This begins a 3-part series on learning to use Jena and Fuseki for parsing, validating, and querying RDF. This series grew out of the Linked Data Toolshare workshop, Code4Lib Midwest 2013, that I facilitated with Shawn Averkamp.

Apache Jena provides a Java framework and command-line tools for working with Linked Data. Fuseki is a SPARQL server that provides HTTP endpoints to your RDF data.  Together, they provide a rich and convenient toolset for building Semantic Web applications.

I assume you already have some RDF data. If not, no worries; I provide a source.

Keep these in mind

Jena command-line tools We don’t need the Jena CL tools for running Fuseki, but they include some scripts for validating our data. Specifically, Fuseki doesn’t include a validator for RDF/XML.

Getting Started with Fuseki

Apache Jena binaries downloads

Get Started

Download and extract the Jena binary:

$ curl -O http://www.apache.org/dist/jena/binaries/apache-jena-2.11.0.tar.gz
$ tar xvfz apache-jena*.gz

Set environment variables for Jena command-line tools:

$ export JENAROOT=[/path_to_apache-jena-2.11.0]
$ export PATH=$PATH:$JENAROOT/bin

Verify:

$ sparql --version

Download and extract the Fuseki binary:

$ curl -O http://www.apache.org/dist/jena/binaries/jena-fuseki-1.0.0-distribution.tar.gz
$ tar xvfz jena-fuseki*.gz

Make fuseki-server executable:

$ cd jena-fuseki-1.0.0
$ chmod +x fuseki-server s-*

Start fuseki-server. The options below starts fuseki-server with an in-memory, non-persistent datastore. It gets flushed when you stop the server. The –update option allows us to write to the datastore.

$ ./fuseki-server --update --mem /ds

A temporary datastore might come in handy some day, but since Fuseki comes with TDB built-in, let’s create a persistent datastore. You can pre-configure your TDB datastore to meet your needs, but we’re just going to take the basic included configuration. First, create a new directory ./DB to hold your datastores.

$ mkdir ./DB
$ ./fuseki-server --update --loc=DB /ds

See Part 2: Validate and Store RDF data in Jena and Fuseki

RDF with Jena and Fuseki (Part 1 of 3): Installing Fuseki (with TDB) server and Jena command-line tools on OS X

Solr and Mahout

I’ve just started experimenting with applying Mahout to analyze text in a Solr index. Mahout is a set of machine-learning tools. It is built on Apache Hadoop and consists of algorithms and utilities for clustering and classifiying text and data. Recent versions of Solr include the Carrot2 clustering engine which is very cool, but I specifically wanted to get acquainted with Hadoop and MapReduce.

There is already lots of helpful information out there on using Mahout with Solr. Grant Ingersoll’s post, Integrating Apache Mahout with Apache Lucene and Solr – Part I (of 3) got me started, but like many of the commenters, I was pining for the missing sequels. Next came Mayur Choubey’s helpful, straightforward outline Cluster Apache Solr data using Apache Mahout. Finally, the Mahout wiki page Quick tour of text analysis using the Mahout command line filled in some missing blanks.

Following are the steps and references I used to generate clusters from a BibApp Solr index.

Installation of Mahout and Hadoop was straightforward. Although, had I read through all the instructions before installing Mahout then I’d have known to skip running the tests. That would have saved a good chunk of time.

Step 1: Add termVectors to BibApp’s Solr text field in schema.xml.

<!--====Special Fields====-->
<!--'text' is used as default search field (see below)-->
<field name="text" type="text" indexed="true" stored="false" multiValued="true" termVectors="true"/>

Then, reindex Solr. I’ve slightly modified my BibApp to take better advantage of Solr multicore for reindexing, but it’s still a “standard” Solr index:

$ cd ~/development/BibApp; bundle exec rake solr:refresh_swap_index RAILS_ENV=development

Step 2: Run Mahout against the Solr index to generate a vector file and dictionary file.
$ bin/mahout lucene.vector --dir /Users/jstirnaman/development/BibApp/vendor/bibapp-solr/cores-data/development/core2/data/index/ --output my-data/bibapp-vectors --field text --idField id --dictOut my-data/bibapp-dictionary --norm 2

Step 3: Run Mahout’s kmeans clustering algorithm against the vector file. Incidentally, this step took the longest to figure out as I don’t know anything about providing cluster centroids (the -c parameter). As it turns out, if you supply both the -k and -c parameters then kmeans will put its own random seed vectors into the -c directory. The “Quick Tour of Text Analysis…” Mahout wiki page clued me in. Phew!
$ bin/mahout kmeans -i my-data/bibapp-mahout.vec -c my-data/bibapp-kmeans-centroids -cl -o my-data/bibapp-kmeans-clusters -k 20 -ow -x 10 -dm org.apache.mahout.common.distance.CosineDistanceMeasure

Here’s the tail end of the output. 16461 records sounds about right:

12/07/29 07:37:48 INFO mapred.JobClient: Job complete: job_local_0003
12/07/29 07:37:48 INFO mapred.JobClient: Counters: 9
12/07/29 07:37:48 INFO mapred.JobClient: File Output Format Counters
12/07/29 07:37:48 INFO mapred.JobClient: Bytes Written=27229499
12/07/29 07:37:48 INFO mapred.JobClient: File Input Format Counters
12/07/29 07:37:48 INFO mapred.JobClient: Bytes Read=25709664
12/07/29 07:37:48 INFO mapred.JobClient: FileSystemCounters
12/07/29 07:37:48 INFO mapred.JobClient: FILE_BYTES_READ=228198077
12/07/29 07:37:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=142916329
12/07/29 07:37:48 INFO mapred.JobClient: Map-Reduce Framework
12/07/29 07:37:48 INFO mapred.JobClient: Map input records=16461
12/07/29 07:37:48 INFO mapred.JobClient: Spilled Records=0
12/07/29 07:37:48 INFO mapred.JobClient: Total committed heap usage (bytes)=241053696
12/07/29 07:37:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=138
12/07/29 07:37:48 INFO mapred.JobClient: Map output records=16461
12/07/29 07:37:48 INFO driver.MahoutDriver: Program took 68717 ms (Minutes: 1.1452833333333334)

Step 4: Use Clusterdump to analyze the clusters. Mind the -dt, dictionary type, parameter. Set it to “text” in our case, otherwise the command will fail, telling you that the dictionary file is not a sequence file.

$ bin/mahout clusterdump -d my-data/bibapp-dictionary -dt text -i my-data/bibapp-kmeans-clusters/clusters-2-final/part-r-00000 -o my-data/bibapp-kmeans-clusterdump -n 20 -b 100 -p my-data/bibapp-kmeans-clusters/clusteredPoints -e

And here are the Top Terms from my Clusterdump output:

{n=2370 c=[0:0.023, 00:0.001, 000:0.001, 0000:0.000, 000001:0.000, 000005:0.000, 000008:0.00
Top Terms:
a =>0.027360996873880702
patient =>0.025805192877985494
0 =>0.023186265779140857
cancer =>0.021687143137689265
age =>0.021376625634302378
p =>0.020841041796736622
were =>0.020806778810206115
diseas =>0.020105232473833737
n =>0.019341989276154777
r => 0.01866668922521172
from =>0.018322945487318585
measur =>0.018318213666784038
medicin => 0.01811168751056253
2 =>0.017878920778544888
use =>0.017718353777290936
studi =>0.017570063261710112
signific =>0.017489906172346706
health => 0.01743561515914853
increas =>0.017073525048242385
clinic => 0.01705333529210705

I need to tweak my clustering for sure, but it’s a start. I had forgotten that my text field in Solr contains truncated words for stemming. I’ll consider adding a new field to generate clusters from.

Solr and Mahout

Agile and Scrum at 2011 KC Developer Conf

Some notes on Agile, Scrum, and development proces, from the 2011 Kansas City Developer Conference held at Johnson County Community College, Overland Park, KS.

Updated November 16, 2011 with notes from Wes’ BDD session.

Agile Methodologies (Martin Olson)

  • Deploy Early and Often -> Increases ROI
  • One size doesn’t fit all: we have a variety of available approaches. Evolve over time and get input from different disciplines (e.g. lean manufacturing)
  • Application of conversation and facilitation instead of preservation. Facilitation is the hardest part of Agile – fostering conversation but allowing the facilitator to jump in and ask hard questions.
  • Different agile approaches address different areas (people, technology, business, marketing). Wikipedia is a good source of info for these:
    • Extreme Programming (XP)
    • Scrum (gives clear visibility to developers and management, requires strict adherence to the process)
    • Agile Modeling (Scott ? at IBM, “A whiteboard is the most powerful tool you have. Design enough to know what you need to do and then quit.”)
    • Feature Driven Development
    • Crystal Family of Methodologies
    • Behavior Driven Development (See Wes Garrison’s talk)
    • Kanban
    • Lean

Kanban and Lean getting more attention now. Can be applied to any organization but require more discipline to be successful. “If you’re working on more than one thing, you’re violating the process.” Check out the Kanban game played at Agile Conference

Which one’s for me?

  • Assess your work space and determine your strengths and weaknesses
  • Need to have a plan
  • Start small and build up
  • Get help if you’re unsure: agilekc.org

Factors to Consider When Determining to Use an Agile Process

Enabling factors:

  • Uncertain, changing and volatile requirements for the project
  • Responsible and motivated developers and customers
  • Customer who willing to get involved
  • Organization able to support and trust the team

Disabling Factors:

  • Cultural norms that stress formal structure
  • Inexperienced development teams
  • Fixed development scope and a strict contract
  • Developers that don’t communicate or are not open to new ideas and processes (this will kill agility)
  • Customers that won’t share the project risk

Scrumbut and Fragile (Mark Randolph)

9 essentials of scrum

  • 3 Roles (developer, scrum master, owner)
  • 3 Planning Cycles
  • 3 Key Practices

3 Roles

Product Owner = pays for, specifies, accepts results; source of requirements; approves final product; intimate participant at all levels

Developers = develop product; all key skill sets are on the team (varies with project); self-organizing (scrum master is facilitator, NOT project manager), requires people to work well together (good processes permit people to work well together who would ordinarily not); hold huddles

ScrumMaster = facilitator, not manager; organizes and runs meetings (except huddles); obtains resources and removes obstacles; represents team to stakeholders; advocates for and protects team

3 Planning Cycle

Releases

Releases = list of features to deliver = user stories

  • Product Owner decides the most important feature (priority ranking: release 1.0, release 2.0)
  • 3-6 month time horizon driven by feature backlog
  • Gives schedule and budget forecasts

Sprints

Sprints = stuff you can get done in a fixed period of time (aka “timebox”)

  • Purpose is to ship a functional product
  • Sprint planning meeting outcome is to commit to deliver feature to product owner
  • Incremental deliveries (feature subsets)
  • 1 week to 1 month time horizon
  • Commits to specific features from backlog
  • Delivers potentially shippable code *every time*

Scrums

  • Huddle = 15 min. max. What did you accomplish? What are your obstacles? What are you going to work on today?
  • Purpose is to maintain focus on sprint
  • Held every day

3 Essential Practices

Feature Driven Backlog = center of scrum; a list (spreadsheet works fine); measured in “chunks” or “story points” (not hours). vs. “artifacts”

  • SORS Velocity
  • SORS Burndown = work remaining measurement, trends. When you have a trend you know when you need to catch up and how much it will take.
  • Only features and artifacts (no tasks, no bugs)
  • Owner defines features
  • Owner sets priorities
  • Owner determines completion

Zero defects. Total bug-free product. It is possible => no *known* defects. The one metric that cannot be gamed is “running, tested, features”. Achieved by defining and enforcing the meaning of “done”. The feature is either done or not with NO partial credit.

Retrospection = what are we doing right? wrong? what can we do differently? what will we do? Reflection and change. Drives process improvement. Team-owned. Conducted as part of sprint

  • Can use measurements, backlog, etc. as visuals (“information radiator” (Bob Martin))

If it’s so easy, why is it so hard?

  • Middle management interference
  • No accountability of owners
  • Indecisive owners
  • Shortage of critical resource (DBA)
  • Failure to get commitment of resources

Pushing the rock uphill…moving up a notch

  • Create a scorecard (Agile Scorecard handouts, Nokia test (Jeff Sutherland)) = where you are vs. where you want to be
  • Create an improvement backlog
  • Never skip a retrospective
  • The only thing that really matters is running, tested, features

Behavior-Driven Development (Wes Garrison)

  • We all have clients. What are we selling them? Project success.
  • Why use BDD? Because not building the wrong thing saves us time.

User First

  • Feature (= “story” in Agile)
  • Why, Who, What

Why: Pop the why stack

  • Keep asking why until you get to the core need/value

Who: User roles (User, Admin, Visitor)

What: “I want to…”

  • Feature gets broken into scenarios.

Scenario:

  • Given => Initial setup
  • When => Steps
  • Then => Assertions

Working in the Domain Language (high-level)

  • Business users can participate.
  • Business users can submit as bug reports
  • Our job is to communicate
    • Collaborate in Google Docs, it’s easy.
    • Write some code and stop talking about it.

Development process

  • Scenarios are executable code.
  • Broken build = shame (make it fun, a game)
  • Red-Green Refactor Cycle
Agile and Scrum at 2011 KC Developer Conf

Highlights from Code4Lib 2011

Less bacon. More Hydra, Solr, and ugly animals.

These are some of the presentations that stood out to me at this year’s Code4Lib Conference in Bloomington. The back button ate my first draft, so here’s the abbreviated version.

Running Cloud Servers

VIVO Bootcamp

Visualizing Library Data

Hey, Dilbert. Where’s My Data?!

Enhancing the Mobile Experience: Mobile Library Services at Illinois

One Week, One Tool: Ultra-Rapid Open Source Development Among Strangers

VuFind Beyond MARC: Discovering Everything Else

A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework

Chicago Underground Library’s Community-Based Cataloging System

GIS on the Cheap

Let’s Get Small: A Microservices Approach to Library Websites

Building an Open Source Staff-Facing Tablet App for Library Assessment

Mendeley’s API and University Libraries: Three Examples to Create Value

Sharing Between Data Repositories

And, as always, the Lightning Talks showcased some of the most interesting projects and were frequently a laugh-riot. Danish guys are funny (http://www.indiana.edu/~video/stream/launchflash.html?format=MP4&folder=vic&filename=C4L2011_session_3b_20110209.mp4, 202:00).

Highlights from Code4Lib 2011