Multiple-Entry Key Overview
This page last modified on 28 June, 2006
Beyond the rough identification of plants that's possible with casual recognition of plant photographs in a guidebook, one needs to use a detailed "Key" to get the identification right. For most working botanists, the traditional "Dichotomous Key" format (called a "binary tree" in computer science jargon) is still favored. In California, the common authoritative reference to flowering native plants is The Jepson Manual edition published in 1993 (University of California Press) - referred to as "TJM" hereafter. If you're not already familiar with the Genus to which an Asteraceae plant belongs, you have few reasonable alternatives to using the TJM dichotomous keys to identify your plant. But for amateurs (and even some working professional botanists) many of the TJM keys have a common flaw that prevents a successful conclusion without resorting to guesswork, and/or to a lot of unsuccessfull dead-end searches.
In a dichotomous key, you start from a very general level that includes all possible species covered by the entire key. You then make a series of either-this-or-that (i.e. binary) decisions each based on a pair of descriptions, where each side of the pair is meant to exclude all members of the other side. A search for the species identification typically may involve 20 to 30 such decisions. But if at any step you have no source of information on which to decide which side of such a pair is appropriate for your plant, you've reached a road-block and cannot logically proceed!!
Since the early days of punch-cards for computer data processing, an alternative which avoids this road-block has been available for building identification keys. In effect one uses a table (or "spreadsheet" in modern computer jargon) in which each row of the table represents a plant species, and the value entered in each column represents the presence (or absence) of some characteristic of the plant (example: each flower has 5 petals). On the old punch cards, presence would be indicated by a hole in the appropriate card column, while absence would be indicated by absence of a hole. In a spreadsheet, you could use a 1 to indicate presence, a 0 to indicate absence. By sorting the spreadsheet based on the values in such a column, you could separate the rows with the 1's from those with the 0's - and eliminate the rows that don't agree with your search criterion. Typically, your key built this way would have well over 100 such columns, each representing some common plant characteristic. There's no prescribed order for going through progressive steps in which you sort on any single column, and then eliminate the rows not agreeing with your search criterion. So you can use the order which seems most logical to you based on what you know about the plant characteristics.
The power of this multiple-entry approach is that absence of information about any single data column is unlikely to block a successful conclusion based on sorting/eliminating using data in other columns. This is because the data is usually redundant, and there usually will be many combinations of a few columns which, when combined using this sort/eliminate procedure, will lead to a unique conclusion of the search.
If you have been following this space, you'll know that frustration with identifying Asteraceae plants I had photographed here in San Diego County led me to experiment with a multiple-entry key starting in mid 2005. I used a recent version of the MEKA tools (Multiple Entry Key Algorithm) which are available for downloading from the website of the Jepson Project at UC Berkeley. The resulting key, which I posted here, was successful to the extent that it permitted me to reduce my time for identifying an Asteraceae plant to just a few hours - complete with checking to be sure that the plant I identified in that way matched all of the characteristics listed in TJM.
Details are still available on my Old MEKA/SLIKS based key. (But I plan to remove all but the overview documentation by end of summer 2006 unless I get E-mail from anyone wishing the functioning bulk of this old key to remain on this website.)
But my preliminary key led to various frustrations, especially resulting from the fact that the descriptions and keys in TJM often lack any information on plant characteristics that seem important to me. Most frustrating is the fact that the missing characteristics, in TJM descriptions of species within a single genus, tend to vary randomly - thus making it difficult to compare plants within that genus. But the manual gives no overt clue about which characteristics are missing for a particular species - so one has to read each description and genus-level key to find out which characteristics are important, and which are missing. That's a very time consuming exercise, so one tries to use the key without knowing which characteristics are missing for the plant to be identified. Often (but not always) selecting such a missing characteristic results in throwing out the the correct plant identification - a fact you can't discover easily without detailed study of the TJM descriptions and genus-level key (plus species-level key if there are subspecies to consider).
In recent years, there have been a number of efforts associated with the worldwide botany community to build multiple-entry key software tools that would achieve widespread acceptance for building and publishing plant identification keys. One of the leading groups producing such a tool is the Centre for Biological Information Technology (CBIT) at the University of Queensland in Brisbane, Australia. The latest version of their Lucid key building tool provides an elegant solution to the Missing Data problem. As a result, I've rebuilt my AsteraceaeSD key completely using Lucid version 3.3.
See Lucid-3 version key for details.
Though this Lucid-3 version key is mainly based on plant descriptions from TJM, it was necessary to interpret many different descriptive terms used by TJM authors to mean the same thing. That made it possible to present each botanic "character" (i.e. plant characteristic, called the "State" of some "Feature" in Lucid terminology) to mean the same thing throughout the key. Without doing this interpretation, the key would have been a lot larger, and very confusing to use. Undoubtedly (not being an experienced professional botanist) I've made mistakes in the interpretation -- and invite feedback from professionals to help me correct those mistakes.
While The Jepson Manual presents the various botanic character values in a more or less standardized order, many authors have used unique ways to describe plant details. In effect, it's not really practical to encode all of these details (that may be essential to identification of a species) into the spreadsheet format in which the multiple-entry key data is represented. (The Lucid Builder tool does permit adding small annotation files giving additional explanations where relevant.)
There is an international community of botanists working toward this goal. (I'll add references here when time permits.) But it will be necessary for the great majority of professional botanists to accept this concept, and to agree to use the standardized State terms that would make this possible. That is unlikely to happen until a number of really large, complex multiple-entry keys come into widespread use. Hopefully my Lucid-3 version key will be a small contribution toward this end.
-- Ken Bowles