Update: This has now been published in many more places: the original Nature article,
Discover Magazine, SEED, Geo, the Eastern European Edition and India, Brazil, etc.,
and more books and papers.
Since the image has proven to be popular (we still get print & publication
requests, four lears later!), I decided to rework the existing image layers, modifying
only the graphical representationno structureto see whether I could
improve the information layering and remedy some of the faults I outline at the bottom
of this page. Here's the new version:
I was surprised at how much clearer it could be made. I think the new one has many
of the structural faults I outlined below, but improves several things:
- Fewer distracting lines: nodes are now a single visual element rather than an outline and
a central dot. They’re also more object-like, being filled areas rather than outlines.
- Cleaner look: I gave up on representing all the individual papers as tiny white
dots (which required the background to be significantly darker than white)
- The labels are still the same shapeand still unnecessarily hairybut I pushed
them into the background by printing them in gray and drawing more attention to the
nodes with color. The results is that the structure of the data is much more prominent,
as it should be.
- I used a more modern typeface (Myriad Pro) and layout, backing off the more “antique
map” effect I was going for before. I still think 200-year-old engravings are
better at representing data than almost every contemporary InfoVis trick, but we can
learn from how they differentiated data categories visually and follow the same
processes within more current typographic fashion.
What follows is the original posting.
The journal Nature chose an image that spatially lays
out different areas of science in a plane. It is a reduction
of a large-format (42" x 43") paper print.
As to what the image depicts, it was constructed by sorting roughly 800,000 scientific papers (shown as white dots) into 776 different scientific paradigms (red circular nodes) based on how often the papers were cited together by authors of other papers. Links (curved lines) were made between the paradigms that shared common members, then treated as rubber bands, holding similar paradigms nearer one another when a physical simulation had every paradigm repel every other: thus the layout derives directly from the data. Larger paradigms have more papers. Labels list common words unique to each paradigm.
Research and node layout by Kevin Boyack and
Dick Klavans; data from Thompson ISI; graphics & typography by W. Bradford Paley. Commissioned and partially supported by Katy Borner and the Places and Spaces: Mapping Science exhibition.
Copyright (c) 2006 W. Bradford Paley, all rights reserved. (But you may print a version for personal use, and optionally donate to i|e or Places & Spaces if you want to be a good infosphere citizen.)
A sad proportion of “art/science” illustrations
are more show than tool, evoking but not helping to explain
the phenomena that are supposed to be their subjects.
This map
and variations on it are a distinct counterpoint: they
are used daily as a tool by Dick Klavans
to determine such things as which areas of science are most
closely connected to one another, are most and least
intellectually vital, or which scientific areas produce the
most patents. Here’s an image on which he and I collaborated
(again, with Kevin Boyack’s help & ISI’s data) that discusses
the variations in how different nations pursue science.
This second poster shows which scientific areas produce the
most patents.
Even at this gross reduction, you can see image variations relating to
how the US treats science (the large map: heavy in the Medical
Sciences at the lower left) and, say, China (top of the
rightmost column: heavy in Physics,
the nodes at the upper right). But Dick’s concise and illuminating
tours thorough the representation are a pleasure to read, and
a boon for a data illustrator like myself who loves to have
experts guide me through data sets to help me foreground
what’s important. (It’s always a warning sign in my mind when
someone says: here’s my datawhat can you find in it?...)
Click on the image to get a very large one (the original is
roughly 24" x 39" that lets you read the stories.
Some have shown interest in how the image
was created and what I think of it. Here are some notes.
Pictorial/communicative inspirations:
- Antique maps carefully wrapped text along features in the
geography: this save space (if the features were
spatially seperated the labels stayed in the vicinity of the
feature, not complicating the labling of other features)
- It also helped tie the label more firmly to the geographic
feature: I’d suggest that the cognitive binding of
perceptually-related objects (especially objects next to one
another and with similar shapes) is stronger than that of
labels that are merely nearby or point at the feature
- Note that for this image the features are only dots,
and are so tightly packed that I couldn’t do what I think
optimal: track the feature shape as a river name tracks the
river in antique maps, so I fall back at pointing at the node,
but at least don’t use callout lines
Algorithm description:
- Sometimes the pictorial description suggests an
implementation. I imagined the labels as flowing downhill
away from the nodesnow, how do I create that terrain?
- Here, I first thought of the base map
as on a tabletop and each node as a
tall spike (with height proportional to node size), coming up from the table
- Then I convolved those spikes with Gaussian bumps with a
very large Rho (or diameter) approximately 1/4 the size of the image
- (I later put in a smaller, steeper Gaussian on top (say, 1/20 the
width of the image), to help bend the label paths more strongly away
from nearby notes in clusters)
- Then I simply started a label path an M-width or so away from
the node and let it follow the descending gradient
- I stopped the path when is reached a local minimum
- (I later had it check all the local minima that distance
from the node center as starting points, in case the lowest
of them did not result in the longest path)
- Finally, to prevent label overlap, I started
with the most important (here, biggest)
notes on the blank page, and as I lay down each node’s label
path, I convolved a very small, very steep Gaussian (about
the diameter of twice the font height) with the whole path
so later paths would avoid it
Here’s what it looked like before that last step:
It has slightly less distracting label paths, but they overlap
unreadably in the densest areas (where one presumably most
wants to read them). This is what you’ll see in person
if you visit the Places & Spaces exhibition.
Current Strengths:
- Better (more even) use of space for labels than
simple linear placement, especially in dense areas
- No “callout lines” necessary: the text itself points
toward the node
- Dual representation for each node: large circle allows for
greater visual range to express the data’s variability;
small central dot anchors the node to space, overlaps
less, and provides a visual anchor for the text (being
roughly x-height & baseline aligned)while retaining
the easy ability to connect the large circle with the
small dot (I speculate we are wired for this by selective
pressures in favor of finding pupils in irises, splashes
in ripples, etc.)
- The more relaxed paths of the nodes may elicit a useful
intellectual frame of reference for certain kinds of
data: it resonates with historical geographic labelling
conventions and generally feels more organic: good for
organically- or flow-based data sets
Weaknesses:
- Much too organic and “hairy” looking: the overall effect
draws too much attention to the labelling algorithm and
away from the data (I hope to have the lines relatively
straight, or perhaps very simply curved, with just a
little hook at the end where needed to visually point to a node)
- Hard to read upside-down: I should reverse the letters
when that happens
- Too many overlaps & text paths that bump over earlier
(larger) ones
- The current paths are too choppy, even if we allow the
complex paths: I need more powerful software/computers to
manage all the moving points (thousands per path) at a
finer spatial resolution for better curves.
- I don’t like the red circles (though red is a good overlay
color historically); coupled with the “hairy” effect of
the current paths, it might evoke rashes or injuries
(Other colors do not separate as powerfully; I think the
unwanted resonances will go away when I tame the curves)
I seem to have plateaued on this project for now; other things
have assumed priority, and I hope at some point to
build a new tool set that
will allow easier my exploration of this sort of
work: force-directed placement following
cognitive and typographic
rules rather than simpleminded physical ones. This work should be
considered a first block’s walk in a mile journey.
|