From: James Kohl <email@example.com>
Date: Wed Sep 10, 2003 3:32:44 PM US/Pacific
To: John Shalf <firstname.lastname@example.org>
Subject: Re: DiVA Survey (Please return by Sept 10!)
O.K., here goes... wahoo... :)
1) Data Structures/Representations/Management==================
The center of every successful modular visualization architecture has
been a flexible core set of data structures for representing data that
is important to the targeted application domain. Before we can begin
working on algorithms, we must come to some agreement on common methods
(either data structures or accessors/method calls) for exchanging data
between components of our vis framework.
There are two potentially disparate motivations for defining the data
representation requirements. In the coarse-grained case, we need to
define standards for exchanging data between components in this
framework (interoperability). In the fined-grained case, we want to
define some canonical data structures that can be used within a
component -- one developed specifically for this framework. These two
use-cases may drive different set of requirements and implementation
* Do you feel both of these use cases are equally important or
should we focus exclusively on one or the other?
I think for now we need to exclusively focus on exchanging data between
components, rather than any fine-grained generalized data objects...
The first order entry into any component development is to "wrap up
what ya got". The "rip things apart" phase comes after you can glue
all the coarse-grained piece together reliably...
* Do you feel the requirements for each of these use-cases are
aligned or will they involve two separate development tracks?
Two separate development tracks. Definitely. There are different driving
design forces and they can be developed (somewhat) independently (I hope).
For instance, using "accessors" (method calls that provide abstract
access to essentially opaque data structures) will likely work fine for
the coarse-grained data exchanges between components, but will lead to
inefficiencies if used to implement algorithms within a particular
* As you answer the "implementation and requirements" questions
below, please try to identify where coarse-grained and fine-grained use
cases will affect the implementation requirements.
What are requirements for the data representations that must be
supported by a common infrastructure. We will start by answering Pat's
questions of about representation requirements and follow up with
personal experiences involving particular domain scientist's
Must: support for structured data
Must/Want: support for multi-block data?
Must/Want: support for various unstructured data representations?
Must/Want: support for adaptive grid standards? Please be specific
about which adaptive grid methods you are referring to. Restricted
block-structured AMR (aligned grids), general block-structured AMR
(rotated grids), hierarchical unstructured AMR, or non-hierarchical
adaptive structured/unstructured meshes.
Must/Want: "vertex-centered" data, "cell-centered" data?
All of these should be "Wants", to the extent that they require more
sophisticated handling, or are less well-known in terms of generalizing
For example, the AMR folks havfe been trying to get together and define
a standard API, and have been as yet unsuccessful. Who are we to attempt
this where they have failed...?
So to clarify, if we *really* understand (or think we do) a particular
data representation/organization, or even a specific subset of a general
representation type, then by all means lets whittle an API into our stuff.
Otherwise, leave it alone for someone else to do, or do as strictly needed.
Must: support time-varying data, sequenced, streamed data?
Must/Want: higher-order elements?
Must/Want: Expression of material interface boundaries and other
special-treatment of boundary conditions.
Wants, see above...
* For commonly understood datatypes like structured and
unstructured, please focus on any features that are commonly overlooked in
typical implementations. For example, often data-centering is overlooked
in structured data representations in vis systems and FEM researchers
commonly criticize vis people for co-mingling geometry with topology
for unstructured grid representations. Few datastructures provide
proper treatment of boundary conditions or material interfaces. Please
describe your personal experience on these matters.
* Please describe data representation requirements for novel data
representations such as bioinformatics and terrestrial sensor datasets.
In particular, how should we handle more abstract data that is
typically given the moniker "information visualization".
I don't think we should "pee in this pool" either yet. Are any of us
experts in this kind of viz? Let's stick with what we collectively know
best and make that work before we try to tackle a related-but-fundamentally-
What do you consider the most elegant/comprehensive implementation for
data representations that you believe could form the basis for a
comprehensive visualization framework?
Sounds like the "Holy Grail" to me... If anything even remotely close to
this already existed, we'd all be using it already...
(Unless of course it's the dreaded NIH syndrome...)
* For instance, AVS uses entirely different datastructures for
structure, unstructured and geometry data. VTK uses class inheritance
to express the similarities between related structures. Ensight treats
unstructured data and geometry nearly interchangably. OpenDX uses more
vector-bundle-like constructs to provide a more unified view of
disparate data structures. FM uses data-accessors (essentially keeping
the data structures opaque).
* Are there any of the requirements above that are not covered by
the structure you propose?
* This should focus on the elegance/usefulness of the core
design-pattern employed by the implementation rather than a
point-by-point description of the implemenation!
* Is there information or characteristics of particular file format
standards that must percolate up into the specific implementation of
the in-memory data structures?
I dunno, but what does HDF5 or NetCDF include? We should definitely be
able to handle various meta-data...
Otherwise, our viz framework should be able to read in all sorts of
file-based data as input, converting it seamlessly into our "Holy Data
Grail" format for all the components to use and pass around. But the
data shouldn't be identifiable as having once been HDF or NetCDF, etc...
(i.e. it's important to read the data format, but not to use it internally)
For the purpose of this survey, "data analysis" is defined broadly as
all non-visual data processing done *after* the simulation code has
finished and *before* "visual analysis".
* Is there a clear dividing line between "data analysis" and "visual
NO. There shouldn't be - these operations are tightly coupled, or even
symbiotic, and *should* all be incorporated into the same framework,
indistinguishable from each other.
* Can we (should we) incorporate data analysis functionality into
this framework, or is it just focused on visual analysis.
* What kinds of data analysis typically needs to be done in your
Simple sampling, basic statistical averages/deviations, principal component
analysis (PCA, or EOF for climate folks), other dimension reduction.
Please give examples and how these functions are currently
C/C++ code... mostly slow serial... :-Q
* How do we incorporate powerful data analysis functionality into
As components (duh)... :-)
We should define some "standard" APIs for the desired analysis functions,
and then either wrap existing codes as components or shoehorn in existing
component implementations from systems like ASPECT.
2) Execution Model=======================
It will be necessary for us to agree on a common execution semantics
for our components. Otherwise, while we might have compatible data
structures but incompatible execution requirements. Execution
semantics is akin to the function of protocol in the context of network
serialization of data structures. The motivating questions are as
* How is the execution model affected by the kinds of
algorithms/system-behaviors we want to implement.
Directly. There are probably a few main exec models we want to cover.
I don't think the list is *that* long...
As such, we should anticipate building several distinct framework
environments that each exclusively support a given exec model. Then
the trick is to "glue" these individual frameworks together so they can
interoperate (exchange data and invoke each others' component methods)
and be arbitrarily "bridged" together to form complex higher-level
pipelines or other local/remote topologies.
* How then will a given execution model affect data structure
I don't think it should affect the data structure impls at all, per se.
Clearly, the access patterns will be different for various execution models,
but this shouldn't change the data impl. Perhaps a better question is
how to indicate the expected access pattern to allow a given data impl
to optimize or properly prefetch/cache the accesses...
* How will the execution model be translated into execution
semantics on the component level. For example will we need to implement
special control-ports on our components to implement particular execution
models or will the semantics be implicit in the way we structure the
method calls between components.
Components should be "dumb" and let other components or the framework invoke
them as needed for a given execution model. The framework dictates the
control flow, not the component. The API shouldn't change.
If you want multi-threaded components, then the framework better support
that, and the API for the component should take the possibility into account.
What kinds of execution models should be supported by the distributed
* View dependent algorithms? (These were typically quite difficult
to implement for dataflow visualization environments like AVS5).
* Out-of-core algorithms
Must. This is a necessary evil of "big data". You need some killer
caching infrastructure throughout the pipeline (e.g. like VizCache).
* Progressive update and hierarchical/multiresolution algorithms?
* Procedural execution from a single thread of control (ie. using an
commandline language like IDL to interactively control an dynamic or
large parallel back-end)
This is not an execution model, it is a command/control interface issue.
You should be able to have a GUI, programmatic control, or scripting to
dictate interactive control (or "steering" as they call it... :-). The
internal software organization shouldn't change, just the interface to
the outside (or inside) world...
* Dataflow execution models?
What is the firing method that should
be employed for a dataflow pipeline? Do you need a central executive like
AVS/OpenDX or, completely distributed firing mechanism like that of
VTK, or some sort of abstraction that allows the modules to be used
with either executive paradigm?
This should be an implementation issue in the "dataflow framework", and
should not affect the component-level APIs.
* Support for novel data layouts like space-filling curves?
Must. But this isn't an execution model either. It's a data structure
or algorithmic detail...
* Are there special considerations for collaborative applications?
Surely. The interoperability of distinct framework implementations
ties in with this... but the components shouldn't be aware that they
are being run collaboratively/remotely... definitely a framework issue.
* What else?
How will the execution model affect our implementation of data
It shouldn't. The execution model should be kept independent of the
data structures as much as possible.
If you want to build higher-level APIs for specific data access patterns
that's fine, but keep the underlying data consistent where possible.
* how do you decompose a data structure such that it is amenable to
streaming in small chunks?
This sounds a lot like distributed data decompositions. I suspect that
given a desired block/cycle size, you can organize/decompose data in all
sorts of useful ways, depending on the expected access pattern.
In conjunction with this, you could also reorganize static datasets
into filesystem databases, with appropriate naming conventions or
perhaps a special protocol for lining up the data blob files in the
desired order for streaming (in either time or space along any axis).
Meta-data in the files might be handy here, too, if it's indexed
efficiently for fast lookup/searching/selection.
* how do you represent temporal dependencies in that model?
Meta-data, or file naming conventions...
* how do you minimize recomputation in order to regenerate data for
What are the execution semantics necessary to implement these execution
* how does a component know when to compute new data? (what is the
There are really only 2 possibilities I can see - either a component is
directly invoked by another component or the framework, or else a method
must be triggered by some sort of dataflow dependency or stream-based
* does coordination of the component execution require a central
executive or can it be implemented using only rules that are local to a
This is a framework implementation detail. No. No. Bad Dog.
The component doesn't know what's outside of it (in the rest of the
framework, or the outside world). It only gets invoked, one way or
* how elegantly can execution models be supported by the proposed
execution semantics? Are there some things, like loops or
back-propagation of information that are difficult to implement using a
particular execution semantics?
We need to keep the different execution models separate, as implementation
details of individual frameworks. This separates the concerns here.
How will security considerations affect the execution model?
Ha ha ha ha...
They won't right away, except in collaboration scenarios.
Think "One MPI Per Framework" and do things the old fashioned way
locally, then do the "glue" for inter-framework connectivity with
proper authentication only as needed. (No worse than Globus... :-)
3) Parallelism and load-balancing=================
Thus far, managing parallelism in visualization systems has been a
tedious and difficult at best. Part of this is a lack of powerful
abstractions for managing data-parallelism, load-balancing and
Please describe the kinds of parallel execution models that must be
supported by a visualization component architecture.
* data-parallel/dataflow pipelines?
* master/slave work-queues?
* streaming update for management of pipeline parallelism?
* chunking mechanisms where the number of chunks may be different
from the number of CPU's employed to process those chunks?
This sounds the same as master/slave to me, as in "bag of tasks"...
* how should one manage parallelism for interactive scripting
languages that have a single thread of control? (eg. I'm using a
commandline language like IDL that interactively drives an arbitrarily
large set of parallel resources. How can I make the parallel back-end
available to a single-threaded interactive thread of control?)
Broadcast, Baby... Either you blast the commands out to everyone SIMD
style (unlikely) or else you talk to the Rank 0 task and the command
gets forwarded on a fast internal network.
Please describe your vision of what kinds of software support /
programming design patterns are needed to better support parallelism
and load balancing.
* What programming model should be employed to express parallelism.
(UPC, MPI, SMP/OpenMP, custom sockets?)
All but UPC will be necessary for various functionality.
* Can you give some examples of frameworks or design patterns that
you consider very promising for support of parallelism and load balancing.
(ie. PNNL Global Arrays or Sandia's Zoltan)
Nope, that covers my list of hopefuls.
* Should we use novel software abstractions for expressing
parallelism or should the implementation of parallelism simply be an
opaque property of the component? (ie. should there be an abstract
messaging layer or not)
It's not our job to develop "novel" parallelism abstractions. We should
just use existing abstractions like what the CCA is developing.
* How does the NxM work fit in to all of this? Is it sufficiently
differentiated from Zoltan's capabilities?
I don't know what Zoltan can do specifically, but MxN is designed for
basic "parallel data redistribution". This means it is good for doing
big parallel-to-parallel data movement/transformations among two disparate
parallel frameworks, or between two parallel components in the same
framework with different data decompositions. MxN is also good for
"self-transpose" or other types of local data reorganization within a
given (parallel) component.
MxN doesn't do interpolation in space or time (yet, probably for a while),
and it won't wash your car (but it won't drink your beer either... :-).
If you need something fancier, or if you don't really need any data
reorganization between the source and destination of a transfer, then
MxN *isn't* for you...
===============End of Mandatory Section (the rest is
4) Graphics and Rendering=================
What do you use for converting geometry and data into images (the
rendering-engine). Please comment on any/all of the following.
* Should we build modules around declarative/streaming methods for
rendering geometry like OpenGL, Chromium and DirectX or should we move
to higher-level representations for graphics offered by scene graphs?
What are the pitfalls of building our component architecture around
* What about Postscript, PDF and other scale-free output methods for
publication quality graphics? Are pixmaps sufficient?
In a distributed environment, we need to create a rendering subsystem
that can flexibly switch between drawing to a client application by
sending images, sending geometry, or sending geometry fragments
(image-based rendering)? How do we do that?
I would think this could be achieved by a sophisticated data communication
protocol - one that encodes the type of data in the stream, say, using XML
or some such thingy.
* Please describe some rendering models that you would like to see
supported (ie. view-dependent update, progressive update) and how they
would adjust dynamically do changing objective functions (optimize for
fastest framerate, or fastest update on geometry change, or varying
workloads and resource constraints).
* Are there any good examples of such a system?
What is the role of non-polygonal methods for rendering (ie. shaders)?
* Are you using any of the latest gaming features of commodity cards
in your visualization systems today?
* Do you see this changing in the future? (how?)
It will be necessary to separate the visualization back-end from the
presentation interface. For instance, you may want to have the same
back-end driven by entirely different control-panels/GUIs and displayed
in different display devices (a CAVE vs. a desktop machine). Such
separation is also useful when you want to provide different
implementations of the user-interface depending on the targeted user
community. For instance, visualization experts might desire a
dataflow-like interface for composing visualization workflows whereas a
scientists might desire a domain-specific dash-board like interface
that implements a specific workflow. Both users should be able to
share the same back-end components and implementation even though the
user interface differs considerably.
How do different presentation devices affect the component model?
Not. The display device only affects resolution or bandwidth required.
This could be parameterized in the component invocations APIs, but
should not otherwise change an individual component.
If you want a "multiplexer" to share a massive data stream with a powerwall
and a PDA, then the "multiplexer component" implementation handles that...
* Do different display devices require completely different user
No. Different GUIs should all map to some common framework command/control
interface. The same functions will ultimately get executed, just from buttons
with different labels or appl-specific short-cuts... The UIs should all be
independent, but talk the same protocol to the framework.
If so, then we must define a clear separation
between the GUI description and the components performing the back-end
computations. If not, then is there a common language to describe user
interfaces that can be used across platforms?
* Do different display modalities require completely different
component/algorithm implementations for the back-end compute engine?
(what do we do about that??)
Algorithm maybe, component no. This could fall into the venue of the
different execution-model-specific frameworks and/or their bridging...
What Presentation modalities do you feel are important and what do you
consider the most important.
* Desktop graphics (native applications on Windows, on Macs)
* Graphics access via Virtual Machines like Java?
Ha ha ha ha...
* CAVEs, Immersadesks, and other VR devices
* Ultra-high-res/Tiled display devices?
* Web-based applications?
Probably a good idea. Someone always asks for this... :-Q
What abstractions do you think should be employed to separate the
presentation interface from the back-end compute engine?
Some sort of general protocol descriptor, like XML...? Nuthin fancy.
* Should we be using CCA to define the communication between GUI and
compute engine or should we be using software infrastructure that was
designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)
The CCA doesn't do such communication per se. Messaging between or in/out
of frameworks is always "out of band" relative to CCA port invocations.
If the specific framework impl wants to shove out data on some wire,
then it's hidden below the API level...
I would think that WSDL/SOAP would be O.K. for low-bandwidth uses.
* How do such control interfaces work with parallel applications?
Should the parallel application have a single process that manages the
control interface and broadcasts to all nodes or should the control
interface treat all application processes within a given component as
I vote for the "single process that manages the control interface and
broadcasts to all nodes" (or the variation above, where one of the
parallel tasks forwards to the rest internally :-). The latter is
BTW, you can't have "application processes within a... component".
What does that even mean?
Usually, an application "process" consists of a collection of one or
more components that have been composed with some specific connectivity...
6) Basic Deployment/Development Environment Issues============
One of the goals of the distributed visualization architecture is
seamless operation on the Grid -- distributed/heterogeneous collections
of machines. However, it is quite difficult to realize such a vision
without some consideration of deployment/portability issues. This
question also touches on issues related to the development environment
and what kinds of development methods should be supported.
What languages do you use for core vis algorithms and frameworks.
* for the numerically intensive parts of vis algorithms
* for the glue that connects your vis algorithms together into an
* How aggressively do you use language-specific features like C++
RUN AWAYYYY!!! These are not consistent across o.s./arch/compiler yet.
* is Fortran important to you? Is it important that a framework
support it seamlessly?
Fortran is crucial for many application scientists. It is not directly
useful for the tools I build.
But if you want to ever integrate application code components directly
into a viz framework, then you better not preclude this... (or Babel...)
* Do you see other languages becoming important for visualization
(ie. Python, UPC, or even BASIC?)
What platforms are used for data analysis/visualization?
* What do you and your target users depend on to display results?
(ie. Windows, Linux, SGI, Sun etc..)
All of the above (not so much Sun any more...).
* What kinds of presentation devices are employed (desktops,
portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories)
and what is their relative importance to active users.
All but handhelds are important, mostly desktops, CAVEs/hi-res and AG,
in decreasing order.
* What is the relative importants of these various presentation
methods from a research standpoint?
CAVEs/hi-res and AG are worthwhile research areas. The rest can be
weaved in or incorporated more easily.
* Do you see other up-and-coming visualization platforms in the
Yes, but I haven't figured out where exactly to stick the chip behind
my ear for the virtual holodeck equipment... :)
Tell us how you deal with the issue of versioning and library
dependencies for software deployment.
* For source code distributions, do you bundle builds of all related
libraries with each software release (ie. bundle HDF5 and FLTK source
with each release).
No, but provide web links or separate copies of dependent distributions
next to our software on the web site...
Too ugly to include everything in one big bundle, and not as efficient
as letting the user download just what they need. (As long as everything
you need is centrally located or accessible...)
* What methods are employed to support platform independent builds
(cmake, imake, autoconf). What are the benefits and problems with this
Mostly autoconf so far. My student thinks automake and libtools is "cool"
but we haven't used them yet...
* For binaries, have you have issues with different versions of
libraries (ie. GLIBC problems on Linux and different JVM
implemetnations/version for Java). Can you tell us about any
sophisticated packaging methods that address some of these problems
(RPM need not apply)
Just say no. Open Source is the way to go, with a small set of "common"
binaries just for yuks. Most times the binaries won't work with the
specific run-time libs anyway...
* How do you handle multiplatform builds?
Autoconf, shared source tree, with arch-specific subdirs for object files,
libs and executables.
How do you (or would you) provide abstractions that hide the locality
of various components of your visualization/data analysis application?
I would use "proxy" components that use out-of-band communication to
forward invocations and data to the actual component implementation.
* Does anyone have ample experience with CORBA, OGSA, DCOM, .NET,
RPC? Please comment on advantages/problems of these technologies.
* Do web/grid services come into play here?
Yuk, I hope not.
7) Collaboration ==========================
If you are interested in "collaborative appllications" please define
the term "collaborative". Perhaps provide examples of collaborative
"Collaborative" is 2 or more geographically/remote teams, sharing one
common viz environment, with shared control and full telepresence.
(Note: by this definition, "collaborative" does not yet exist... :-)
Is collaboration a feature that exists at an application level or are
there key requirements for collaborative applications that necessitate
Collaboration should exist *above* the application level, either outside
the specific framework or as part of the framework "bridging" technology.
* Should collaborative infrastructure be incorporated as a core
feature of very component?
NO. Let the framework proxy to collaborative capabilities...
* Can any conceivable collaborative requirement be satisfied using a
separate set of modules that specifically manage distribution of events
and data in collaborative applications?
I dunno, I doubt it.
* How is the collaborative application presented? Does the
application only need to be collaborative sometimes?
Yes, collaboration should be flexible and on demand as needed - like
dialing out on the speakerphone while in the middle of a meeting...
* Where does performance come in to play? Does the visualization
system or underlying libraries need to be performance-aware?
There likely will need to be "hooks" to specify performance requirements,
like "quality of service". This should perhaps be incorporated as part
of the individual component APIs, or at least metered by the frameworks...
doing a given task and I need a framerate of X for it to be useful
using my current compute resources)...
It would be wise to specify the frame rate requirement, perhaps interactively
depending on the venue... e.g. in interactive collaboration scenarios you'd
rather drop some frames consistently than stall completely or in bursts...
network aware (i.e. the system is
starving for data and must respond by adding an alternate stream or
redeploying the pipeline).
This sounds like futureware to me - an intelligent network protocol layer...
beyond our scope for sure!
Are these considerations implemented at the
component level, framework level, or are they entirely out-of-scope for
These issues should be dealt with mostly at the framework level, if at all.
I think they're mostly out-of-scope for the first incarnation...
WHEW! DONE! :-D