From: "John
Clyne" <clyne@ncar.ucar.edu>
Date: Fri Sep 5, 2003 3:40:00 PM US/Pacific
To: "John
Shalf" <jshalf@lbl.gov>, <diva@lbl.gov>
Subject: Re: DiVA Survey
(Please return by Sept 10!)
John,
I think I may have answered 25% of the
questions below. I didn't answer
more because 1) my 3 1/2 hour flight didn't
permit, and 2) I think
a lot of the questions really get into
implementation issues that should not
(can not) be addressed until we have
agreement on functional
requirements. They are excellent questions,
and raise important points
to keep in mind, but I felt it was premature
to try and answer them.
cheers - jc
1) Data
Structures/Representations/Management==================
The center of every successful modular
visualization architecture has
been a flexible core set of data
structures for representing data that
is important to the targeted application
domain. Before we can begin
working on algorithms, we must come to
some agreement on common methods
(either data structures or
accessors/method calls) for
exchanging data
between components of our vis framework.
There are two potentially disparate
motivations for defining the data
representation requirements. In the coarse-grained case, we need to
define standards for exchanging data
between components in this
framework (interoperability). In the fined-grained case, we want to
define some canonical data structures that
can be used within a
component -- one developed specifically
for this framework. These two
use-cases may drive different set of requirements
and implementation
issues.
* Do you feel both of these use cases are
equally important or should
we focus exclusively on one or the other?
Too soon to tell. Focus on both until the
issues become more clear.
* Do you feel the requirements for each of
these use-cases are aligned
or will they involve two separate
development tracks? For instance,
using "accessors" (method calls
that provide abstract access to
essentially opaque data structures) will
likely work fine for the
coarse-grained data exchanges between
components, but will lead to
inefficiencies if used to implement
algorithms within a particular
component.
I think it's premature to say. We need to
have agreement on the
questions below first.
* As you answer the "implementation
and requirements" questions below,
please try to identify where
coarse-grained and fine-grained use cases
will affect the implementation
requirements.
What are requirements for the data
representations that must be
supported by a common infrastructure. We will start by answering Pat's
questions of about representation
requirements and follow up with
personal experiences involving particular
domain scientist's
requirements.
Must: support for structured data
Must.
Must/Want: support for multi-block data?
Must.
Must/Want: support for various
unstructured data representations?
(which ones?)
Not sure. Not a priority.
Must/Want: support for adaptive grid
standards? Please be specific
about which adaptive grid methods you are
referring to. Restricted
block-structured AMR (aligned grids),
general block-structured AMR
(rotated grids), hierarchical unstructured
AMR, or non-hierarchical
adaptive structured/unstructured meshes.
Adaptive grid usage is in its infancy at
NCAR. But I suspect it is the
way of the future. Too soon to be specific
about which adaptive grid
methods are prefered.
Must/Want: "vertex-centered"
data, "cell-centered" data?
other-centered?
Must: support time-varying data,
sequenced, streamed data?
Must. Time varying data is what makes so
many of our problems currently
intractible. Too many of the available tools
(e.g. VTK) assume static
data and completely fall apart when the data
is otherwise.
Must/Want: higher-order elements?
low priority
Must/Want: Expression of material
interface boundaries and other
special-treatment of boundary conditions.
no priority
* For commonly understood datatypes like
structured and unstructured,
please focus on any features that are
commonly overlooked in typical
implementations. For example, often data-centering is overlooked in
structured data representations in vis
systems and FEM researchers
commonly criticize vis people for
co-mingling geometry with topology
for unstructured grid
representations. Few
datastructures provide
proper treatment of boundary conditions or
material interfaces. Please
describe your personal experience on these
matters.
Support for missing data is essential for
observed fields.
* Please describe data representation
requirements for novel data
representations such as bioinformatics and
terrestrial sensor datasets.
In particular, how should we handle more abstract data that is
typically given the moniker
"information visualization".
Beats me.
What do you consider the most
elegant/comprehensive implementation for
data representations that you believe
could form the basis for a
comprehensive visualization framework?
* For instance, AVS uses entirely
different datastructures for
structure, unstructured and geometry
data. VTK uses class inheritance
to express the similarities between
related structures. Ensight treats
unstructured data and geometry nearly
interchangably. OpenDX uses more
vector-bundle-like constructs to provide a
more unified view of
disparate data structures. FM uses data-accessors (essentially
keeping
the data structures opaque).
I don't think this is what you're after, but
i've come to believe that
multiresolution data representations with
efficient domain subsetting
capabilities are the most pragmatic and
elegant
way to deal with large data sets. In
addition to enabling interaction
with the largest data sets they offer
tremenous scalability from desktop
to "visual supercomputer". i would
encourage a data model that includes
and facilitates their integral support.
* Are there any of the requirements above
that are not covered by the
structure you propose?
Not sure.
* This should focus on the
elegance/usefulness of the core
design-pattern employed by the
implementation rather than a
point-by-point description of the
implemenation!
* Is there information or characteristics
of particular file format
standards that must percolate up into the
specific implementation of
the in-memory data structures?
For the purpose of this survey, "data
analysis" is defined broadly as
all non-visual data processing done
*after* the simulation code has
finished and *before* "visual
analysis".
I take issue with your definition of data
analysis. Yes it is performed
after the simulation, but it is performed
(or would be performed if viz
tools didn't suck) in *parallel* with visual
analysis. The two when
well integrated, which is rarely the case,
can compliment each other
tremendously. So called "visual
analysis" by itself, without good
quantitative capablity, is pretty useless.
* Is there a clear dividing line between
"data analysis" and "visual
analysis" requirements?
Well, text based, programmable user
interfaces are a must for "data
analysis" , whereas GUI is essential
for visual.
* Can we (should we) incorporate data
analysis functionality into this
framework, or is it just focused on visual
analysis.
If visualization is ever going to live up to
the claim made by so many
in the viz community of
it being an indispensable tool for analsyis,
tight integration with
statistical tools and data processing
capabilities are a must. Otherwise
we'll just continue to make pretty pictures,
put on dog and pony shows,
and wonder where the users are.
* What kinds of data analysis typically
needs to be done in your
field? Please give examples and how these functions are currently
implemented.
Pretty much everything you can do with IDL
or matlab.
* How do we incorporate powerful data
analysis functionality into the
framework?
I'd suggest exploring leveraging existing
tools, numerical python for
example.
2) Execution Model=======================
It will be necessary for us to agree on a
common execution semantics
for our components. Otherwise, while we might have
compatible data
structures but incompatible execution
requirements. Execution
semantics is akin to the function of
protocol in the context of network
serialization of data structures. The motivating questions are as
follows;
* How is the execution model affected by
the kinds of
algorithms/system-behaviors we want to
implement.
* How then will a given execution model
affect data structure
implementations
* How will the execution model be
translated into execution semantics
on the component level. For example will we need to implement
special
control-ports on our components to
implement particular execution
models or will the semantics be implicit
in the way we structure the
method calls between components.
What kinds of execution models should be
supported by the distributed
visualization architecture
* View dependent algorithms? (These were
typically quite difficult to
implement for dataflow visualization
environments like AVS5).
These are neat research topics, but i've
never been convinced that they
have much application beyond IEEEViz
publications. Mostly I believe
this because of the complexity they impose
on the data model. Better to
simply offer progressive/multiresolution
data access.
* Out-of-core algorithms
Seems like a must for large data. But is
this a requirement or a design
issue?
* Progressive update and
hierarchical/multiresolution algorithms?
This is the way to go (IMHO), the question
is at what level to support
it.
* Procedural execution from a single
thread of control (ie. using an
commandline language like IDL to
interactively control an dynamic or
large parallel back-end)
A must for data analysis and data manipulation
(derving new fields, etc)
* Dataflow execution models? What is the firing method that should
be
employed for a dataflow pipeline? Do you need a central executive like
AVS/OpenDX or, completely distributed
firing mechanism like that of
VTK, or some sort of abstraction that
allows the modules to be used
with either executive paradigm?
* Support for novel data layouts like
space-filling curves?
We use a wavelet based approach similar to
space filling curves. Both
approaches have merrit and both should be
supportable by the framework.
* Are there special considerations for
collaborative applications?
* What else?
How will the execution model affect our
implementation of data
structures?
* how do you decompose a data structure
such that it is amenable to
streaming in small chunks?
* how do you represent temporal
dependencies in that model?
* how do you minimize recomputation in
order to regenerate data for
view-dependent algorithms.
What are the execution semantics necessary
to implement these execution
models?
* how does a component know when to
compute new data? (what is the
firing rule)
* does coordination of the component
execution require a central
executive or can it be implemented using
only rules that are local to a
particular component.
* how elegantly can execution models be
supported by the proposed
execution semantics? Are there some things, like loops or
back-propagation of information that are
difficult to implement using a
particular execution semantics?
How will security considerations affect
the execution model?
3) Parallelism and
load-balancing=================
Thus far, managing parallelism in
visualization systems has been a
tedious and difficult at best. Part of this is a lack of powerful
abstractions for managing data-parallelism,
load-balancing and
component control.
Please describe the kinds of parallel
execution models that must be
supported by a visualization component
architecture.
* data-parallel/dataflow pipelines?
* master/slave work-queues?
* streaming update for management of
pipeline parallelism?
* chunking mechanisms where the number of
chunks may be different from
the number of CPU's employed to process
those chunks?
* how should one manage parallelism for
interactive scripting
languages that have a single thread of
control? (eg. I'm using a
commandline language like IDL that
interactively drives an arbitrarily
large set of parallel resources. How can I make the parallel back-end
available to a single-threaded interactive
thread of control?)
Please describe your vision of what kinds
of software support /
programming design patterns are needed to
better support parallelism
and load balancing.
* What programming model should be
employed to express parallelism.
(UPC, MPI, SMP/OpenMP, custom sockets?)
* Can you give some examples of frameworks
or design patterns that you
consider very promising for support of
parallelism and load balancing.
(ie. PNNL Global Arrays or Sandia's
Zoltan)
http://www.cs.sandia.gov/Zoltan/
http://www.emsl.pnl.gov/docs/global/ga.html
* Should we use novel software
abstractions for expressing parallelism
or should the implementation of
parallelism simply be an opaque
property of the component? (ie. should
there be an abstract messaging
layer or not)
* How does the NxM work fit in to all of
this? Is it sufficiently
differentiated from Zoltan's capabilities?
Hmm. These all seem to be implementation
issues. Too early to answer.
===============End of Mandatory Section
(the rest is
voluntary)=============
4) Graphics and Rendering=================
What do you use for converting geometry
and data into images (the
rendering-engine). Please comment on any/all of the
following.
* Should we build modules around
declarative/streaming methods for
rendering geometry like OpenGL, Chromium
and DirectX or should we move
to higher-level representations for
graphics offered by scene graphs?
What are the pitfalls of building our
component architecture around
scene graphs?
Not so good for time varying data last time
i checked.
* What about Postscript, PDF and other
scale-free output methods for
publication quality graphics? Are pixmaps sufficient?
Well what are we trying to provide, an
environment for analysis or
producing images for publications? The
latter can be done as a post
process and should not, IMHO, be a focus of
DIVA.
In a distributed environment, we need to
create a rendering subsystem
that can flexibly switch between drawing
to a client application by
sending images, sending geometry, or
sending geometry fragments
(image-based rendering)? How do we do that?
Use Cr
* Please describe some rendering models
that you would like to see
supported (ie. view-dependent update,
progressive update) and how they
would adjust dynamically do changing
objective functions (optimize for
fastest framerate, or fastest update on
geometry change, or varying
workloads and resource constraints).
Not sold on view dependent update as
worthwhile, but progressive updates
can be hugely helpful. Question is do you
accomplish this by adding
support in the renderer or back it up the
pipeline to the raw data?
* Are there any good examples of such a
system?
Yes, Kitware's not-for-free volume renderer
(volren?). I does a nice job
with handling progressive updates. This is
mostly handled by the GUI but
places some obvious requirements on the
underlying rendering/viz
component.
What is the role of non-polygonal methods
for rendering (ie. shaders)?
* Are you using any of the latest gaming
features of commodity cards
in your visualization systems today?
Yup, we've off loaded a couple of algorithms
from the CPU.
* Do you see this changing in the future?
(how?)
The biggest issue is portability, but things
are looking up with OpenGL
2.0 efforts, etc.
5) Presentation=========================
It will be necessary to separate the visualization
back-end from the
presentation interface. For instance, you may want to have the
same
back-end driven by entirely different
control-panels/GUIs and displayed
in different display devices (a CAVE vs. a
desktop machine). Such
separation is also useful when you want to
provide different
implementations of the user-interface
depending on the targeted user
community. For instance, visualization experts might desire a
dataflow-like interface for composing
visualization workflows whereas a
scientists might desire a domain-specific
dash-board like interface
that implements a specific workflow. Both users should be able to
share the same back-end components and
implementation even though the
user interface differs considerably.
How do different presentation devices
affect the component model?
* Do different display devices require
completely different user
interface paradigms? If so, then we must define a clear
separation
between the GUI description and the
components performing the back-end
computations. If not, then is there a common language to describe user
interfaces that can be used across
platforms?
* Do different display modalities require
completely different
component/algorithm implementations for
the back-end compute engine?
(what do we do about that??)
What Presentation modalities do you feel
are important and what do you
consider the most important.
* Desktop graphics (native applications on
Windows, on Macs)
This is numero uno by a HUGE margin
* Graphics access via Virtual Machines like
Java?
Not important
* CAVEs, Immersadesks, and other VR
devices
Not important
* Ultra-high-res/Tiled display devices?
Moderately important
* Web-based applications?
Well, maybe.
What abstractions do you think should be
employed to separate the
presentation interface from the back-end
compute engine?
* Should we be using CCA to define the
communication between GUI and
compute engine or should we be using
software infrastructure that was
designed specifically for that space? (ie.
WSDL, OGSA, or CORBA?)
* How do such control interfaces work with
parallel applications?
Should the parallel application have a
single process that manages the
control interface and broadcasts to all
nodes or should the control
interface treat all application processes
within a given component as
peers?
6) Basic Deployment/Development
Environment Issues============
One of the goals of the distributed
visualization architecture is
seamless operation on the Grid --
distributed/heterogeneous collections
of machines. However, it is quite difficult to realize such a vision
without some consideration of
deployment/portability issues.
This
question also touches on issues related to
the development environment
and what kinds of development methods
should be supported.
What languages do you use for core vis
algorithms and frameworks.
* for the numerically intensive parts of
vis algorithms
C/C++
* for the glue that connects your vis
algorithms together into an
application?
C/C++, Tcl, Python
* How aggressively do you use language-specific
features like C++
templates?
Not at all. Too scary.
* is Fortran important to you? Is it important that a framework
support it seamlessly?
Nope.
* Do you see other languages becoming
important for visualization (ie.
Python, UPC, or even BASIC?)
Python, mostly because the direction of
numerical python.
What platforms are used for data
analysis/visualization?
* What do you and your target users depend
on to display results? (ie.
Windows, Linux, SGI, Sun etc..)
All the above, primarily lintel an windoze
though.
* What kinds of presentation devices are
employed (desktops,
portables, handhelds, CAVEs, Access Grids,
WebPages/Collaboratories)
and what is their relative importance to
active users.
desktops, tiled displays, AG
* What is the relative importants of these
various presentation
methods from a research standpoint?
The desktop is where the users live.
* Do you see other up-and-coming
visualization platforms in the future?
I don't see SMP graphics boxes going away as
quickly as some might.
Tell us how you deal with the issue of
versioning and library
dependencies for software deployment.
* For source code distributions, do you
bundle builds of all related
libraries with each software release (ie.
bundle HDF5 and FLTK source
with each release).
Sometimes, depending on the stability of the
libraries.
* What methods are employed to support
platform independent builds
(cmake, imake, autoconf). What are the benefits and problems with
this
approach.
I've used all, developed my own, and like
none. Maybe we can do better.
I think something based around gmake might
have the best potential.
* For binaries, have you have issues with
different versions of
libraries (ie. GLIBC problems on Linux and
different JVM
implemetnations/version for Java). Can you tell us about any
sophisticated packaging methods that
address some of these problems
(RPM need not apply)
* How do you handle multiplatform builds?
The brute force, not so smart way. The VTK
model is worth looking at.
How do you (or would you) provide
abstractions that hide the locality
of various components of your
visualization/data analysis application?
* Does anyone have ample experience with
CORBA, OGSA, DCOM, .NET, RPC?
Please comment on advantages/problems of these technologies.
* Do web/grid services come into play
here?
7) Collaboration
==========================
If you are interested in
"collaborative appllications" please define
the term "collaborative". Perhaps provide examples of
collaborative
application paradigms.
Is collaboration a feature that exists at
an application level or are
there key requirements for collaborative
applications that necessitate
component-level support?
* Should collaborative infrastructure be
incorporated as a core
feature of very component?
Does it need to be incorporated in all
components? What kind of collab
support is needed? Permitting session
logging and geographically
separated, simultaneous, users would go a long way to providing for
collab needs and would seem to only impact
the GUI and perhaps renderer.
* Can any conceivable collaborative
requirement be satisfied using a
separate set of modules that specifically
manage distribution of events
and data in collaborative applications?
* How is the collaborative application
presented? Does the
application only need to be collaborative
sometimes?
* Where does performance come in to
play? Does the visualization
system or underlying libraries need to be
performance-aware? (i.e. I'm
doing a given task and I need a framerate
of X for it to be useful
using my current compute resources),
network aware (i.e. the system is
starving for data and must respond by
adding an alternate stream or
redeploying the pipeline). Are these considerations implemented at
the
component level, framework level, or are
they entirely out-of-scope for
our consideration?