From: "Ilmi Yoon" <firstname.lastname@example.org>
Date: Tue Sep 9, 2003 6:55:29 PM US/Pacific
To: "John Shalf" <email@example.com>
Subject: Re: DiVA Survey (Please return by Sept 10!)
Here are my answers.
----- Original Message -----
From: "John Shalf" <firstname.lastname@example.org>
Sent: Wednesday, August 27, 2003 3:33 PM
Subject: DiVA Survey (Please return by Sept 10!)
1) Data Structures/Representations/Management==================
The center of every successful modular visualization architecture has
been a flexible core set of data structures for representing data that
is important to the targeted application domain. Before we can begin
working on algorithms, we must come to some agreement on common methods
(either data structures or accessors/method calls) for exchanging data
between components of our vis framework.
There are two potentially disparate motivations for defining the data
representation requirements. In the coarse-grained case, we need to
define standards for exchanging data between components in this
framework (interoperability). In the fined-grained case, we want to
define some canonical data structures that can be used within a
component -- one developed specifically for this framework. These two
use-cases may drive different set of requirements and implementation
* Do you feel both of these use cases are equally important or should
we focus exclusively on one or the other?
&& I think we need to decide the coarse-grain something like SOAP that wraps
the internal data with XML format. But I think we don't need to decide the
fined-grain since each component can have choose their own way/foramt and
then post format to public, so the party who want to use the component needs
to follow the interface. But if we like to decide initial sets of format
that must/may be supported by diva components, then we can list most popular
format and choose some/all of them.
* Do you feel the requirements for each of these use-cases are aligned
or will they involve two separate development tracks? For instance,
using "accessors" (method calls that provide abstract access to
essentially opaque data structures) will likely work fine for the
coarse-grained data exchanges between components, but will lead to
inefficiencies if used to implement algorithms within a particular
&& There will be some overhead and inefficiency usingaccessors for data
exchange, but I like the apporach of accessors and believe the CCA achieves
the reusability in expense of performance as OOP does anyway. Just we try to
make the expense as little as possible.
* As you answer the "implementation and requirements" questions below,
please try to identify where coarse-grained and fine-grained use cases
will affect the implementation requirements.
What are requirements for the data representations that must be
supported by a common infrastructure. We will start by answering Pat's
questions of about representation requirements and follow up with
personal experiences involving particular domain scientist's
Must: support for structured data
Must/Want: support for multi-block data?
Must/Want: support for various unstructured data representations?
Must/Want: support for adaptive grid standards? Please be specific
about which adaptive grid methods you are referring to. Restricted
block-structured AMR (aligned grids), general block-structured AMR
(rotated grids), hierarchical unstructured AMR, or non-hierarchical
adaptive structured/unstructured meshes.
Must/Want: "vertex-centered" data, "cell-centered" data?
Must: support time-varying data, sequenced, streamed data?
Must/Want: higher-order elements?
Must/Want: Expression of material interface boundaries and other
special-treatment of boundary conditions.
* For commonly understood datatypes like structured and unstructured,
please focus on any features that are commonly overlooked in typical
implementations. For example, often data-centering is overlooked in
structured data representations in vis systems and FEM researchers
commonly criticize vis people for co-mingling geometry with topology
for unstructured grid representations. Few datastructures provide
proper treatment of boundary conditions or material interfaces. Please
describe your personal experience on these matters.
* Please describe data representation requirements for novel data
representations such as bioinformatics and terrestrial sensor datasets.
In particular, how should we handle more abstract data that is
typically given the moniker "information visualization".
What do you consider the most elegant/comprehensive implementation for
data representations that you believe could form the basis for a
comprehensive visualization framework?
* For instance, AVS uses entirely different datastructures for
structure, unstructured and geometry data. VTK uses class inheritance
to express the similarities between related structures. Ensight treats
unstructured data and geometry nearly interchangably. OpenDX uses more
vector-bundle-like constructs to provide a more unified view of
disparate data structures. FM uses data-accessors (essentially keeping
the data structures opaque).
&& Combination of (externally) FM data-accessors and (internally) VTK class
* Are there any of the requirements above that are not covered by the
structure you propose?
* This should focus on the elegance/usefulness of the core
design-pattern employed by the implementation rather than a
point-by-point description of the implemenation!
* Is there information or characteristics of particular file format
standards that must percolate up into the specific implementation of
the in-memory data structures?
For the purpose of this survey, "data analysis" is defined broadly as
all non-visual data processing done *after* the simulation code has
finished and *before* "visual analysis".
* Is there a clear dividing line between "data analysis" and "visual
&& Some components do purely data analysis, some do only visual, but there
will be calls to the data analysis component from the visual during the
* Can we (should we) incorporate data analysis functionality into this
framework, or is it just focused on visual analysis.
&& Not all data analysis, but there are lots of data analysis being used for
visual analysis and, the more tools are provided initially, it gets easier
to make user-group become big. So, we can list candidates.
* What kinds of data analysis typically needs to be done in your
field? Please give examples and how these functions are currently
* How do we incorporate powerful data analysis functionality into the
2) Execution Model=======================
It will be necessary for us to agree on a common execution semantics
for our components. Otherwise, while we might have compatible data
structures but incompatible execution requirements. Execution
semantics is akin to the function of protocol in the context of network
serialization of data structures. The motivating questions are as
* How is the execution model affected by the kinds of
algorithms/system-behaviors we want to implement.
&& I guess we can make each component propagate/fire the execution of next
component/components in the network/pipeline. Each component can use their
own memory or shared memory to access the data in process. In such case,
algorithm of each component does not get much affected by other coponents
* How then will a given execution model affect data structure
* How will the execution model be translated into execution semantics
on the component level. For example will we need to implement special
control-ports on our components to implement particular execution
models or will the semantics be implicit in the way we structure the
method calls between components.
What kinds of execution models should be supported by the distributed
* View dependent algorithms? (These were typically quite difficult to
implement for dataflow visualization environments like AVS5).
&& I like to say "must", but it is for improving usability and efficiency,
so people may live without it.
It will definitely improve the efficiency. If we want to support view
dependent algorithm, then we should consider it from the beginning of the
dataflow design, so it can be easily integrated into. View dependent or
image-based algorithm doesn't necessarily make much changes to existing data
flow design. View dependant or image-based algorithms are useful to
eliminate majority of data blocks from the rendering pipeline. Therefore, it
is good to provide capability to choose subset of data to be rendered from
* Out-of-core algorithms
* Progressive update and hierarchical/multiresolution algorithms?
&& MUST! for improving usability and efficiency. And can be used to support
* Procedural execution from a single thread of control (ie. using an
commandline language like IDL to interactively control an dynamic or
large parallel back-end)
&& Good to have
* Dataflow execution models? What is the firing method that should be
employed for a dataflow pipeline? Do you need a central executive like
AVS/OpenDX or, completely distributed firing mechanism like that of
VTK, or some sort of abstraction that allows the modules to be used
with either executive paradigm?
* Support for novel data layouts like space-filling curves?
* Are there special considerations for collaborative applications?
&& Some locking mechanizm for subset of data or dispatching of changes from
one client to multiple clients
* What else?
How will the execution model affect our implementation of data
* how do you decompose a data structure such that it is amenable to
streaming in small chunks?
* how do you represent temporal dependencies in that model?
* how do you minimize recomputation in order to regenerate data for
What are the execution semantics necessary to implement these execution
* how does a component know when to compute new data? (what is the
* does coordination of the component execution require a central
executive or can it be implemented using only rules that are local to a
* how elegantly can execution models be supported by the proposed
execution semantics? Are there some things, like loops or
back-propagation of information that are difficult to implement using a
particular execution semantics?
How will security considerations affect the execution model?
3) Parallelism and load-balancing=================
Thus far, managing parallelism in visualization systems has been a
tedious and difficult at best. Part of this is a lack of powerful
abstractions for managing data-parallelism, load-balancing and
Please describe the kinds of parallel execution models that must be
supported by a visualization component architecture.
* data-parallel/dataflow pipelines?
* master/slave work-queues?
* streaming update for management of pipeline parallelism?
* chunking mechanisms where the number of chunks may be different from
the number of CPU's employed to process those chunks?
* how should one manage parallelism for interactive scripting
languages that have a single thread of control? (eg. I'm using a
commandline language like IDL that interactively drives an arbitrarily
large set of parallel resources. How can I make the parallel back-end
available to a single-threaded interactive thread of control?)
Please describe your vision of what kinds of software support /
programming design patterns are needed to better support parallelism
and load balancing.
* What programming model should be employed to express parallelism.
(UPC, MPI, SMP/OpenMP, custom sockets?)
* Can you give some examples of frameworks or design patterns that you
consider very promising for support of parallelism and load balancing.
(ie. PNNL Global Arrays or Sandia's Zoltan)
* Should we use novel software abstractions for expressing parallelism
or should the implementation of parallelism simply be an opaque
property of the component? (ie. should there be an abstract messaging
layer or not)
* How does the NxM work fit in to all of this? Is it sufficiently
differentiated from Zoltan's capabilities?
===============End of Mandatory Section (the rest is
4) Graphics and Rendering=================
What do you use for converting geometry and data into images (the
rendering-engine). Please comment on any/all of the following.
* Should we build modules around declarative/streaming methods for
rendering geometry like OpenGL, Chromium and DirectX or should we move
to higher-level representations for graphics offered by scene graphs?
&& It is usually useful to have access to frame buffer so, I prefer OpenGL
style over VRML style.
In addition, I don't know how useful the scene graphs for visualization. I
guess scene graphs for visualizations are relatively simple, so it is
possible to convert the scene graphs to declarative way. So, mainly support
declarative methods and then additional support of scen graphs and
conversions to declarative methods.
What are the pitfalls of building our component architecture around
&& might lose access to frame buffer and pixel level manipulation --
extremely difficult for view dependent or image-based approach
* What about Postscript, PDF and other scale-free output methods for
publication quality graphics? Are pixmaps sufficient?
In a distributed environment, we need to create a rendering subsystem
that can flexibly switch between drawing to a client application by
sending images, sending geometry, or sending geometry fragments
(image-based rendering)? How do we do that?
* Please describe some rendering models that you would like to see
supported (ie. view-dependent update, progressive update) and how they
would adjust dynamically do changing objective functions (optimize for
fastest framerate, or fastest update on geometry change, or varying
workloads and resource constraints).
* Are there any good examples of such a system?
What is the role of non-polygonal methods for rendering (ie. shaders)?
* Are you using any of the latest gaming features of commodity cards
in your visualization systems today?
* Do you see this changing in the future? (how?)
It will be necessary to separate the visualization back-end from the
presentation interface. For instance, you may want to have the same
back-end driven by entirely different control-panels/GUIs and displayed
in different display devices (a CAVE vs. a desktop machine). Such
separation is also useful when you want to provide different
implementations of the user-interface depending on the targeted user
community. For instance, visualization experts might desire a
dataflow-like interface for composing visualization workflows whereas a
scientists might desire a domain-specific dash-board like interface
that implements a specific workflow. Both users should be able to
share the same back-end components and implementation even though the
user interface differs considerably.
How do different presentation devices affect the component model?
* Do different display devices require completely different user
interface paradigms? If so, then we must define a clear separation
between the GUI description and the components performing the back-end
computations. If not, then is there a common language to describe user
interfaces that can be used across platforms?
* Do different display modalities require completely different
component/algorithm implementations for the back-end compute engine?
(what do we do about that??)
What Presentation modalities do you feel are important and what do you
consider the most important.
* Desktop graphics (native applications on Windows, on Macs)
* Graphics access via Virtual Machines like Java?
* CAVEs, Immersadesks, and other VR devices
* Ultra-high-res/Tiled display devices?
* Web-based applications?
What abstractions do you think should be employed to separate the
presentation interface from the back-end compute engine?
* Should we be using CCA to define the communication between GUI and
compute engine or should we be using software infrastructure that was
designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)
* How do such control interfaces work with parallel applications?
Should the parallel application have a single process that manages the
control interface and broadcasts to all nodes or should the control
interface treat all application processes within a given component as
6) Basic Deployment/Development Environment Issues============
One of the goals of the distributed visualization architecture is
seamless operation on the Grid -- distributed/heterogeneous collections
of machines. However, it is quite difficult to realize such a vision
without some consideration of deployment/portability issues. This
question also touches on issues related to the development environment
and what kinds of development methods should be supported.
What languages do you use for core vis algorithms and frameworks.
* for the numerically intensive parts of vis algorithms
* for the glue that connects your vis algorithms together into an
* How aggressively do you use language-specific features like C++
* is Fortran important to you? Is it important that a framework
support it seamlessly?
* Do you see other languages becoming important for visualization (ie.
Python, UPC, or even BASIC?)
What platforms are used for data analysis/visualization?
* What do you and your target users depend on to display results? (ie.
Windows, Linux, SGI, Sun etc..)
* What kinds of presentation devices are employed (desktops,
portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories)
and what is their relative importance to active users.
* What is the relative importants of these various presentation
methods from a research standpoint?
* Do you see other up-and-coming visualization platforms in the future?
Tell us how you deal with the issue of versioning and library
dependencies for software deployment.
* For source code distributions, do you bundle builds of all related
libraries with each software release (ie. bundle HDF5 and FLTK source
with each release).
* What methods are employed to support platform independent builds
(cmake, imake, autoconf). What are the benefits and problems with this
* For binaries, have you have issues with different versions of
libraries (ie. GLIBC problems on Linux and different JVM
implemetnations/version for Java). Can you tell us about any
sophisticated packaging methods that address some of these problems
(RPM need not apply)
* How do you handle multiplatform builds?
How do you (or would you) provide abstractions that hide the locality
of various components of your visualization/data analysis application?
* Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC?
Please comment on advantages/problems of these technologies.
* Do web/grid services come into play here?
7) Collaboration ==========================
If you are interested in "collaborative appllications" please define
the term "collaborative". Perhaps provide examples of collaborative
Is collaboration a feature that exists at an application level or are
there key requirements for collaborative applications that necessitate
* Should collaborative infrastructure be incorporated as a core
feature of very component?
* Can any conceivable collaborative requirement be satisfied using a
separate set of modules that specifically manage distribution of events
and data in collaborative applications?
* How is the collaborative application presented? Does the
application only need to be collaborative sometimes?
* Where does performance come in to play? Does the visualization
system or underlying libraries need to be performance-aware? (i.e. I'm
doing a given task and I need a framerate of X for it to be useful
using my current compute resources), network aware (i.e. the system is
starving for data and must respond by adding an alternate stream or
redeploying the pipeline). Are these considerations implemented at the
component level, framework level, or are they entirely out-of-scope for