1) Data
Structures/Representations/Management==================
There are two potentially disparate
motivations for defining the data representation requirements. In the coarse-grained case, we need to
define standards for exchanging data between components in this framework
(interoperability). In the
fined-grained case, we want to define some canonical data structures that can
be used within a component -- one developed specifically for this framework. These two use-cases may drive different
set of requirements and implementation issues.
*
Do you feel both of these use cases are equally important or should we focus
exclusively on one or the other?
Both are important. The strongest case, IMO,
for the intra-component DS/DM
is that I have a stable set of data
modeling/mgt tools that I can use for
families of components. Having a solid DS/DM
base will free me to focus
on vis and rendering algorithms, which is
how I want to spend my time.
The strongest case for the inter-component
DS/DM is the "strong typing"
property that makes AVS and apps of its ilk
work so well.
The "elephant in the living room"
is that there is no silver bullet.
I favor an approach that is, by design,
incremental. What I mean is that
we can deal with structure grids,
unstructured grids, geom and other
renderable data, etc. in a more or less
piecemeal fashion with an eye
towards component level interoperability in
the long term. In the beginning,
there won't be 100% interoperability as if,
for example, all data models
and types were stuffed into a vector bundles
interface. OTOH, a more
conciliatory approach will permit forward
progress among multiple
independent groups who are all eyeing
"interoperability". This is the
real goal, not a "single true data
model."
*
Do you feel the requirements for each of these use-cases are aligned or will
they involve two separate development tracks? For instance, using "accessors" (method calls that
provide abstract access to essentially opaque data structures) will likely work
fine for the coarse-grained data exchanges between components, but will lead to
inefficiencies if used to implement algorithms within a particular component.
They are aligned to a large degree - data
structures/models are produced and
consumed by component code, but may also be
manipulated (serialized,
marshalled, etc) by the framework.
*
As you answer the "implementation and requirements" questions below,
please try to identify where coarse-grained and fine-grained use cases will
affect the implementation requirements.
What are requirements for the data
representations that must be supported by a common infrastructure. We will start by answering Pat's
questions of about representation requirements and follow up with personal experiences
involving particular domain scientist's requirements.
Must:
support for structured data
Agree.
Must/Want:
support for multi-block data?
Must. We must set targets that meet our
needs, and not sacrifice
requirements for speed of implementation.
Must/Want:
support for various unstructured data representations? (which ones?)
Must. Unstructured data reps are widely used
and they should not be
excluded from the base set of DS/DM
technologies.
Must/Want:
support for adaptive grid standards?
Please be specific about which adaptive grid methods you are referring
to. Restricted block-structured
AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical
unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.
Want, badly. We could start with Berger-Colella
AMR since it is widely
used. I'm not crazy about Boxlib, though,
and hope we can do something
that is easier to use.
Must/Want:
"vertex-centered" data, "cell-centered" data?
other-centered?
Don't care - will let someone else answer
this.
Must:
support time-varying data, sequenced, streamed data?
Not ready for prime time. I've read two or
three research proposals in
the past year that focus on methods for
time-varying data representations
and manipulation. IMO, this topic is not
ready for prime time yet. We
can say that it would be nice to have, but
will probably not be fully
prepared to start whacking out code.
Must/Want:
higher-order elements?
Not sure what this means, exactly, so I'll
improvise. Beyond scientific
data representations, there is a family of
"vis data structures" that need
to be on the table. These include renderable
stuff - images, deep images,
explicit and implicit geometry, scene graph
ordering semantics, scene
specification semantics, etc. In addition,
there is the issue of
"performance data" and how it will
be represented.
Must/Want:
Expression of material interface boundaries and other special-treatment of
boundary conditions.
I'll let someone else answer this one.
*
For commonly understood datatypes like structured and unstructured, please
focus on any features that are commonly overlooked in typical
implementations. For example,
often data-centering is overlooked in structured data representations in vis
systems and FEM researchers commonly criticize vis people for co-mingling
geometry with topology for unstructured grid representations. Few datastructures provide proper
treatment of boundary conditions or material interfaces. Please describe your personal
experience on these matters.
No comment.
*
Please describe data representation requirements for novel data representations
such as bioinformatics and terrestrial sensor datasets. In particular, how should we handle
more abstract data that is typically given the moniker "information
visualization".
Maybe I don't understand the problem...the
same tough issues that plague
more familiar data models appear to be
present in bioinformatics and
"info viz" data mgt. There are
heirarchical data, unstructured data,
multivariate and multidimensional data, etc.
What do you consider the most
elegant/comprehensive implementation for data representations that you believe
could form the basis for a comprehensive visualization framework?
*
For instance, AVS uses entirely different datastructures for structure,
unstructured and geometry data.
VTK uses class inheritance to express the similarities between related
structures. Ensight treats
unstructured data and geometry nearly interchangably. OpenDX uses more vector-bundle-like constructs to provide a
more unified view of disparate data structures. FM uses data-accessors (essentially keeping the data
structures opaque).
*
Are there any of the requirements above that are not covered by the structure
you propose?
Not sure how to answer. The one thing that
came to mind is a general observation
that the above data models are designed for
scientific data. The AVS geom
data structure was opaque to the developer,
and if you looked at the header
files, was really, really ugly. Since I have
a keen interest in renderers,
I am very concerned about having adequate
flexibility and performance from
a DS/DM for moving/representing renderable
data, as opposed to large
structured or unstructured meshes. It is
possible to generalize a
DS for storing renderable data (e.g, a scene
graph), but this separate class
of citizen reflects the partitioning of data
types in AVS. Perhaps this
isn't something to be concerned about at
this point.
*
This should focus on the elegance/usefulness of the core design-pattern
employed by the implementation rather than a point-by-point description of the
implemenation!
*
Is there information or characteristics of particular file format standards
that must percolate up into the specific implementation of the in-memory data
structures?
One observation is what seems to be a
successful design pattern from the
DMF effort: let the HDF guys build the heavy
lifting machinery, and focus
upon an abstraction layer that uses the
machinery to move bytes.
For the purpose of this survey, "data
analysis" is defined broadly as all non-visual data processing done
*after* the simulation code has finished and *before* "visual
analysis".
*
Is there a clear dividing line between "data analysis" and
"visual analysis" requirements?
Generally speaking, there's not much
difference.
That said, some differences seem obvious to
me:
1. Performance - visualization is most often
an interactive process, but
has offline implementations. "Plain
old" data analysis seems to be mostly
an offline activity with a few interactive
implementations.
2. Scope - data analysis seems to be a
subset of vis. Data analysis doesn't
have need for as rich a DS/DM infrastructure
as vis.
*
Can we (should we) incorporate data analysis functionality into this framework,
or is it just focused on visual analysis.
Ideally, the same machinery could be used in
both domains.
*
What kinds of data analysis typically needs to be done in your field? Please give examples and how these
functions are currently implemented.
*
How do we incorporate powerful data analysis functionality into the framework?
2) Execution Model=======================
It will be necessary for us to agree on a
common execution semantics for our components. Otherwise, while we might have compatible data structures
but incompatible execution requirements.
Execution semantics is akin to the function of protocol in the context
of network serialization of data structures. The motivating questions are as follows;
*
How is the execution model affected by the kinds of algorithms/system-behaviors
we want to implement.
The "simple" execution model is
for the framework to invoke a component, be
notified of its completion, then invoke the
next component in the chain, etc.
Things get more interesting if you want to
have a streaming processing model.
Related, progressive processing is somewhat
akin to streaming, but more
stateful.
*
How then will a given execution model affect data structure implementations
We're back to the issue of needing a DS/DM
that supports multiresolution
models from the git-go. The relationship
between data analysis and vis data
models becomes more apparent here when we
start thinking about multires
representations of unstructured data, like
particle fields or point clouds.
*
How will the execution model be translated into execution semantics on the
component level. For example will
we need to implement special control-ports on our components to implement
particular execution models or will the semantics be implicit in the way we
structure the method calls between components.
One thing that always SUPREMELY annoyed me
about AVS was the absence of a
"stop" button on the modules. This
issue concerns being able to interrupt
a module's processing when it was taking too
long. Related, it might be nice
to have an execution model that uses the
following paradigm: "OK, time's up,
give me what you have now."
What kinds of execution models should be
supported by the distributed visualization architecture
*
View dependent algorithms? (These were typically quite difficult to implement
for dataflow visualization environments like AVS5).
*
Out-of-core algorithms
*
Progressive update and hierarchical/multiresolution algorithms?
All of the above. Go team!
*
Procedural execution from a single thread of control (ie. using an commandline
language like IDL to interactively control an dynamic or large parallel
back-end)
HIstorically, this approach has proven to be
very useful.
*
Dataflow execution models? What is
the firing method that should be employed for a dataflow pipeline? Do you need a central executive like
AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or
some sort of abstraction that allows the modules to be used with either
executive paradigm?
I get stuck thinking about the UI for this
kind of thing rather than
the actual implementation. I'll defer to
others for opinions.
*
Support for novel data layouts like space-filling curves?
*
Are there special considerations for collaborative applications?
*
What else?
Control data, performance data and framework
response to and manipulation
of such data.
How will the execution model affect our
implementation of data structures?
*
how do you decompose a data structure such that it is amenable to streaming in
small chunks?
*
how do you represent temporal dependencies in that model?
*
how do you minimize recomputation in order to regenerate data for
view-dependent algorithms.
What are the execution semantics necessary
to implement these execution models?
*
how does a component know when to compute new data? (what is the firing rule)
To review, the old AVS model said that a
module would be executed if any
of its parameters changed, or if its input
data changed. One thing that
was annoying was that you had to explicitly
disable the flow executive if
you wanted to make changes to multiple
parameters on a single module before
allowing it to execute. This type of thing
came up when using a module with
a long execution time.
*
does coordination of the component execution require a central executive or can
it be implemented using only rules that are local to a particular component.
Not sure what this means.
*
how elegantly can execution models be supported by the proposed execution
semantics? Are there some things,
like loops or back-propagation of information that are difficult to implement
using a particular execution semantics?
The dataflow thing doesn't lend itself well
to things like view dependent
processing where the module at the end of
the chain (renderer) sends view
parameters back upstream, thereby causing
the network to execute again, etc.
The whole upstream data thing is a
"wart on the ass of" AVS. (sorry)
How will security considerations affect
the execution model?
3) Parallelism and
load-balancing=================
Thus far, managing parallelism in visualization
systems has been a tedious and difficult at best. Part of this is a lack of powerful abstractions for managing
data-parallelism, load-balancing and component control.
Please describe the kinds of parallel
execution models that must be supported by a visualization component
architecture.
*
data-parallel/dataflow pipelines?
It would be nice if the whole scatter/gather
thing could be marshaled
by the framework. That way, my
SuperSlick[tm] renderer wouldn't contain
a bunch of icky network code that manages
multiple socket connections
from an N-way parallel vis component. One
interesting problem is how a
persistent tool, like a renderer, will be
notified of changes in data
originating from external components. I want
some infrastructure that
will make obsolete me having to write custom
code like this for each
new project.
*
master/slave work-queues?
*
streaming update for management of pipeline parallelism?
*
chunking mechanisms where the number of chunks may be different from the number
of CPU's employed to process those chunks?
*
how should one manage parallelism for interactive scripting languages that have
a single thread of control? (eg.
I'm using a commandline language like IDL that interactively drives an
arbitrarily large set of parallel resources. How can I make the parallel back-end available to a
single-threaded interactive thread of control?)
Please describe your vision of what kinds
of software support / programming design patterns are needed to better support
parallelism and load balancing.
*
What programming model should be employed to express parallelism. (UPC, MPI, SMP/OpenMP, custom sockets?)
This discussion may follow the same path as
the one about DS/DM for grids.
The answer seems to be "one size
doesn't fit all, but there is no 'superset'
the makes everyone happy." That said,
there is likely a set of common issues
wrt execution and DS/DM that underly
parallel components regardless of
implementation.
*
Can you give some examples of frameworks or design patterns that you consider
very promising for support of parallelism and load balancing. (ie. PNNL Global Arrays or Sandia's
Zoltan)
http://www.cs.sandia.gov/Zoltan/
http://www.emsl.pnl.gov/docs/global/ga.html
Mabye should include "remote resource
management" in this thread. I'm
thinking of the remote AVS module libraries.
So, not only is there the issue
of launching parallel components, and load
balancing (not sure how this will
play out), but also one of allowing a user
to select, at run time, from
among a set of resources.
This problem becomes even more interesting
when the pipeline optimization
starts to happen, and components are
migrated across resources.
*
Should we use novel software abstractions for expressing parallelism or should
the implementation of parallelism simply be an opaque property of the
component? (ie. should there be an abstract messaging layer or not)
*
How does the NxM work fit in to all of this? Is it sufficiently differentiated from Zoltan's
capabilities?
===============End of Mandatory Section
(the rest is voluntary)=============
4) Graphics and Rendering=================
What do you use for converting geometry
and data into images (the rendering-engine). Please comment on any/all of the following.
*
Should we build modules around declarative/streaming methods for rendering
geometry like OpenGL, Chromium and DirectX or should we move to higher-level
representations for graphics offered by scene graphs? What are the pitfalls of building our component architecture
around scene graphs?
As a scene graph proponent, I would say that
you don't build component
architectures around scene graphs. That
concept doesn't make any sense to me.
Instead, what you do is have DS/DM
representations/encapsulations for the
results of visualization. These are things
like buckets-o-triangles, perhaps
at multiple resolutions. You also provide
the means to send renderer information
to vis components to do view-dependent
processing, or some other form of
selective processing.
Similarly, you don't make the output of
visualization components in the form
of glBegin()/glEnd() pairs, either.
Back to the scene graph issue - what you
allow for is composition of streams
of data into a renderer. Since view position
information is supported as a
first class DS/DM citizen (right?) it
becomes possible to compose a
rendering session that is driven by an
external source.
Nearly all renderers use scene graph
concepts - resistance is futile! The
weak spot in this discussion concerns
streaming. Since scene graphs systems
presume some notion of static data, the
streaming notion poses some problems.
They can be surmounted by adding some smarts
to the rendering and the
data streaming - send over some bounding box
info to start with, then allow
the streaming to happen at will. The
renderer could either then not render
that tree branch until transmission is
complete, or it could go ahead and
render whatever is in there at the time.
Middle ground could be achieved
with progressive transmission, so long as
there are "markers" that signal
the completion of a finished chunk of data
to be rendered.
Some people's "complaints" about
scene graphs stem from bad designs
and bad implementations. A "scene graph
system" is supposed to be
an infrastructure for storing scene data and
rendering. That ought to
include support for image-based methods,
even though at first blush
it seems nonsensical to talk about
buckets-o-triangles in the same
breath as normal maps. All interactive
rendering systems are fundamentally
created equally in terms of intent &
design. The implementation varies.
Among the top items in the
"common" list is the need to store data, the
need to specify a viewpoint, and the need to
propogate transformation
information. Beyond that, it's merely an
implementation issue.
I caution against spending too much time
worrying about how scene graphs
fit into DiVA because the issue is largely a
red herring.
*
What about Postscript, PDF and other scale-free output methods for publication
quality graphics? Are pixmaps
sufficient?
Gotta have vector graphics.
In a distributed environment, we need to
create a rendering subsystem that can flexibly switch between drawing to a
client application by sending images, sending geometry, or sending geometry
fragments (image-based rendering)?
How do we do that?
Again, one size doesn't fit all. These seem
to be logically different components.
*
Please describe some rendering models that you would like to see supported (ie.
view-dependent update, progressive update) and how they would adjust dynamically
do changing objective functions (optimize for fastest framerate, or fastest
update on geometry change, or varying workloads and resource constraints).
The scene graph treatise (above) covers most
of what I have to say for now.
*
Are there any good examples of such a system?
I know of a couple of good scene graphs that
can form the basis for renderers.
What is the role of non-polygonal methods
for rendering (ie. shaders)?
*
Are you using any of the latest gaming features of commodity cards in your
visualization systems today?
*
Do you see this changing in the future? (how?)
We've invited Ilmi Yoon to the next
workshop. She represents the IBR
community. I am very keen to see us take
advantage of IBR techniques as well
as our traditional polygon engines, perhaps
combining them in interesting
ways to realize powerful new systems.
5) Presentation=========================
It will be necessary to separate the
visualization back-end from the presentation interface. For instance, you may want to have the
same back-end driven by entirely different control-panels/GUIs and displayed in
different display devices (a CAVE vs. a desktop machine). Such separation is also useful
when you want to provide different implementations of the user-interface
depending on the targeted user community.
For instance, visualization experts might desire a dataflow-like
interface for composing visualization workflows whereas a scientists might
desire a domain-specific dash-board like interface that implements a specific
workflow. Both users should be
able to share the same back-end components and implementation even though the
user interface differs considerably.