1) Data Structures/Representations/Management==================


There are two potentially disparate motivations for defining the data representation requirements.  In the coarse-grained case, we need to define standards for exchanging data between components in this framework (interoperability).  In the fined-grained case, we want to define some canonical data structures that can be used within a component -- one developed specifically for this framework.  These two use-cases may drive different set of requirements and implementation issues.

         * Do you feel both of these use cases are equally important or should we focus exclusively on one or the other?


Both are important. The strongest case, IMO, for the intra-component DS/DM

is that I have a stable set of data modeling/mgt tools that I can use for

families of components. Having a solid DS/DM base will free me to focus

on vis and rendering algorithms, which is how I want to spend my time.


The strongest case for the inter-component DS/DM is the "strong typing"

property that makes AVS and apps of its ilk work so well.


The "elephant in the living room" is that there is no silver bullet.

I favor an approach that is, by design, incremental. What I mean is that

we can deal with structure grids, unstructured grids, geom and other

renderable data, etc. in a more or less piecemeal fashion with an eye

towards component level interoperability in the long term. In the beginning,

there won't be 100% interoperability as if, for example, all data models

and types were stuffed into a vector bundles interface. OTOH, a more

conciliatory approach will permit forward progress among multiple

independent groups who are all eyeing "interoperability". This is the

real goal, not a "single true data model."


         * Do you feel the requirements for each of these use-cases are aligned or will they involve two separate development tracks?  For instance, using "accessors" (method calls that provide abstract access to essentially opaque data structures) will likely work fine for the coarse-grained data exchanges between components, but will lead to inefficiencies if used to implement algorithms within a particular component.


They are aligned to a large degree - data structures/models are produced and

consumed by component code, but may also be manipulated (serialized,

marshalled, etc) by the framework.


         * As you answer the "implementation and requirements" questions below, please try to identify where coarse-grained and fine-grained use cases will affect the implementation requirements.


What are requirements for the data representations that must be supported by a common infrastructure.  We will start by answering Pat's questions of about representation requirements and follow up with personal experiences involving particular domain scientist's requirements.

         Must: support for structured data



         Must/Want: support for multi-block data?

Must. We must set targets that meet our needs, and not sacrifice

requirements for speed of implementation.


         Must/Want: support for various unstructured data representations? (which ones?)


Must. Unstructured data reps are widely used and they should not be

excluded from the base set of DS/DM technologies.


         Must/Want: support for adaptive grid standards?  Please be specific about which adaptive grid methods you are referring to.  Restricted block-structured AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.


Want, badly. We could start with Berger-Colella AMR since it is widely

used. I'm not crazy about Boxlib, though, and hope we can do something

that is easier to use.


         Must/Want: "vertex-centered" data, "cell-centered" data? other-centered?

Don't care - will let someone else answer this.


         Must: support time-varying data, sequenced, streamed data?


Not ready for prime time. I've read two or three research proposals in

the past year that focus on methods for time-varying data representations

and manipulation. IMO, this topic is not ready for prime time yet. We

can say that it would be nice to have, but will probably not be fully

prepared to start whacking out code.


         Must/Want: higher-order elements?


Not sure what this means, exactly, so I'll improvise. Beyond scientific

data representations, there is a family of "vis data structures" that need

to be on the table. These include renderable stuff - images, deep images,

explicit and implicit geometry, scene graph ordering semantics, scene

specification semantics, etc. In addition, there is the issue of

"performance data" and how it will be represented.


         Must/Want: Expression of material interface boundaries and other special-treatment of boundary conditions.


I'll let someone else answer this one.


         * For commonly understood datatypes like structured and unstructured, please focus on any features that are commonly overlooked in typical implementations.  For example, often data-centering is overlooked in structured data representations in vis systems and FEM researchers commonly criticize vis people for co-mingling geometry with topology for unstructured grid representations.  Few datastructures provide proper treatment of boundary conditions or material interfaces.  Please describe your personal experience on these matters.


No comment.


         * Please describe data representation requirements for novel data representations such as bioinformatics and terrestrial sensor datasets.  In particular, how should we handle more abstract data that is typically given the moniker "information visualization".


Maybe I don't understand the problem...the same tough issues that plague

more familiar data models appear to be present in bioinformatics and

"info viz" data mgt. There are heirarchical data, unstructured data,

multivariate and multidimensional data, etc.



What do you consider the most elegant/comprehensive implementation for data representations that you believe could form the basis for a comprehensive visualization framework?

         * For instance, AVS uses entirely different datastructures for structure, unstructured and geometry data.  VTK uses class inheritance to express the similarities between related structures.  Ensight treats unstructured data and geometry nearly interchangably.  OpenDX uses more vector-bundle-like constructs to provide a more unified view of disparate data structures.  FM uses data-accessors (essentially keeping the data structures opaque).

         * Are there any of the requirements above that are not covered by the structure you propose?


Not sure how to answer. The one thing that came to mind is a general observation

that the above data models are designed for scientific data. The AVS geom

data structure was opaque to the developer, and if you looked at the header

files, was really, really ugly. Since I have a keen interest in renderers,

I am very concerned about having adequate flexibility and performance from

a DS/DM for moving/representing renderable data, as opposed to large

structured or unstructured meshes. It is possible to generalize a

DS for storing renderable data (e.g, a scene graph), but this separate class

of citizen reflects the partitioning of data types in AVS. Perhaps this

isn't something to be concerned about at this point.


         * This should focus on the elegance/usefulness of the core design-pattern employed by the implementation rather than a point-by-point description of the implemenation!

         * Is there information or characteristics of particular file format standards that must percolate up into the specific implementation of the in-memory data structures?


One observation is what seems to be a successful design pattern from the

DMF effort: let the HDF guys build the heavy lifting machinery, and focus

upon an abstraction layer that uses the machinery to move bytes.


For the purpose of this survey, "data analysis" is defined broadly as all non-visual data processing done *after* the simulation code has finished and *before* "visual analysis".

         * Is there a clear dividing line between "data analysis" and "visual analysis" requirements?


Generally speaking, there's not much difference.


That said, some differences seem obvious to me:


1. Performance - visualization is most often an interactive process, but

has offline implementations. "Plain old" data analysis seems to be mostly

an offline activity with a few interactive implementations.


2. Scope - data analysis seems to be a subset of vis. Data analysis doesn't

have need for as rich a DS/DM infrastructure as vis.


         * Can we (should we) incorporate data analysis functionality into this framework, or is it just focused on visual analysis.


Ideally, the same machinery could be used in both domains.


         * What kinds of data analysis typically needs to be done in your field?  Please give examples and how these functions are currently implemented.

         * How do we incorporate powerful data analysis functionality into the framework?


2) Execution Model=======================

It will be necessary for us to agree on a common execution semantics for our components.  Otherwise, while we might have compatible data structures but incompatible execution requirements.  Execution semantics is akin to the function of protocol in the context of network serialization of data structures.  The motivating questions are as follows;

         * How is the execution model affected by the kinds of algorithms/system-behaviors we want to implement.


The "simple" execution model is for the framework to invoke a component, be

notified of its completion, then invoke the next component in the chain, etc.

Things get more interesting if you want to have a streaming processing model.

Related, progressive processing is somewhat akin to streaming, but more



         * How then will a given execution model affect data structure implementations


We're back to the issue of needing a DS/DM that supports multiresolution

models from the git-go. The relationship between data analysis and vis data

models becomes more apparent here when we start thinking about multires

representations of unstructured data, like particle fields or point clouds.


         * How will the execution model be translated into execution semantics on the component level.  For example will we need to implement special control-ports on our components to implement particular execution models or will the semantics be implicit in the way we structure the method calls between components.


One thing that always SUPREMELY annoyed me about AVS was the absence of a

"stop" button on the modules. This issue concerns being able to interrupt

a module's processing when it was taking too long. Related, it might be nice

to have an execution model that uses the following paradigm: "OK, time's up,

give me what you have now."


What kinds of execution models should be supported by the distributed visualization architecture

         * View dependent algorithms? (These were typically quite difficult to implement for dataflow visualization environments like AVS5).

         * Out-of-core algorithms

         * Progressive update and hierarchical/multiresolution algorithms?


All of the above. Go team!


         * Procedural execution from a single thread of control (ie. using an commandline language like IDL to interactively control an dynamic or large parallel back-end)


HIstorically, this approach has proven to be very useful.


         * Dataflow execution models?  What is the firing method that should be employed for a dataflow pipeline?  Do you need a central executive like AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or some sort of abstraction that allows the modules to be used with either executive paradigm?


I get stuck thinking about the UI for this kind of thing rather than

the actual implementation. I'll defer to others for opinions.


         * Support for novel data layouts like space-filling curves?

         * Are there special considerations for collaborative applications?

         * What else?


Control data, performance data and framework response to and manipulation

of such data.


How will the execution model affect our implementation of data structures?

         * how do you decompose a data structure such that it is amenable to streaming in small chunks?

         * how do you represent temporal dependencies in that model?

         * how do you minimize recomputation in order to regenerate data for view-dependent algorithms.


What are the execution semantics necessary to implement these execution models?

         * how does a component know when to compute new data? (what is the firing rule)


To review, the old AVS model said that a module would be executed if any

of its parameters changed, or if its input data changed. One thing that

was annoying was that you had to explicitly disable the flow executive if

you wanted to make changes to multiple parameters on a single module before

allowing it to execute. This type of thing came up when using a module with

a long execution time.


         * does coordination of the component execution require a central executive or can it be implemented using only rules that are local to a particular component.


Not sure what this means.


         * how elegantly can execution models be supported by the proposed execution semantics?  Are there some things, like loops or back-propagation of information that are difficult to implement using a particular execution semantics?


The dataflow thing doesn't lend itself well to things like view dependent

processing where the module at the end of the chain (renderer) sends view

parameters back upstream, thereby causing the network to execute again, etc.

The whole upstream data thing is a "wart on the ass of" AVS. (sorry)


How will security considerations affect the execution model?


3) Parallelism and load-balancing=================

Thus far, managing parallelism in visualization systems has been a tedious and difficult at best.  Part of this is a lack of powerful abstractions for managing data-parallelism, load-balancing and component control.


Please describe the kinds of parallel execution models that must be supported by a visualization component architecture.

         * data-parallel/dataflow pipelines?


It would be nice if the whole scatter/gather thing could be marshaled

by the framework. That way, my SuperSlick[tm] renderer wouldn't contain

a bunch of icky network code that manages multiple socket connections

from an N-way parallel vis component. One interesting problem is how a

persistent tool, like a renderer, will be notified of changes in data

originating from external components. I want some infrastructure that

will make obsolete me having to write custom code like this for each

new project.


         * master/slave work-queues?

         * streaming update for management of pipeline parallelism?

         * chunking mechanisms where the number of chunks may be different from the number of CPU's employed to process those chunks?

         * how should one manage parallelism for interactive scripting languages that have a single thread of control?  (eg. I'm using a commandline language like IDL that interactively drives an arbitrarily large set of parallel resources.  How can I make the parallel back-end available to a single-threaded interactive thread of control?)


Please describe your vision of what kinds of software support / programming design patterns are needed to better support parallelism and load balancing.

         * What programming model should be employed to express parallelism.  (UPC, MPI, SMP/OpenMP, custom sockets?)


This discussion may follow the same path as the one about DS/DM for grids.

The answer seems to be "one size doesn't fit all, but there is no 'superset'

the makes everyone happy." That said, there is likely a set of common issues

wrt execution and DS/DM that underly parallel components regardless of



         * Can you give some examples of frameworks or design patterns that you consider very promising for support of parallelism and load balancing.  (ie. PNNL Global Arrays or Sandia's Zoltan)




Mabye should include "remote resource management" in this thread. I'm

thinking of the remote AVS module libraries. So, not only is there the issue

of launching parallel components, and load balancing (not sure how this will

play out), but also one of allowing a user to select, at run time, from

among a set of resources.


This problem becomes even more interesting when the pipeline optimization

starts to happen, and components are migrated across resources.


         * Should we use novel software abstractions for expressing parallelism or should the implementation of parallelism simply be an opaque property of the component? (ie. should there be an abstract messaging layer or not)

         * How does the NxM work fit in to all of this?  Is it sufficiently differentiated from Zoltan's capabilities?



===============End of Mandatory Section (the rest is voluntary)=============


4) Graphics and Rendering=================

What do you use for converting geometry and data into images (the rendering-engine).  Please comment on any/all of the following.

         * Should we build modules around declarative/streaming methods for rendering geometry like OpenGL, Chromium and DirectX or should we move to higher-level representations for graphics offered by scene graphs?  What are the pitfalls of building our component architecture around scene graphs?


As a scene graph proponent, I would say that you don't build component

architectures around scene graphs. That concept doesn't make any sense to me.

Instead, what you do is have DS/DM representations/encapsulations for the

results of visualization. These are things like buckets-o-triangles, perhaps

at multiple resolutions. You also provide the means to send renderer information

to vis components to do view-dependent processing, or some other form of

selective processing.


Similarly, you don't make the output of visualization components in the form

of glBegin()/glEnd() pairs, either.


Back to the scene graph issue - what you allow for is composition of streams

of data into a renderer. Since view position information is supported as a

first class DS/DM citizen (right?) it becomes possible to compose a

rendering session that is driven by an external source.


Nearly all renderers use scene graph concepts - resistance is futile! The

weak spot in this discussion concerns streaming. Since scene graphs systems

presume some notion of static data, the streaming notion poses some problems.

They can be surmounted by adding some smarts to the rendering and the

data streaming - send over some bounding box info to start with, then allow

the streaming to happen at will. The renderer could either then not render

that tree branch until transmission is complete, or it could go ahead and

render whatever is in there at the time. Middle ground could be achieved

with progressive transmission, so long as there are "markers" that signal

the completion of a finished chunk of data to be rendered.


Some people's "complaints" about scene graphs stem from bad designs

and bad implementations. A "scene graph system" is supposed to be

an infrastructure for storing scene data and rendering. That ought to

include support for image-based methods, even though at first blush

it seems nonsensical to talk about buckets-o-triangles in the same

breath as normal maps. All interactive rendering systems are fundamentally

created equally in terms of intent & design. The implementation varies.

Among the top items in the "common" list is the need to store data, the

need to specify a viewpoint, and the need to propogate transformation

information. Beyond that, it's merely an implementation issue.


I caution against spending too much time worrying about how scene graphs

fit into DiVA because the issue is largely a red herring.


         * What about Postscript, PDF and other scale-free output methods for publication quality graphics?  Are pixmaps sufficient?


Gotta have vector graphics.


In a distributed environment, we need to create a rendering subsystem that can flexibly switch between drawing to a client application by sending images, sending geometry, or sending geometry fragments (image-based rendering)?  How do we do that?


Again, one size doesn't fit all. These seem to be logically different components.


         * Please describe some rendering models that you would like to see supported (ie. view-dependent update, progressive update) and how they would adjust dynamically do changing objective functions (optimize for fastest framerate, or fastest update on geometry change, or varying workloads and resource constraints).


The scene graph treatise (above) covers most of what I have to say for now.


         * Are there any good examples of such a system?


I know of a couple of good scene graphs that can form the basis for renderers.


What is the role of non-polygonal methods for rendering (ie. shaders)?

         * Are you using any of the latest gaming features of commodity cards in your visualization systems today?

         * Do you see this changing in the future? (how?)


We've invited Ilmi Yoon to the next workshop. She represents the IBR

community. I am very keen to see us take advantage of IBR techniques as well

as our traditional polygon engines, perhaps combining them in interesting

ways to realize powerful new systems.


5) Presentation=========================

It will be necessary to separate the visualization back-end from the presentation interface.  For instance, you may want to have the same back-end driven by entirely different control-panels/GUIs and displayed in different display devices (a CAVE vs. a desktop machine).   Such separation is also useful when you want to provide different implementations of the user-interface depending on the targeted user community.  For instance, visualization experts might desire a dataflow-like interface for composing visualization workflows whereas a scientists might desire a domain-specific dash-board like interface that implements a specific workflow.  Both users should be able to share the same back-end components and implementation even though the user interface differs considerably.


H>ow do different presentation devices affect the component model?

         * Do different display devices require completely different user interface paradigms?  If so, then we must define a clear separation between the GUI description and the components performing the back-end computations.  If not, then is there a common language to describe user interfaces that can be used across platforms?

         * Do different display modalities require completely different component/algorithm implementations for the back-end compute engine?  (what do we do about that??)


What Presentation modalities do you feel are important and what do you consider the most important.

         * Desktop graphics (native applications on Windows, on Macs)

Yes, most important, will never go away.


         * CAVEs, Immersadesks, and other VR devices

If it works on desktops, it will work in these environments.


         * Ultra-high-res/Tiled display devices?

Second to workstations. With evolution of Chromium, DMX and the nascent

PICA stuff, I would expect that desktop tools would port transparently

to these devices.


         * Graphics access via Virtual Machines like Java?

         * Web-based applications?

(I view these two as similar in many respects). Always sounds nice, but have

yet to see much fruit in this area. The potential importance/relevance is great.

The browser makes a nice UI engine, but I wouldn't trust it to do "real"



What abstractions do you think should be employed to separate the presentation interface from the back-end compute engine?

         * Should we be using CCA to define the communication between GUI and compute engine or should we be using software infrastructure that was designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)

         * How do such control interfaces work with parallel applications?  Should the parallel application have a single process that manages the control interface and broadcasts to all nodes or should the control interface treat all application processes within a given component as peers?



6) Basic Deployment/Development Environment Issues============

One of the goals of the distributed visualization architecture is seamless operation on the Grid -- distributed/heterogeneous collections of machines.  However, it is quite difficult to realize such a vision without some consideration of deployment/portability issues.  This question also touches on issues related to the development environment and what kinds of development methods should be supported.


What languages do you use for core vis algorithms and frameworks.

         * for the numerically intensive parts of vis algorithms



         * for the glue that connects your vis algorithms together into an application?



         * How aggressively do you use language-specific features like C++ templates?

Beyond vanilla classes, not at all.


         * is Fortran important to you?  Is it important that a framework support it seamlessly?

No. Fortran can be wrapped inside something sane.


         * Do you see other languages becoming important for visualization (ie. Python, UPC, or even BASIC?)


What platforms are used for data analysis/visualization?

         * What do you and your target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)


For rendering, OpenGL engines.


         * What kinds of presentation devices are employed (desktops, portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories) and what is their relative importance to active users.


Workstations are most important.


         * What is the relative importants of these various presentation methods from a research standpoint?

         * Do you see other up-and-coming visualization platforms in the future?


Tell us how you deal with the issue of versioning and library dependencies for software deployment.

         * For source code distributions, do you bundle builds of all related libraries with each software release (ie. bundle HDF5 and FLTK source with each release).


Oddly enough, I do bundling like this for some of my projects. I think people

appreciate it.


         * What methods are employed to support platform independent builds (cmake, imake, autoconf).  What are the benefits and problems with this approach.


I hate Imake, but used it extensively for a long time with LBL's AVS modules.

I think it still works. Nobody I know can figure out how autoconf works. I

personally tend to have different makefiles, particularly when doing code

that is supposed to build on Win32 as well as Unix/Linux systems.


         * For binaries, have you have issues with different versions of libraries (ie. GLIBC problems on Linux and different JVM implemetnations/version for Java).  Can you tell us about any sophisticated packaging methods that address some of these problems (RPM need not apply)


I tend to just do source, rather than binaries, to avoid this whole morass.

OTOH, as a consumer, I prefer RPMs so that I don't have to build it. I want

my ice toasted, please.


         * How do you handle multiplatform builds?


How do you (or would you) provide abstractions that hide the locality of various components of your visualization/data analysis application?

         * Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC?  Please comment on advantages/problems of these technologies.

         * Do web/grid services come into play here?



7) Collaboration ==========================

If you are interested in "collaborative appllications" please define the term "collaborative".  Perhaps provide examples of collaborative application paradigms.


The term "collaboration" is one of the most overused, misused and abused

terms in the English language. There is a huge disconnect between what

many users want/need, and what seems to be an overemphasis upon collaborative

technologies. For this particular project, collaboration (ought to) mean:

being able to share software components; and some level of confidence that

"DiVA-compliant" components in fact do interoperate. For the sake of

discussion, let's call this type of collaboration "interoperability."


For the other forms of "collaboration," care must be taken to define what

they are, whether they are useful, etc. If you're talking about multiple

persons seeing the same interactive renderer output, and each person being

able to do some interactive transformation, let's call that form of

collaboration "MI" (multiperson-interactive).


I recall hearing some discussion about the relationship between the AG

and DiVA. From my perspective, the AG ought to provide support to allow

any application to run in a "MI mode" With this perspective,

there isn't really much to talk about in terms of fundamental DiVA

design wrt "MI."


Is collaboration a feature that exists at an application level or are there key requirements for collaborative applications that necessitate component-level support?

         * Should collaborative infrastructure be incorporated as a core feature of very component?


I don't know what "collaborative infrastructure" means. Given that my position

(above), "MI" is more of a framework thing, and not a component thing.


         * Can any conceivable collaborative requirement be satisfied using a separate set of modules that specifically manage distribution of events and data in collaborative applications?


This seems to be the most realistic approach to "MI."


         * How is the collaborative application presented?  Does the application only need to be collaborative sometimes?

         * Where does performance come in to play?  Does the visualization system or underlying libraries need to be performance-aware?  (i.e. I'm doing a given task and I need a framerate of X for it to be useful using my current compute resources), network aware (i.e. the system is starving for data and must respond by adding an alternate stream or redeploying the pipeline).  Are these considerations implemented at the component level, framework level, or are they entirely out-of-scope for our consideration?


The MI-aware framework collects and uses performance data generated by

components to make decisions about how to tune/optimize visualization

pipeline performance (the pipeline consists of a bunch of components).


If some of the other issues I've raised are addressed (e.g., time-limited

execution, partial processing, incremental processing, etc), then the

performance issues raised within the context of MI come "for free".