From: "John Clyne" <clyne@ncar

From: "John Clyne" <clyne@ncar.ucar.edu>

Date: Fri Sep 5, 2003 3:40:00 PM US/Pacific

To: "John Shalf" <jshalf@lbl.gov>, <diva@lbl.gov>

Subject: Re: DiVA Survey (Please return by Sept 10!)

John,

I think I may have answered 25% of the questions below. I didn't answer

more because 1) my 3 1/2 hour flight didn't permit, and 2) I think

a lot of the questions really get into implementation issues that should not

(can not) be addressed until we have agreement on functional

requirements. They are excellent questions, and raise important points

to keep in mind, but I felt it was premature

to try and answer them.

cheers - jc

1) Data Structures/Representations/Management==================

The center of every successful modular visualization architecture has

been a flexible core set of data structures for representing data that

is important to the targeted application domain. Before we can begin

working on algorithms, we must come to some agreement on common methods

(either data structures or accessors/method calls) for exchanging data

between components of our vis framework.

There are two potentially disparate motivations for defining the data

representation requirements. In the coarse-grained case, we need to

define standards for exchanging data between components in this

framework (interoperability). In the fined-grained case, we want to

define some canonical data structures that can be used within a

component -- one developed specifically for this framework. These two

use-cases may drive different set of requirements and implementation

issues.

* Do you feel both of these use cases are equally important or should

we focus exclusively on one or the other?

Too soon to tell. Focus on both until the issues become more clear.

* Do you feel the requirements for each of these use-cases are aligned

or will they involve two separate development tracks? For instance,

using "accessors" (method calls that provide abstract access to

essentially opaque data structures) will likely work fine for the

coarse-grained data exchanges between components, but will lead to

inefficiencies if used to implement algorithms within a particular

component.

I think it's premature to say. We need to have agreement on the

questions below first.

* As you answer the "implementation and requirements" questions below,

please try to identify where coarse-grained and fine-grained use cases

will affect the implementation requirements.

What are requirements for the data representations that must be

supported by a common infrastructure. We will start by answering Pat's

questions of about representation requirements and follow up with

personal experiences involving particular domain scientist's

requirements.

Must: support for structured data

Must.

Must/Want: support for multi-block data?

Must.

Must/Want: support for various unstructured data representations?

(which ones?)

Not sure. Not a priority.

Must/Want: support for adaptive grid standards? Please be specific

about which adaptive grid methods you are referring to. Restricted

block-structured AMR (aligned grids), general block-structured AMR

(rotated grids), hierarchical unstructured AMR, or non-hierarchical

adaptive structured/unstructured meshes.

Adaptive grid usage is in its infancy at NCAR. But I suspect it is the

way of the future. Too soon to be specific about which adaptive grid

methods are prefered.

Must/Want: "vertex-centered" data, "cell-centered" data?

other-centered?

Must: support time-varying data, sequenced, streamed data?

Must. Time varying data is what makes so many of our problems currently

intractible. Too many of the available tools (e.g. VTK) assume static

data and completely fall apart when the data is otherwise.

Must/Want: higher-order elements?

low priority

Must/Want: Expression of material interface boundaries and other

special-treatment of boundary conditions.

no priority

* For commonly understood datatypes like structured and unstructured,

please focus on any features that are commonly overlooked in typical

implementations. For example, often data-centering is overlooked in

structured data representations in vis systems and FEM researchers

commonly criticize vis people for co-mingling geometry with topology

for unstructured grid representations. Few datastructures provide

proper treatment of boundary conditions or material interfaces. Please

describe your personal experience on these matters.

Support for missing data is essential for observed fields.

* Please describe data representation requirements for novel data

representations such as bioinformatics and terrestrial sensor datasets.

In particular, how should we handle more abstract data that is

typically given the moniker "information visualization".

Beats me.

What do you consider the most elegant/comprehensive implementation for

data representations that you believe could form the basis for a

comprehensive visualization framework?

* For instance, AVS uses entirely different datastructures for

structure, unstructured and geometry data. VTK uses class inheritance

to express the similarities between related structures. Ensight treats

unstructured data and geometry nearly interchangably. OpenDX uses more

vector-bundle-like constructs to provide a more unified view of

disparate data structures. FM uses data-accessors (essentially keeping

the data structures opaque).

I don't think this is what you're after, but i've come to believe that

multiresolution data representations with efficient domain subsetting

capabilities are the most pragmatic and elegant

way to deal with large data sets. In addition to enabling interaction

with the largest data sets they offer tremenous scalability from desktop

to "visual supercomputer". i would encourage a data model that includes

and facilitates their integral support.

* Are there any of the requirements above that are not covered by the

structure you propose?

Not sure.

* This should focus on the elegance/usefulness of the core

design-pattern employed by the implementation rather than a

point-by-point description of the implemenation!

* Is there information or characteristics of particular file format

standards that must percolate up into the specific implementation of

the in-memory data structures?

For the purpose of this survey, "data analysis" is defined broadly as

all non-visual data processing done *after* the simulation code has

finished and *before* "visual analysis".

I take issue with your definition of data analysis. Yes it is performed

after the simulation, but it is performed (or would be performed if viz

tools didn't suck) in *parallel* with visual analysis. The two when

well integrated, which is rarely the case, can compliment each other

tremendously. So called "visual analysis" by itself, without good

quantitative capablity, is pretty useless.

* Is there a clear dividing line between "data analysis" and "visual

analysis" requirements?

Well, text based, programmable user interfaces are a must for "data

analysis" , whereas GUI is essential for visual.

* Can we (should we) incorporate data analysis functionality into this

framework, or is it just focused on visual analysis.

If visualization is ever going to live up to the claim made by so many

in the viz community of

it being an indispensable tool for analsyis, tight integration with

statistical tools and data processing capabilities are a must. Otherwise

we'll just continue to make pretty pictures, put on dog and pony shows,

and wonder where the users are.

* What kinds of data analysis typically needs to be done in your

field? Please give examples and how these functions are currently

implemented.

Pretty much everything you can do with IDL or matlab.

* How do we incorporate powerful data analysis functionality into the

framework?

I'd suggest exploring leveraging existing tools, numerical python for

example.

2) Execution Model=======================

It will be necessary for us to agree on a common execution semantics

for our components. Otherwise, while we might have compatible data

structures but incompatible execution requirements. Execution

semantics is akin to the function of protocol in the context of network

serialization of data structures. The motivating questions are as

follows;

* How is the execution model affected by the kinds of

algorithms/system-behaviors we want to implement.

* How then will a given execution model affect data structure

implementations

* How will the execution model be translated into execution semantics

on the component level. For example will we need to implement special

control-ports on our components to implement particular execution

models or will the semantics be implicit in the way we structure the

method calls between components.

What kinds of execution models should be supported by the distributed

visualization architecture

* View dependent algorithms? (These were typically quite difficult to

implement for dataflow visualization environments like AVS5).

These are neat research topics, but i've never been convinced that they

have much application beyond IEEEViz publications. Mostly I believe

this because of the complexity they impose on the data model. Better to

simply offer progressive/multiresolution data access.

* Out-of-core algorithms

Seems like a must for large data. But is this a requirement or a design

issue?

* Progressive update and hierarchical/multiresolution algorithms?

This is the way to go (IMHO), the question is at what level to support

it.

* Procedural execution from a single thread of control (ie. using an

commandline language like IDL to interactively control an dynamic or

large parallel back-end)

A must for data analysis and data manipulation (derving new fields, etc)

* Dataflow execution models? What is the firing method that should be

employed for a dataflow pipeline? Do you need a central executive like

AVS/OpenDX or, completely distributed firing mechanism like that of

VTK, or some sort of abstraction that allows the modules to be used

with either executive paradigm?

* Support for novel data layouts like space-filling curves?

We use a wavelet based approach similar to space filling curves. Both

approaches have merrit and both should be supportable by the framework.

* Are there special considerations for collaborative applications?

* What else?

How will the execution model affect our implementation of data

structures?

* how do you decompose a data structure such that it is amenable to

streaming in small chunks?

* how do you represent temporal dependencies in that model?

* how do you minimize recomputation in order to regenerate data for

view-dependent algorithms.

What are the execution semantics necessary to implement these execution

models?

* how does a component know when to compute new data? (what is the

firing rule)

* does coordination of the component execution require a central

executive or can it be implemented using only rules that are local to a

particular component.

* how elegantly can execution models be supported by the proposed

execution semantics? Are there some things, like loops or

back-propagation of information that are difficult to implement using a

particular execution semantics?

How will security considerations affect the execution model?

3) Parallelism and load-balancing=================

Thus far, managing parallelism in visualization systems has been a

tedious and difficult at best. Part of this is a lack of powerful

abstractions for managing data-parallelism, load-balancing and

component control.

Please describe the kinds of parallel execution models that must be

supported by a visualization component architecture.

* data-parallel/dataflow pipelines?

* master/slave work-queues?

* streaming update for management of pipeline parallelism?

* chunking mechanisms where the number of chunks may be different from

the number of CPU's employed to process those chunks?

* how should one manage parallelism for interactive scripting

languages that have a single thread of control? (eg. I'm using a

commandline language like IDL that interactively drives an arbitrarily

large set of parallel resources. How can I make the parallel back-end

available to a single-threaded interactive thread of control?)

Please describe your vision of what kinds of software support /

programming design patterns are needed to better support parallelism

and load balancing.

* What programming model should be employed to express parallelism.

(UPC, MPI, SMP/OpenMP, custom sockets?)

* Can you give some examples of frameworks or design patterns that you

consider very promising for support of parallelism and load balancing.

(ie. PNNL Global Arrays or Sandia's Zoltan)

http://www.cs.sandia.gov/Zoltan/

http://www.emsl.pnl.gov/docs/global/ga.html

* Should we use novel software abstractions for expressing parallelism

or should the implementation of parallelism simply be an opaque

property of the component? (ie. should there be an abstract messaging

layer or not)

* How does the NxM work fit in to all of this? Is it sufficiently

differentiated from Zoltan's capabilities?

Hmm. These all seem to be implementation issues. Too early to answer.

===============End of Mandatory Section (the rest is

voluntary)=============

4) Graphics and Rendering=================

What do you use for converting geometry and data into images (the

rendering-engine). Please comment on any/all of the following.

* Should we build modules around declarative/streaming methods for

rendering geometry like OpenGL, Chromium and DirectX or should we move

to higher-level representations for graphics offered by scene graphs?

What are the pitfalls of building our component architecture around

scene graphs?

Not so good for time varying data last time i checked.

* What about Postscript, PDF and other scale-free output methods for

publication quality graphics? Are pixmaps sufficient?

Well what are we trying to provide, an environment for analysis or

producing images for publications? The latter can be done as a post

process and should not, IMHO, be a focus of DIVA.

In a distributed environment, we need to create a rendering subsystem

that can flexibly switch between drawing to a client application by

sending images, sending geometry, or sending geometry fragments

(image-based rendering)? How do we do that?

Use Cr

* Please describe some rendering models that you would like to see

supported (ie. view-dependent update, progressive update) and how they

would adjust dynamically do changing objective functions (optimize for

fastest framerate, or fastest update on geometry change, or varying

workloads and resource constraints).

Not sold on view dependent update as worthwhile, but progressive updates

can be hugely helpful. Question is do you accomplish this by adding

support in the renderer or back it up the pipeline to the raw data?

* Are there any good examples of such a system?

Yes, Kitware's not-for-free volume renderer (volren?). I does a nice job

with handling progressive updates. This is mostly handled by the GUI but

places some obvious requirements on the underlying rendering/viz

component.

What is the role of non-polygonal methods for rendering (ie. shaders)?

* Are you using any of the latest gaming features of commodity cards

in your visualization systems today?

Yup, we've off loaded a couple of algorithms from the CPU.

* Do you see this changing in the future? (how?)

The biggest issue is portability, but things are looking up with OpenGL

2.0 efforts, etc.

5) Presentation=========================

It will be necessary to separate the visualization back-end from the

presentation interface. For instance, you may want to have the same

back-end driven by entirely different control-panels/GUIs and displayed

in different display devices (a CAVE vs. a desktop machine). Such

separation is also useful when you want to provide different

implementations of the user-interface depending on the targeted user

community. For instance, visualization experts might desire a

dataflow-like interface for composing visualization workflows whereas a

scientists might desire a domain-specific dash-board like interface

that implements a specific workflow. Both users should be able to

share the same back-end components and implementation even though the

user interface differs considerably.

How do different presentation devices affect the component model?

* Do different display devices require completely different user

interface paradigms? If so, then we must define a clear separation

between the GUI description and the components performing the back-end

computations. If not, then is there a common language to describe user

interfaces that can be used across platforms?

* Do different display modalities require completely different

component/algorithm implementations for the back-end compute engine?

(what do we do about that??)

What Presentation modalities do you feel are important and what do you

consider the most important.

* Desktop graphics (native applications on Windows, on Macs)

This is numero uno by a HUGE margin

* Graphics access via Virtual Machines like Java?

Not important

* CAVEs, Immersadesks, and other VR devices

Not important

* Ultra-high-res/Tiled display devices?

Moderately important

* Web-based applications?

Well, maybe.

What abstractions do you think should be employed to separate the

presentation interface from the back-end compute engine?

* Should we be using CCA to define the communication between GUI and

compute engine or should we be using software infrastructure that was

designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)

* How do such control interfaces work with parallel applications?

Should the parallel application have a single process that manages the

control interface and broadcasts to all nodes or should the control

interface treat all application processes within a given component as

peers?

6) Basic Deployment/Development Environment Issues============

One of the goals of the distributed visualization architecture is

seamless operation on the Grid -- distributed/heterogeneous collections

of machines. However, it is quite difficult to realize such a vision

without some consideration of deployment/portability issues. This

question also touches on issues related to the development environment

and what kinds of development methods should be supported.

What languages do you use for core vis algorithms and frameworks.

* for the numerically intensive parts of vis algorithms

C/C++

* for the glue that connects your vis algorithms together into an

application?

C/C++, Tcl, Python

* How aggressively do you use language-specific features like C++

templates?

Not at all. Too scary.

* is Fortran important to you? Is it important that a framework

support it seamlessly?

Nope.

* Do you see other languages becoming important for visualization (ie.

Python, UPC, or even BASIC?)

Python, mostly because the direction of numerical python.

What platforms are used for data analysis/visualization?

* What do you and your target users depend on to display results? (ie.

Windows, Linux, SGI, Sun etc..)

All the above, primarily lintel an windoze though.

* What kinds of presentation devices are employed (desktops,

portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories)

and what is their relative importance to active users.

desktops, tiled displays, AG

* What is the relative importants of these various presentation

methods from a research standpoint?

The desktop is where the users live.

* Do you see other up-and-coming visualization platforms in the future?

I don't see SMP graphics boxes going away as quickly as some might.

Tell us how you deal with the issue of versioning and library

dependencies for software deployment.

* For source code distributions, do you bundle builds of all related

libraries with each software release (ie. bundle HDF5 and FLTK source

with each release).

Sometimes, depending on the stability of the libraries.

* What methods are employed to support platform independent builds

(cmake, imake, autoconf). What are the benefits and problems with this

approach.

I've used all, developed my own, and like none. Maybe we can do better.

I think something based around gmake might have the best potential.

* For binaries, have you have issues with different versions of

libraries (ie. GLIBC problems on Linux and different JVM

implemetnations/version for Java). Can you tell us about any

sophisticated packaging methods that address some of these problems

(RPM need not apply)

* How do you handle multiplatform builds?

The brute force, not so smart way. The VTK model is worth looking at.

How do you (or would you) provide abstractions that hide the locality

of various components of your visualization/data analysis application?

* Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC?

Please comment on advantages/problems of these technologies.

* Do web/grid services come into play here?

7) Collaboration ==========================

If you are interested in "collaborative appllications" please define

the term "collaborative". Perhaps provide examples of collaborative

application paradigms.

Is collaboration a feature that exists at an application level or are

there key requirements for collaborative applications that necessitate

component-level support?

* Should collaborative infrastructure be incorporated as a core

feature of very component?

Does it need to be incorporated in all components? What kind of collab

support is needed? Permitting session logging and geographically

separated, simultaneous, users would go a long way to providing for

collab needs and would seem to only impact the GUI and perhaps renderer.

* Can any conceivable collaborative requirement be satisfied using a

separate set of modules that specifically manage distribution of events

and data in collaborative applications?

* How is the collaborative application presented? Does the

application only need to be collaborative sometimes?

* Where does performance come in to play? Does the visualization

system or underlying libraries need to be performance-aware? (i.e. I'm

doing a given task and I need a framerate of X for it to be useful

using my current compute resources), network aware (i.e. the system is

starving for data and must respond by adding an alternate stream or

redeploying the pipeline). Are these considerations implemented at the

component level, framework level, or are they entirely out-of-scope for

our consideration?