From: John Shalf
<jshalf@lbl.gov>
Date: Wed Sep 10,
2003 11:53:29 AM US/Pacific
To: diva@lbl.gov
Subject: Re: DiVA Survey
(Please return by Sept 10!)
OK,
here are my responses to the mandatory
portion of survey.
I'll send the voluntary section separately.
On Wednesday, August 27, 2003, at 03:33 PM,
John Shalf wrote:
=============The
Survey=========================
Please answer the attached survey with as
much or as little verbosity as you please and return it to me by September
10. The survey has 3 mandatory
sections and 4 voluntary (bonus) sections. The sections are as follows;
Mandatory;
1)
Data Structures
2)
Execution Model
3)
Parallelism and Load-Balancing
Voluntary;
4)
Graphics and Rendering
5)
Presentation
6)
Basic Deployment and Development Environment Issues
7)
Collaboration
We will spend this workshop focusing on
the first 3 sections, but I think we will derive some useful/motivating
information from any answers to questions in the voluntary sections.
I'll post my answers to this survey on
diva mailing list very soon. You
can post your answers publicly if you want to, but I am happy to regurgitate
your answers as "anonymous contributors" if it will enable you to be
more candid in your evaluation of available technologies.
1) Data Structures/Representations/Management==================
The center of every successful modular
visualization architecture has been a flexible core set of data structures for
representing data that is important to the targeted application domain. Before we can begin working on
algorithms, we must come to some agreement on common methods (either data
structures or accessors/method
calls) for exchanging data between components of our vis framework.
There are two potentially disparate
motivations for defining the data representation requirements. In the coarse-grained case, we need to
define standards for exchanging data between components in this framework
(interoperability). In the
fined-grained case, we want to define some canonical data structures that can be
used within a component -- one developed specifically for this framework. These two use-cases may drive different
set of requirements and implementation issues.
*
Do you feel both of these use cases are equally important or should we focus
exclusively on one or the other?
While I am very interested in design
patterns, data structures, and services that could make the design of the
interior of parallel/distributed components easier, it is clear that the
interfaces between components are the central focus of this project. So the definition of inter-component
data exchanges is preeminent.
*
Do you feel the requirements for each of these use-cases are aligned or will
they involve two separate development tracks? For instance, using "accessors" (method calls that
provide abstract access to essentially opaque data structures) will likely work
fine for the coarse-grained data exchanges between components, but will lead to
inefficiencies if used to implement algorithms within a particular component.
Given the focus on inter-component data
exchange, I think accessors provide the most straightforward paradigm for data
exchange. The arguments to the
data access methods can involve elemental data types rather than composite data
structures (eg. we use scalars and arrays of basic machine data types rather
than hierarchical structures).
Therefore we should look closely at FM's API organization as well as the
accessors employed by SCIRun V1 (before they employed dynamic compilation).
The accessor method works well for
abstracting component location, but requires potentially redundant copying of
data for components in the same memory space. It may be necessary to use reference counting in order to
reduce the need to recopy data arrays between co-located components, but I'd
really like to avoid making ref counting a mandatory requirement if we can
avoid it. (does anyone know how to
avoid redundant data copying between opaque components without employing
reference counting?)
What are requirements for the data representations
that must be supported by a common infrastructure. We will start by answering Pat's questions of about
representation requirements and follow up with personal experiences involving
particular domain scientist's requirements.
Must:
support for structured data
Must
Must/Want:
support for multi-block data?
Must
Must/Want:
support for various unstructured data representations? (which ones?)
Cell based initially. Arbitrary connectivity eventually, but
not manditory.
Must/Want:
support for adaptive grid standards?
Please be specific about which adaptive grid methods you are referring
to. Restricted block-structured
AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical
unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.
If we can define the data models rigorously
for the individual grid types (ie. structured and unstructured data), then
adaptive grid standards really revolve around an infrastructure for indexing
data items. We normally think of
indexing datasets by time and by data species. However, we need to have more general indexing methods that
can be used to support concepts of spatial and temporal relationships. Support for pervasive indexing
structures is also important for supporting other visualization features like
K-d trees, octrees, and other such methods that are used to accelerate graphics
algorithms. We really should
consider how to pass such representations down the data analysis pipeline in a
uniform manner because they are used so commonly.
Must/Want:
"vertex-centered" data, "cell-centered" data?
other-centered?
Must understand all centering (particularly
for structured grids where vis systems are typically lax in
storing/representing this information).
Must:
support time-varying data, sequenced, streamed data?
Yes to all. However, the concept of streamed data must be defined in
more detail. This is where the
execution paradigm is going to affect the data structures.
Must/Want:
higher-order elements?
Not yet.
Must/Want:
Expression of material interface boundaries and other special-treatment of
boundary conditions.
Yes, we must treat ghost zones specially or
parallel vis algorithms will create significant artifacts. I'm not sure what is required for
combined air-ocean models.
*
For commonly understood datatypes like structured and unstructured, please
focus on any features that are commonly overlooked in typical
implementations. For example,
often data-centering is overlooked in structured data representations in vis
systems and FEM researchers commonly criticize vis people for co-mingling
geometry with topology for unstructured grid representations. Few datastructures provide proper
treatment of boundary conditions or material interfaces. Please describe your personal
experience on these matters.
There is little support for non-cartesian
coordinate systems in typical data structures. We will need to have a discussion of how to support
coordinate projections/conversions in a comprehensive manner. This will be very important for
applications relating to the National Virtual Observatory.
*
Please describe data representation requirements for novel data representations
such as bioinformatics and terrestrial sensor datasets. In particular, how should we handle more
abstract data that is typically given the moniker "information
visualization".
I simply don't know enough about this field
to comment.
What do you consider the most
elegant/comprehensive implementation for data representations that you believe
could form the basis for a comprehensive visualization framework?
*
For instance, AVS uses entirely different datastructures for structure,
unstructured and geometry data.
VTK uses class inheritance to express the similarities between related
structures. Ensight treats
unstructured data and geometry nearly interchangably. OpenDX uses more vector-bundle-like constructs to provide a
more unified view of disparate data structures. FM uses data-accessors (essentially keeping the data
structures opaque).
Since I'm already on record as saying that
opaque data accessors are essential for this project, it is clear that FM
offers the most compelling implementation that satisfies this requirement.
*
Are there any of the requirements above that are not covered by the structure
you propose?
We need to be able to express a wider
variety of data layout conversions and have some design pattern that reduces
the need to recopy data arrays for local components. The FM model also needs to have additional API support for
hierarchical indices to accelerate access to subsections of arrays or domains.
*
Is there information or characteristics of particular file format standards
that must percolate up into the specific implementation of the in-memory data
structures?
I hope not.
For the purpose of this survey, "data
analysis" is defined broadly as all non-visual data processing done
*after* the simulation code has finished and *before* "visual
analysis".
*
Is there a clear dividing line between "data analysis" and
"visual analysis" requirements?
There shouldn't be. However, people at the SRM workshop
left me with the impression that they felt data analysis had been essentially
abandoned by the vis community in favor or "visual analysis" methods. We need to undo this.
*
Can we (should we) incorporate data analysis functionality into this framework,
or is it just focused on visual analysis.
Vis is bullshit without seamless integration
with flexible data analysis methods.
The most flexible methods available are text-based. The failure to integrate more powerful
data analysis features into contemporary 3D vis tools has been a serious
problem.
*
What kinds of data analysis typically needs to be done in your field? Please give examples and how these
functions are currently implemented.
This question is targeted at vis folks that
have been focused on a particular scientific domain. For general use, I think of IDL as being one of the most
popular/powerful data analysis languages.
Python has become increasingly important -- especially with the
Livermore numerical extensions and the PyGlobus software. However, use of these scripting/data
analysis languages have not made the transition to parallel/distributed-memory
environments (except in a sort of data-parallel batch mode).
*
How do we incorporate powerful data analysis functionality into the framework?
I'm very interested in work that Nagiza
Samatarova has proposed for a parallel implementation of the R statistics
language. The traditional approach
for parallelizing scripting languages is to run them in a sort of MIMD mode of
Nprocs identical scripts operating on different chunks of the same
dataset. This makes it difficult
to have a commandline/interactive scripting environment. I think Nagiza is proposing to have an interactive
commandline environment that transparently manipulates distributed actions on
the back-end.
There is a similar work in progress on
parallel matlab at UC Berkeley.
Does anyone know of such an effort for Python? (most of the parallel python hacks I know of are essentially
MIMD which is not very useful).
2) Execution Model=======================
It will be necessary for us to agree on a
common execution semantics for our components. Otherwise, while we might have compatible data structures
but incompatible execution requirements.
Execution semantics is akin to the function of protocol in the context
of network serialization of data structures. The motivating questions are as follows;
*
How is the execution model affected by the kinds of algorithms/system-behaviors
we want to implement.
*
How then will a given execution model affect data structure implementations
There will need to be some way to support
both declarative execution semantics, data-driven and demand-driven
semantics. By declarative
semantics, I mean support for environments that want to be in control of when
the component "executes" or interactive scripting environments that
wish to use the components much like subroutines. This is separate from the demands of very interactive
use-cases like view-dependent algorithms where the execution semantics must be
more automatic (or at least hidden from the developer who is composing the
components into an application). I
think this is potentially relevant to data model discussions because the
automatic execution semantics often impose some additional requirements on the
data structures to hand off tokens to one another. There are also issues involved with managing concurrent
access to data involved. For
instance, a demand-driven system demanded of progressive-update or
view-dependent algorithms, will need to manage the interaction between the
arrival of new data and asynchronous requests from the viewer to recompute
existing data as the geometry is rotated.
*
How will the execution model be translated into execution semantics on the
component level. For example will
we need to implement special control-ports on our components to implement
particular execution models or will the semantics be implicit in the way we
structure the method calls between components.
I'm going to propose that we go after the
declarative semantics first (no automatic execution of components) with hopes
that you can wrap components that declare such an execution model with your own
automatic execution semantics (whether it be a central executive or a
distributed one). This follows the
paradigm that was employed for tools such as VisIt that wrapped each of the
pieces of the VTK execution pipeline so that it could impose its own execution
semantics on the pipeline rather than depending on the exec semantics that were
predefined by VTK. DiVA should
follow this model, but start with the simplest possible execution model so that
it doesn't need to be deconstructed if it fails to meet the application
developer's needs (as was the case with VisIt).
We should have at least some discussion to
ensure that the *baseline* declarative execution semantics imposes the fewest
requirements for component development but can be wrapped in a very
consistent/uniform/simple manner to support any of our planned pipeline
execution scenarios. This is an
excercise in making things as simple as possible, but thinking ahead far enough
about long-term goals to ensure that the baseline is "future proof"
to some degree.
What kinds of execution models should be
supported by the distributed visualization architecture
*
View dependent algorithms? (These were typically quite difficult to implement
for dataflow visualization environments like AVS5).
Must
be supported, but not as a basline exec model.
*
Out-of-core algorithms
Same
deal. We must work out what kinds
of attributes are required of the data structures/data model to represent
temporal decomposition of a dataset.
We should not encode the execution semantics as part of this (it should
be outside of the component), but we must ensure that the data interfaces
between components are capable of representing this kind of data
decomposition/use-case.
*
Progressive update and hierarchical/multiresolution algorithms?
Likewise, we should separate the execution
semantics necessary to implement this from the requirements imposed on the data
representation. Data models in
existing production data analysis/visualization systems often do not provide an
explicit representation for such things as multiresolution hierarchies. We have LevelOfDetail switches, but
that seems to be only a week form of representation for these hierarchical
relationships and limits the effectivness of algorithms that depend on this
method of data representation. Those
requirements should not be co-mingled with the actual execution semantics for
such components (its just the execution interface)
*
Procedural execution from a single thread of control (ie. using an commandline
language like IDL to interactively control an dynamic or large parallel
back-end)
This should be our primary initial
target. I do not have a good
understanding of how best to support this, but its clear that we must ensure
that a commandline/interactive scripting language must be supported. Current data parallel scripting
interfaces assume data-parallel, batch-mode execution of the scripting
interpreters (this is a bad thing).
*
Dataflow execution models? What is
the firing method that should be employed for a dataflow pipeline? Do you need a central executive like
AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or
some sort of abstraction that allows the modules to be used with either
executive paradigm?
This can probably be achieved by wrapping
components that have explicit/declarative execution semantics. Its an open question as to whether
these execution models are a function of the component or the framework that is
used to compose the components into an application though.
*
Support for novel data layouts like space-filling curves?
I don't understand enough about such
techniques to know how to approach this.
However, it does point out that it is essential that we hand off data
structures via accessors that keep
the internal data structures opaque rather than complex data structures.
*
Are there special considerations for collaborative applications?
*
What else?
Ugh.
I'm also hoping that collaborative applications only impose requirements
for wrapping baseline components rather than imposing internal requirements on
the interfaces that exchange data between the components. So I hope we can have
"accessors" or "multiplexor/demultiplexor" objects that
connect to essentially non-collaboration-aware components in order support such
things. Otherwise, I'm a bit
daunted by the requirements imposed.
How will the execution model affect our
implementation of data structures?
*
how do you decompose a data structure such that it is amenable to streaming in
small chunks?
The recent SDM workshop pointed out that
chunking/streaming interfaces are going to be essential for any data analysis
system that deals with large data, but there was very little agreement on how
the chunking should be expressed.
The chunking also potentially involves end-to-end requirements of the
components that are assembled in a pipeline as you must somehow support
uniformity in the passage of chunks through the system (ie. the decision you
make about the size of one chunk will impose requirements for all other
dependent streaming interfaces in the system). We will need to walk through at least one use-case for
chunking/streaming to get an idea of what the constraints are here. It may be too tough an issue to tackle
in this first meeting though.
*
how do you represent temporal dependencies in that model?
Each item in a datastructures needs to have
some method of referring to dependencies both spatial (ie. interior boundaries
caused by domain decomposition) and temporal. Its important to make these dependencies explicit in the
data structures provide a framework the necessary information to organize
parallelism in both the pipeline and data-parallel directions. The implementation details of how to do
so are not well formulated and perhaps out-of-scope for our discussions. So this is a desired *requirement* that
doesn't have a concrete implementation or design pattern involved.
*
how do you minimize recomputation in order to regenerate data for
view-dependent algorithms.
I don't know. I'm hoping someone else responding to this survey has some
ideas on this. I'm uncertain how
it will affect our data model requirements.
What are the execution semantics necessary
to implement these execution models?
*
how does a component know when to compute new data? (what is the firing rule)
For declarative semantics, the firing rule
is an explicit method call that is invoked externally. Hopefully such objects can be *wrapped*
to encode semantics that are more automatic (ie. the module itself decides when
to fire depending on input conditions), but initially it should be explicit.
*
does coordination of the component execution require a central executive or can
it be implemented using only rules that are local to a particular component.
It can eventually be implemented using local
semantics, but intiially, we should design for explicit external control.
*
how elegantly can execution models be supported by the proposed execution
semantics? Are there some things,
like loops or back-propagation of information that are difficult to implement
using a particular execution semantics?
Its all futureware at this point. We want to first come up with clear
rules for baseline component execution and then can come up with some higher
level / automatic execution semantics that can be implemented by *wrapping*
such components. The
"wrapper" would then take responsibility for imposing higher-level
automatic semantics.
How will security considerations affect
the execution model?
I don't know. Please somebody tell me if this is going to be an
issue. I don't have a handle on
the *requirements* for security.
But I do know that simply using a secure method to *launch* a component
is considered insufficient by security people who would also require that
connections between components be explicitly authenticated as well. Most vis systems assume secure
launching (via SSH or GRAM) is sufficient. The question is perhaps whether security and authorization
are a framework issue or a component issue. I am hoping that it is the former (the role of the framework
that is used to compose the components).
3) Parallelism and
load-balancing=================
Thus far, managing parallelism in
visualization systems has been a tedious and difficult at best. Part of this is a lack of powerful
abstractions for managing data-parallelism, load-balancing and component
control.
If we are going to address inter-component
data transfers to the exclusion of data structures/models internal to the
component, then much of this section is moot. The only question is how to properly represent
data-parallel-to-data-parallel transfers and also the semantics for expressing
temporal/pipeline parallelism and streaming semantics. Load-balancing becomes an issue that is
out-of-scope because it is effectively something that is inside of components
(and we don't want to look inside of the components).
Please describe the kinds of parallel
execution models that must be supported by a visualization component
architecture.
*
data-parallel/dataflow pipelines?
Must
*
master/slave work-queues?
Maybe: If we want to support progressive
update or heterogeneous execution environments.
*
streaming update for management of pipeline parallelism?
Must.
*
chunking mechanisms where the number of chunks may be different from the number
of CPU's employed to process those chunks?
Absolutely. Of course, this would possibly be implemented as a
master/slave work-queue, but there are other methods.
*
how should one manage parallelism for interactive scripting languages that have
a single thread of control? (eg.
I'm using a commandline language like IDL that interactively drives an
arbitrarily large set of parallel resources. How can I make the parallel back-end available to a
single-threaded interactive thread of control?)
I think the is very important and a growing
field of inquiry for data analysis environments. Whatever agreements we come up with, I want to make sure
that things like parallel R are not left out in these considerations.
Please describe your vision of what kinds
of software support / programming design patterns are needed to better support
parallelism and load balancing.
*
What programming model should be employed to express parallelism. (UPC, MPI, SMP/OpenMP, custom sockets?)
If we are working just on the outside of
components, this question should be moot.
We must make sure the API is not affected by these choices though.
*
Can you give some examples of frameworks or design patterns that you consider
very promising for support of parallelism and load balancing. (ie. PNNL Global Arrays or Sandia's
Zoltan)
http://www.cs.sandia.gov/Zoltan/
http://www.emsl.pnl.gov/docs/global/ga.html
Also out of scope. This would be something employed within a component, but if
we are restricting discussions to what happens on the interface between
components, then this is also a moot point. At minimum, it will be important to ensure that such options
will not be precluded by our component interfaces.
*
Should we use novel software abstractions for expressing parallelism or should
the implementation of parallelism simply be an opaque property of the
component? (ie. should there be an abstract messaging layer or not)
Yes.
*
How does the NxM work fit in to all of this? Is it sufficiently differentiated from Zoltan's
capabilities?
I need a more concrete understanding of
MxN. I understand what it is
supposed to do, but I'm not entirely sure what requirements it would impose on
any given component interface implementation. It seems like something our component data interfaces should
support, but perhaps such redistribution could be hidden inside of an MxN
component? So should this kind of
redistribution be supported by the inter-component interface or should there be
components that explicitly effect such data redistributions? Jim... Help!
===============End of Mandatory Section
(the rest is voluntary)=============
4) Graphics and Rendering=================
What do you use for converting geometry
and data into images (the rendering-engine). Please comment on any/all of the following.
*
Should we build modules around declarative/streaming methods for rendering
geometry like OpenGL, Chromium and DirectX or should we move to higher-level
representations for graphics offered by scene graphs?
This all depends on the scope of the
framework. A-priori, you can
consider the rendering method separable and render this question moot. However, this will make it quite
difficult to provide very sophisticated support for progressive update,
image-based-methods, and view-dependent algorithms because the rendering engine
becomes intimately involved in such methods. I'm concerned that this is where the component model might
break down a bit. Certainly the
rendering component of traditional component-like systems like AVS or NAG
Explorer the most heavy-weight and complex components of the entire
environment. Often, the implementation of the rendering component would impose
certain requirements on components that had to interact with it closely
(particularly in the case of NAG/Iris Explorer where you were really directly
exposed to the fact that the renderer was built atop of OpenInventor).
So, we probably cannot take on the issue of
renderers quite yet, but we are eventually going to need to define a big
"component box" around OpenGL/Chromium/DirectX. That box is going to have to be
carefully built so as to keep from
precluding any important functionality that each of those rendering engines can
offer. Again, I wonder if we would
need to consider scene graphs if only to offer a persistent datastructure to hand-off
to such an opaque rendering engine.
This isn't necessarily a good thing.
What are the pitfalls of building our
component architecture around scene graphs?
It will add greatly to the complexity of
this system. It also may get in
the way of novel rendering methods like Image-based methods.
*
What about Postscript, PDF and other scale-free output methods for publication
quality graphics? Are pixmaps
sufficient?
Pixmaps are insufficient. Our data analysis infrastructure has
been moving rapidly away from scale-free methods and rapidly towards
pixel-based methods. I don't know
how to stop this slide or if we are poised to address this issue as we look at
this component model.
In a distributed environment, we need to
create a rendering subsystem that can flexibly switch between drawing to a
client application by sending images, sending geometry, or sending geometry
fragments (image-based rendering)?
How do we do that?
*
Please describe some rendering models that you would like to see supported (ie.
view-dependent update, progressive update) and how they would adjust
dynamically do changing objective functions (optimize for fastest framerate, or
fastest update on geometry change, or varying workloads and resource
constraints).
I see this as the role for the
framework. It also points to the
need to have performance models and performance monitoring built in to every
component so that the framework has sufficient information to make effective
pipeline deployment decisions in response to performance constraints. It also points to the fact that at some
level in this component architecture, component placement decisions must be
entirely abstract (but such a capability is futureware).
So in the short-term its important to design
components with effective interfaces for collecting performance data and
representing either analytic or historical-based models of that data. This is a necessary baseline to get to
the point that a framework could use such data to make intelligent
deployment/configuration decisions for a distributed visualization system.
*
Are there any good examples of such a system?
No.
What is the role of non-polygonal methods
for rendering (ie. shaders)?
*
Are you using any of the latest gaming features of commodity cards in your visualization
systems today?
*
Do you see this changing in the future? (how?)
I'd like to know if anyone is using shader
hardware. I don't know much about
it myself, but it points out that we need to plan for non-polygon-based
visualization methods. Its not
clear to me how to approach this yet.
5) Presentation=========================
It will be necessary to separate the
visualization back-end from the presentation interface. For instance, you may want to have the
same back-end driven by entirely different control-panels/GUIs and displayed in
different display devices (a CAVE vs. a desktop machine). Such separation is also useful
when you want to provide different implementations of the user-interface depending
on the targeted user community.
For instance, visualization experts might desire a dataflow-like
interface for composing visualization workflows whereas a scientists might
desire a domain-specific dash-board like interface that implements a specific
workflow. Both users should be
able to share the same back-end components and implementation even though the
user interface differs considerably.
How do different presentation devices
affect the component model?
*
Do different display devices require completely different user interface
paradigms? If so, then we must
define a clear separation between the GUI description and the components
performing the back-end computations.
If not, then is there a common language to describe user interfaces that
can be used across platforms?
Systems that attempt to use the same GUI
paradigm across different presentation media have always been terrible in my
opinion. I strongly believe that
each presentation medium requires a GUI design that is specific to that
particular medium. This imposes a
strong requirement that our compute pipeline for a given component architecture
be strictly separated from the GUI that controls the parameters and presents
the visual output of that pipeline.
OGSA/WSDL has been proposed as one way to define that interface, but it
is extremely complex to use. One
could use CCA to represent the GUI handles, but that might be equally
complex. Others have simply
customized ways to use XML descriptions of their external GUI interface handles
for their components. The latter
seems much simpler to deal with, but is it general enough?
*
Do different display modalities require completely different
component/algorithm implementations for the back-end compute engine? (what do we do about that??)
I think there is a lot of opportunity to
share the back-end compute engines across different display modalities. There are some cases where a developer
would be inclined to implement things like an isosurfacer differently for a
CAVE environment just to keep the framerates up high-enought to maintain your
sense of immersion. However, I
think of those as edge-cases.
What Presentation modalities do you feel
are important and what do you consider the most important.
*
Desktop graphics (native applications on Windows, on Macs)
#1
*
Graphics access via Virtual Machines like Java?
#4
*
CAVEs, Immersadesks, and other VR devices
#5
*
Ultra-high-res/Tiled display devices?
#3 : the next tiled display may well be your
next *desktop* display, but quite yet.
*
Web-based applications?
#2: If only because this is becoming an
increasingly important component of collaboratorys.
What abstractions do you think should be
employed to separate the presentation interface from the back-end compute
engine?
*
Should we be using CCA to define the communication between GUI and compute
engine or should we be using software infrastructure that was designed
specifically for that space? (ie. WSDL, OGSA, or CORBA?)
I think I addressed this earlier. We can do this all in CCA, but is that
the right thing to do? I know this
is an implementation issue, but is a strong part of our agreement on methods to
implement our components (or define component boundaries).
*
How do such control interfaces work with parallel applications? Should the parallel application have a
single process that manages the control interface and broadcasts to all nodes
or should the control interface treat all application processes within a given
component as peers?
This requires more discussion, but reliable
broadcast methods have many problems related to event skewing and MPI-like
point-to-point emulation of the broadcast suffers from scalability
problems. We need to collect
design patterns for the control interface and either compete them against
one-another or find a way to support them all by design. This is clearly an implementation
issue, but will leak in to our abstract component design decisions. Clearly we want a single thread
of control to efficiently deliver events to massively parallel back-end
components. That is a *must* requirement.
6) Basic Deployment/Development
Environment Issues============
One of the goals of the distributed
visualization architecture is seamless operation on the Grid --
distributed/heterogeneous collections of machines. However, it is quite difficult to realize such a vision
without some consideration of deployment/portability issues. This question also touches on issues
related to the development environment and what kinds of development methods
should be supported.
What languages do you use for core vis
algorithms and frameworks.
*
for the numerically intensive parts of vis algorithms
Fortan/C/C++
*
for the glue that connects your vis algorithms together into an application?
C++/C/Java but I want to get into some Python.
*
How aggressively do you use language-specific features like C++ templates?
I avoid them due to portability and compiler
maturity issues.
*
is Fortran important to you? Is it
important that a framework support it seamlessly?
Yes, absolutely. It needn't be full fledged F90 support, but certainly f77
with some f90 extensions.
*
Do you see other languages becoming important for visualization (ie. Python,
UPC, or even BASIC?)
Python.
What platforms are used for data
analysis/visualization?
*
What do you and your target users depend on to display results? (ie. Windows,
Linux, SGI, Sun etc..)
Linux, MacOS-X(BSD), Windows.
*
What kinds of presentation devices are employed (desktops, portables,
handhelds, CAVEs, Access Grids, WebPages/Collaboratories) and what is their
relative importance to active users.
Desktop and laptops are most important. Web, AG, and CAVE are of lesser
importance (but still important).
*
Do you see other up-and-coming visualization platforms in the future?
Tablet PCs and desktop-scale Tiled display
devices.
Tell us how you deal with the issue of
versioning and library dependencies for software deployment.
*
For source code distributions, do you bundle builds of all related libraries
with each software release (ie. bundle HDF5 and FLTK source with each release).
Every time I fail to bundle dependent
libraries, it has been a disaster.
So it seems that packaging dependent libraries with any software release
is a *must*.
*
What methods are employed to support platform independent builds (cmake, imake,
autoconf). What are the benefits
and problems with this approach.
I depend on conditional statements in
gmake-based makefiles to auto-select between flags for different
architectures. This is not
sufficiently sophisticated for most release engineering though. I have dabbled with autoconf, but it is
not a silver bullet (neither was imake).
I do not understand the practical benefits of 'cmake'.
*
For binaries, have you have issues with different versions of libraries (ie.
GLIBC problems on Linux and different JVM implemetnations/version for
Java). Can you tell us about any
sophisticated packaging methods that address some of these problems (RPM need
not apply)
Building statically has been necessary in a
lot of cases, but creates gigantic executables. In the case of JVM's, the problems with the ever-changing
Java platform have driven me away from employing Java as a development
platform.
*
How do you handle multiplatform builds?
* Conservative, lowest-common denominator
coding practices.
* execute 'uname' at the top of a gnu
makefile to select an appropriate set of build options for sourcecode
building. Inside of the code, must
use the CPP to code around platoform dependencies.
How do you (or would you) provide
abstractions that hide the locality of various components of your
visualization/data analysis application?
*
Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC? Please comment on advantages/problems
of these technologies.
Nope.
*
Do web/grid services come into play here?
As these web-based scientific collaboratory
efforts gather momentum, web-based data analysis tools have become increasingly
important. I think the motivation
is largely driven by deployment issues when supporting a very
heterogeneous/multi-institutional user base. It reduces the deployment variables when your target is a
specific web-server environment, but you pay a price in that the user-interface
is considerably less advanced.
This cost is mitigated somewhat if the data analysis performed is very
domain-specific and customized for the particular collaboratory community. So its a poor choice for
general-purpose visualization tools, but if the workflow is well-established
among the collaborators, then the weakness of the web-based user-interface
options is not as much of a problem.
7) Collaboration
==========================
If you are interested in
"collaborative appllications" please define the term
"collaborative". Perhaps
provide examples of collaborative application paradigms.
Is collaboration a feature that exists at
an application level or are there key requirements for collaborative
applications that necessitate component-level support?
*
Should collaborative infrastructure be incorporated as a core feature of very
component?
No.
I hope that support for collaborative applications can be provided via
supplemental components.
*
Can any conceivable collaborative requirement be satisfied using a separate set
of modules that specifically manage distribution of events and data in
collaborative applications?
That is what I hope.
*
How is the collaborative application presented? Does the application only need to be collaborative
sometimes?
This is probably true. You probably want to be able to have
tools that were effectively standalone that can join into a collaborative space
on demand.
*
Where does performance come in to play?
Does the visualization system or underlying libraries need to be
performance-aware? (i.e. I'm doing
a given task and I need a framerate of X for it to be useful using my current
compute resources), network aware (i.e. the system is starving for data and
must respond by adding an alternate stream or redeploying the pipeline). Are these considerations implemented at
the component level, framework level, or are they entirely out-of-scope for our
consideration?
Yes.
The whole collaboration experience will fall apart if you cannot impose
some constraints on quality of service or react appropriately to service
limitations. Its a big problem,
but I hope the solution does not need to be a fundamental feature of the
baseline component design.