Preface Comment from Ilmi Yoon:
Just one curiosity, component is in much larger granularity
compared
to object in terms of reuasability or usage itself. Component is kind of
package of objects that has interfaces to communicate with other
components.
So, components are much more portable and easily re-usably
without knowing
the programming environment of diffenrent components - they can
be different
programming langauges, etc as long as they know the interfaces
to each
other... Just I feel some discussions are related to
object-oriented not,
component-oriented. Maybeit is from my ignorance and/or lacking
of certain
backgrounds from the last meeting.
=============The
Survey=========================
1) Data
Structures/Representations/Management==================
The center of every successful modular
visualization architecture has been a flexible core set of data structures for
representing data that is important to the targeted application domain. Before we can begin working on
algorithms, we must come to some agreement on common methods (either data
structures or accessors/method
calls) for exchanging data between components of our vis framework.
There are two potentially disparate
motivations for defining the data representation requirements. In the coarse-grained case, we need to
define standards for exchanging data between components in this framework
(interoperability). In the
fined-grained case, we want to define some canonical data structures that can
be used within a component -- one developed specifically for this
framework. These two use-cases may
drive different set of requirements and implementation issues.
* Do you feel both of
these use cases are equally important or should we focus exclusively on one or
the other?
Randy: I think that interoperability (both
in terms of data and perhaps more
critically operation/interaction) is more
critical than fine-grained
data sharing. My motivation: there is no way that DiVA will be able to
meet all needs initially and in many cases,
it may be fine for data to
go "opaque" to the framework once
inside a "limb" in the framework (e.g.
VTK could be a limb). This allows the framework to be easily
populated
with a lot of solid code bases and shifts
the initial focus on important
interactions (perhaps domain centric). Over time, I see the fine-grain
stuff coming up, but perhaps proposed by the
"limbs" rather than the
framework. I do feel that the coarse level must take into account
distributed processing however...
I want to facilitate interfaces
between packages, opting for (possibly
specific) data models that map
to the application at hand. I could use some generic mechanisms
provided by DiVA to reduce the amount of
code I need or bootstrap
more rapid prototyping, but it is not key
that the data model be
burned fully into the Framework. I certainly feel that the Framework
should be able to support more than one data
model (since we have
repeatedly illustrated that all
"realizable" models have design
boundaries that we will eventually hit.
Pat: I think both cases are important, but
agreeing upon the fine-grained access
will be harder.
John C: Too soon to tell. Focus on both
until the issues become more clear.
Jim: I think for now we need to exclusively
focus on exchanging data between
components, rather than any fine-grained
generalized data objects...
The first order entry into any component
development is to "wrap up
what ya got". The "rip things apart" phase
comes after you can glue
all the coarse-grained piece together
reliably...
Ilmi: I think we need to decide the
coarse-grain something like SOAP that wraps
the internal data with XML format. But I
think we don't need to decide the
fined-grain since each component can have
choose their own way/format and
then post format to public, so the party
who want to use the component needs
to follow the interface. But if we like to
decide initial sets of format
that must/may be supported by diva
components, then we can list most popular
format and choose some/all of them.
JohnS: While I am very interested
in design patterns, data structures, and services that could make the design of
the interior of parallel/distributed components easier, it is clear that the
interfaces between components are the central focus of this project. So the definition of inter-component
data exchanges is preeminent.
Wes: Both are important. The strongest case, IMO, for the
intra-component DS/DM
is that I have a stable set of data
modeling/mgt tools that I can use for
families of components. Having a solid
DS/DM base will free me to focus
on vis and rendering algorithms, which is
how I want to spend my time.
The strongest case for the inter-component
DS/DM is the "strong typing"
property that makes AVS and apps of its
ilk work so well.
The "elephant in the living
room" is that there is no silver bullet.
I favor an approach that is, by design,
incremental. What I mean is that
we can deal with structure grids,
unstructured grids, geom and other
renderable data, etc. in a more or less
piecemeal fashion with an eye
towards component level interoperability
in the long term. In the beginning,
there won't be 100% interoperability as
if, for example, all data models
and types were stuffed into a vector
bundles interface. OTOH, a more
conciliatory approach will permit forward
progress among multiple
independent groups who are all eyeing
"interoperability". This is the
real goal, not a "single true data model."
á Do you feel the requirements for each of these use-cases are aligned or will they involve two separate development tracks? For instance, using "accessors" (method calls that provide abstract access to essentially opaque data structures) will likely work fine for the coarse-grained data exchanges between components, but will lead to inefficiencies if used to implement algorithms within a particular component.
á As you answer the "implementation and requirements" questions below, please try to identify where coarse-grained and fine-grained use cases will affect the implementation requirements.
Randy: I think you hit the nail on the
head. Where necessary, I see
sub-portions
of the framework working out the necessary
fine-grained, efficient,
"aware" interactions and
datastuctures as needed. I
strongly doubt we
would get that part right initially and
think it would lead to some of
the same constraints that are forcing us to
re-invent frameworks right
now.
IMHO: the fine-grain stuff must be flexible and dynamic over
time as development and research progress.
Pat: I think the focus should be on
interfaces rather than data structures.
I
would advocate this approach not just
because it's the standard
"object-oriented" way, but
because it's the one we followed with FEL,
and now FM, and it has been a big win for
us. It's a significant benefit
not having to maintain different versions of
the same visualization
technique, each dedicated to a different
method for producing the
data (i.e., different data
structures). So, for example, we
use the same
visualization code in both in-core and
out-of-core cases. Assuming up
front that an interface-based approach would be too slow is, in my
humble opinion, classic premature
optimization.
Jim: Two separate development tracks. Definitely. There are different driving
design forces and they can be developed
(somewhat) independently (I hope).
Lori: The TSTT center is not interested in
defining a data representation
per se - that is dictating what the data
structure will look like. Rather,
we are interested in defining how data can
be accessed in a uniform
way from a wide variety of different data
structures (for both structured
and unstructured meshes). This came about because we recognize
that
1. there are a lot of
different meshing/data frameworks out there,
that have many man years of
effort behind their development,
that are not going to change
their data structures very easily
(if at all). Moreover, these infrastructures have
made their
choices for a reason - if
there was a one-size-fits-all answer,
someone probably would have
found it by now :-)
2. Because of the
difference in data structures - it has been very
difficult for application
scientists (and tool builders) to experiment
with and/or support different
data infrastructures which has
severely limited their ability
to play with different meshing strategies,
discretization schemes, etc.
We are trying to address this latter point -
by developing common
interfaces for a variety of infrastructures
applications can easily
experiment with different techniques and
supporting tool developers
(such as mesh quality improvement and front
tracking codes) and
write their tools to a single API and
automatically support multiple
infrastructures.
We are also experimenting with the language
interoperability tools
provided by the Babel team at LLNL and have
ongoing work to
evaluate it's performance (and the
performance of our interface in
general) for fine and course grained access
to mesh (data) entities -
something that I suspect will be of interest
to this group as well.
JohnC: I think it's premature to say. We
need to have agreement on the
questions below first.
Ilmi: There will be some overhead and
inefficiency using accessors for data
exchange, but I like the apporach of
accessors and believe the CCA achieves
the reusability in expense of performance
as OOP does anyway. Just we try to
make the expense as little as possible.
JohnS: Given the focus on inter-component
data exchange, I think accessors provide the most straightforward paradigm for
data exchange. The arguments to the
data access methods can involve elemental data types rather than composite data
structures (eg. we use scalars and arrays of basic machine data types rather
than hierarchical structures).
Therefore we should look closely at FM's API organization as well as the
accessors employed by SCIRun V1 (before they employed dynamic compilation).
The accessor method works well for
abstracting component location, but requires potentially redundant copying of
data for components in the same memory space. It may be necessary to use reference counting in order to
reduce the need to recopy data arrays between co-located components, but I'd
really like to avoid making ref counting a mandatory requirement if we can
avoid it. (does anyone know how to
avoid redundant data copying between opaque components without employing
reference counting?)
Wes: They are aligned to a large degree -
data structures/models are produced and
consumed by component code, but may also
be manipulated (serialized,
marshalled, etc) by the framework.
What are requirements for the data representations that must be supported by a common infrastructure. We will start by answering Pat's questions of about representation requirements and follow up with personal experiences involving particular domain scientist's requirements.
Must: support for
structured data
Randy: Must-at the coarse level, I think
this could form the basis of all
other representations.
Pat: Structured data support is a must.
JohnC: Must
Jim: Must
JohnS: Must
Wes: Agree.
Must/Want: support for
multi-block data?
Randy: Must-at the coarse level, I think
this is key for scalability,
domain decomposition and streaming/multpart
data transfer.
Pat: We have unstructured data, mostly
based on tetrahedral or prismatic meshes.
We need support for at least those
types. I do not think we could
simply
graft unstructured data support on top of
our structured data structures.
JohnC: Must
Jim: Must
JohnS: Must
Wes: Must. We must set targets that meet
our needs, and not sacrifice
requirements for speed of implementation.
Must/Want: support for
various unstructured data representations? (which ones?)
Randy: Nice-but I would be willing to live
with an implementation on top
of structured, multi-block (e.g.
Exdous). I feel accessors are
fine for this at the "framework"
level (not at the leaves).
Pat: We have unstructured data, mostly
based on tetrahedral or prismatic meshes.
We need support for at least those
types. I do not think we could
simply
graft unstructured data support on top of
our structured data structures.
JohnC: Not sure. Not a priority.
Jim: Want (low priority)
JohnS: Cell based initially unstructured
representations first. Need
support for arbitrary connectivity eventually, but not mandatory. I liked Iris ExplorerÕs hierarchical
model as it seems more general than the model offered by other vis systems.
Wes: Must. Unstructured data reps are
widely used and they should not be
excluded from the base set of DS/DM
technologies.
Must/Want: support for
adaptive grid standards? Please be
specific about which adaptive grid methods you are referring to. Restricted block-structured AMR
(aligned grids), general block-structured AMR (rotated grids), hierarchical
unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.
Randy: Similar to my comments on
unstructured data reps. In the
long
run, something like boxlib with support for
both P and H adaptivity
will be needed (IMHO, VTK might provide
this).
Pat: Adaptive grid support is a
"want" for us currently, probably eventually
a "must". The local favorite is CART3D, which
consists of hierarchical
regular grids. The messy part is that CART3D also supports having
more-or-less arbitrary shapes in the
domain, e.g., an aircraft fuselage.
Handling the shape description and all the
"cut cell" intersections
I expect will be a pain.
JohnC: Adaptive grid usage is in its infancy
at NCAR. But I suspect it is the
way of the future. Too soon to be specific
about which adaptive grid
methods are prefered.
Jim: Want (low priority) the AMR folks
havfe been trying to get together and define
a standard API, and have been as yet
unsuccessful. Who are we to
attempt
this where they have failed...?
JohnS: If we can define the data models
rigorously for the individual grid types (ie. structured and unstructured
data), then adaptive grid standards really revolve around an infrastructure for
indexing data items. We normally
think of indexing datasets by time and by data species. However, we need to have more general
indexing methods that can be used to support concepts of spatial and temporal
relationships. Support for
pervasive indexing structures is also important for supporting other
visualization features like K-d trees, octrees, and other such methods that are
used to accelerate graphics algorithms.
We really should consider how to pass such representations down the data
analysis pipeline in a uniform manner because they are used so commonly.
Wes: Want, badly. We could start with
Berger-Colella AMR since it is widely
used. I'm not crazy about Boxlib, though,
and hope we can do something
that is easier to use.
Must/Want:
"vertex-centered" data, "cell-centered" data?
other-centered?
Randy: Must.
Pat: Most of the data we see is still vertex-centered. FM supports other
associations, but we haven't used them
much so far.
Jim: Want (low priority)
All of these should be "Wants",
to the extent that they require more
sophisticated handling, or are less
well-known in terms of generalizing
the interfaces.
For example, the AMR folks havfe been
trying to get together and define
a standard API, and have been as yet
unsuccessful. Who are we to
attempt
this where they have failed...?
So to clarify, if we *really* understand
(or think we do) a particular
data representation/organization, or even a
specific subset of a general
representation type, then by all means lets
whittle an API into our stuff.
Otherwise, leave it alone for someone else
to do, or do as strictly needed.
JohnS: The accessors must understand (or not preclude) all
centering. This is particularly
for structured grids where vis systems are typically lax in
storing/representing this information.
Wes: Don't care - will let someone else
answer this.
Note: It sounds like at least time-varying data handling is well understood by the people who want it.
Must: support
time-varying data, sequenced, streamed data?
Randy: Must, but way too much to say here to
do it justice. I will say
that the core must deal with time-varying/sequenced
data. Streaming
might be able to be placed on top of that,
if it is designed
properly. I will add that we have a need for progressive data as
well.
Pat: Support for time-varying data is a
must.
JohnC: Must. Time varying data is what makes
so many of our problems currently
intractible. Too many of the available tools
(e.g. VTK) assume static
data and completely fall apart when the data
is otherwise.
Definitely there is
not support for any routines that require
temporal integration (e.g.
unsteady flow viz). In general, there is no
notion of a timestep in
VTk. Datasets are 3D. Period.
Additionally, there is a performance issue:
VTK is not optimized in any
way at moving data through the pipeline at
high rates. It's underlying
archictecture seems to assume that as long
as the pipeline eventually
generates some geometry, it's ok if it take
a loooong time because
you're going to interact with that geometry
(navigating through camera
space, color space, etc.) and the
"pre-processing" doesn't have to run
at interactive rates. So the data readers
are pathetically slow (and
there is little hope for optimization here
with that data model that is
used). There is no way to exploit temporal
coherence in any of the data
operators. No simple way to cache results if
you want to play out of
memory.
At a high level you need a design that gives
consideration to temporal needs
throughout the architecture. I think the data structures do need to
be
time varying data aware, not just capable of
dealing with 4D data
(although I can't think of a specific
example of why now). One issue is
that the temporal dimension often has
different spacing/regularity than
the spatial dimension. Obviously you're talking different
units from
the spatial dimensions as well. There are also
system-level issues as
well (e.g. unsteady flow viz needs,
exploiting temporal coherence,
caching, support for exploring the temporal
dimension from user
interactors).
I know i've only just started to scratch the
surface here. We could
probably devote an entire workshop to time
varying data needs and
several more figuring out how to actually
suppor them.
Jim: MUST
JohnS: Yes to all. However, the concept of streamed data must be defined in
more detail. This is where the
execution paradigm is going to affect the data structures.
Wes: Not ready for prime time. I've read
two or three research proposals in
the past year that focus on methods for
time-varying data representations
and manipulation. IMO, this topic is not
ready for prime time yet. We
can say that it would be nice to have, but
will probably not be fully
prepared to start whacking out code.
Note: Should do quick gap-analysis on
what existing tools fulfill this requirement.
Must/Want: higher-order
elements?
Randy: Must - but again, this can often be
"faked" on top of other reps.
Pat: Occasionally people ask about it, but
we haven't found it to be a "must".
JohnC: low priority
Jim: Wants, see above...
JohnS: Not yet.
Wes: Not sure what this means, exactly, so
I'll improvise. Beyond scientific
data representations, there is a family of
"vis data structures" that need
to be on the table. These include
renderable stuff - images, deep images,
explicit and implicit geometry, scene
graph ordering semantics, scene
specification semantics, etc. In addition,
there is the issue of
"performance data" and how it
will be represented.
Note: I find the response to this quite funny because I
ran a two day workshop about 3 years ago here at LBNL on finite element
analysis requirements. We got bashed
for two days straight by the FEM code jocks because we didnÕt seem to care
about higher order elements. So it
would be interesting to know if we donÕt see much of this because its not
needed or if the domain scientists simply lost all confidence in us to deal
with this issue properly.
Must/Want: Expression of
material interface boundaries and other special-treatment of boundary
conditions.
Randy: Must, but I will break this into two
cases. Material interfaces for
us are essentially sparse vectors sets, so
they can be handled with
basic mechanisms so I do not see that as
core, other than perhaps
support for compression. Boundary conditions (e.g. ghost zoning,
AMR boundaries, etc) are critical.
Pat: We don't see this so much. "Want", but not must.
JohnC: no priority
Jim: Want, see aboveÉ
JohnS: Yes, we must treat ghost zones
specially or parallel vis algorithms will create significant artifacts. I'm not sure what is required for
combined air-ocean models.
Wes: I'll let someone else answer this
one.
Note: DOE vis workshop, it was pointed out that simple
things like isosurface give inconsistent (or radically different) results on
material interface boundaries depending on assumptions about the boundary
treatment. YouÕd think that this would come up with analysis combined air-ocean
models, but apparently not among the vis people. From a data analysis standpoint, domain scientists say this
is incredibly important, but they canÕt deal with it because none of the vis or
data analysis people listen to them.
* For commonly
understood datatypes like structured and unstructured, please focus on any
features that are commonly overlooked in typical implementations. For example, often data-centering is
overlooked in structured data representations in vis systems and FEM
researchers commonly criticize vis people for co-mingling geometry with
topology for unstructured grid representations. Few datastructures provide proper treatment of boundary conditions
or material interfaces. Please
describe your personal experience on these matters.
Randy: Make sure you get the lowest common
denominator correct! There is
no realistic way that the framework can
support everything, everywhere
without losing its ability to be nimble (no
matter what some OOP folks
say).
Simplicity ore representation with "externally" supplied
optional
optimization information is one approach to
this kind of problem.
Pat: One thing left out of the items above
is support for some sort of "blanking"
mechanism, i.e., a means to indicate that
the data at some nodes are not
valid. That's a must for us.
For instance, with Earth science data we see
the use of some special value to indicate
"no data" locations.
JohnC: Support for missing data is essential
for observed fields.
To do it right you need some way to flag
data cells/vertices within the data model as
not containing valid data.
Then you need to add support to your data
"operators" as well. For example,
if your operator is some kind of
reconstruction filter it needs to know
to use a different kernel when missing data
are involved.
Obviously, this could pose a signficant
amount of overhead on the entire
system, and the effort may not be justified
if the DOE doesn't have
great need for dealing with instrument
acquired data. I only added the
point as a discussion topic as it is fairly
important to us. At the
very least, I would hope to have the
flexibility to hack support
for missing data if it was not integral to
the core framework.
Jim: I don't think we should "pee in this
pool" either yet. Are any of
us
experts in this kind of viz? Let's stick with what we collectively
know
best and make that work before we try to
tackle a related-but-fundamentally-
different-domain.
JohnS: There is little support for
non-cartesian coordinate systems in typical data structures. We will need to have a discussion of
how to support coordinate projections/conversions in a comprehensive
manner. This will be very
important for applications relating to the National Virtual Observatory.
Wes: No comment
* Please describe data
representation requirements for novel data representations such as
bioinformatics and terrestrial sensor datasets. In particular, how should we handle more abstract data that
is typically given the moniker "information visualization".
Randy: Obviously, do not forget
"records" and aggregate/derived types. That
having been said, the overheads for these
can be ugly. Consider
parallel arrays as an alternative...
Pat: "Field Model" draws the
line only trying to represent fields and the meshes
that the fields are based on. I not really familiar enough with other
types
of data to know what
interfaces/data-structures would be best.
We haven't
see a lot of demand for those types of
data as of yet. A low-priority "want".
JohnC: Beats me.
JohnS: I simply don't know enough about this
field to comment.
Wes: Maybe I don't understand the
problem...the same tough issues that plague
more familiar data models appear to be
present in bioinformatics and
"info viz" data mgt. There are
heirarchical data, unstructured data,
multivariate and multidimensional data,
etc.
Note: Must separate mesh from field data interfaces.
The
mesh may not be updated as often as the field.
Perhaps time-range of validity is important information.
What do you consider the most
elegant/comprehensive implementation for data representations that you believe
could form the basis for a comprehensive visualization framework?
á For instance, AVS
uses entirely different datastructures for structure, unstructured and geometry
data. VTK uses class inheritance
to express the similarities between related structures. Ensight treats unstructured data and
geometry nearly interchangably.
OpenDX uses more vector-bundle-like constructs to provide a more unified
view of disparate data structures.
FM uses data-accessors (essentially keeping the data structures opaque).
Randy: IMHO: layered data structuring
combined with data accessors is
probably the right way to go. Keep the basic representational
elements simple.
Pat: Well, as you'd expect, as the primary
author of Field Model (FM) I think it's
the most elegant/comprehensive of the
lot. It handles structured and
unstructured data. It handles data non-vertex-centered
data. I think it
should be able to handle adaptive data,
though it hasn't actually been
put to the test yet. And of course every adaptive mesh
scheme is a little
different. I think it could handle boundary condition needs, though
that's
not something we see much of.
JohnC: I don't think this is what you're
after, but i've come to believe that
multiresolution data representations with
efficient domain subsetting
capabilities are the most pragmatic and
elegant
way to deal with large data sets. In
addition to enabling interaction
with the largest data sets they offer
tremenous scalability from desktop
to "visual supercomputer". i would
encourage a data model that includes
and facilitates their integral support.
Ilmi: Combination of (externally) FM
data-accessors and (internally) VTK class
inheritance.
JohnS: Since I'm already on record as saying
that opaque data accessors are essential for this project, it is clear that FM
offers the most compelling implementation that satisfies this requirement.
* Are there any of the
requirements above that are not covered by the structure you propose?
Randy: I think one big issue will be the
distributed representations.
This
item is ill handled by many of these systems.
Pat: Out-of-core? Derived fields? Analytic meshes (e.g.,
regular meshes)?
Differential
operators? Interpolation methods?
JohnC: Not sure.
JohnS: We need to be able to express a wider
variety of data layout conversions and have some design pattern that reduces
the need to recopy data arrays for local components. The FM model also needs to have additional API support for
hierarchical indices to accelerate access to subsections of arrays or domains.
Wes: Not sure how to answer. The one thing
that came to mind is a general observation
that the above data models are designed
for scientific data. The AVS geom
data structure was opaque to the
developer, and if you looked at the header
files, was really, really ugly. Since I
have a keen interest in renderers,
I am very concerned about having adequate
flexibility and performance from
a DS/DM for moving/representing renderable
data, as opposed to large
structured or unstructured meshes. It is
possible to generalize a
DS for storing renderable data (e.g, a
scene graph), but this separate class
of citizen reflects the partitioning of
data types in AVS. Perhaps this
isn't something to be concerned about at
this point.
Note: Area of unique features?
-blanking
arrays
-data
handling for distributed data
-better
handling of time-varying data
-hints
for caching so that temporal locality can be exploited
-indexing (not in TSTT senseÉ needs more discussion. Need support for kD trees and rapid lookup. Indexing might help with our AMR issues)
* This should focus on
the elegance/usefulness of the core design-pattern employed by the implementation
rather than a point-by-point description of the implemenation!
Randy: Is it possible to consider a COM
Automation Object-like approach,
also similar to the CCA breakdown. Basically, define the common
stuff and make it interchangable then build
on top. Allow underlying
objects to be "aware" and wink to
each other to bypass as needed.
In the long run, consider standardizing on
working bypass paradigms
and bring them into the code (e.g. OpenGL).
Note: We need the ÒbypassÓ. The question is how do we supply the bypass mechanism for unanticipated data?
Pat: I think if we could reasonably cover
the (preliminary) requirments above,
that would be a good first step. I agree with Randy that whatever we
come up with will have to be able to
"adapt" over time as our understanding
moves forward.
* Is there information
or characteristics of particular file format standards that must percolate up
into the specific implementation of the in-memory data structures?
Randy: Not really, but metadata handling and
referencing will be key and need
to be general.
Pat: In FM we tried hard to
file-format-specific stuff out of the core model.
Instead, there are additional modules
built on top of FM that handle
the file-format-specific stuff, like I/O
and derived fields specific to
a particular format. Currently we have PLOT3D, FITS, and
HDFEOS4
modules that are pretty well filled out,
and other modules that are
mostly skeletons at this point.
We should also be careful not to assume
that analyzing the data starts
with "read the data from a file into
memory, ...". Don't forget
out-of-core,
analysis concurrent with simulation, among
others.
One area where the file-format-specific
issues creep in is with metadata.
Most file formats have some sort of
metadata storage support, some much
more elaborate than others. Applications need to get at this
metadata,
possibly through the data model, possibly
some other way. I don't have
the answer here, but it's something to
keep in mind.
Jim: I dunno, but what does HDF5 or NetCDF
include? We should definitely be
able to handle various meta-data...
Otherwise, our viz framework should be able
to read in all sorts of
file-based data as input, converting it
seamlessly into our "Holy Data
Grail" format for all the components to
use and pass around. But the
data shouldn't be identifiable as having
once been HDF or NetCDF, etc...
(i.e. it's important to read the data
format, but not to use it internally)
JohnS: I hope not.
Wes: One observation is what seems to be a
successful design pattern from the
DMF effort: let the HDF guys build the
heavy lifting machinery, and focus
upon an abstraction layer that uses the
machinery to move bytes.
Note: Metadata: Must
also be propagated down the pipeline.
Ignored
by items that donÕt care, but recognized by pipeline components that do.
-alternative
is database at the reader, but seems to create painful connection mechanics.
-and
still have to figure out how to reference the proper component, even after
going through data-structure transformations.
One powerful feature of both HDF and XML
is the ability to ignore and pass-through unrecognized constructs/metadata.
For the purpose of this survey, "data
analysis" is defined broadly as all non-visual data processing done
*after* the simulation code has finished and *before* "visual
analysis".
* Is there a clear
dividing line between "data analysis" and "visual analysis"
requirements?
Randy: Not in my opinion.
Pat: Your definition excludes concurrent
analysis and steering from
"visualization". Is this intentional? I don't think there's a clear dividing
line here.
JohnC: I take issue with your definition of
data analysis. Yes it is performed
after the simulation, but it is performed
(or would be performed if viz
tools didn't suck) in *parallel* with visual
analysis. The two when
well integrated, which is rarely the case,
can compliment each other
tremendously. So called "visual
analysis" by itself, without good
quantitative capablity, is pretty useless.
Well, text based, programmable user
interfaces are a must for "data
analysis" , whereas GUI is essential
for visual.
Jim: NO. There shouldn't be - these operations are tightly coupled,
or even
symbiotic, and *should* all be incorporated
into the same framework,
indistinguishable from each other.
Ilmi: Some components do purely data
analysis, some do only visual, but there
will be calls to the data analysis
component from the visual during the
analysis.
JohnS: There shouldn't be. However, people at the SRM community
left me with the impression that they felt data analysis had been essentially
abandoned by the vis community in favor or "visual analysis"
methods. We need to undo this.
Wes: Generally speaking, there's not much
difference.
That said, some differences seem obvious
to me:
1. Performance - visualization is most
often an interactive process, but
has offline implementations. "Plain
old" data analysis seems to be mostly
an offline activity with a few interactive
implementations.
2. Scope - data analysis seems to be a subset
of vis. Data analysis doesn't
have need for as rich a DS/DM
infrastructure as vis.
Note: Righteous indignation. Well thatÕs good. Why then do SDM people and domain scientists think we donÕt care? They arenÕt smoking crack. They have legitimate reasons to believe that we arenÕt being genuine when we say that we care about data analysis functionality. Can I do data analysis with Vis5D? Does VTK offer me a wide array of statistical methods? Must keep this central as we design this system. Do you agree with John ClyneÕs assertion that data analysis == text interface and visualization==GUI.
* Can we (should we)
incorporate data analysis functionality into this framework, or is it just
focused on visual analysis.
Randy: Yes and we should, particularly as
the complexity and size of
data grows, we begin to rely more heavily on
"data analysis" based
visualization.
Pat: I think you would also want to
include feature detection techniques.
For
large data analysis in particular, we
don't want to assume that the scientist
will want to do the analysis by visually
scanning through all the data.
JohnC: If visualization is ever going to
live up to the claim made by so many
in the viz community of
it being an indispensable tool for analsyis,
tight integration with
statistical tools and data processing
capabilities are a must. Otherwise
we'll just continue to make pretty pictures,
put on dog and pony shows,
and wonder where the users are.
Jim: YES.
Ilmi: Not all data analysis, but there are
lots of data analysis being used for
visual analysis and, the more tools are
provided initially, it gets easier
to make user-group become big. So, we can
list candidates.
JohnS: Vis is bullshit without seamless
integration with flexible data analysis methods. The most flexible methods available are text-based. The failure to integrate more powerful
data analysis features into contemporary 3D vis tools has been a serious
problem.
Wes: Ideally, the same machinery could be
used in both domains.
Notes: Data Analysis and Feature detection support constitutes more unique features for this framework.
How
do you then index your bag of features or have them properly refer back to the
Data
that led to their generation?
Sometimes detected feature is discrete marker,
Other
times, it is treated as a derived field.
Former case seems to again point to
Need
for robust indexing method.
JohnC
observation about data-analysis==text-based s very interesting!!! Does everyone agree?
Aspect
is one of few examples of providing a graphical workflow interface for
traditionally procedural/text-based data analysis tools.
* What kinds of data
analysis typically needs to be done in your field? Please give examples and how these functions are currently
implemented.
Randy: Obviously basic statistics (e.g.
moments, limits, etc). Regression
and model driven analysis are common. For example, comparison of
data/fields via comparison vs common
distance maps. Prediction of
activation "outliers" via general
linear models applied on an
element by element basis, streaming through
temporal data windows.
Pat: Around here there is interest in
vector-field topology feature detection
techniques, for instance, vortex-core
detection.
JohnC: Pretty much everything you can do
with IDL or matlab.
Jim: Simple sampling, basic statistical
averages/deviations, principal component
analysis (PCA, or EOF for climate folks),
other dimension reduction.
Typically implemented as C/C++ code... mostly slow serial... :-Q
JohnS: This question is targeted at vis
folks that have been focused on a particular scientific domain. For general use, I think of IDL as
being one of the most popular/powerful data analysis languages. Python has become increasingly
important -- especially with the Livermore numerical extensions and the
PyGlobus software. However, use of
these scripting/data analysis languages have not made the transition to
parallel/distributed-memory environments (except in a sort of data-parallel
batch mode).
* How do we incorporate
powerful data analysis functionality into the framework?
Randy: Hard work :), include support for
meta-data, consider support for
sparse data representations and include the
necessary support for
"windowing" concepts.
Pat: Carefully :-)? By striving not to make a closed
system.
JohnC: I'd suggest exploring leveraging
existing tools, numerical python for
example.
Jim: As components (duh)... :-)
We should define some "standard"
APIs for the desired analysis functions,
and then either wrap existing codes as
components or shoehorn in existing
component implementations from systems like
ASPECT.
JohnS: I'm very interested in work that
Nagiza has proposed for a parallel implementation of the R statistics
language. The traditional approach
for parallelizing scripting languages is to run them in a sort of MIMD mode of
Nprocs identical scripts operating on different chunks of the same
dataset. This makes it difficult
to have a commandline/interactive scripting environment. I think Nagiza is proposing to have an
interactive commandline environment that transparently manipulates distributed
actions on the back-end.
There is a similar work in progress on
parallel matlab at UC Berkeley.
Does anyone know of such an effort for Python? (most of the parallel python hacks I know of are essentially
MIMD which is not very useful).
2) Execution Model=======================
It will be necessary for us to agree on a common execution semantics for our components. Otherwise, while we might have compatible data structures but incompatible execution requirements. Execution semantics is akin to the function of protocol in the context of network serialization of data structures. The motivating questions are as follows;
á How is the
execution model affected by the kinds of algorithms/system-behaviors we want to
implement.
Pat: In general I see choices where at one
end of the spectrum we have
simple analysis techniques where most of
the control responsibilities
are handled from the outside. At the other end we could have more
elaborate techniques that may handle load
balancing, memory
management, thread management, and so
on. Techniques towards
the latter end of the spectrum will
inevitably be intertwined more
with the execution model.
Jim: Directly. There are probably a few main exec models we want to cover.
I don't think the list is *that* long...
As such, we should anticipate building
several distinct framework
environments that each exclusively support
a given exec model. Then
the trick is to "glue" these
individual frameworks together so they can
interoperate (exchange data and invoke each
others' component methods)
and be arbitrarily "bridged"
together to form complex higher-level
pipelines or other local/remote topologies.
Ilmi: I guess we can make each component
propagate/fire the execution of next
component/components in the
network/pipeline. Each component can use their
own memory or shared memory to access the
data in process. In such case,
algorithm of each component does not get
much affected by other coponents
around.
Wes: The "simple" execution
model is for the framework to invoke a component, be
notified of its completion, then invoke
the next component in the chain, etc.
Things get more interesting if you want to
have a streaming processing model.
Related, progressive processing is
somewhat akin to streaming, but more
stateful.
Note: WesÕ model sounds like a good ÒbaselineÓ
model. It does not allow for
chains-of-invokation and therefore prevents us from getting locked-in to
complex issues of comonent-local execution semantics and deadlock
prevention. Can we make this a
baseline component requirement?
á How then will a
given execution model affect data structure implementations
Pat: Well, there's always thread-safety
issues.
Jim: I don't think it should affect the
data structure impls at all, per se.
Clearly, the access patterns will be
different for various execution models,
but this shouldn't change the data
impl. Perhaps a better question is
how to indicate the expected access pattern
to allow a given data impl
to optimize or properly prefetch/cache the
accesses...
Note: Actually, that is the question. How do we pass information about access patterns so that you can do the kind of temporal caching that John Clyne wants? Its important to not do what VTK does, so that we donÕt have to un-do it again (as was the case for VisIt).
Ilmi: I guess we can make each component propagate/fire the execution
of next
component/components in the
network/pipeline. Each component can use their
own memory or shared memory to access the
data in process. In such case,
algorithm of each component does not get
much affected by other coponents
around.
JohnS: There will need to be some way to
support both declarative execution semantics, data-driven and demand-driven
semantics. By declarative
semantics, I mean support for environments that want to be in control of when
the component "executes" or interactive scripting environments that
wish to use the components much like subroutines. This is separate from the demands of very interactive
use-cases like view-dependent algorithms where the execution semantics must be
more automatic (or at least hidden from the developer who is composing the
components into an application). I
think this is potentially relevant to data model discussions because the
automatic execution semantics often impose some additional requirements on the
data structures to hand off tokens to one another. There are also issues involved with managing concurrent
access to data involved. For
instance, a demand-driven system demanded of progressive-update or
view-dependent algorithms, will need to manage the interaction between the
arrival of new data and asynchronous requests from the viewer to recompute
existing data as the geometry is rotated.
(note: Wes provides a more succinct
description of this execution semantics.)
Wes: We're back to the issue of needing a
DS/DM that supports multiresolution
models from the git-go. The relationship
between data analysis and vis data
models becomes more apparent here when we
start thinking about multires
representations of unstructured data, like
particle fields or point clouds.
Note: The fly in the ointment here is
that highly interactive methods like multires models and view-dep algorithms
are not well supported by completely simple/declarative semantics (unless you
have an incredibly complex framework, but then the framework would require
component-specific knowledge to schedule things properly).
á How will the
execution model be translated into execution semantics on the component
level. For example will we need to
implement special control-ports on our components to implement particular
execution models or will the semantics be implicit in the way we structure the
method calls between components.
Pat: Not sure.
Jim: Components should be "dumb"
and let other components or the framework invoke
them as needed for a given execution
model. The framework dictates the
control flow, not the component. The API shouldn't change.
If you want multi-threaded components, then
the framework better support
that, and the API for the component should
take the possibility into account.
JohnS: I'm going to propose that we go after
the declarative semantics first (no automatic execution of components) with
hopes that you can wrap components that declare such an execution model with
your own automatic execution semantics (whether it be a central executive or a
distributed one). This follows the
paradigm that was employed for tools such as VisIt that wrapped each of the
pieces of the VTK execution pipeline so that it could impose its own execution
semantics on the pipeline rather than depending on the exec semantics that were
predefined by VTK. DiVA should
follow this model, but start with the simplest possible execution model so that
it doesn't need to be deconstructed if it fails to meet the application
developer's needs (as was the case with VisIt).
We should have at least some discussion to
ensure that the *baseline* declarative execution semantics imposes the fewest
requirements for component development but can be wrapped in a very
consistent/uniform/simple manner to support any of our planned pipeline
execution scenarios. This is an
excercise in making things as simple as possible, but thinking ahead far enough
about long-term goals to ensure that the baseline is "future proof"
to some degree.
Wes: One thing that always SUPREMELY
annoyed me about AVS was the absence of a
"stop" button on the modules.
This issue concerns being able to interrupt
a module's processing when it was taking
too long. Related, it might be nice
to have an execution model that uses the
following paradigm: "OK, time's up,
give me what you have now."
Note: That is another way that the exec model is going
to affect structures (or at least the API for accessing those structures). We can refer to this as Òconcurrent
access to dataÓ. Do we have to
incorporate locking semantics in the data accessors? Do we have to incorporate firing protocol into the accessors
(or at least hints as to firing constraints)? Again, we donÕt want to get into a VisIt situation.
Automatic exec semantics do not give the framework
enough control to address WesÕ issue.
Certainly this is an issue with VTK as well. How do we formulate this as a requirement?
What kinds of execution models should be
supported by the distributed visualization architecture
* View dependent
algorithms? (These were typically quite difficult to implement for dataflow
visualization environments like AVS5).
Randy: I propose limited enforcement of fixed
execution semantics. View/data/focus
dependent environments are common and need
to be supported, however, they
are still tied very closely with data
representations, hence will likely
need to be customized to application
domains/functions.
Pat: Not used heavily here, but would be
interesting. A "want".
JohnC: These are neat research topics, but
i've never been convinced that they
have much application beyond IEEEViz
publications. Mostly I believe
this because of the complexity they impose
on the data model. Better to
simply offer progressive/multiresolution
data access.
Jim: Want.
Ilmi: I like to say "must", but
it is for improving usability and efficiency,
so people may live without it.
It will definitely improve the efficiency.
If we want to support view
dependent algorithm, then we should
consider it from the beginning of the
dataflow design, so it can be easily
integrated into. View dependent or
image-based algorithm doesn't necessarily
make much changes to existing data
flow design. View dependant or image-based
algorithms are useful to
eliminate majority of data blocks from the
rendering pipeline. Therefore, it
is good to provide capability to choose
subset of data to be rendered from
the dataflow.
JohnS: Must be supported, but not as a
basline exec model.
Wes: Yes
* Out-of-core algorithms
Randy: This has to be a feature, given the
focus on large data.
Pat: A "must" for us.
JohnC: Seems like a must for large data. But
is this a requirement or a design
issue?
Jim: Must. This is a necessary evil of "big data". You need some killer
caching infrastructure throughout the
pipeline (e.g. like VizCache).
JohnS: Same deal. We must work out what kinds of attributes are required of
the data structures/data model to represent temporal decomposition of a
dataset. We should not encode the
execution semantics as part of this (it should be outside of the component),
but we must ensure that the data interfaces between components are capable of
representing this kind of data decomposition/use-case.
Wes: Yes
* Progressive update and
hierarchical/multiresolution algorithms?
Randy: Obviously, I have a bias here,
particularly in the remote visualization
cases.
Remote implies fluctuations in effective data latency that make
progressive systems key.
Pat: A "want".
JohnC: This is the way to go (IMHO), the
question is at what level to support
it.
Jim: Must
Ilmi: MUST! for improving usability and efficiency. And can be used to
support
view-dependent algorithm.
JohnS: Likewise, we should separate the
execution semantics necessary to implement this from the requirements imposed
on the data representation. Data
models in existing production data analysis/visualization systems often do not
provide an explicit representation for such things as multiresolution
hierarchies. We have LevelOfDetail
switches, but that seems to be only a week form of representation for these
hierarchical relationships and limits the effectivness of algorithms that
depend on this method of data representation. Those requirements should not be co-mingled with the actual
execution semantics for such components (its just the execution interface)
Wes: Yes. All of the above. Go team!
* Procedural execution
from a single thread of control (ie. using an commandline language like IDL to
interactively control an dynamic or large parallel back-end)
Randy: Yep, I think this kind of control is
key.
Pat: A "want".
JohnC: A must for data analysis and data
manipulation (derving new fields, etc)
Jim: This is not an execution model, it is
a command/control interface issue.
You should be able to have a GUI,
programmatic control, or scripting to
dictate interactive control (or "steering"
as they call it... :-). The
internal software organization shouldn't
change, just the interface to
the outside (or inside) world...
Ilmi: Good to have
JohnS: This should be our primary initial
target. I do not have a good
understanding of how best to support this, but its clear that we must ensure
that a commandline/interactive scripting language must be supported. Current data parallel scripting
interfaces assume data-parallel, batch-mode execution of the scripting
interpreters (this is a bad thing).
Wes: HIstorically, this approach has
proven to be very useful.
* Dataflow execution
models? What is the firing method
that should be employed for a dataflow pipeline? Do you need a central executive like AVS/OpenDX or,
completely distributed firing mechanism like that of VTK, or some sort of
abstraction that allows the modules to be used with either executive paradigm?
Randy: I think this should be an option as
it can ease some connection
mechanisms, but it should not be the sole
mechanism. Personally,
I find a properly designed central executive
making "global"
decisions coupled with demand/pull driven
local "pipelets" that
allow high levels of abstraction more useful
(see the VisIt model).
Pat: Preferably a design that does not
lock us in to one execution model.
JohnC: We use a wavelet based approach
similar to space filling curves. Both
approaches have merrit and both should be
supportable by the framework.
Jim: Must This should be an implementation
issue in the "dataflow framework", and
should not affect the component-level APIs.
JohnS: This can probably be achieved by
wrapping components that have explicit/declarative execution semantics in a
Òcomponent-within-a-componentÓ hierarchical manner. Its an open question as to whether these execution models
are a function of the component or the framework that is used to compose the
components into an application though.
Wes: I get stuck thinking about the UI for
this kind of thing rather than
the actual implementation. I'll defer to
others for opinions.
Note: StrangeÉ I would have tagged SFCÕs and Wavelets as ÒresearchyÓ things.
* Support for novel data
layouts like space-filling curves?
Randy: With the right accessors nothing special needs to be added for these.
Pat: Not a pressing need here, as of yet.
Jim: Must. But this isn't an execution model either. It's a data structure
or algorithmic detail...
JohnS: I don't understand enough about such
techniques to know how to approach this.
However, it does point out that it is essential that we hand off data
structures via accessors that keep
the internal data structures opaque rather than complex data structures.
á Are there special
considerations for collaborative applications?
Jim: Surely. The interoperability of distinct framework implementations
ties in with this... but the components shouldn't be aware
that they
are being run
collaboratively/remotely...
definitely a framework issue.
Ilmi: Some locking mechanizm for subset of
data or dispatching of changes from
one client to multiple clients.
JohnS: Ugh. I'm also hoping that collaborative applications only impose
requirements for wrapping baseline components rather than imposing internal
requirements on the interfaces that exchange data between the components. So I hope we can have
"accessors" or "multiplexor/demultiplexor" objects that
connect to essentially non-collaboration-aware components in order support such
things. Otherwise, I'm a bit
daunted by the requirements imposed.
Note: The danger of pushing the collaborative
functionality out to a Òframework issueÓ is that we increasingly make the
ÒframeworkÓ a heavyweight object. It creates a high-cost-of-entry for any such
feature or even minor modifications to such features. Learned from ÒCactusÓ the
importance of making the framework as slender as possible and move as much
functionality as possible into Òoptional componentsÓ to support feature X. So it is important to ensure that we
push off these issues as much as possible.
* What else?
Randy: The kitchen sink? :)
Pat: Distributed control? Fault tolerance?
Jim: Yeah Right.
Wes: Control data, performance data and
framework response to and manipulation
of such data.
How will the execution model affect our
implementation of data structures?
Jim: It shouldn't. The execution model should be kept
independent of the
data structures as much as possible.
If you want to build higher-level APIs for
specific data access patterns
that's fine, but keep the underlying data
consistent where possible.
Note: The description of this as affecting our Òdata structuresÓ is an artifact of attempting to straddle the dual-goals of addressing internal data structures and external accessors. So perhaps this should be Òhow will it affect our accessors.Ó
* how do you decompose a data structure such that it is amenable to streaming in small chunks?
Randy: This is a major issue and relates to
things like out-of-core/etc.
I definitely feel that "chunking"
like mechanisms need to be in
the core interfaces.
Pat: Are we assuming streaming is a
requirement?
How do you handle visualization algorithms
where the access patterns
are not known a priori? The predominant example: streamlines
and streaklines.
Note the access patterns can be in both
space and time. How do you avoid
having each analysis technique need to
know about each possible data
structure in order to negotiate a
streaming protocol? How do add
another
data structure in the future without
having to go through all the analysis
techniques and put another case in their
streaming negotiation code?
In FM the fine-grained data access
("accessors") is via a standard
interface. The evaluation is all lazy. This design means more
function calls, but it frees the analysis
techniques from having to know
access patterns a priori and negotiate
with the data objects. In FM
the data access methods are virtual
functions. We find the overhead
not to be a problem, even with relatively
large data. In fact, the overhead
is less an issue with large data because
the data are less likely to be
served up from a big array buffer in
memory (think out-of-core, remote
out-of-core, time series, analytic meshes,
derived fields, differential-
operator fields, transformed objects,
etc., etc.).
The same access-through-an-interface approach
could be done without
virtual functions, in order to squeeze out
a little more performance, though
I'm not convinced it would be worth
it. To start with you'd probably
end up
doing a lot more C++ templating. Eliminating the virtual functions would
make it harder to compose things at
run-time, though you might be able
to employ run-time compilation techniques
a la SCIRun 2.
Jim: This sounds a lot like distributed
data decompositions. I suspect
that
given a desired block/cycle size, you can
organize/decompose data in all
sorts of useful ways, depending on the
expected access pattern.
In conjunction with this, you could also
reorganize static datasets
into filesystem databases, with appropriate
naming conventions or
perhaps a special protocol for lining up
the data blob files in the
desired order for streaming (in either time
or space along any axis).
Meta-data in the files might be handy here,
too, if it's indexed
efficiently for fast
lookup/searching/selection.
JohnS: The recent SDM workshop pointed out
that chunking/streaming interfaces are going to be essential for any data
analysis system that deals with large data, but there was very little agreement
on how the chunking should be expressed.
The chunking also potentially involves end-to-end requirements of the
components that are assembled in a pipeline as you must somehow support
uniformity in the passage of chunks through the system (ie. the decision you
make about the size of one chunk will impose requirements for all other
dependent streaming interfaces in the system). We will need to walk through at least one use-case for
chunking/streaming to get an idea of what the constraints are here. It may be too tough an issue to tackle
in this first meeting though.
Also, as Pat pointed out, when dealing with
vis techniques like streamlines, you almost need to have a demand-based
fetching of data. This implies
some automatic propagation of requests through the pipeline. This will be hard, and perhaps not
supported by a baseline procedural model for execution.
Note: Again, it appears we need to have clear delineation between temporal and spatial dependencies. To support streaming, one must also have dependent components be able to report back their constraints.
Jim, how can we formulate a requirement
that the execution model is independent of the data structures when we really
donÕt have data structures per-se.
Because we are using accessors, calling them will in turn cause a
component to call other accessors.
If we do not have common execution semantics, then this will be a
complete muddle even if we do agree on our port standards. So can we really keep these things
independent?
* how do you represent
temporal dependencies in that model?
Randy: I need to give this more thought,
there are a lot of options.
Pat: In FM, data access arguments have a
time value, the field interface is
the same for both static and time-varying
data.
Jim: Meta-data, or file naming
conventions...
JohnS: Each item in a datastructure or as
passed-through via an accessor needs to have some method of referring to
dependencies both spatial (ie. interior boundaries caused by domain
decomposition) and temporal. Its
important to make these dependencies explicit in the data structures provide a
framework the necessary information to organize parallelism in both the
pipeline and data-parallel directions.
The implementation details of how to do so are not well formulated and
perhaps out-of-scope for our discussions.
So this is a desired *requirement* that doesn't have a concrete
implementation or design pattern involved.
Note: Given the importance of
time-varying data to JohnC and Pat, it seems important to come up with a formal
way to represent these things.
* how do you minimize
recomputation in order to regenerate data for view-dependent algorithms.
Randy: Framework invisible caching. Not a major Framework issue.
Pat: Caching? I don't have a lot of experience with view-dependent
algorithms.
Jim: No clue.
JohnS: I don't know. I'm hoping someone else responding to
this survey has some ideas on this.
I'm uncertain how it will affect our data model requirements.
Note: Is caching a framework issue? Or is it a component issue?
What are the execution semantics necessary
to implement these execution models?
* how does a component
know when to compute new data? (what is the firing rule)
Randy: Explicit function calls with
potential async operation. A
higher-level
wrapper can make this look like
"dataflow".
Jim: There are really only 2 possibilities
I can see - either a component is
directly invoked by another component or
the framework, or else a method
must be triggered by some sort of dataflow
dependency or stream-based
event mechanism.
JohnS: For declarative semantics, the firing
rule is an explicit method call that is invoked externally. Hopefully such objects can be *wrapped*
to encode semantics that are more automatic (ie. the module itself decides when
to fire depending on input conditions), but initially it should be explicit.
Wes: To review, the old AVS model said
that a module would be executed if any
of its parameters changed, or if its input
data changed. One thing that
was annoying was that you had to
explicitly disable the flow executive if
you wanted to make changes to multiple parameters
on a single module before
allowing it to execute. This type of thing
came up when using a module with
a long execution time.
* does coordination of the component execution require a central executive or can it be implemented using only rules that are local to a particular component.
Randy: I think the central executive can be
an optional component (again, see
VisIt).
Jim: This is a framework implementation
detail. No. No. Bad Dog.
The component doesn't know what's outside
of it (in the rest of the
framework, or the outside world). It only gets invoked, one way or
another.
JohnS: It can eventually be implemented
using local semantics, but intiially, we should design for explicit external
control.
Wes: Not sure what this means.
Note: And it potentially invokes other components. If a component invokes other components and thereby creates a chain of execution, then we have an execution semantics that is outside of the frameworkÕs control. So, do we want to prevent this in our baseline requirements for component invocation? The central executive approach says that our ÒbaselineÓ components may not invoke another component in response to their invocation. This seems to be a component invocation semantics issue.
* how elegantly can
execution models be supported by the proposed execution semantics? Are there some things, like loops or
back-propagation of information that are difficult to implement using a
particular execution semantics?
Randy: There will always be warts...
Pat: The execution models we have used
have kept the control model in
each analysis technique pretty simple,
relying on an external executive.
The one big exception is with
multi-threading. We've
experimented with
more elaborate parallelism and
load-balancing techniques, motivated in
part by latency hiding desires.
Jim: We need to keep the different
execution models separate, as implementation
details of individual frameworks. This separates the concerns here.
JohnS: Its all futureware at this
point. We want to first come up
with clear rules for baseline component execution and then can come up with
some higher level / automatic execution semantics that can be implemented by
*wrapping* such components. The "wrapper"
would then take responsibility for imposing higher-level automatic semantics.
Wes: The dataflow thing doesn't lend
itself well to things like view dependent
processing where the module at the end of
the chain (renderer) sends view
parameters back upstream, thereby causing
the network to execute again, etc.
The whole upstream data thing is a
"wart on the ass of" AVS. (sorry)
How will security considerations affect
the execution model?
Randy: Security issues tend to impact two
areas: 1) effective bandwidth/latency
and 2) dynamic connection problems. 1) can be unavoidable, but will not
show up in most environments if we design
properly. 2) is a real problem
with few silver bullets.
Pat: More libraries to link to? More latency in network communication?
Jim: Ha ha ha ha...
They won't right away, except in
collaboration scenarios.
Think "One MPI Per Framework" and
do things the old fashioned way
locally, then do the "glue" for
inter-framework connectivity with
proper authentication only as needed. (No worse than Globus... :-)
JohnS: I don't know. Please somebody tell me if this is
going to be an issue. I don't have
a handle on the *requirements* for security. But I do know that simply using a secure method to *launch*
a component is considered insufficient by security people who would also require
that connections between components be explicitly authenticated as well. Most vis systems assume secure
launching (via SSH or GRAM) is sufficient. The question is perhaps whether security and authorization
are a framework issue or a component issue. I am hoping that it is the former (the role of the framework
that is used to compose the components).
Note: Current DOE security policy basically dictates that we cannot deploy current distributed vis tool implementations because the connections are not authenticated. Ensight is an exception because the server is always making an outgoing connection (basically makes it an issue for the destination site) and requires explicit ÒacceptÓ of the connection.
3) Parallelism and
load-balancing=================
Thus far, managing parallelism in
visualization systems has been a tedious and difficult at best. Part of this is a lack of powerful
abstractions for managing data-parallelism, load-balancing and component control.
JohnS: If we are going to address inter-component
data transfers to the exclusion of data structures/models internal to the
component, then much of this section is moot. The only question is how to properly represent
data-parallel-to-data-parallel transfers and also the semantics for expressing
temporal/pipeline parallelism and streaming semantics. Load-balancing becomes an issue that is
out-of-scope because it is effectively something that is inside of components
(and we don't want to look inside of the components
Please describe the kinds of parallel
execution models that must be supported by a visualization component
architecture.
á data-parallel/dataflow
pipelines?
JohnS: Must
Wes: It would be nice if the whole scatter/gather thing could be
marshaled
by the framework. That way, my
SuperSlick[tm] renderer wouldn't contain
a bunch of icky network code that manages
multiple socket connections
from an N-way parallel vis component. One
interesting problem is how a
persistent tool, like a renderer, will be
notified of changes in data
originating from external components. I
want some infrastructure that
will make obsolete me having to write
custom code like this for each
new project.
Note: Seriously. Is it really a very useful paradigm to have the framework represent parallel components as one-component-per-processor? Its seems very ÔickyÕ as Wes says.
* master/slave
work-queues?
Randy: I tend to use small dataflow
pipelines locally and higher-level
async streaming work-queue models globally.
Jim: Must
JohnS: Maybe: If we want to support
progressive update or heterogeneous execution environments. However, I usually
donÕt consider this methodology scalable.
* streaming update for
management of pipeline parallelism?
Randy: Yes, we use this, but it often
requires a global parallel filesystem to
be most effective.
Jim: Must
JohnS: Must
* chunking mechanisms
where the number of chunks may be different from the number of CPU's employed
to process those chunks?
Randy: We use spacefilling curves to reduce
the overall expense of this
(common) operation (consider the compute/viz
impedance mismatch
problem as well). As a side effect, the codes gain cache coherency
as well.
Pat: We're pretty open here. Mostly straight-forward work-queues.
Jim: This sounds the same as master/slave
to me, as in "bag of tasks"...
JohnS: Absolutely. Of course, this would possibly be implemented as a
master/slave work-queue, but there are other methods.
* how should one manage
parallelism for interactive scripting languages that have a single thread of
control? (eg. I'm using a
commandline language like IDL that interactively drives an arbitrarily large
set of parallel resources. How can
I make the parallel back-end available to a single-threaded interactive thread
of control?)
Randy: Consider them as "scripting
languages", and have most operations
run through an executive (note the executive
would not be aware
of all component operations/interactions, it
is a higher-level
executive). Leave RPC style hooks for specific references.
Pat: I've used Python to control multiple
execution threads. The (C++)
data objects are thread safe, the minimal
provisions for thread-safe
objects in Python haven't been too much of
a problem.
Jim: Broadcast, Baby... Either you blast the commands out to
everyone SIMD
style (unlikely) or else you talk to the
Rank 0 task and the command
gets forwarded on a fast internal network.
JohnS: I think the is very important and a
growing field of inquiry for data analysis environments. Whatever agreements we come up with, I
want to make sure that things like parallel R are not left out in these
considerations.
Note: But CCA doesnÕt support
broadcast. This leads to a
quandary here because we want to be able to be able to adjust parameters for a
component via the GUI or via a command from another component
interchangeably. So I agree with
Òbroadcast babyÓ, but I donÕt see that it is feasible to push this off as a
Òframework issueÓ as it may well need to be something the component interface
description must support.
Please describe your vision of what kinds
of software support / programming design patterns are needed to better support
parallelism and load balancing.
* What programming model
should be employed to express parallelism.
(UPC, MPI, SMP/OpenMP, custom sockets?)
Randy: The programming model must transcend
specific parallel APIs.
Jim: All but UPC will be necessary for
various functionality.
JohnS: If we are working just on the outside
of components, this question should be moot. We must make sure the API is not affected by these choices
though.
Wes: This discussion may follow the same
path as the one about DS/DM for grids.
The answer seems to be "one size
doesn't fit all, but there is no 'superset'
the makes everyone happy." That said,
there is likely a set of common issues
wrt execution and DS/DM that underly
parallel components regardless of
implementation.
Note: Since we are talking about functionality outside of the component, this seems reasonable. So this really requires clarification of where the parallelism is expressed. Caffeine wants to express this as parallel sets of components. However, this seems unreasonable for some of the communication patterns we deal with. If we stated that such parallelism is inside of the component wrapper, then what? (at minimum, we donÕt have to answer this question!)
* Can you give some
examples of frameworks or design patterns that you consider very promising for
support of parallelism and load balancing.
(ie. PNNL Global Arrays or Sandia's
Zoltan)
http://www.cs.sandia.gov/Zoltan/
http://www.emsl.pnl.gov/docs/global/ga.html
Randy: no I cannot (am not up to speed).
Jim: Nope, that covers my list of hopefuls.
JohnS: Also out of scope. This would be something employed within
a component, but if we are restricting discussions to what happens on the
interface between components, then this is also a moot point. At minimum, it will be important to
ensure that such options will not be precluded by our component interfaces.
Wes: Mabye should include "remote
resource management" in this thread. I'm
thinking of the remote AVS module
libraries. So, not only is there the issue
of launching parallel components, and load
balancing (not sure how this will
play out), but also one of allowing a user
to select, at run time, from
among a set of resources.
This problem becomes even more interesting
when the pipeline optimization
starts to happen, and components are
migrated across resources.
Note: This is now somewhat out-of-scope for discussions of inter-component communication.
* Should we use novel software abstractions for expressing parallelism or should the implementation of parallelism simply be an opaque property of the component? (ie. should there be an abstract messaging layer or not)
Randy: I would vote no as it will allow
known paradigms to work, but will
interfere with research and new direction
integration. I think some
kind of basic message abstraction (outside
of the parallel data system)
is needed.
Jim: It's not our job to develop
"novel" parallelism abstractions. We should
just use existing abstractions like what
the CCA is developing.
JohnS: Implementation of parallelism should
be an opaque property of the component.
We want to have language independence. We should also strive to support independence in the
implementation of parallelism.
Creating a software abstraction layer for messaging and shmem is a
horrible way to do it.
* How does the NxM work
fit in to all of this? Is it
sufficiently differentiated from Zoltan's capabilities?
Randy: Unable to comment...
Pat: I don't have a strong opinion
here. I'm not familiar with Zoltan
et al.
Our experience with parallelism tends to
be more shared-memory than
distributed memory.
JohnC: Hmm. These all seem to be
implementation issues. Too early to answer.
JohnS: I need a more concrete understanding
of MxN. I understand what it is
supposed to do, but I'm not entirely sure what requirements it would impose on
any given component interface implementation. It seems like something our component data interfaces should
support, but perhaps such redistribution could be hidden inside of an MxN
component? So should this kind of
redistribution be supported by the inter-component interface or should there be
components that explicitly effect such data redistributions? Jim... Help!
Jim: I don't know what Zoltan can do
specifically, but MxN is designed for
basic "parallel data
redistribution". This means
it is good for doing
big parallel-to-parallel data
movement/transformations among two disparate
parallel frameworks, or between two
parallel components in the same
framework with different data
decompositions. MxN is also good
for
"self-transpose" or other types
of local data reorganization within a
given (parallel) component.
MxN doesn't do interpolation in space or
time (yet, probably for a while),
and it won't wash your car (but it won't
drink your beer either... :-).
If you need something fancier, or if you
don't really need any data
reorganization between the source and
destination of a transfer, then
MxN *isn't* for you...
===============End of Mandatory Section
(the rest is voluntary)=============
4) Graphics and Rendering=================
What do you use for converting geometry
and data into images (the rendering-engine). Please comment on any/all of the following.
* Should we build
modules around declarative/streaming methods for rendering geometry like
OpenGL, Chromium and DirectX or should we move to higher-level representations
for graphics offered by scene graphs?
Randy: IMHO, the key is defining the
boundary and interoperability constraints.
If these can be documented, then the
question becomes moot, you can
use whatever works best for the job.
Ilmi: It is usually useful to have access
to frame buffer so, I prefer OpenGL
style over VRML style.
In addition, I don't know how useful the
scene graphs for visualization. I
guess scene graphs for visualizations are
relatively simple, so it is
possible to convert the scene graphs to
declarative way. So, mainly support
declarative methods and then additional
support of scen graphs and
conversions to declarative methods.
JohnS: This all depends on the scope of the
framework. A-priori, you can
consider the rendering method separable and render this question moot. However, this will make it quite
difficult to provide very sophisticated support for progressive update,
image-based-methods, and view-dependent algorithms because the rendering engine
becomes intimately involved in such methods. I'm concerned that this is where the component model might
break down a bit. Certainly the
rendering component of traditional component-like systems like AVS or NAG
Explorer the most heavy-weight and complex components of the entire environment.
Often, the implementation of the rendering component would impose certain
requirements on components that had to interact with it closely (particularly
in the case of NAG/Iris Explorer where you were really directly exposed to the
fact that the renderer was built atop of OpenInventor).
So, we probably cannot take on the issue of
renderers quite yet, but we are eventually going to need to define a big
"component box" around OpenGL/Chromium/DirectX. That box is going to have to be carefully built so as to keep from precluding any
important functionality that each of those rendering engines can offer. Again, I wonder if we would need to
consider scene graphs if only to offer a persistent datastructure to hand-off
to such an opaque rendering engine. This isn't necessarily a good thing.
Wes: As a scene graph proponent, I would
say that you don't build component
architectures around scene graphs. That
concept doesn't make any sense to me.
Instead, what you do is have DS/DM
representations/encapsulations for the
results of visualization. These are things
like buckets-o-triangles, perhaps
at multiple resolutions. You also provide
the means to send renderer information
to vis components to do view-dependent
processing, or some other form of
selective processing.
Similarly, you don't make the output of
visualization components in the form
of glBegin()/glEnd() pairs, either.
Note: It sounds like we need to look at
IlmiÕs work and ensure that whatever method we select to get the ÒdrawablesÓ to
the ÒrendererÓ that it not preclude her requirements.
What are the pitfalls of building our
component architecture around scene graphs?
Randy: Data cloning, data locking and good
support for streaming, view dependent,
progressive systems.
JohnC: Not so good for time varying data
last time I checked.
Ilmi: might lose access to frame buffer
and pixel level manipulation --
extremely difficult for view dependent or
image-based approach
JohnS: It will add greatly to the complexity
of this system. It also may get in
the way of novel rendering methods like Image-based methods.
Wes: Back to the scene graph issue - what
you allow for is composition of streams
of data into a renderer. Since view
position information is supported as a
first class DS/DM citizen (right?) it becomes
possible to compose a
rendering session that is driven by an
external source.
Nearly all renderers use scene graph
concepts - resistance is futile! The
weak spot in this discussion concerns
streaming. Since scene graphs systems
presume some notion of static data, the
streaming notion poses some problems.
They can be surmounted by adding some
smarts to the rendering and the
data streaming - send over some bounding
box info to start with, then allow
the streaming to happen at will. The
renderer could either then not render
that tree branch until transmission is
complete, or it could go ahead and
render whatever is in there at the time.
Middle ground could be achieved
with progressive transmission, so long as
there are "markers" that signal
the completion of a finished chunk of data
to be rendered.
Some people's "complaints" about
scene graphs stem from bad designs
and bad implementations. A "scene
graph system" is supposed to be
an infrastructure for storing scene data
and rendering. That ought to
include support for image-based methods,
even though at first blush
it seems nonsensical to talk about
buckets-o-triangles in the same
breath as normal maps. All interactive
rendering systems are fundamentally
created equally in terms of intent &
design. The implementation varies.
Among the top items in the
"common" list is the need to store data, the
need to specify a viewpoint, and the need
to propogate transformation
information. Beyond that, it's merely an
implementation issue.
I caution against spending too much time
worrying about how scene graphs
fit into DiVA because the issue is largely
a red herring.
* What about Postscript,
PDF and other scale-free output methods for publication quality graphics? Are pixmaps sufficient?
Randy: Gotta make nice graphs. Pixmaps will not suffice.
JohnC: Well what are we trying to provide,
an environment for analysis or
producing images for publications? The
latter can be done as a post
process and should not, IMHO, be a focus of
DIVA.
JohnS: Pixmaps are insufficient. Our data analysis infrastructure has
been moving rapidly away from scale-free methods and rapidly towards
pixel-based methods. I don't know
how to stop this slide or if we are poised to address this issue as we look at
this component model.
Wes: Gotta have vector graphics.
In a distributed environment, we need to
create a rendering subsystem that can flexibly switch between drawing to a
client application by sending images, sending geometry, or sending geometry
fragments (image-based rendering)?
How do we do that?
Randy: See the Chromium approach. This is actually more easily done than
one might think. Define an image "fragment" and augment the
rendering
pipeline to handle it (ref: PICA and
Chromium).
JohnC: Use Cr
Jim: I would think this could be achieved
by a sophisticated data communication
protocol - one that encodes the type of
data in the stream, say, using XML
or some such thingy.
Wes: Again, one size doesn't fit all.
These seem to be logically different components.
Note: So we are going to define the OpenGL/Cr API as a ÒportÓ interface in CCA? All of GL will go through RMI?
* Please describe some rendering models
that you would like to see supported (ie. view-dependent update, progressive
update) and how they would adjust dynamically do changing objective functions
(optimize for fastest framerate, or fastest update on geometry change, or
varying workloads and resource constraints).
Randy: See the TeraScale browser system.
JohnC: Not sold on view dependent update as
worthwhile, but progressive updates
can be hugely helpful. Question is do you
accomplish this by adding
support in the renderer or back it up the
pipeline to the raw data?
JohnS: I see this as the role for the
framework. It also points to the
need to have performance models and performance monitoring built in to every
component so that the framework has sufficient information to make effective
pipeline deployment decisions in response to performance constraints. It also points to the fact that at some
level in this component architecture, component placement decisions must be
entirely abstract (but such a capability is futureware).
So in the short-term its important to design
components with effective interfaces for collecting performance data and
representing either analytic or historical-based models of that data. This is a necessary baseline to get to
the point that a framework could use such data to make intelligent
deployment/configuration decisions for a distributed visualization system.
Wes: The scene graph treatise (above)
covers most of what I have to say for now.
* Are there any good
examples of such a system?
Randy: None that are ideal :), but they are
not difficult to build.
JohnC: Yes, Kitware's not-for-free volume
renderer (volren?). I does a nice job
with handling progressive updates. This is
mostly handled by the GUI but
places some obvious requirements on the
underlying rendering/viz
component.
JohnS: No. ThatÕs why we are here.
Wes: I know of a couple of good scene
graphs that can form the basis for renderers.
What is the role of non-polygonal methods
for rendering (ie. shaders)?
á Are you using any
of the latest gaming features of commodity cards in your visualization systems
today?
JohnC: Yup, we've off loaded a couple of
algorithms from the CPU.
We just have some very simple, one-off
applications that off-load
computation from the cpu to gpu. For
example, we have a 2D Image Based
Flow Visualization algorithm that exploits
vertex programmability to do
white noise advection. Developing this type
of application within
any Diva framework I've envisioned would
really push the limits of
anything we've discussed.
JohnS: I'd like to know if anyone is using
shader hardware. I don't know much
about it myself, but it points out that we need to plan for non-polygon-based
visualization methods. Its not
clear to me how to approach this yet.
* Do you see this changing in the future? (how?)
Randy: This is a big problem area. Shaders are difficult to
combine/pipeline.
We are using this stuff now and I do not see
it getting much easier
(hlsl does not fix it). At some point, I believe that
non-polygon
methods will become more common that polygon
methods (about 3-4 years?).
Poylgons are a major bottleneck on current
gfx cards as they limit
parallelism. I'm not sure what the fix will be but it will still be
called OpenGL :).
JohnC: The biggest issue is portability, but
things are looking up with OpenGL
2.0 efforts, etc.
Wes: We've invited Ilmi Yoon to the next
workshop. She represents the IBR
community. I am very keen to see us take
advantage of IBR techniques as well
as our traditional polygon engines,
perhaps combining them in interesting
ways to realize powerful new systems.
Note: Do scene graphs somewhat address portability issues via further abstraction of the rendering procedure?
5) Presentation=========================
It will be necessary to separate the
visualization back-end from the presentation interface. For instance, you may want to have the
same back-end driven by entirely different control-panels/GUIs and displayed in
different display devices (a CAVE vs. a desktop machine). Such separation is also useful
when you want to provide different implementations of the user-interface
depending on the targeted user community.
For instance, visualization experts might desire a dataflow-like
interface for composing visualization workflows whereas a scientists might
desire a domain-specific dash-board like interface that implements a specific
workflow. Both users should be
able to share the same back-end components and implementation even though the
user interface differs considerably.
How do different presentation devices
affect the component model?
Jim: Not. The display device only affects resolution or bandwidth
required.
This could be parameterized in the
component invocations APIs, but
should not otherwise change an individual
component.
If you want a "multiplexer" to
share a massive data stream with a powerwall
and a PDA, then the "multiplexer
component" implementation handles that...
* Do different display
devices require completely different user interface paradigms? If so, then we must define a clear
separation between the GUI description and the components performing the
back-end computations. If not,
then is there a common language to describe user interfaces that can be used
across platforms?
Randy: I think they do (e.g. immersion).
Jim: No. Different GUIs should all map to some common framework
command/control
interface. The same functions will ultimately get executed, just from
buttons
with different labels or appl-specific
short-cuts... The UIs should all
be
independent, but talk the same protocol to
the framework.
Yuk (with regard to creating separation
between GUI and component description)
JohnS: Systems that attempt to use the same
GUI paradigm across different presentation media have always been terrible in
my opinion. I strongly believe
that each presentation medium requires a GUI design that is specific to that
particular medium. This imposes a
strong requirement that our compute pipeline for a given component architecture
be strictly separated from the GUI that controls the parameters and presents
the visual output of that pipeline.
OGSA/WSDL has been proposed as one way to define that interface, but it
is extremely complex to use. One
could use CCA to represent the GUI handles, but that might be equally
complex. Others have simply
customized ways to use XML descriptions of their external GUI interface handles
for their components. The latter
seems much simpler to deal with, but is it general enough?
* Do different display
modalities require completely different component/algorithm implementations for
the back-end compute engine?
(what do we do about that??)
Randy: They can (e.g. holography), but I do
not see a big problem there.
Push the representation through an
abstraction (not a layer).
Jim: Algorithm maybe, component no. This could fall into the venue of the
different execution-model-specific
frameworks and/or their bridging...
I dunno.
JohnS: I think there is a lot of opportunity
to share the back-end compute engines across different display modalities. There are some cases where a developer
would be inclined to implement things like an isosurfacer differently for a
CAVE environment just to keep the framerates up high-enought to maintain your
sense of immersion. However, I
think of those as edge-cases.
What Presentation modalities do you feel are important and what do you consider the most important.
* Desktop graphics
(native applications on Windows, on Macs)
Randy: #1 (by a fair margin)
JohnC: This is numero uno by a HUGE margin
Jim: MUST
JohnS: #1
Wes: Yes, most important, will never go away.
* Graphics access via
Virtual Machines like Java?
Randy: #5
JohnC: Not important
Jim: Ha ha ha haÉ
JohnS: #5
Wes: If it works on desktops, it will work
in these environments.
* CAVEs, Immersadesks,
and other VR devices
Randy: #4
JohnC: Not important
Jim: Must
JohnS: #4
Wes: Second to workstations. With
evolution of Chromium, DMX and the nascent
PICA stuff, I would expect that desktop
tools would port transparently
to these devices.
* Ultra-high-res/Tiled
display devices?
Randy: #3 - note that tiling applies to
desktop systems as well, not
necessarily high-pixel count displays.
JohnC: Moderately important
Jim: Must
JohnS: #3 : the next tiled display may well
be your next *desktop* display, but quite yet.
* Web-based
applications?
Randy: #2
JohnC: Well, maybe.
Jim: Probably a good idea. Someone always asks for this... :-Q
JohnS: #2
What abstractions do you think should be
employed to separate the presentation interface from the back-end compute
engine?
Jim: Some sort of general protocol
descriptor, like XML...? Nuthin
fancy.
* Should we be using CCA
to define the communication between GUI and compute engine or should we be
using software infrastructure that was designed specifically for that space?
(ie. WSDL, OGSA, or CORBA?)
Randy: No strong opinion.
Jim: The CCA doesn't do such communication
per se. Messaging between or
in/out
of frameworks is always "out of
band" relative to CCA port invocations.
If the specific framework impl wants to
shove out data on some wire,
then it's hidden below the API level...
I would think that WSDL/SOAP would be O.K.
for low-bandwidth uses.
JohnS: I think I addressed this
earlier. We can do this all in
CCA, but is that the right thing to do?
I know this is an implementation issue, but is a strong part of our
agreement on methods to implement our components (or define component
boundaries).
Wes: (I this as similar to rendering in
VMÕs like Java in many respects).
Always sounds nice, but have
yet to see much fruit in this area. The
potential importance/relevance is great.
The browser makes a nice UI engine, but I
wouldn't trust it to do "real"
rendering.
* How do such control
interfaces work with parallel applications?
Should the parallel application have a single
process that manages the control interface and broadcasts to all nodes or
should the control interface treat all application processes within a given
component as peers?
Randy: Consider DMX, by default, single
w/broadcast, but it supports
backend bypass...
Jim: I vote for the "single process
that manages the control interface and
broadcasts to all nodes" (or the
variation above, where one of the
parallel tasks forwards to the rest
internally :-). The latter is
not scalable.
BTW, you can't have "application
processes within a... component".
What does that even mean?
Usually, an application "process"
consists of a collection of one or
more components that have been composed
with some specific connectivity...
JohnS: This requires more discussion, but
reliable broadcast methods have many problems related to event skewing and
MPI-like point-to-point emulation of the broadcast suffers from scalability
problems. We need to collect
design patterns for the control interface and either compete them against
one-another or find a way to support them all by design. This is clearly an implementation
issue, but will leak in to our abstract component design decisions. Clearly we want a single thread
of control to efficiently deliver events to massively parallel back-end
components. That is a *must* requirement.
Note: That paradigm (one component-chain per process) doesnÕt offer you much opportunity for encapsulating complex parallel communication patterns.
6) Basic Deployment/Development
Environment Issues============
One of the goals of the distributed
visualization architecture is seamless operation on the Grid --
distributed/heterogeneous collections of machines. However, it is quite difficult to realize such a vision without
some consideration of deployment/portability issues. This question also touches on issues related to the
development environment and what kinds of development methods should be
supported.
What languages do you use for core vis
algorithms and frameworks.
* for the numerically
intensive parts of vis algorithms
Randy: C/C++ (a tiny amount of Fortran)
JohnC: C/C++
Jim: C/C++É Fortran/F90 for numerically intensive parts.
JohnS: C/C++/Fortran
Wes: C/C++
* for the glue that
connects your vis algorithms together into an application?
Randy: C/C++
JohnC: C/C++, Tcl, Python
Jim: C/C++
JohnS: C++/C/Java, but want to get into some
Python (it is said to have better numerics than Java)
Wes: C/C++
* How aggressively do
you use language-specific features like C++ templates?
Randy: Not very, but they are used.
JohnC: Not at all. Too scary.
Jim: RUN AWAYYYY!!! These are not consistent across
o.s./arch/compiler yet.
Maybe someday...
JohnS: I avoid them due to portability and
compiler maturity issues.
Wes: Beond vanilla classes, not at all.
* is Fortran important
to you? Is it important that a
framework support it seamlessly?
Randy: Pretty important, but at least
"standardly" enhanced F77 should be simple :).
JohnC: Nope
Jim: Fortran is crucial for many
application scientists. It is not
directly
useful for the tools I build.
But if you want to ever integrate
application code components directly
into a viz framework, then you better not
preclude this... (or Babel...)
JohnS: Yes, absolutely. It needn't be full fledged F90 support,
but certainly f77 with some f90 extensions.
Wes: No. Fortran can be wrapped inside
something sane.
Note: It is perhaps incumbent on us to support Fortran. We would eventually like buy-in from domain scientists to provide some analysis components that are interesting for them. Lack of Fortran bindings for VTK was a major issue for some participants in the Vis Greenbook workshop.
* Do you see other
languages becoming important for visualization (ie. Python, UPC, or even
BASIC?)
Randy: Python is big for us.
JohnC: Python, mostly because
the direction of numerical python.
Jim: Nope.
JohnS: Python
What platforms are used for data
analysis/visualization?
* What do you and your
target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)
Randy: Linux, SGI, Sun, Windows, MacOS in
that order
JohnC: All the above, primarily lintel an
windoze though.
Jim:
All of the above (not so much Sun anymoreÉ)
JohnS: Linux, MacOS-X(BSD), Windows
Wes: For rendering, OpenGL engines.
* What kinds of
presentation devices are employed (desktops, portables, handhelds, CAVEs,
Access Grids, WebPages/Collaboratories) and what is their relative importance
to active users.
Randy: Remote desktops and laptops. Very important
JohnC: desktops, tiled displays, AG
Jim: All but handhelds are important,
mostly desktops, CAVEs/hi-res and AG,
in decreasing order.
JohnS: Desktop and laptops are most
important. Web, AG, and CAVE are
of lesser importance (but still important).
Wes: Workstations are most important.
* What is the relative importants of these various presentation methods from a research standpoint?
Randy: PowerPoint :)?
JohnC: The desktop is where the users live.
Jim: CAVEs/hi-res and AG are worthwhile
research areas. The rest can be
weaved in or incorporated more easily.
* Do you see other up-and-coming visualization platforms in the future?
Randy: Tablets & set-top boxes.
JohnC: I don't see SMP graphics boxes going
away as quickly as some might.
Jim: Yes, but I haven't figured out where
exactly to stick the chip behind
my ear for the virtual holodeck
equipment... :)
JohnS: Tablet PCs and desktop-scale Tiled
display devices.
Tell us how you deal with the issue of
versioning and library dependencies for software deployment.
* For source code
distributions, do you bundle builds of all related libraries with each software
release (ie. bundle HDF5 and FLTK source with each release).
Randy: For many libs, yes.
JohnC: Sometimes, depending on the stability
of the libraries.
Jim: CVS for control of versioning.
For bundling of libraries: No, but provide
web links or separate copies of dependent distributions
next to our software on the web site...
Too ugly to include everything in one big
bundle, and not as efficient
as letting the user download just what they
need. (As long as everything
you
need is centrally located or accessible...)
JohnS: Every time I fail to bundle dependent
libraries, it has been a disaster.
So it seems that packaging dependent libraries with any software release
is a *must*.
Wes: Oddly enough, I do bundling like this
for some of my projects. I think people
appreciate it.
* What methods are
employed to support platform independent builds (cmake, imake, autoconf). What are the benefits and problems with
this approach.
Randy: gmake based makefiles.
JohnC: I've used all, developed my own, and
like none. Maybe we can do better.
I think something based around gmake might
have the best potential.
Jim: Mostly autoconf so far.
My student thinks automake and libtools is "cool"
but we haven't used them yet...
JohnS: I depend on conditional
statements in gmake-based makefiles to auto-select between flags for different
architectures. This is not
sufficiently sophisticated for most release engineering though. I have dabbled with autoconf, but it is
not a silver bullet (neither was imake).
I do not understand the practical benefits of 'cmake'.
Wes:
I
hate Imake, but used it extensively for a long time with LBL's AVS modules.
I think it still works. Nobody I know can
figure out how autoconf works. I
personally tend to have different
makefiles, particularly when doing code
that
is supposed to build on Win32 as well as Unix/Linux systems.
* For binaries, have you
have issues with different versions of libraries (ie. GLIBC problems on Linux
and different JVM implemetnations/version for Java). Can you tell us about any sophisticated packaging methods
that address some of these problems (RPM need not apply)
Randy: No real problems other that GLIBC
problems. We do tend to ship
static
for several libs. Motif used to be a problem on Linux (LessTiff vs
OpenMotif).
Jim: Just say no. Open Source is the way to go, with a small set of
"common"
binaries just for yuks. Most times the binaries won't work with
the
specific run-time libs anyway...
JohnS: Building statically has been
necessary in a lot of cases, but creates gigantic executables. In the case of JVM's, the problems with
the ever-changing Java platform have driven me away from employing Java as a
development platform.
Wes: I tend to just do source, rather than
binaries, to avoid this whole morass.
OTOH, as a consumer, I prefer RPMs so that
I don't have to build it. I want
my ice toasted, please.
* How do you handle
multiplatform builds?
Randy: cron jobs on multiple platforms,
directly from CVS repos.
Entire
environment can be built from CVS repo info
(or cached).
JohnC: The brute force, not so smart way.
The VTK model is worth looking at.
Jim: Autoconf, shared source tree, with
arch-specific subdirs for object files,
libs and executables.
JohnS: * Conservative, lowest-common
denominator coding practices.
* execute 'uname' at the top of a
gnu makefile to select an appropriate set of build options for sourcecode
building. Inside of the code, must
use the CPP to code around platoform dependencies.
How do you (or would you) provide
abstractions that hide the locality of various components of your
visualization/data analysis application?
Jim: I would use "proxy"
components that use out-of-band communication to
forward invocations and data to the actual
component implementation.
á Does anyone have
ample experience with CORBA, OGSA, DCOM, .NET, RPC? Please comment on advantages/problems of these technologies.
Jim: Nope
JohnS: Nope
* Do web/grid services come into play here?
Randy: Not usually an issue for us.
Jim: Yuck, I hope notÉ
JohnS: As these web-based scientific
collaboratory efforts gather momentum, web-based data analysis tools have
become increasingly important. I
think the motivation is largely driven by deployment issues when supporting a
very heterogeneous/multi-institutional user base. It reduces the deployment variables when your target is a
specific web-server environment, but you pay a price in that the user-interface
is considerably less advanced.
This cost is mitigated somewhat if the data analysis performed is very
domain-specific and customized for the particular collaboratory community. So its a poor choice for general-purpose
visualization tools, but if the workflow is well-established among the
collaborators, then the weakness of the web-based user-interface options is not
as much of a problem.
7) Collaboration
==========================
If you are interested in "collaborative
appllications" please define the term "collaborative". Perhaps provide examples of
collaborative application paradigms.
Randy: Meeting Maker? :) :) (I'm getting tired).
Jim: "Collaborative" is 2 or more
geographically/remote teams, sharing one
common viz environment, with shared control
and full telepresence.
(Note: by this definition,
"collaborative" does not yet exist... :-)
JohnS: Despite years of dabbling in
Òcollaborative applications,Ó IÕm still not sure if I (or anyone) really knows
what ÒcollaborativeÓ is in a strict sense.
Wes: The term "collaboration" is
one of the most overused, misused and abused
terms in the English language. There is a
huge disconnect between what
many users want/need, and what seems to be
an overemphasis upon collaborative
technologies. For this particular project,
collaboration (ought to) mean:
being able to share software components;
and some level of confidence that
"DiVA-compliant" components in
fact do interoperate. For the sake of
discussion, let's call this type of
collaboration "interoperability."
For the other forms of
"collaboration," care must be taken to define what
they are, whether they are useful, etc. If
you're talking about multiple
persons seeing the same interactive
renderer output, and each person being
able to do some interactive
transformation, let's call that form of
collaboration "MI"
(multiperson-interactive).
I recall hearing some discussion about the
relationship between the AG
and DiVA. From my perspective, the AG
ought to provide support to allow
any application to run in a "MI
mode" With this perspective,
there isn't really much to talk about in
terms of fundamental DiVA
design wrt "MI."
Is collaboration a feature that exists at an application level or are there key requirements for collaborative applications that necessitate component-level support?
á Should
collaborative infrastructure be incorporated as a core feature of very
component?
JohnC: Does it need to be incorporated in
all components? What kind of collab
support is needed? Permitting session
logging and geographically
separated, simultaneous, users would go a long way to providing for
collab needs and would seem to only impact
the GUI and perhaps renderer.
Jim: Collaboration should exist *above* the
application level, either outside
the specific framework or as part of the
framework "bridging" technology.
JohnS: No. I hope that support for collaborative applications can be
provided via supplemental components.
Wes: I don't know what "collaborative
infrastructure" means. Given that my position
(above), "MI" is more of a
framework thing, and not a component thing.
This seems to be the most realistic
approach to "MI."
Note: IÕm
not sure how to interpret this answer.
Is this a Òframework issueÓ or a Òcomponent issueÓ or is it totally
outside of the application? So we
retrofit applications to be ÒcollaborativeÓ from the outside rather than
designing apps or the frameworks that implement them to support collaboration
ÒrequirementsÓ as a fundamental feature of the technology?
á Can any
conceivable collaborative requirement be satisfied using a separate set of
modules that specifically manage distribution of events and data in
collaborative applications?
Jim: I dunno, I doubt it.
JohnS: That is what I hope.
á How is the collaborative
application presented? Does the
application only need to be collaborative sometimes?
Jim: Yes, collaboration should be flexible
and on demand as needed - like
dialing out on the speakerphone while in
the middle of a meeting...
JohnS: This is probably true. You probably want to be able to have
tools that were effectively standalone that can join into a collaborative space
on demand.
á Where does
performance come in to play? Does
the visualization system or underlying libraries need to be performance-aware?
(i.e. I'm doing a given task and I need a framerate of X for it to be useful using my current compute resources), network aware (i.e. the system is starving for data and must respond by adding an alternate stream or redeploying the pipeline). Are these considerations implemented at the component level, framework level, or are they entirely out-of-scope for our consideration?
Jim: There likely will need to be
"hooks" to specify performance requirements,
like "quality of service". This should perhaps be incorporated as
part
of the individual component APIs, or at
least metered by the frameworks...
It would be wise to specify the frame rate
requirement, perhaps interactively
depending on the venue... e.g. in interactive collaboration
scenarios you'd
rather drop some frames consistently than stall completely or in
bursts...
This sounds like futureware to me - an intelligent network protocol layer...
beyond our scope for sure!
These issues should be dealt with mostly at
the framework level, if at all.
I think they're mostly out-of-scope for the
first incarnation...
JohnS: Yes. The whole collaboration experience will fall apart if you
cannot impose some constraints on quality of service or react appropriately to
service limitations. Its a big problem,
but I hope the solution does not need to be a fundamental feature of the
baseline component design.
Wes: The MI-aware framework collects and
uses performance data generated by
components to make decisions about how to
tune/optimize visualization
pipeline performance (the pipeline
consists of a bunch of components).
If some of the other issues I've raised
are addressed (e.g., time-limited
execution, partial processing, incremental
processing, etc), then the
performance issues raised within the context
of MI come "for free".
Note: The issue here is
that if anyone thinks this should be done at anything other than the framework
(or even outside of framework) level, then it could be very disruptive to our
design process if we develop first for single-user operation and then later
attempt to make Òcollaborative servicesÓ a requirement. Implementing this at the Òframework
levelÓ is again a high-price-for-admission. If there is any way to support this at a component level, it
would enable people working on collaborative extensions to share better with
people who have different aims for their framework. I donÕt consider it a benefit to have one ÒframeworkÓ
per use-case as has been the
practice in many aspects of CCA.
It will just continue the balkanization of our development efforts.