Preface Comment from Ilmi Yoon:
Just one curiosity, component is in much larger granularity compared
to object in terms of reuasability or usage itself. Component is kind of
package of objects that has interfaces to communicate with other components.
So, components are much more portable and easily re-usably without knowing
the programming environment of diffenrent components - they can be different
programming langauges, etc as long as they know the interfaces to each
other... Just I feel some discussions are related to object-oriented not,
component-oriented. Maybeit is from my ignorance and/or lacking of certain
backgrounds from the last meeting.
1) Data Structures/Representations/Management==================
The center of every successful modular visualization architecture has been a flexible core set of data structures for representing data that is important to the targeted application domain. Before we can begin working on algorithms, we must come to some agreement on common methods (either data structures or accessors/method calls) for exchanging data between components of our vis framework.
There are two potentially disparate motivations for defining the data representation requirements. In the coarse-grained case, we need to define standards for exchanging data between components in this framework (interoperability). In the fined-grained case, we want to define some canonical data structures that can be used within a component -- one developed specifically for this framework. These two use-cases may drive different set of requirements and implementation issues.
* Do you feel both of these use cases are equally important or should we focus exclusively on one or the other?
Randy: I think that interoperability (both in terms of data and perhaps more
critically operation/interaction) is more critical than fine-grained
data sharing. My motivation: there is no way that DiVA will be able to
meet all needs initially and in many cases, it may be fine for data to
go "opaque" to the framework once inside a "limb" in the framework (e.g.
VTK could be a limb). This allows the framework to be easily populated
with a lot of solid code bases and shifts the initial focus on important
interactions (perhaps domain centric). Over time, I see the fine-grain
stuff coming up, but perhaps proposed by the "limbs" rather than the
framework. I do feel that the coarse level must take into account
distributed processing however...
I want to facilitate interfaces
between packages, opting for (possibly specific) data models that map
to the application at hand. I could use some generic mechanisms
provided by DiVA to reduce the amount of code I need or bootstrap
more rapid prototyping, but it is not key that the data model be
burned fully into the Framework. I certainly feel that the Framework
should be able to support more than one data model (since we have
repeatedly illustrated that all "realizable" models have design
boundaries that we will eventually hit.
Pat: I think both cases are important, but agreeing upon the fine-grained access
will be harder.
John C: Too soon to tell. Focus on both until the issues become more clear.
Jim: I think for now we need to exclusively focus on exchanging data between
components, rather than any fine-grained generalized data objects...
The first order entry into any component development is to "wrap up
what ya got". The "rip things apart" phase comes after you can glue
all the coarse-grained piece together reliably...
Ilmi: I think we need to decide the coarse-grain something like SOAP that wraps
the internal data with XML format. But I think we don't need to decide the
fined-grain since each component can have choose their own way/format and
then post format to public, so the party who want to use the component needs
to follow the interface. But if we like to decide initial sets of format
that must/may be supported by diva components, then we can list most popular
format and choose some/all of them.
JohnS: While I am very interested in design patterns, data structures, and services that could make the design of the interior of parallel/distributed components easier, it is clear that the interfaces between components are the central focus of this project. So the definition of inter-component data exchanges is preeminent.
Wes: Both are important. The strongest case, IMO, for the intra-component DS/DM
is that I have a stable set of data modeling/mgt tools that I can use for
families of components. Having a solid DS/DM base will free me to focus
on vis and rendering algorithms, which is how I want to spend my time.
The strongest case for the inter-component DS/DM is the "strong typing"
property that makes AVS and apps of its ilk work so well.
The "elephant in the living room" is that there is no silver bullet.
I favor an approach that is, by design, incremental. What I mean is that
we can deal with structure grids, unstructured grids, geom and other
renderable data, etc. in a more or less piecemeal fashion with an eye
towards component level interoperability in the long term. In the beginning,
there won't be 100% interoperability as if, for example, all data models
and types were stuffed into a vector bundles interface. OTOH, a more
conciliatory approach will permit forward progress among multiple
independent groups who are all eyeing "interoperability". This is the
real goal, not a "single true data model."
Š Do you feel the requirements for each of these use-cases are aligned or will they involve two separate development tracks? For instance, using "accessors" (method calls that provide abstract access to essentially opaque data structures) will likely work fine for the coarse-grained data exchanges between components, but will lead to inefficiencies if used to implement algorithms within a particular component.
Š As you answer the "implementation and requirements" questions below, please try to identify where coarse-grained and fine-grained use cases will affect the implementation requirements.
Randy: I think you hit the nail on the head. Where necessary, I see sub-portions
of the framework working out the necessary fine-grained, efficient,
"aware" interactions and datastuctures as needed. I strongly doubt we
would get that part right initially and think it would lead to some of
the same constraints that are forcing us to re-invent frameworks right
now. IMHO: the fine-grain stuff must be flexible and dynamic over
time as development and research progress.
Pat: I think the focus should be on interfaces rather than data structures. I
would advocate this approach not just because it's the standard
"object-oriented" way, but because it's the one we followed with FEL,
and now FM, and it has been a big win for us. It's a significant benefit
not having to maintain different versions of the same visualization
technique, each dedicated to a different method for producing the
data (i.e., different data structures). So, for example, we use the same
visualization code in both in-core and out-of-core cases. Assuming up
front that an interface-based approach would be too slow is, in my
humble opinion, classic premature optimization.
Jim: Two separate development tracks. Definitely. There are different driving
design forces and they can be developed (somewhat) independently (I hope).
Lori: The TSTT center is not interested in defining a data representation
per se - that is dictating what the data structure will look like. Rather,
we are interested in defining how data can be accessed in a uniform
way from a wide variety of different data structures (for both structured
and unstructured meshes). This came about because we recognize
1. there are a lot of different meshing/data frameworks out there,
that have many man years of effort behind their development,
that are not going to change their data structures very easily
(if at all). Moreover, these infrastructures have made their
choices for a reason - if there was a one-size-fits-all answer,
someone probably would have found it by now :-)
2. Because of the difference in data structures - it has been very
difficult for application scientists (and tool builders) to experiment
with and/or support different data infrastructures which has
severely limited their ability to play with different meshing strategies,
discretization schemes, etc.
We are trying to address this latter point - by developing common
interfaces for a variety of infrastructures applications can easily
experiment with different techniques and supporting tool developers
(such as mesh quality improvement and front tracking codes) and
write their tools to a single API and automatically support multiple
We are also experimenting with the language interoperability tools
provided by the Babel team at LLNL and have ongoing work to
evaluate it's performance (and the performance of our interface in
general) for fine and course grained access to mesh (data) entities -
something that I suspect will be of interest to this group as well.
JohnC: I think it's premature to say. We need to have agreement on the
questions below first.
Ilmi: There will be some overhead and inefficiency using accessors for data
exchange, but I like the apporach of accessors and believe the CCA achieves
the reusability in expense of performance as OOP does anyway. Just we try to
make the expense as little as possible.
JohnS: Given the focus on inter-component data exchange, I think accessors provide the most straightforward paradigm for data exchange. The arguments to the data access methods can involve elemental data types rather than composite data structures (eg. we use scalars and arrays of basic machine data types rather than hierarchical structures). Therefore we should look closely at FM's API organization as well as the accessors employed by SCIRun V1 (before they employed dynamic compilation).
The accessor method works well for abstracting component location, but requires potentially redundant copying of data for components in the same memory space. It may be necessary to use reference counting in order to reduce the need to recopy data arrays between co-located components, but I'd really like to avoid making ref counting a mandatory requirement if we can avoid it. (does anyone know how to avoid redundant data copying between opaque components without employing reference counting?)
Wes: They are aligned to a large degree - data structures/models are produced and
consumed by component code, but may also be manipulated (serialized,
marshalled, etc) by the framework.
What are requirements for the data representations that must be supported by a common infrastructure. We will start by answering Pat's questions of about representation requirements and follow up with personal experiences involving particular domain scientist's requirements.
Must: support for structured data
Randy: Must-at the coarse level, I think this could form the basis of all
Pat: Structured data support is a must.
Must/Want: support for multi-block data?
Randy: Must-at the coarse level, I think this is key for scalability,
domain decomposition and streaming/multpart data transfer.
Pat: We have unstructured data, mostly based on tetrahedral or prismatic meshes.
We need support for at least those types. I do not think we could simply
graft unstructured data support on top of our structured data structures.
Wes: Must. We must set targets that meet our needs, and not sacrifice
requirements for speed of implementation.
Must/Want: support for various unstructured data representations? (which ones?)
Randy: Nice-but I would be willing to live with an implementation on top
of structured, multi-block (e.g. Exdous). I feel accessors are
fine for this at the "framework" level (not at the leaves).
Pat: We have unstructured data, mostly based on tetrahedral or prismatic meshes.
We need support for at least those types. I do not think we could simply
graft unstructured data support on top of our structured data structures.
JohnC: Not sure. Not a priority.
Jim: Want (low priority)
JohnS: Cell based initially unstructured representations first. Need support for arbitrary connectivity eventually, but not mandatory. I liked Iris Explorer’s hierarchical model as it seems more general than the model offered by other vis systems.
Wes: Must. Unstructured data reps are widely used and they should not be
excluded from the base set of DS/DM technologies.
Must/Want: support for adaptive grid standards? Please be specific about which adaptive grid methods you are referring to. Restricted block-structured AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.
Randy: Similar to my comments on unstructured data reps. In the long
run, something like boxlib with support for both P and H adaptivity
will be needed (IMHO, VTK might provide this).
Pat: Adaptive grid support is a "want" for us currently, probably eventually
a "must". The local favorite is CART3D, which consists of hierarchical
regular grids. The messy part is that CART3D also supports having
more-or-less arbitrary shapes in the domain, e.g., an aircraft fuselage.
Handling the shape description and all the "cut cell" intersections
I expect will be a pain.
JohnC: Adaptive grid usage is in its infancy at NCAR. But I suspect it is the
way of the future. Too soon to be specific about which adaptive grid
methods are prefered.
Jim: Want (low priority) the AMR folks havfe been trying to get together and define
a standard API, and have been as yet unsuccessful. Who are we to attempt
this where they have failed...?
JohnS: If we can define the data models rigorously for the individual grid types (ie. structured and unstructured data), then adaptive grid standards really revolve around an infrastructure for indexing data items. We normally think of indexing datasets by time and by data species. However, we need to have more general indexing methods that can be used to support concepts of spatial and temporal relationships. Support for pervasive indexing structures is also important for supporting other visualization features like K-d trees, octrees, and other such methods that are used to accelerate graphics algorithms. We really should consider how to pass such representations down the data analysis pipeline in a uniform manner because they are used so commonly.
Wes: Want, badly. We could start with Berger-Colella AMR since it is widely
used. I'm not crazy about Boxlib, though, and hope we can do something
that is easier to use.
Must/Want: "vertex-centered" data, "cell-centered" data? other-centered?
Pat: Most of the data we see is still vertex-centered. FM supports other
associations, but we haven't used them much so far.
Jim: Want (low priority)
All of these should be "Wants", to the extent that they require more
sophisticated handling, or are less well-known in terms of generalizing
For example, the AMR folks havfe been trying to get together and define
a standard API, and have been as yet unsuccessful. Who are we to attempt
this where they have failed...?
So to clarify, if we *really* understand (or think we do) a particular
data representation/organization, or even a specific subset of a general
representation type, then by all means lets whittle an API into our stuff.
Otherwise, leave it alone for someone else to do, or do as strictly needed.
JohnS: The accessors must understand (or not preclude) all centering. This is particularly for structured grids where vis systems are typically lax in storing/representing this information.
Wes: Don't care - will let someone else answer this.
Note: It sounds like at least time-varying data handling is well understood by the people who want it.
Must: support time-varying data, sequenced, streamed data?
Randy: Must, but way too much to say here to do it justice. I will say
that the core must deal with time-varying/sequenced data. Streaming
might be able to be placed on top of that, if it is designed
properly. I will add that we have a need for progressive data as
Pat: Support for time-varying data is a must.
JohnC: Must. Time varying data is what makes so many of our problems currently
intractible. Too many of the available tools (e.g. VTK) assume static
data and completely fall apart when the data is otherwise.
Definitely there is
not support for any routines that require temporal integration (e.g.
unsteady flow viz). In general, there is no notion of a timestep in
VTk. Datasets are 3D. Period.
Additionally, there is a performance issue: VTK is not optimized in any
way at moving data through the pipeline at high rates. It's underlying
archictecture seems to assume that as long as the pipeline eventually
generates some geometry, it's ok if it take a loooong time because
you're going to interact with that geometry (navigating through camera
space, color space, etc.) and the "pre-processing" doesn't have to run
at interactive rates. So the data readers are pathetically slow (and
there is little hope for optimization here with that data model that is
used). There is no way to exploit temporal coherence in any of the data
operators. No simple way to cache results if you want to play out of
At a high level you need a design that gives consideration to temporal needs
throughout the architecture. I think the data structures do need to be
time varying data aware, not just capable of dealing with 4D data
(although I can't think of a specific example of why now). One issue is
that the temporal dimension often has different spacing/regularity than
the spatial dimension. Obviously you're talking different units from
the spatial dimensions as well. There are also system-level issues as
well (e.g. unsteady flow viz needs, exploiting temporal coherence,
caching, support for exploring the temporal dimension from user
I know i've only just started to scratch the surface here. We could
probably devote an entire workshop to time varying data needs and
several more figuring out how to actually suppor them.
JohnS: Yes to all. However, the concept of streamed data must be defined in more detail. This is where the execution paradigm is going to affect the data structures.
Wes: Not ready for prime time. I've read two or three research proposals in
the past year that focus on methods for time-varying data representations
and manipulation. IMO, this topic is not ready for prime time yet. We
can say that it would be nice to have, but will probably not be fully
prepared to start whacking out code.
Note: Should do quick gap-analysis on what existing tools fulfill this requirement.
Must/Want: higher-order elements?
Randy: Must - but again, this can often be "faked" on top of other reps.
Pat: Occasionally people ask about it, but we haven't found it to be a "must".
JohnC: low priority
Jim: Wants, see above...
JohnS: Not yet.
Wes: Not sure what this means, exactly, so I'll improvise. Beyond scientific
data representations, there is a family of "vis data structures" that need
to be on the table. These include renderable stuff - images, deep images,
explicit and implicit geometry, scene graph ordering semantics, scene
specification semantics, etc. In addition, there is the issue of
"performance data" and how it will be represented.
Note: I find the response to this quite funny because I ran a two day workshop about 3 years ago here at LBNL on finite element analysis requirements. We got bashed for two days straight by the FEM code jocks because we didn’t seem to care about higher order elements. So it would be interesting to know if we don’t see much of this because its not needed or if the domain scientists simply lost all confidence in us to deal with this issue properly.
Must/Want: Expression of material interface boundaries and other special-treatment of boundary conditions.
Randy: Must, but I will break this into two cases. Material interfaces for
us are essentially sparse vectors sets, so they can be handled with
basic mechanisms so I do not see that as core, other than perhaps
support for compression. Boundary conditions (e.g. ghost zoning,
AMR boundaries, etc) are critical.
Pat: We don't see this so much. "Want", but not must.
JohnC: no priority
Jim: Want, see above…
JohnS: Yes, we must treat ghost zones specially or parallel vis algorithms will create significant artifacts. I'm not sure what is required for combined air-ocean models.
Wes: I'll let someone else answer this one.
Note: DOE vis workshop, it was pointed out that simple things like isosurface give inconsistent (or radically different) results on material interface boundaries depending on assumptions about the boundary treatment. You’d think that this would come up with analysis combined air-ocean models, but apparently not among the vis people. From a data analysis standpoint, domain scientists say this is incredibly important, but they can’t deal with it because none of the vis or data analysis people listen to them.
* For commonly understood datatypes like structured and unstructured, please focus on any features that are commonly overlooked in typical implementations. For example, often data-centering is overlooked in structured data representations in vis systems and FEM researchers commonly criticize vis people for co-mingling geometry with topology for unstructured grid representations. Few datastructures provide proper treatment of boundary conditions or material interfaces. Please describe your personal experience on these matters.
Randy: Make sure you get the lowest common denominator correct! There is
no realistic way that the framework can support everything, everywhere
without losing its ability to be nimble (no matter what some OOP folks
say). Simplicity ore representation with "externally" supplied optional
optimization information is one approach to this kind of problem.
Pat: One thing left out of the items above is support for some sort of "blanking"
mechanism, i.e., a means to indicate that the data at some nodes are not
valid. That's a must for us. For instance, with Earth science data we see
the use of some special value to indicate "no data" locations.
JohnC: Support for missing data is essential for observed fields.
To do it right you need some way to flag
data cells/vertices within the data model as not containing valid data.
Then you need to add support to your data "operators" as well. For example,
if your operator is some kind of reconstruction filter it needs to know
to use a different kernel when missing data are involved.
Obviously, this could pose a signficant amount of overhead on the entire
system, and the effort may not be justified if the DOE doesn't have
great need for dealing with instrument acquired data. I only added the
point as a discussion topic as it is fairly important to us. At the
very least, I would hope to have the flexibility to hack support
for missing data if it was not integral to the core framework.
Jim: I don't think we should "pee in this pool" either yet. Are any of us
experts in this kind of viz? Let's stick with what we collectively know
best and make that work before we try to tackle a related-but-fundamentally-
JohnS: There is little support for non-cartesian coordinate systems in typical data structures. We will need to have a discussion of how to support coordinate projections/conversions in a comprehensive manner. This will be very important for applications relating to the National Virtual Observatory.
Wes: No comment
* Please describe data representation requirements for novel data representations such as bioinformatics and terrestrial sensor datasets. In particular, how should we handle more abstract data that is typically given the moniker "information visualization".
Randy: Obviously, do not forget "records" and aggregate/derived types. That
having been said, the overheads for these can be ugly. Consider
parallel arrays as an alternative...
Pat: "Field Model" draws the line only trying to represent fields and the meshes
that the fields are based on. I not really familiar enough with other types
of data to know what interfaces/data-structures would be best. We haven't
see a lot of demand for those types of data as of yet. A low-priority "want".
JohnC: Beats me.
JohnS: I simply don't know enough about this field to comment.
Wes: Maybe I don't understand the problem...the same tough issues that plague
more familiar data models appear to be present in bioinformatics and
"info viz" data mgt. There are heirarchical data, unstructured data,
multivariate and multidimensional data, etc.
Note: Must separate mesh from field data interfaces.
The mesh may not be updated as often as the field.
Perhaps time-range of validity is important information.
What do you consider the most elegant/comprehensive implementation for data representations that you believe could form the basis for a comprehensive visualization framework?
Š For instance, AVS uses entirely different datastructures for structure, unstructured and geometry data. VTK uses class inheritance to express the similarities between related structures. Ensight treats unstructured data and geometry nearly interchangably. OpenDX uses more vector-bundle-like constructs to provide a more unified view of disparate data structures. FM uses data-accessors (essentially keeping the data structures opaque).
Randy: IMHO: layered data structuring combined with data accessors is
probably the right way to go. Keep the basic representational
Pat: Well, as you'd expect, as the primary author of Field Model (FM) I think it's
the most elegant/comprehensive of the lot. It handles structured and
unstructured data. It handles data non-vertex-centered data. I think it
should be able to handle adaptive data, though it hasn't actually been
put to the test yet. And of course every adaptive mesh scheme is a little
different. I think it could handle boundary condition needs, though that's
not something we see much of.
JohnC: I don't think this is what you're after, but i've come to believe that
multiresolution data representations with efficient domain subsetting
capabilities are the most pragmatic and elegant
way to deal with large data sets. In addition to enabling interaction
with the largest data sets they offer tremenous scalability from desktop
to "visual supercomputer". i would encourage a data model that includes
and facilitates their integral support.
Ilmi: Combination of (externally) FM data-accessors and (internally) VTK class
JohnS: Since I'm already on record as saying that opaque data accessors are essential for this project, it is clear that FM offers the most compelling implementation that satisfies this requirement.
* Are there any of the requirements above that are not covered by the structure you propose?
Randy: I think one big issue will be the distributed representations.
This item is ill handled by many of these systems.
Pat: Out-of-core? Derived fields? Analytic meshes (e.g., regular meshes)?
Differential operators? Interpolation methods?
JohnC: Not sure.
JohnS: We need to be able to express a wider variety of data layout conversions and have some design pattern that reduces the need to recopy data arrays for local components. The FM model also needs to have additional API support for hierarchical indices to accelerate access to subsections of arrays or domains.
Wes: Not sure how to answer. The one thing that came to mind is a general observation
that the above data models are designed for scientific data. The AVS geom
data structure was opaque to the developer, and if you looked at the header
files, was really, really ugly. Since I have a keen interest in renderers,
I am very concerned about having adequate flexibility and performance from
a DS/DM for moving/representing renderable data, as opposed to large
structured or unstructured meshes. It is possible to generalize a
DS for storing renderable data (e.g, a scene graph), but this separate class
of citizen reflects the partitioning of data types in AVS. Perhaps this
isn't something to be concerned about at this point.
Note: Area of unique features?
-data handling for distributed data
-better handling of time-varying data
-hints for caching so that temporal locality can be exploited
-indexing (not in TSTT sense… needs more discussion. Need support for kD trees and rapid lookup. Indexing might help with our AMR issues)
* This should focus on the elegance/usefulness of the core design-pattern employed by the implementation rather than a point-by-point description of the implemenation!
Randy: Is it possible to consider a COM Automation Object-like approach,
also similar to the CCA breakdown. Basically, define the common
stuff and make it interchangable then build on top. Allow underlying
objects to be "aware" and wink to each other to bypass as needed.
In the long run, consider standardizing on working bypass paradigms
and bring them into the code (e.g. OpenGL).
Note: We need the “bypass”. The question is how do we supply the bypass mechanism for unanticipated data?
Pat: I think if we could reasonably cover the (preliminary) requirments above,
that would be a good first step. I agree with Randy that whatever we
come up with will have to be able to "adapt" over time as our understanding
* Is there information or characteristics of particular file format standards that must percolate up into the specific implementation of the in-memory data structures?
Randy: Not really, but metadata handling and referencing will be key and need
to be general.
Pat: In FM we tried hard to file-format-specific stuff out of the core model.
Instead, there are additional modules built on top of FM that handle
the file-format-specific stuff, like I/O and derived fields specific to
a particular format. Currently we have PLOT3D, FITS, and HDFEOS4
modules that are pretty well filled out, and other modules that are
mostly skeletons at this point.
We should also be careful not to assume that analyzing the data starts
with "read the data from a file into memory, ...". Don't forget out-of-core,
analysis concurrent with simulation, among others.
One area where the file-format-specific issues creep in is with metadata.
Most file formats have some sort of metadata storage support, some much
more elaborate than others. Applications need to get at this metadata,
possibly through the data model, possibly some other way. I don't have
the answer here, but it's something to keep in mind.
Jim: I dunno, but what does HDF5 or NetCDF include? We should definitely be
able to handle various meta-data...
Otherwise, our viz framework should be able to read in all sorts of
file-based data as input, converting it seamlessly into our "Holy Data
Grail" format for all the components to use and pass around. But the
data shouldn't be identifiable as having once been HDF or NetCDF, etc...
(i.e. it's important to read the data format, but not to use it internally)
JohnS: I hope not.
Wes: One observation is what seems to be a successful design pattern from the
DMF effort: let the HDF guys build the heavy lifting machinery, and focus
upon an abstraction layer that uses the machinery to move bytes.
Note: Metadata: Must also be propagated down the pipeline.
Ignored by items that don’t care, but recognized by pipeline components that do.
-alternative is database at the reader, but seems to create painful connection mechanics.
-and still have to figure out how to reference the proper component, even after going through data-structure transformations.
One powerful feature of both HDF and XML is the ability to ignore and pass-through unrecognized constructs/metadata.
For the purpose of this survey, "data analysis" is defined broadly as all non-visual data processing done *after* the simulation code has finished and *before* "visual analysis".
* Is there a clear dividing line between "data analysis" and "visual analysis" requirements?
Randy: Not in my opinion.
Pat: Your definition excludes concurrent analysis and steering from
"visualization". Is this intentional? I don't think there's a clear dividing
JohnC: I take issue with your definition of data analysis. Yes it is performed
after the simulation, but it is performed (or would be performed if viz
tools didn't suck) in *parallel* with visual analysis. The two when
well integrated, which is rarely the case, can compliment each other
tremendously. So called "visual analysis" by itself, without good
quantitative capablity, is pretty useless.
Well, text based, programmable user interfaces are a must for "data
analysis" , whereas GUI is essential for visual.
Jim: NO. There shouldn't be - these operations are tightly coupled, or even
symbiotic, and *should* all be incorporated into the same framework,
indistinguishable from each other.
Ilmi: Some components do purely data analysis, some do only visual, but there
will be calls to the data analysis component from the visual during the
JohnS: There shouldn't be. However, people at the SRM community left me with the impression that they felt data analysis had been essentially abandoned by the vis community in favor or "visual analysis" methods. We need to undo this.
Wes: Generally speaking, there's not much difference.
That said, some differences seem obvious to me:
1. Performance - visualization is most often an interactive process, but
has offline implementations. "Plain old" data analysis seems to be mostly
an offline activity with a few interactive implementations.
2. Scope - data analysis seems to be a subset of vis. Data analysis doesn't
have need for as rich a DS/DM infrastructure as vis.
Note: Righteous indignation. Well that’s good. Why then do SDM people and domain scientists think we don’t care? They aren’t smoking crack. They have legitimate reasons to believe that we aren’t being genuine when we say that we care about data analysis functionality. Can I do data analysis with Vis5D? Does VTK offer me a wide array of statistical methods? Must keep this central as we design this system. Do you agree with John Clyne’s assertion that data analysis == text interface and visualization==GUI.
* Can we (should we) incorporate data analysis functionality into this framework, or is it just focused on visual analysis.
Randy: Yes and we should, particularly as the complexity and size of
data grows, we begin to rely more heavily on "data analysis" based
Pat: I think you would also want to include feature detection techniques. For
large data analysis in particular, we don't want to assume that the scientist
will want to do the analysis by visually scanning through all the data.
JohnC: If visualization is ever going to live up to the claim made by so many
in the viz community of
it being an indispensable tool for analsyis, tight integration with
statistical tools and data processing capabilities are a must. Otherwise
we'll just continue to make pretty pictures, put on dog and pony shows,
and wonder where the users are.
Ilmi: Not all data analysis, but there are lots of data analysis being used for
visual analysis and, the more tools are provided initially, it gets easier
to make user-group become big. So, we can list candidates.
JohnS: Vis is bullshit without seamless integration with flexible data analysis methods. The most flexible methods available are text-based. The failure to integrate more powerful data analysis features into contemporary 3D vis tools has been a serious problem.
Wes: Ideally, the same machinery could be used in both domains.
Notes: Data Analysis and Feature detection support constitutes more unique features for this framework.
How do you then index your bag of features or have them properly refer back to the
Data that led to their generation? Sometimes detected feature is discrete marker,
Other times, it is treated as a derived field. Former case seems to again point to
Need for robust indexing method.
JohnC observation about data-analysis==text-based s very interesting!!! Does everyone agree?
Aspect is one of few examples of providing a graphical workflow interface for traditionally procedural/text-based data analysis tools.
* What kinds of data analysis typically needs to be done in your field? Please give examples and how these functions are currently implemented.
Randy: Obviously basic statistics (e.g. moments, limits, etc). Regression
and model driven analysis are common. For example, comparison of
data/fields via comparison vs common distance maps. Prediction of
activation "outliers" via general linear models applied on an
element by element basis, streaming through temporal data windows.
Pat: Around here there is interest in vector-field topology feature detection
techniques, for instance, vortex-core detection.
JohnC: Pretty much everything you can do with IDL or matlab.
Jim: Simple sampling, basic statistical averages/deviations, principal component
analysis (PCA, or EOF for climate folks), other dimension reduction.
Typically implemented as C/C++ code... mostly slow serial... :-Q
JohnS: This question is targeted at vis folks that have been focused on a particular scientific domain. For general use, I think of IDL as being one of the most popular/powerful data analysis languages. Python has become increasingly important -- especially with the Livermore numerical extensions and the PyGlobus software. However, use of these scripting/data analysis languages have not made the transition to parallel/distributed-memory environments (except in a sort of data-parallel batch mode).
* How do we incorporate powerful data analysis functionality into the framework?
Randy: Hard work :), include support for meta-data, consider support for
sparse data representations and include the necessary support for
Pat: Carefully :-)? By striving not to make a closed system.
JohnC: I'd suggest exploring leveraging existing tools, numerical python for
Jim: As components (duh)... :-)
We should define some "standard" APIs for the desired analysis functions,
and then either wrap existing codes as components or shoehorn in existing
component implementations from systems like ASPECT.
JohnS: I'm very interested in work that Nagiza has proposed for a parallel implementation of the R statistics language. The traditional approach for parallelizing scripting languages is to run them in a sort of MIMD mode of Nprocs identical scripts operating on different chunks of the same dataset. This makes it difficult to have a commandline/interactive scripting environment. I think Nagiza is proposing to have an interactive commandline environment that transparently manipulates distributed actions on the back-end.
There is a similar work in progress on parallel matlab at UC Berkeley. Does anyone know of such an effort for Python? (most of the parallel python hacks I know of are essentially MIMD which is not very useful).
2) Execution Model=======================
It will be necessary for us to agree on a common execution semantics for our components. Otherwise, while we might have compatible data structures but incompatible execution requirements. Execution semantics is akin to the function of protocol in the context of network serialization of data structures. The motivating questions are as follows;
Š How is the execution model affected by the kinds of algorithms/system-behaviors we want to implement.
Pat: In general I see choices where at one end of the spectrum we have
simple analysis techniques where most of the control responsibilities
are handled from the outside. At the other end we could have more
elaborate techniques that may handle load balancing, memory
management, thread management, and so on. Techniques towards
the latter end of the spectrum will inevitably be intertwined more
with the execution model.
Jim: Directly. There are probably a few main exec models we want to cover.
I don't think the list is *that* long...
As such, we should anticipate building several distinct framework
environments that each exclusively support a given exec model. Then
the trick is to "glue" these individual frameworks together so they can
interoperate (exchange data and invoke each others' component methods)
and be arbitrarily "bridged" together to form complex higher-level
pipelines or other local/remote topologies.
Ilmi: I guess we can make each component propagate/fire the execution of next
component/components in the network/pipeline. Each component can use their
own memory or shared memory to access the data in process. In such case,
algorithm of each component does not get much affected by other coponents
Wes: The "simple" execution model is for the framework to invoke a component, be
notified of its completion, then invoke the next component in the chain, etc.
Things get more interesting if you want to have a streaming processing model.
Related, progressive processing is somewhat akin to streaming, but more
Note: Wes’ model sounds like a good “baseline” model. It does not allow for chains-of-invokation and therefore prevents us from getting locked-in to complex issues of comonent-local execution semantics and deadlock prevention. Can we make this a baseline component requirement?
Š How then will a given execution model affect data structure implementations
Pat: Well, there's always thread-safety issues.
Jim: I don't think it should affect the data structure impls at all, per se.
Clearly, the access patterns will be different for various execution models,
but this shouldn't change the data impl. Perhaps a better question is
how to indicate the expected access pattern to allow a given data impl
to optimize or properly prefetch/cache the accesses...
Note: Actually, that is the question. How do we pass information about access patterns so that you can do the kind of temporal caching that John Clyne wants? Its important to not do what VTK does, so that we don’t have to un-do it again (as was the case for VisIt).
Ilmi: I guess we can make each component propagate/fire the execution of next
component/components in the network/pipeline. Each component can use their
own memory or shared memory to access the data in process. In such case,
algorithm of each component does not get much affected by other coponents
JohnS: There will need to be some way to support both declarative execution semantics, data-driven and demand-driven semantics. By declarative semantics, I mean support for environments that want to be in control of when the component "executes" or interactive scripting environments that wish to use the components much like subroutines. This is separate from the demands of very interactive use-cases like view-dependent algorithms where the execution semantics must be more automatic (or at least hidden from the developer who is composing the components into an application). I think this is potentially relevant to data model discussions because the automatic execution semantics often impose some additional requirements on the data structures to hand off tokens to one another. There are also issues involved with managing concurrent access to data involved. For instance, a demand-driven system demanded of progressive-update or view-dependent algorithms, will need to manage the interaction between the arrival of new data and asynchronous requests from the viewer to recompute existing data as the geometry is rotated.
(note: Wes provides a more succinct description of this execution semantics.)
Wes: We're back to the issue of needing a DS/DM that supports multiresolution
models from the git-go. The relationship between data analysis and vis data
models becomes more apparent here when we start thinking about multires
representations of unstructured data, like particle fields or point clouds.
Note: The fly in the ointment here is that highly interactive methods like multires models and view-dep algorithms are not well supported by completely simple/declarative semantics (unless you have an incredibly complex framework, but then the framework would require component-specific knowledge to schedule things properly).
Š How will the execution model be translated into execution semantics on the component level. For example will we need to implement special control-ports on our components to implement particular execution models or will the semantics be implicit in the way we structure the method calls between components.
Pat: Not sure.
Jim: Components should be "dumb" and let other components or the framework invoke
them as needed for a given execution model. The framework dictates the
control flow, not the component. The API shouldn't change.
If you want multi-threaded components, then the framework better support
that, and the API for the component should take the possibility into account.
JohnS: I'm going to propose that we go after the declarative semantics first (no automatic execution of components) with hopes that you can wrap components that declare such an execution model with your own automatic execution semantics (whether it be a central executive or a distributed one). This follows the paradigm that was employed for tools such as VisIt that wrapped each of the pieces of the VTK execution pipeline so that it could impose its own execution semantics on the pipeline rather than depending on the exec semantics that were predefined by VTK. DiVA should follow this model, but start with the simplest possible execution model so that it doesn't need to be deconstructed if it fails to meet the application developer's needs (as was the case with VisIt).
We should have at least some discussion to ensure that the *baseline* declarative execution semantics imposes the fewest requirements for component development but can be wrapped in a very consistent/uniform/simple manner to support any of our planned pipeline execution scenarios. This is an excercise in making things as simple as possible, but thinking ahead far enough about long-term goals to ensure that the baseline is "future proof" to some degree.
Wes: One thing that always SUPREMELY annoyed me about AVS was the absence of a
"stop" button on the modules. This issue concerns being able to interrupt
a module's processing when it was taking too long. Related, it might be nice
to have an execution model that uses the following paradigm: "OK, time's up,
give me what you have now."
Note: That is another way that the exec model is going to affect structures (or at least the API for accessing those structures). We can refer to this as “concurrent access to data”. Do we have to incorporate locking semantics in the data accessors? Do we have to incorporate firing protocol into the accessors (or at least hints as to firing constraints)? Again, we don’t want to get into a VisIt situation.
Automatic exec semantics do not give the framework enough control to address Wes’ issue. Certainly this is an issue with VTK as well. How do we formulate this as a requirement?
What kinds of execution models should be supported by the distributed visualization architecture
* View dependent algorithms? (These were typically quite difficult to implement for dataflow visualization environments like AVS5).
Randy: I propose limited enforcement of fixed execution semantics. View/data/focus
dependent environments are common and need to be supported, however, they
are still tied very closely with data representations, hence will likely
need to be customized to application domains/functions.
Pat: Not used heavily here, but would be interesting. A "want".
JohnC: These are neat research topics, but i've never been convinced that they
have much application beyond IEEEViz publications. Mostly I believe
this because of the complexity they impose on the data model. Better to
simply offer progressive/multiresolution data access.
Ilmi: I like to say "must", but it is for improving usability and efficiency,
so people may live without it.
It will definitely improve the efficiency. If we want to support view
dependent algorithm, then we should consider it from the beginning of the
dataflow design, so it can be easily integrated into. View dependent or
image-based algorithm doesn't necessarily make much changes to existing data
flow design. View dependant or image-based algorithms are useful to
eliminate majority of data blocks from the rendering pipeline. Therefore, it
is good to provide capability to choose subset of data to be rendered from
JohnS: Must be supported, but not as a basline exec model.
* Out-of-core algorithms
Randy: This has to be a feature, given the focus on large data.
Pat: A "must" for us.
JohnC: Seems like a must for large data. But is this a requirement or a design
Jim: Must. This is a necessary evil of "big data". You need some killer
caching infrastructure throughout the pipeline (e.g. like VizCache).
JohnS: Same deal. We must work out what kinds of attributes are required of the data structures/data model to represent temporal decomposition of a dataset. We should not encode the execution semantics as part of this (it should be outside of the component), but we must ensure that the data interfaces between components are capable of representing this kind of data decomposition/use-case.
* Progressive update and hierarchical/multiresolution algorithms?
Randy: Obviously, I have a bias here, particularly in the remote visualization
cases. Remote implies fluctuations in effective data latency that make
progressive systems key.
Pat: A "want".
JohnC: This is the way to go (IMHO), the question is at what level to support
Ilmi: MUST! for improving usability and efficiency. And can be used to support
JohnS: Likewise, we should separate the execution semantics necessary to implement this from the requirements imposed on the data representation. Data models in existing production data analysis/visualization systems often do not provide an explicit representation for such things as multiresolution hierarchies. We have LevelOfDetail switches, but that seems to be only a week form of representation for these hierarchical relationships and limits the effectivness of algorithms that depend on this method of data representation. Those requirements should not be co-mingled with the actual execution semantics for such components (its just the execution interface)
Wes: Yes. All of the above. Go team!
* Procedural execution from a single thread of control (ie. using an commandline language like IDL to interactively control an dynamic or large parallel back-end)
Randy: Yep, I think this kind of control is key.
Pat: A "want".
JohnC: A must for data analysis and data manipulation (derving new fields, etc)
Jim: This is not an execution model, it is a command/control interface issue.
You should be able to have a GUI, programmatic control, or scripting to
dictate interactive control (or "steering" as they call it... :-). The
internal software organization shouldn't change, just the interface to
the outside (or inside) world...
Ilmi: Good to have
JohnS: This should be our primary initial target. I do not have a good understanding of how best to support this, but its clear that we must ensure that a commandline/interactive scripting language must be supported. Current data parallel scripting interfaces assume data-parallel, batch-mode execution of the scripting interpreters (this is a bad thing).
Wes: HIstorically, this approach has proven to be very useful.
* Dataflow execution models? What is the firing method that should be employed for a dataflow pipeline? Do you need a central executive like AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or some sort of abstraction that allows the modules to be used with either executive paradigm?
Randy: I think this should be an option as it can ease some connection
mechanisms, but it should not be the sole mechanism. Personally,
I find a properly designed central executive making "global"
decisions coupled with demand/pull driven local "pipelets" that
allow high levels of abstraction more useful (see the VisIt model).
Pat: Preferably a design that does not lock us in to one execution model.
JohnC: We use a wavelet based approach similar to space filling curves. Both
approaches have merrit and both should be supportable by the framework.
Jim: Must This should be an implementation issue in the "dataflow framework", and
should not affect the component-level APIs.
JohnS: This can probably be achieved by wrapping components that have explicit/declarative execution semantics in a “component-within-a-component” hierarchical manner. Its an open question as to whether these execution models are a function of the component or the framework that is used to compose the components into an application though.
Wes: I get stuck thinking about the UI for this kind of thing rather than
the actual implementation. I'll defer to others for opinions.
Note: Strange… I would have tagged SFC’s and Wavelets as “researchy” things.
* Support for novel data layouts like space-filling curves?
Randy: With the right accessors nothing special needs to be added for these.
Pat: Not a pressing need here, as of yet.
Jim: Must. But this isn't an execution model either. It's a data structure
or algorithmic detail...
JohnS: I don't understand enough about such techniques to know how to approach this. However, it does point out that it is essential that we hand off data structures via accessors that keep the internal data structures opaque rather than complex data structures.
Š Are there special considerations for collaborative applications?
Jim: Surely. The interoperability of distinct framework implementations
ties in with this... but the components shouldn't be aware that they
are being run collaboratively/remotely... definitely a framework issue.
Ilmi: Some locking mechanizm for subset of data or dispatching of changes from
one client to multiple clients.
JohnS: Ugh. I'm also hoping that collaborative applications only impose requirements for wrapping baseline components rather than imposing internal requirements on the interfaces that exchange data between the components. So I hope we can have "accessors" or "multiplexor/demultiplexor" objects that connect to essentially non-collaboration-aware components in order support such things. Otherwise, I'm a bit daunted by the requirements imposed.
Note: The danger of pushing the collaborative functionality out to a “framework issue” is that we increasingly make the “framework” a heavyweight object. It creates a high-cost-of-entry for any such feature or even minor modifications to such features. Learned from “Cactus” the importance of making the framework as slender as possible and move as much functionality as possible into “optional components” to support feature X. So it is important to ensure that we push off these issues as much as possible.
* What else?
Randy: The kitchen sink? :)
Pat: Distributed control? Fault tolerance?
Jim: Yeah Right.
Wes: Control data, performance data and framework response to and manipulation
of such data.
How will the execution model affect our implementation of data structures?
Jim: It shouldn't. The execution model should be kept independent of the
data structures as much as possible.
If you want to build higher-level APIs for specific data access patterns
that's fine, but keep the underlying data consistent where possible.
Note: The description of this as affecting our “data structures” is an artifact of attempting to straddle the dual-goals of addressing internal data structures and external accessors. So perhaps this should be “how will it affect our accessors.”
* how do you decompose a data structure such that it is amenable to streaming in small chunks?
Randy: This is a major issue and relates to things like out-of-core/etc.
I definitely feel that "chunking" like mechanisms need to be in
the core interfaces.
Pat: Are we assuming streaming is a requirement?
How do you handle visualization algorithms where the access patterns
are not known a priori? The predominant example: streamlines and streaklines.
Note the access patterns can be in both space and time. How do you avoid
having each analysis technique need to know about each possible data
structure in order to negotiate a streaming protocol? How do add another
data structure in the future without having to go through all the analysis
techniques and put another case in their streaming negotiation code?
In FM the fine-grained data access ("accessors") is via a standard
interface. The evaluation is all lazy. This design means more
function calls, but it frees the analysis techniques from having to know
access patterns a priori and negotiate with the data objects. In FM
the data access methods are virtual functions. We find the overhead
not to be a problem, even with relatively large data. In fact, the overhead
is less an issue with large data because the data are less likely to be
served up from a big array buffer in memory (think out-of-core, remote
out-of-core, time series, analytic meshes, derived fields, differential-
operator fields, transformed objects, etc., etc.).
The same access-through-an-interface approach could be done without
virtual functions, in order to squeeze out a little more performance, though
I'm not convinced it would be worth it. To start with you'd probably end up
doing a lot more C++ templating. Eliminating the virtual functions would
make it harder to compose things at run-time, though you might be able
to employ run-time compilation techniques a la SCIRun 2.
Jim: This sounds a lot like distributed data decompositions. I suspect that
given a desired block/cycle size, you can organize/decompose data in all
sorts of useful ways, depending on the expected access pattern.
In conjunction with this, you could also reorganize static datasets
into filesystem databases, with appropriate naming conventions or
perhaps a special protocol for lining up the data blob files in the
desired order for streaming (in either time or space along any axis).
Meta-data in the files might be handy here, too, if it's indexed
efficiently for fast lookup/searching/selection.
JohnS: The recent SDM workshop pointed out that chunking/streaming interfaces are going to be essential for any data analysis system that deals with large data, but there was very little agreement on how the chunking should be expressed. The chunking also potentially involves end-to-end requirements of the components that are assembled in a pipeline as you must somehow support uniformity in the passage of chunks through the system (ie. the decision you make about the size of one chunk will impose requirements for all other dependent streaming interfaces in the system). We will need to walk through at least one use-case for chunking/streaming to get an idea of what the constraints are here. It may be too tough an issue to tackle in this first meeting though.
Also, as Pat pointed out, when dealing with vis techniques like streamlines, you almost need to have a demand-based fetching of data. This implies some automatic propagation of requests through the pipeline. This will be hard, and perhaps not supported by a baseline procedural model for execution.
Note: Again, it appears we need to have clear delineation between temporal and spatial dependencies. To support streaming, one must also have dependent components be able to report back their constraints.
Jim, how can we formulate a requirement that the execution model is independent of the data structures when we really don’t have data structures per-se. Because we are using accessors, calling them will in turn cause a component to call other accessors. If we do not have common execution semantics, then this will be a complete muddle even if we do agree on our port standards. So can we really keep these things independent?
* how do you represent temporal dependencies in that model?
Randy: I need to give this more thought, there are a lot of options.
Pat: In FM, data access arguments have a time value, the field interface is
the same for both static and time-varying data.
Jim: Meta-data, or file naming conventions...
JohnS: Each item in a datastructure or as passed-through via an accessor needs to have some method of referring to dependencies both spatial (ie. interior boundaries caused by domain decomposition) and temporal. Its important to make these dependencies explicit in the data structures provide a framework the necessary information to organize parallelism in both the pipeline and data-parallel directions. The implementation details of how to do so are not well formulated and perhaps out-of-scope for our discussions. So this is a desired *requirement* that doesn't have a concrete implementation or design pattern involved.
Note: Given the importance of time-varying data to JohnC and Pat, it seems important to come up with a formal way to represent these things.
* how do you minimize recomputation in order to regenerate data for view-dependent algorithms.
Randy: Framework invisible caching. Not a major Framework issue.
Pat: Caching? I don't have a lot of experience with view-dependent algorithms.
Jim: No clue.
JohnS: I don't know. I'm hoping someone else responding to this survey has some ideas on this. I'm uncertain how it will affect our data model requirements.
Note: Is caching a framework issue? Or is it a component issue?
What are the execution semantics necessary to implement these execution models?
* how does a component know when to compute new data? (what is the firing rule)
Randy: Explicit function calls with potential async operation. A higher-level
wrapper can make this look like "dataflow".
Jim: There are really only 2 possibilities I can see - either a component is
directly invoked by another component or the framework, or else a method
must be triggered by some sort of dataflow dependency or stream-based
JohnS: For declarative semantics, the firing rule is an explicit method call that is invoked externally. Hopefully such objects can be *wrapped* to encode semantics that are more automatic (ie. the module itself decides when to fire depending on input conditions), but initially it should be explicit.
Wes: To review, the old AVS model said that a module would be executed if any
of its parameters changed, or if its input data changed. One thing that
was annoying was that you had to explicitly disable the flow executive if
you wanted to make changes to multiple parameters on a single module before
allowing it to execute. This type of thing came up when using a module with
a long execution time.
* does coordination of the component execution require a central executive or can it be implemented using only rules that are local to a particular component.
Randy: I think the central executive can be an optional component (again, see
Jim: This is a framework implementation detail. No. No. Bad Dog.
The component doesn't know what's outside of it (in the rest of the
framework, or the outside world). It only gets invoked, one way or
JohnS: It can eventually be implemented using local semantics, but intiially, we should design for explicit external control.
Wes: Not sure what this means.
Note: And it potentially invokes other components. If a component invokes other components and thereby creates a chain of execution, then we have an execution semantics that is outside of the framework’s control. So, do we want to prevent this in our baseline requirements for component invocation? The central executive approach says that our “baseline” components may not invoke another component in response to their invocation. This seems to be a component invocation semantics issue.
* how elegantly can execution models be supported by the proposed execution semantics? Are there some things, like loops or back-propagation of information that are difficult to implement using a particular execution semantics?
Randy: There will always be warts...
Pat: The execution models we have used have kept the control model in
each analysis technique pretty simple, relying on an external executive.
The one big exception is with multi-threading. We've experimented with
more elaborate parallelism and load-balancing techniques, motivated in
part by latency hiding desires.
Jim: We need to keep the different execution models separate, as implementation
details of individual frameworks. This separates the concerns here.
JohnS: Its all futureware at this point. We want to first come up with clear rules for baseline component execution and then can come up with some higher level / automatic execution semantics that can be implemented by *wrapping* such components. The "wrapper" would then take responsibility for imposing higher-level automatic semantics.
Wes: The dataflow thing doesn't lend itself well to things like view dependent
processing where the module at the end of the chain (renderer) sends view
parameters back upstream, thereby causing the network to execute again, etc.
The whole upstream data thing is a "wart on the ass of" AVS. (sorry)
How will security considerations affect the execution model?
Randy: Security issues tend to impact two areas: 1) effective bandwidth/latency
and 2) dynamic connection problems. 1) can be unavoidable, but will not
show up in most environments if we design properly. 2) is a real problem
with few silver bullets.
Pat: More libraries to link to? More latency in network communication?
Jim: Ha ha ha ha...
They won't right away, except in collaboration scenarios.
Think "One MPI Per Framework" and do things the old fashioned way
locally, then do the "glue" for inter-framework connectivity with
proper authentication only as needed. (No worse than Globus... :-)
JohnS: I don't know. Please somebody tell me if this is going to be an issue. I don't have a handle on the *requirements* for security. But I do know that simply using a secure method to *launch* a component is considered insufficient by security people who would also require that connections between components be explicitly authenticated as well. Most vis systems assume secure launching (via SSH or GRAM) is sufficient. The question is perhaps whether security and authorization are a framework issue or a component issue. I am hoping that it is the former (the role of the framework that is used to compose the components).
Note: Current DOE security policy basically dictates that we cannot deploy current distributed vis tool implementations because the connections are not authenticated. Ensight is an exception because the server is always making an outgoing connection (basically makes it an issue for the destination site) and requires explicit “accept” of the connection.
3) Parallelism and load-balancing=================
Thus far, managing parallelism in visualization systems has been a tedious and difficult at best. Part of this is a lack of powerful abstractions for managing data-parallelism, load-balancing and component control.
JohnS: If we are going to address inter-component data transfers to the exclusion of data structures/models internal to the component, then much of this section is moot. The only question is how to properly represent data-parallel-to-data-parallel transfers and also the semantics for expressing temporal/pipeline parallelism and streaming semantics. Load-balancing becomes an issue that is out-of-scope because it is effectively something that is inside of components (and we don't want to look inside of the components
Please describe the kinds of parallel execution models that must be supported by a visualization component architecture.
Š data-parallel/dataflow pipelines?
Wes: It would be nice if the whole scatter/gather thing could be marshaled
by the framework. That way, my SuperSlick[tm] renderer wouldn't contain
a bunch of icky network code that manages multiple socket connections
from an N-way parallel vis component. One interesting problem is how a
persistent tool, like a renderer, will be notified of changes in data
originating from external components. I want some infrastructure that
will make obsolete me having to write custom code like this for each
Note: Seriously. Is it really a very useful paradigm to have the framework represent parallel components as one-component-per-processor? Its seems very ‘icky’ as Wes says.
* master/slave work-queues?
Randy: I tend to use small dataflow pipelines locally and higher-level
async streaming work-queue models globally.
JohnS: Maybe: If we want to support progressive update or heterogeneous execution environments. However, I usually don’t consider this methodology scalable.
* streaming update for management of pipeline parallelism?
Randy: Yes, we use this, but it often requires a global parallel filesystem to
be most effective.
* chunking mechanisms where the number of chunks may be different from the number of CPU's employed to process those chunks?
Randy: We use spacefilling curves to reduce the overall expense of this
(common) operation (consider the compute/viz impedance mismatch
problem as well). As a side effect, the codes gain cache coherency
Pat: We're pretty open here. Mostly straight-forward work-queues.
Jim: This sounds the same as master/slave to me, as in "bag of tasks"...
JohnS: Absolutely. Of course, this would possibly be implemented as a master/slave work-queue, but there are other methods.
* how should one manage parallelism for interactive scripting languages that have a single thread of control? (eg. I'm using a commandline language like IDL that interactively drives an arbitrarily large set of parallel resources. How can I make the parallel back-end available to a single-threaded interactive thread of control?)
Randy: Consider them as "scripting languages", and have most operations
run through an executive (note the executive would not be aware
of all component operations/interactions, it is a higher-level
executive). Leave RPC style hooks for specific references.
Pat: I've used Python to control multiple execution threads. The (C++)
data objects are thread safe, the minimal provisions for thread-safe
objects in Python haven't been too much of a problem.
Jim: Broadcast, Baby... Either you blast the commands out to everyone SIMD
style (unlikely) or else you talk to the Rank 0 task and the command
gets forwarded on a fast internal network.
JohnS: I think the is very important and a growing field of inquiry for data analysis environments. Whatever agreements we come up with, I want to make sure that things like parallel R are not left out in these considerations.
Note: But CCA doesn’t support broadcast. This leads to a quandary here because we want to be able to be able to adjust parameters for a component via the GUI or via a command from another component interchangeably. So I agree with “broadcast baby”, but I don’t see that it is feasible to push this off as a “framework issue” as it may well need to be something the component interface description must support.
Please describe your vision of what kinds of software support / programming design patterns are needed to better support parallelism and load balancing.
* What programming model should be employed to express parallelism.
(UPC, MPI, SMP/OpenMP, custom sockets?)
Randy: The programming model must transcend specific parallel APIs.
Jim: All but UPC will be necessary for various functionality.
JohnS: If we are working just on the outside of components, this question should be moot. We must make sure the API is not affected by these choices though.
Wes: This discussion may follow the same path as the one about DS/DM for grids.
The answer seems to be "one size doesn't fit all, but there is no 'superset'
the makes everyone happy." That said, there is likely a set of common issues
wrt execution and DS/DM that underly parallel components regardless of
Note: Since we are talking about functionality outside of the component, this seems reasonable. So this really requires clarification of where the parallelism is expressed. Caffeine wants to express this as parallel sets of components. However, this seems unreasonable for some of the communication patterns we deal with. If we stated that such parallelism is inside of the component wrapper, then what? (at minimum, we don’t have to answer this question!)
* Can you give some examples of frameworks or design patterns that you consider very promising for support of parallelism and load balancing.
(ie. PNNL Global Arrays or Sandia's Zoltan)
Randy: no I cannot (am not up to speed).
Jim: Nope, that covers my list of hopefuls.
JohnS: Also out of scope. This would be something employed within a component, but if we are restricting discussions to what happens on the interface between components, then this is also a moot point. At minimum, it will be important to ensure that such options will not be precluded by our component interfaces.
Wes: Mabye should include "remote resource management" in this thread. I'm
thinking of the remote AVS module libraries. So, not only is there the issue
of launching parallel components, and load balancing (not sure how this will
play out), but also one of allowing a user to select, at run time, from
among a set of resources.
This problem becomes even more interesting when the pipeline optimization
starts to happen, and components are migrated across resources.
Note: This is now somewhat out-of-scope for discussions of inter-component communication.
* Should we use novel software abstractions for expressing parallelism or should the implementation of parallelism simply be an opaque property of the component? (ie. should there be an abstract messaging layer or not)
Randy: I would vote no as it will allow known paradigms to work, but will
interfere with research and new direction integration. I think some
kind of basic message abstraction (outside of the parallel data system)
Jim: It's not our job to develop "novel" parallelism abstractions. We should
just use existing abstractions like what the CCA is developing.
JohnS: Implementation of parallelism should be an opaque property of the component. We want to have language independence. We should also strive to support independence in the implementation of parallelism. Creating a software abstraction layer for messaging and shmem is a horrible way to do it.
* How does the NxM work fit in to all of this? Is it sufficiently differentiated from Zoltan's capabilities?
Randy: Unable to comment...
Pat: I don't have a strong opinion here. I'm not familiar with Zoltan et al.
Our experience with parallelism tends to be more shared-memory than
JohnC: Hmm. These all seem to be implementation issues. Too early to answer.
JohnS: I need a more concrete understanding of MxN. I understand what it is supposed to do, but I'm not entirely sure what requirements it would impose on any given component interface implementation. It seems like something our component data interfaces should support, but perhaps such redistribution could be hidden inside of an MxN component? So should this kind of redistribution be supported by the inter-component interface or should there be components that explicitly effect such data redistributions? Jim... Help!
Jim: I don't know what Zoltan can do specifically, but MxN is designed for
basic "parallel data redistribution". This means it is good for doing
big parallel-to-parallel data movement/transformations among two disparate
parallel frameworks, or between two parallel components in the same
framework with different data decompositions. MxN is also good for
"self-transpose" or other types of local data reorganization within a
given (parallel) component.
MxN doesn't do interpolation in space or time (yet, probably for a while),
and it won't wash your car (but it won't drink your beer either... :-).
If you need something fancier, or if you don't really need any data
reorganization between the source and destination of a transfer, then
MxN *isn't* for you...
===============End of Mandatory Section (the rest is voluntary)=============
4) Graphics and Rendering=================
What do you use for converting geometry and data into images (the rendering-engine). Please comment on any/all of the following.
* Should we build modules around declarative/streaming methods for rendering geometry like OpenGL, Chromium and DirectX or should we move to higher-level representations for graphics offered by scene graphs?
Randy: IMHO, the key is defining the boundary and interoperability constraints.
If these can be documented, then the question becomes moot, you can
use whatever works best for the job.
Ilmi: It is usually useful to have access to frame buffer so, I prefer OpenGL
style over VRML style.
In addition, I don't know how useful the scene graphs for visualization. I
guess scene graphs for visualizations are relatively simple, so it is
possible to convert the scene graphs to declarative way. So, mainly support
declarative methods and then additional support of scen graphs and
conversions to declarative methods.
JohnS: This all depends on the scope of the framework. A-priori, you can consider the rendering method separable and render this question moot. However, this will make it quite difficult to provide very sophisticated support for progressive update, image-based-methods, and view-dependent algorithms because the rendering engine becomes intimately involved in such methods. I'm concerned that this is where the component model might break down a bit. Certainly the rendering component of traditional component-like systems like AVS or NAG Explorer the most heavy-weight and complex components of the entire environment. Often, the implementation of the rendering component would impose certain requirements on components that had to interact with it closely (particularly in the case of NAG/Iris Explorer where you were really directly exposed to the fact that the renderer was built atop of OpenInventor).
So, we probably cannot take on the issue of renderers quite yet, but we are eventually going to need to define a big "component box" around OpenGL/Chromium/DirectX. That box is going to have to be carefully built so as to keep from precluding any important functionality that each of those rendering engines can offer. Again, I wonder if we would need to consider scene graphs if only to offer a persistent datastructure to hand-off to such an opaque rendering engine. This isn't necessarily a good thing.
Wes: As a scene graph proponent, I would say that you don't build component
architectures around scene graphs. That concept doesn't make any sense to me.
Instead, what you do is have DS/DM representations/encapsulations for the
results of visualization. These are things like buckets-o-triangles, perhaps
at multiple resolutions. You also provide the means to send renderer information
to vis components to do view-dependent processing, or some other form of
Similarly, you don't make the output of visualization components in the form
of glBegin()/glEnd() pairs, either.
Note: It sounds like we need to look at Ilmi’s work and ensure that whatever method we select to get the “drawables” to the “renderer” that it not preclude her requirements.
What are the pitfalls of building our component architecture around scene graphs?
Randy: Data cloning, data locking and good support for streaming, view dependent,
JohnC: Not so good for time varying data last time I checked.
Ilmi: might lose access to frame buffer and pixel level manipulation --
extremely difficult for view dependent or image-based approach
JohnS: It will add greatly to the complexity of this system. It also may get in the way of novel rendering methods like Image-based methods.
Wes: Back to the scene graph issue - what you allow for is composition of streams
of data into a renderer. Since view position information is supported as a
first class DS/DM citizen (right?) it becomes possible to compose a
rendering session that is driven by an external source.
Nearly all renderers use scene graph concepts - resistance is futile! The
weak spot in this discussion concerns streaming. Since scene graphs systems
presume some notion of static data, the streaming notion poses some problems.
They can be surmounted by adding some smarts to the rendering and the
data streaming - send over some bounding box info to start with, then allow
the streaming to happen at will. The renderer could either then not render
that tree branch until transmission is complete, or it could go ahead and
render whatever is in there at the time. Middle ground could be achieved
with progressive transmission, so long as there are "markers" that signal
the completion of a finished chunk of data to be rendered.
Some people's "complaints" about scene graphs stem from bad designs
and bad implementations. A "scene graph system" is supposed to be
an infrastructure for storing scene data and rendering. That ought to
include support for image-based methods, even though at first blush
it seems nonsensical to talk about buckets-o-triangles in the same
breath as normal maps. All interactive rendering systems are fundamentally
created equally in terms of intent & design. The implementation varies.
Among the top items in the "common" list is the need to store data, the
need to specify a viewpoint, and the need to propogate transformation
information. Beyond that, it's merely an implementation issue.
I caution against spending too much time worrying about how scene graphs
fit into DiVA because the issue is largely a red herring.
* What about Postscript, PDF and other scale-free output methods for publication quality graphics? Are pixmaps sufficient?
Randy: Gotta make nice graphs. Pixmaps will not suffice.
JohnC: Well what are we trying to provide, an environment for analysis or
producing images for publications? The latter can be done as a post
process and should not, IMHO, be a focus of DIVA.
JohnS: Pixmaps are insufficient. Our data analysis infrastructure has been moving rapidly away from scale-free methods and rapidly towards pixel-based methods. I don't know how to stop this slide or if we are poised to address this issue as we look at this component model.
Wes: Gotta have vector graphics.
In a distributed environment, we need to create a rendering subsystem that can flexibly switch between drawing to a client application by sending images, sending geometry, or sending geometry fragments (image-based rendering)? How do we do that?
Randy: See the Chromium approach. This is actually more easily done than
one might think. Define an image "fragment" and augment the rendering
pipeline to handle it (ref: PICA and Chromium).
JohnC: Use Cr
Jim: I would think this could be achieved by a sophisticated data communication
protocol - one that encodes the type of data in the stream, say, using XML
or some such thingy.
Wes: Again, one size doesn't fit all. These seem to be logically different components.
Note: So we are going to define the OpenGL/Cr API as a “port” interface in CCA? All of GL will go through RMI?
* Please describe some rendering models that you would like to see supported (ie. view-dependent update, progressive update) and how they would adjust dynamically do changing objective functions (optimize for fastest framerate, or fastest update on geometry change, or varying workloads and resource constraints).
Randy: See the TeraScale browser system.
JohnC: Not sold on view dependent update as worthwhile, but progressive updates
can be hugely helpful. Question is do you accomplish this by adding
support in the renderer or back it up the pipeline to the raw data?
JohnS: I see this as the role for the framework. It also points to the need to have performance models and performance monitoring built in to every component so that the framework has sufficient information to make effective pipeline deployment decisions in response to performance constraints. It also points to the fact that at some level in this component architecture, component placement decisions must be entirely abstract (but such a capability is futureware).
So in the short-term its important to design components with effective interfaces for collecting performance data and representing either analytic or historical-based models of that data. This is a necessary baseline to get to the point that a framework could use such data to make intelligent deployment/configuration decisions for a distributed visualization system.
Wes: The scene graph treatise (above) covers most of what I have to say for now.
* Are there any good examples of such a system?
Randy: None that are ideal :), but they are not difficult to build.
JohnC: Yes, Kitware's not-for-free volume renderer (volren?). I does a nice job
with handling progressive updates. This is mostly handled by the GUI but
places some obvious requirements on the underlying rendering/viz
JohnS: No. That’s why we are here.
Wes: I know of a couple of good scene graphs that can form the basis for renderers.
What is the role of non-polygonal methods for rendering (ie. shaders)?
Š Are you using any of the latest gaming features of commodity cards in your visualization systems today?
JohnC: Yup, we've off loaded a couple of algorithms from the CPU.
We just have some very simple, one-off applications that off-load
computation from the cpu to gpu. For example, we have a 2D Image Based
Flow Visualization algorithm that exploits vertex programmability to do
white noise advection. Developing this type of application within
any Diva framework I've envisioned would really push the limits of
anything we've discussed.
JohnS: I'd like to know if anyone is using shader hardware. I don't know much about it myself, but it points out that we need to plan for non-polygon-based visualization methods. Its not clear to me how to approach this yet.
* Do you see this changing in the future? (how?)
Randy: This is a big problem area. Shaders are difficult to combine/pipeline.
We are using this stuff now and I do not see it getting much easier
(hlsl does not fix it). At some point, I believe that non-polygon
methods will become more common that polygon methods (about 3-4 years?).
Poylgons are a major bottleneck on current gfx cards as they limit
parallelism. I'm not sure what the fix will be but it will still be
called OpenGL :).
JohnC: The biggest issue is portability, but things are looking up with OpenGL
2.0 efforts, etc.
Wes: We've invited Ilmi Yoon to the next workshop. She represents the IBR
community. I am very keen to see us take advantage of IBR techniques as well
as our traditional polygon engines, perhaps combining them in interesting
ways to realize powerful new systems.
Note: Do scene graphs somewhat address portability issues via further abstraction of the rendering procedure?
It will be necessary to separate the visualization back-end from the presentation interface. For instance, you may want to have the same back-end driven by entirely different control-panels/GUIs and displayed in different display devices (a CAVE vs. a desktop machine). Such separation is also useful when you want to provide different implementations of the user-interface depending on the targeted user community. For instance, visualization experts might desire a dataflow-like interface for composing visualization workflows whereas a scientists might desire a domain-specific dash-board like interface that implements a specific workflow. Both users should be able to share the same back-end components and implementation even though the user interface differs considerably.
How do different presentation devices affect the component model?
Jim: Not. The display device only affects resolution or bandwidth required.
This could be parameterized in the component invocations APIs, but
should not otherwise change an individual component.
If you want a "multiplexer" to share a massive data stream with a powerwall
and a PDA, then the "multiplexer component" implementation handles that...
* Do different display devices require completely different user interface paradigms? If so, then we must define a clear separation between the GUI description and the components performing the back-end computations. If not, then is there a common language to describe user interfaces that can be used across platforms?
Randy: I think they do (e.g. immersion).
Jim: No. Different GUIs should all map to some common framework command/control
interface. The same functions will ultimately get executed, just from buttons
with different labels or appl-specific short-cuts... The UIs should all be
independent, but talk the same protocol to the framework.
Yuk (with regard to creating separation between GUI and component description)
JohnS: Systems that attempt to use the same GUI paradigm across different presentation media have always been terrible in my opinion. I strongly believe that each presentation medium requires a GUI design that is specific to that particular medium. This imposes a strong requirement that our compute pipeline for a given component architecture be strictly separated from the GUI that controls the parameters and presents the visual output of that pipeline. OGSA/WSDL has been proposed as one way to define that interface, but it is extremely complex to use. One could use CCA to represent the GUI handles, but that might be equally complex. Others have simply customized ways to use XML descriptions of their external GUI interface handles for their components. The latter seems much simpler to deal with, but is it general enough?
* Do different display modalities require completely different component/algorithm implementations for the back-end compute engine?
(what do we do about that??)
Randy: They can (e.g. holography), but I do not see a big problem there.
Push the representation through an abstraction (not a layer).
Jim: Algorithm maybe, component no. This could fall into the venue of the
different execution-model-specific frameworks and/or their bridging...
JohnS: I think there is a lot of opportunity to share the back-end compute engines across different display modalities. There are some cases where a developer would be inclined to implement things like an isosurfacer differently for a CAVE environment just to keep the framerates up high-enought to maintain your sense of immersion. However, I think of those as edge-cases.
What Presentation modalities do you feel are important and what do you consider the most important.
* Desktop graphics (native applications on Windows, on Macs)
Randy: #1 (by a fair margin)
JohnC: This is numero uno by a HUGE margin
Wes: Yes, most important, will never go away.
* Graphics access via Virtual Machines like Java?
JohnC: Not important
Jim: Ha ha ha ha…
Wes: If it works on desktops, it will work in these environments.
* CAVEs, Immersadesks, and other VR devices
JohnC: Not important
Wes: Second to workstations. With evolution of Chromium, DMX and the nascent
PICA stuff, I would expect that desktop tools would port transparently
to these devices.
* Ultra-high-res/Tiled display devices?
Randy: #3 - note that tiling applies to desktop systems as well, not
necessarily high-pixel count displays.
JohnC: Moderately important
JohnS: #3 : the next tiled display may well be your next *desktop* display, but quite yet.
* Web-based applications?
JohnC: Well, maybe.
Jim: Probably a good idea. Someone always asks for this... :-Q
What abstractions do you think should be employed to separate the presentation interface from the back-end compute engine?
Jim: Some sort of general protocol descriptor, like XML...? Nuthin fancy.
* Should we be using CCA to define the communication between GUI and compute engine or should we be using software infrastructure that was designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)
Randy: No strong opinion.
Jim: The CCA doesn't do such communication per se. Messaging between or in/out
of frameworks is always "out of band" relative to CCA port invocations.
If the specific framework impl wants to shove out data on some wire,
then it's hidden below the API level...
I would think that WSDL/SOAP would be O.K. for low-bandwidth uses.
JohnS: I think I addressed this earlier. We can do this all in CCA, but is that the right thing to do? I know this is an implementation issue, but is a strong part of our agreement on methods to implement our components (or define component boundaries).
Wes: (I this as similar to rendering in VM’s like Java in many respects).
Always sounds nice, but have
yet to see much fruit in this area. The potential importance/relevance is great.
The browser makes a nice UI engine, but I wouldn't trust it to do "real"
* How do such control interfaces work with parallel applications?
Should the parallel application have a single process that manages the control interface and broadcasts to all nodes or should the control interface treat all application processes within a given component as peers?
Randy: Consider DMX, by default, single w/broadcast, but it supports
Jim: I vote for the "single process that manages the control interface and
broadcasts to all nodes" (or the variation above, where one of the
parallel tasks forwards to the rest internally :-). The latter is
BTW, you can't have "application processes within a... component".
What does that even mean?
Usually, an application "process" consists of a collection of one or
more components that have been composed with some specific connectivity...
JohnS: This requires more discussion, but reliable broadcast methods have many problems related to event skewing and MPI-like point-to-point emulation of the broadcast suffers from scalability problems. We need to collect design patterns for the control interface and either compete them against one-another or find a way to support them all by design. This is clearly an implementation issue, but will leak in to our abstract component design decisions. Clearly we want a single thread of control to efficiently deliver events to massively parallel back-end components. That is a *must* requirement.
Note: That paradigm (one component-chain per process) doesn’t offer you much opportunity for encapsulating complex parallel communication patterns.
6) Basic Deployment/Development Environment Issues============
One of the goals of the distributed visualization architecture is seamless operation on the Grid -- distributed/heterogeneous collections of machines. However, it is quite difficult to realize such a vision without some consideration of deployment/portability issues. This question also touches on issues related to the development environment and what kinds of development methods should be supported.
What languages do you use for core vis algorithms and frameworks.
* for the numerically intensive parts of vis algorithms
Randy: C/C++ (a tiny amount of Fortran)
Jim: C/C++… Fortran/F90 for numerically intensive parts.
* for the glue that connects your vis algorithms together into an application?
JohnC: C/C++, Tcl, Python
JohnS: C++/C/Java, but want to get into some Python (it is said to have better numerics than Java)
* How aggressively do you use language-specific features like C++ templates?
Randy: Not very, but they are used.
JohnC: Not at all. Too scary.
Jim: RUN AWAYYYY!!! These are not consistent across o.s./arch/compiler yet.
JohnS: I avoid them due to portability and compiler maturity issues.
Wes: Beond vanilla classes, not at all.
* is Fortran important to you? Is it important that a framework support it seamlessly?
Randy: Pretty important, but at least "standardly" enhanced F77 should be simple :).
Jim: Fortran is crucial for many application scientists. It is not directly
useful for the tools I build.
But if you want to ever integrate application code components directly
into a viz framework, then you better not preclude this... (or Babel...)
JohnS: Yes, absolutely. It needn't be full fledged F90 support, but certainly f77 with some f90 extensions.
Wes: No. Fortran can be wrapped inside something sane.
Note: It is perhaps incumbent on us to support Fortran. We would eventually like buy-in from domain scientists to provide some analysis components that are interesting for them. Lack of Fortran bindings for VTK was a major issue for some participants in the Vis Greenbook workshop.
* Do you see other languages becoming important for visualization (ie. Python, UPC, or even BASIC?)
Randy: Python is big for us.
JohnC: Python, mostly because the direction of numerical python.
What platforms are used for data analysis/visualization?
* What do you and your target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)
Randy: Linux, SGI, Sun, Windows, MacOS in that order
JohnC: All the above, primarily lintel an windoze though.
Jim: All of the above (not so much Sun anymore…)
JohnS: Linux, MacOS-X(BSD), Windows
Wes: For rendering, OpenGL engines.
* What kinds of presentation devices are employed (desktops, portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories) and what is their relative importance to active users.
Randy: Remote desktops and laptops. Very important
JohnC: desktops, tiled displays, AG
Jim: All but handhelds are important, mostly desktops, CAVEs/hi-res and AG,
in decreasing order.
JohnS: Desktop and laptops are most important. Web, AG, and CAVE are of lesser importance (but still important).
Wes: Workstations are most important.
* What is the relative importants of these various presentation methods from a research standpoint?
Randy: PowerPoint :)?
JohnC: The desktop is where the users live.
Jim: CAVEs/hi-res and AG are worthwhile research areas. The rest can be
weaved in or incorporated more easily.
* Do you see other up-and-coming visualization platforms in the future?
Randy: Tablets & set-top boxes.
JohnC: I don't see SMP graphics boxes going away as quickly as some might.
Jim: Yes, but I haven't figured out where exactly to stick the chip behind
my ear for the virtual holodeck equipment... :)
JohnS: Tablet PCs and desktop-scale Tiled display devices.
Tell us how you deal with the issue of versioning and library dependencies for software deployment.
* For source code distributions, do you bundle builds of all related libraries with each software release (ie. bundle HDF5 and FLTK source with each release).
Randy: For many libs, yes.
JohnC: Sometimes, depending on the stability of the libraries.
Jim: CVS for control of versioning.
For bundling of libraries: No, but provide web links or separate copies of dependent distributions
next to our software on the web site...
Too ugly to include everything in one big bundle, and not as efficient
as letting the user download just what they need. (As long as everything
you need is centrally located or accessible...)
JohnS: Every time I fail to bundle dependent libraries, it has been a disaster. So it seems that packaging dependent libraries with any software release is a *must*.
Wes: Oddly enough, I do bundling like this for some of my projects. I think people
* What methods are employed to support platform independent builds (cmake, imake, autoconf). What are the benefits and problems with this approach.
Randy: gmake based makefiles.
JohnC: I've used all, developed my own, and like none. Maybe we can do better.
I think something based around gmake might have the best potential.
Jim: Mostly autoconf so far. My student thinks automake and libtools is "cool"
but we haven't used them yet...
JohnS: I depend on conditional statements in gmake-based makefiles to auto-select between flags for different architectures. This is not sufficiently sophisticated for most release engineering though. I have dabbled with autoconf, but it is not a silver bullet (neither was imake). I do not understand the practical benefits of 'cmake'.
Wes: I hate Imake, but used it extensively for a long time with LBL's AVS modules.
I think it still works. Nobody I know can figure out how autoconf works. I
personally tend to have different makefiles, particularly when doing code
that is supposed to build on Win32 as well as Unix/Linux systems.
* For binaries, have you have issues with different versions of libraries (ie. GLIBC problems on Linux and different JVM implemetnations/version for Java). Can you tell us about any sophisticated packaging methods that address some of these problems (RPM need not apply)
Randy: No real problems other that GLIBC problems. We do tend to ship static
for several libs. Motif used to be a problem on Linux (LessTiff vs
Jim: Just say no. Open Source is the way to go, with a small set of "common"
binaries just for yuks. Most times the binaries won't work with the
specific run-time libs anyway...
JohnS: Building statically has been necessary in a lot of cases, but creates gigantic executables. In the case of JVM's, the problems with the ever-changing Java platform have driven me away from employing Java as a development platform.
Wes: I tend to just do source, rather than binaries, to avoid this whole morass.
OTOH, as a consumer, I prefer RPMs so that I don't have to build it. I want
my ice toasted, please.
* How do you handle multiplatform builds?
Randy: cron jobs on multiple platforms, directly from CVS repos. Entire
environment can be built from CVS repo info (or cached).
JohnC: The brute force, not so smart way. The VTK model is worth looking at.
Jim: Autoconf, shared source tree, with arch-specific subdirs for object files,
libs and executables.
JohnS: * Conservative, lowest-common denominator coding practices.
* execute 'uname' at the top of a gnu makefile to select an appropriate set of build options for sourcecode building. Inside of the code, must use the CPP to code around platoform dependencies.
How do you (or would you) provide abstractions that hide the locality of various components of your visualization/data analysis application?
Jim: I would use "proxy" components that use out-of-band communication to
forward invocations and data to the actual component implementation.
Š Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC? Please comment on advantages/problems of these technologies.
* Do web/grid services come into play here?
Randy: Not usually an issue for us.
Jim: Yuck, I hope not…
JohnS: As these web-based scientific collaboratory efforts gather momentum, web-based data analysis tools have become increasingly important. I think the motivation is largely driven by deployment issues when supporting a very heterogeneous/multi-institutional user base. It reduces the deployment variables when your target is a specific web-server environment, but you pay a price in that the user-interface is considerably less advanced. This cost is mitigated somewhat if the data analysis performed is very domain-specific and customized for the particular collaboratory community. So its a poor choice for general-purpose visualization tools, but if the workflow is well-established among the collaborators, then the weakness of the web-based user-interface options is not as much of a problem.
7) Collaboration ==========================
If you are interested in "collaborative appllications" please define the term "collaborative". Perhaps provide examples of collaborative application paradigms.
Randy: Meeting Maker? :) :) (I'm getting tired).
Jim: "Collaborative" is 2 or more geographically/remote teams, sharing one
common viz environment, with shared control and full telepresence.
(Note: by this definition, "collaborative" does not yet exist... :-)
JohnS: Despite years of dabbling in “collaborative applications,” I’m still not sure if I (or anyone) really knows what “collaborative” is in a strict sense.
Wes: The term "collaboration" is one of the most overused, misused and abused
terms in the English language. There is a huge disconnect between what
many users want/need, and what seems to be an overemphasis upon collaborative
technologies. For this particular project, collaboration (ought to) mean:
being able to share software components; and some level of confidence that
"DiVA-compliant" components in fact do interoperate. For the sake of
discussion, let's call this type of collaboration "interoperability."
For the other forms of "collaboration," care must be taken to define what
they are, whether they are useful, etc. If you're talking about multiple
persons seeing the same interactive renderer output, and each person being
able to do some interactive transformation, let's call that form of
collaboration "MI" (multiperson-interactive).
I recall hearing some discussion about the relationship between the AG
and DiVA. From my perspective, the AG ought to provide support to allow
any application to run in a "MI mode" With this perspective,
there isn't really much to talk about in terms of fundamental DiVA
design wrt "MI."
Is collaboration a feature that exists at an application level or are there key requirements for collaborative applications that necessitate component-level support?
Š Should collaborative infrastructure be incorporated as a core feature of very component?
JohnC: Does it need to be incorporated in all components? What kind of collab
support is needed? Permitting session logging and geographically
separated, simultaneous, users would go a long way to providing for
collab needs and would seem to only impact the GUI and perhaps renderer.
Jim: Collaboration should exist *above* the application level, either outside
the specific framework or as part of the framework "bridging" technology.
JohnS: No. I hope that support for collaborative applications can be provided via supplemental components.
Wes: I don't know what "collaborative infrastructure" means. Given that my position
(above), "MI" is more of a framework thing, and not a component thing.
This seems to be the most realistic approach to "MI."
Note: I’m not sure how to interpret this answer. Is this a “framework issue” or a “component issue” or is it totally outside of the application? So we retrofit applications to be “collaborative” from the outside rather than designing apps or the frameworks that implement them to support collaboration “requirements” as a fundamental feature of the technology?
Š Can any conceivable collaborative requirement be satisfied using a separate set of modules that specifically manage distribution of events and data in collaborative applications?
Jim: I dunno, I doubt it.
JohnS: That is what I hope.
Š How is the collaborative application presented? Does the application only need to be collaborative sometimes?
Jim: Yes, collaboration should be flexible and on demand as needed - like
dialing out on the speakerphone while in the middle of a meeting...
JohnS: This is probably true. You probably want to be able to have tools that were effectively standalone that can join into a collaborative space on demand.
Š Where does performance come in to play? Does the visualization system or underlying libraries need to be performance-aware?
(i.e. I'm doing a given task and I need a framerate of X for it to be useful using my current compute resources), network aware (i.e. the system is starving for data and must respond by adding an alternate stream or redeploying the pipeline). Are these considerations implemented at the component level, framework level, or are they entirely out-of-scope for our consideration?
Jim: There likely will need to be "hooks" to specify performance requirements,
like "quality of service". This should perhaps be incorporated as part
of the individual component APIs, or at least metered by the frameworks...
It would be wise to specify the frame rate requirement, perhaps interactively
depending on the venue... e.g. in interactive collaboration scenarios you'd
rather drop some frames consistently than stall completely or in bursts...
This sounds like futureware to me - an intelligent network protocol layer...
beyond our scope for sure!
These issues should be dealt with mostly at the framework level, if at all.
I think they're mostly out-of-scope for the first incarnation...
JohnS: Yes. The whole collaboration experience will fall apart if you cannot impose some constraints on quality of service or react appropriately to service limitations. Its a big problem, but I hope the solution does not need to be a fundamental feature of the baseline component design.
Wes: The MI-aware framework collects and uses performance data generated by
components to make decisions about how to tune/optimize visualization
pipeline performance (the pipeline consists of a bunch of components).
If some of the other issues I've raised are addressed (e.g., time-limited
execution, partial processing, incremental processing, etc), then the
performance issues raised within the context of MI come "for free".
Note: The issue here is that if anyone thinks this should be done at anything other than the framework (or even outside of framework) level, then it could be very disruptive to our design process if we develop first for single-user operation and then later attempt to make “collaborative services” a requirement. Implementing this at the “framework level” is again a high-price-for-admission. If there is any way to support this at a component level, it would enable people working on collaborative extensions to share better with people who have different aims for their framework. I don’t consider it a benefit to have one “framework” per use-case as has been the practice in many aspects of CCA. It will just continue the balkanization of our development efforts.