A number of research fields must deal with datasets containing a large number of parameters. For instance, state-of-the-art combustion research codes track the concentrations and reaction dynamics of 50 or more species of molecules. Such high-dimensional datasets expose an enormous parameter space for researchers to explore. Furthermore, the computational domain for these simulations are on the order of 400^3 to 2000^3 grid points -- far larger than the memory capacity of most workstations. Both size of the datasets and the large number of parameters for each grid point in the dataset make it impractical to apply traditional interactive data analysis and visualization methods. Consequently, data analysis techniques developed for such datasets focus on rapidly selecting relevant subsets of data based on a "search criteria".
Query-based data analysis methods allow a scientist to define a search criteria as a boolean expression. The search only returns the subset of data that matches the search criteria. For instance, a scientist could query a dataset for all nodes that satisfy the query "(Pressure < 1e6) AND (1 < Temperature < 100.0)"
. Query Based analysis methods allow the scientist to focus their search only on the data that is relevant to their inquiry.
Given the size of the datasets, fast query techniques must be amenable to out-of-core execution in order to provide the basis for practical interactive data analysis applications. Finding the subset of data that matches the relevant search criteria using naive methods would require the search expression be applied to all cells of the dataset. Either the entire dataset must be reside in memory or the dataset must be re-read from disk each time a search is performed -- neither case is reasonable for interactive data exploration.
The FastBit software [1,2], developed by the SDM Center at Lawrence Berkeley National Laboratory, employs bitmap indexing technology to accelerate complex queries on high-dimensional spatio-temporal data. FastBit employs a specialized bitmap index compression technique that achieves search speeds faster by a factor of 10 over the best known high-dimensional indexing methods. The bitmap indexing allows extremely fast searching on datasets that otherwise would not be able to fit into memory.
In addition to supporting accelerated searches on large multidimensional datasets, the FastBit software provides 3D region-growing algorithms that are able to find and uniquely label connected regions of data that matches the search criteria. Region labeling is very important for identifying distinct features in each timestep and ultimately for supporting feature tracking for time-varying datasets. Feature tracking is of enormous importance for inferring cause-effect relationships that span multiple timesteps in time-varying data.
The DEX tool combines the FastBit query engine with 3D visualization methods. By combining Fastbit with 3D visualization capabilities, it is now possible to perform interactive feature-based analysis and region finding for high-dimensional queries. By displaying the resulting regions of interest, application scientists can quickly identify characteristic features of their data.
DEX uses FLTK for the graphical user interface, VTK for the visualization processing, and OpenGL for hardware accelerated 3D graphics. The DEX GUI to helps users create syntactically correct expressions for the underlying FastBit infrastructure. The FastBit software takes only fractions of a second to perform selections on very large (400^3) datasets. FastBit then walks through the selected cells to find and label "connected regions." The user can then interactively visuze the results of the selection in 3D. Each connected region is identified by assigning each one a distinct color. (image of a fastbit selection).
Combustion: Combustion research involves tracking numerous species of molecules through complex chemical reaction networks. Tracking the flame front helps researchers better understand the properties required for efficient combustion. However, the definition of the flame front is ambiguous in practice -- it is identified by a complex set of criteria. DEX is able to use the FastBit infrastructure to rapidly select and visualize the subset of the data that defines the flame front. The region growing and color labels enable the researcher to visually identify and track distinct flame fronts through multiple timesteps of the dataset.
Astrophysics: Like the combustion research datasets, astrophysics data can often involve many fields for each gridpoint. For example, simulations of stellar phenomena like supernovae require the tracking of mass fractions for many different chemical species, radiation emission and absorption profiles for radiation transport, baryonic densities, and all of the typical fluid dynamics properties (eg. pressure, temperature, flow vectors). The DEX tool allows researchers to make sense of the complex relationships between different fields in the data using query-based exploration methods. The 3D viewing interface provides all of the advantages of typical interactive visualization tool approaches, but the FastBit query mechanism ensures that even large datasets can be explored at interactive rates via the accelerated searches.
Adaptive Mesh Refinement (AMR) Dataset: Many research codes involve hierarchical block-structured adaptive mesh refinement methods. Examples include combustion researchers like CCSE (http://hpcrd.lbl.gov/html/CCSE.html) and broader framework building efforts like SciDAC APDEC (http://davis.lbl.gov/APDEC/). The cell-based representation employed by DEX for the 3D visualization is amenable to the AMR data, but considerable work is still required to make sense of AMR file formats and to encode bitmap indices for hierarchical multiresolution data where refined regions overlap valid data points in the coarser levels of the hierarchy.
Multi-resolution/Level-of-Detail Support: Many of the selections return results that contain details that are fall too small to be viewed on a typical desktop display. Indeed, a poorly posed query can easily end up selecting the entire dataset -- thereby exhausting the memory capacity of the workstation. It is therefore desireable to support a multi-resolution encoding of the bitmap indices that can return selections at varying levels of resolution depending on the viewing resolution and on workstation capabilities. As the DEX viewer is used to zoom in on a particular subset of the data, the fastbit selection mechanism can be modified to support selection criteria that include both spatial (Region of Interest) information and resolution (Level of Detail) information. The displayed data will be progressively refined as the user zooms in on a particular region of interest.
Applying Visualization Algorithms to Selected Data: Currently the DEX tool simply draws the cells that were selected by the query and colorizes them according to their region label (cells in the same region are drawn the same color). It is desirable to apply some traditional visualization algorithms like slicing, isosurfaces, and particle advectors to the selected data. The next iteration of the DEX tool will begin to expose additional visualization algorithms that can be applied to the subsetted data.