Reading Data

Gsharp can read data in several ways:

  • CSV - comma delimited files output from Microsoft Excel
  • Report files - columns of data in ASCII format
  • ASCII files - any other form of ASCII file
  • Binary files - created by a C or FORTRAN program
  • Folder files - archive format used by Gsharp

There are also four other methods which we don't cover here:

  • ODBC - Using SQL commands
  • Using GSL commands such as fopen(), fread() and fclose()
  • Using the AVS Field format and import_field
  • Using your own C and FORTRAN readers linked to Gsharp through UserFunc.c

To read data into Gsharp

  • Click on the Open icon on the toolbar (or choose Open ... from the File menu)
  • Set the Files of type field to the appropriate type
  • Select the file
  • Click on Open
CSV files From within Excel
  • choose Save as ... from the file menu.
  • Set the Save as Type to be CSV (Comma delimited).
  • Save the file.

From within Gsharp:

  • Click on the open icon
  • Set the Files of type field to be CSV (*.csv)
  • Select the file.
  • If you wish to specify the number of header lines, whether the file is rows, columns or a grid and whether it should be read as floats or strings then click on the Options button.
  • Click on OK.
Report files The report reader will examine your ASCII file looking for columns of data. Once it has found three consecutive lines of matching column format (e.g. float, float, string) then it will read all the lines in the file that match this format and reject any that don't. It is possible to store the rejected lines into a titles dataset.

The report reader will examine the lines before the first data line to find strings that it can use as names for the datasets it will create. If it cannot find an appropriate line then it will use T1, T2, T3, ... for any float columns and A1, A2, ... for any string columns. Strings must be delimited by two spaces. Floats can be separated by a single space.

It is also possible to replace specified strings in the file before it is read using a conversion filter. The filter has the format "/find/replace/". You can include multiple filters e.g. "/find/replace/ /find/replace/". Examples of conversion filters include:

  • To convert (34) into -34 use  "/(/-/ /)//"
  • To convert "1,2,3,4" into "1 2 3 4" use "/,/ /"

The conversion filter and the name of the titles dataset can be set by clicking on the Options button.

The report reader is useful in that it can handle different numbers of columns but if you know the exact number of columns then the ASCII reader is quicker and uses less memory.

ASCII files The ASCII reader is more flexible that than the report reader, but you need to instruct the reader on how your data is organized. There are a number of options:
  • You must specify either the number of datasets or the names of the datasets, e.g. "6" or "x1 x2 x3 y1 y2 y3"
  • You must specify whether your data is floats or strings.
  • If you are reading a single dataset then you can use the rows, columns and planes to inform Gsharp of the dimension of the data. You could also use the GSL command reshape once the data has been read. N.B. The layout resource has no affect on how single datasets are read.
  • If you are reading multiple datasets then you must inform Gsharp whether the datasets are arranged in columns (vertical) or in rows (horizontal).
  • You can also tell the reader only to look for data within a certain window in the file using the options start row, end row, start column and end column. N.B. In this context a column is a character column not a column of data.
  • You can use a conversion string as described in the report reader section

N.B. The ASCII reader completely ignores line feeds. It works through the file looking for numbers wherever they are and then stores the values it finds into datasets as instructed by you.

It is also possible to read the whole file as an array of strings and then use GSL commands such as mask, strvalue and slicex to create your datasets.

There is no reason why you can't making multiple calls to import_ascii to read data from the same file.

Some examples:

Options File Results
Datasets = "x y", Layout="vertical"
All text is ignored
1  2
3
4
x y
1
3
2
4
Datasets = "A B", Layout="horizontal"
Data for London
1    2    8    
Data for Birmingham
4   3   2  
A B
1
2
8
4
3
2
Datasets = "lines", Format="string"
Data for London
1    2    8
Data for York
4   3   2

lines

Data for London
1    2    8
Data for York
4   3   2
Datasets = "grid", rows = 2, columns = 3
1  2
3  4
5  6

grid

1   3   5
2   4   6

Note that in the last example the rows in the file have become columns in Gsharp and vice versa. You can swap this around in Gsharp using the command:
grid = transpose(grid);
Binary files The binary reader is used for reading binary files created by C or FORTRAN programs. Click on the Options button to specify:
  • The format of each binary number: float, byte, int or short.
  • The name of the dataset to create.
  • The dimension of the dataset to create (rows, columns and planes).

If you have problems reading your binary file:

  • Experiment with the different number formats until you are getting meaningful numbers.
  • Read the whole file without specifying any dimension.
  • Use slicex to remove any unwanted numbers.
  • If necessary, use reshape to convert your numbers into a grid or block.
Folder files The folder format is specific to Gsharp. Folders can be used to store all of Gsharp's datasets into a single binary file. The folder file records the names and dimensions of each dataset.

Folders are most useful for archiving your data, so that you can come back and use it again in a later session.

Return to Gsharp User Guide.