The provider must develop a program that can import a proprietary binary file (.DTA) and export it into Matlab (.m, .mat, etc.) format. Sample .DTA and .pdf documentation files on the binary file structure are provided.
## Deliverables
Details:
I have very large .DTA files (3GB+) that are produced by PAC's AEWIN software in a 32Bit Windows XP environment. A single AEWIN project may have more than 100 of these 3GB+ files created in one session.
In order to analyze the data inside these .DTA files, our research team has decided to utilize Matlab R2010a x64 running in two different hardware environments:
1. CentOS 5.4 x64 on a 64 core, 192GB RAM cluster
2. Windows 7 Enterprise x64 on 4 core, 4GB workstations
We need to be able to utilize the parallel computing capabilities of the Linux cluster; our Matlab licenses have all the necessary toolboxes, including the Parallel Computing and Database toolboxes.
Large jobs will be imported and analyzed using the Linux cluster's MATLAB distributed computing server, but we also need to be able to import individual files on our Windows workstations using local MATLAB workers.
The data is made up of rows and columns with headers for each column; the headers repeat at fixed intervals. These headers should be easily identifiable inside of the Matlab import as variable names, and the data structure should remain mostly the same (redundant headers should be skipped).
You may use external scripts (C++, Python, etc.) as long as the file is imported successfully and as long as it takes advantage of the cluster cores. Any external scripts, if necessary, may be present on the Linux machine as well.
The Parallel Computing toolbox in Matlab is ran from one of our Windows workstations and it connects to the MATLAB Distributed Computing Server that resides in the cluster. The code should work with or without the presence of the cluster server, as we should be able to import a .DTA file on a workstation that is not connected to the cluster.
If you have believe there is a more elegant solution for this project, or if you have any comments, ideas, or corrections please contact me.
Expected program behavior:
After selecting the parallel computing configuration (local or server) on Matlab, the user should be able to type a Matlab command that will initiate the .DTA file import process. The file should import accurately and successfully, and all headers should be reflected as variable names on Matlab. If a previous .DTA file has been imported on the same MATLAB session, additional data should be appended upon a new import.
Deliverables:
1-Set of Matlab files and/or scripts
2-Source for any compiled code
3-Documentation on program use
Milestones:
Final deliverable - January 7th, 2010
(If you commit to an earlier final deliverable date, your proposal will be taken extra consideration)