Data-Loading Functions¶
-
void LoadData(const int ProcessRank, const int JobSize, std::vector<std::vector<Star>> &Data, int &TotalStars, const std::string dataSource, int batches)¶
If loaded by the RootID process, calls the File-Core Allocation protocol (an external Python file), and uses MPI to block all other workers from continuing until the script completes. All workers then GetAssignments() and CalculateBatches(), before using that information to load the assigned stars into the required structures .
- Parameters
ProcessRank – The MPI ID of the calling process
JobSize – The number of MPI workers available (used to determine how to distribute files to each worker)
Data – A reference to the container into which the data will be inserter
TotalStars – A reference to a counter which sums the total number of stars loaded into the worker, and hence into the parallel system
dataSource – The location of the directory in which the stellar data is stored
batches – The number of minibatches the optimiser uses, and hence a determinant of the structure of the loaded data
- Returns
No explicit returns, but the
Dataobject becomes populated with Star objects
-
std::vector<File> GetAssignments(int id, std::string dataSource)¶
Compares the list of files assigned to the running process by the File-Core Allocation protocol and those within the Stellar Directory . If there is a match (which there really should be), it returns the assigned file objects as a list
- Parameters
id – The MPI id of the running process, a number between 0 and NProcesses - 1
dataSource – The directory in which the stellar data is stored
- Returns
A list of the File objects assigned to this process
-
void CalculateBatches(std::vector<File> &Files, int batches)¶
Generates a separation scheme to split the Nstars in each file approximately equally across the minibatches - where there is a rounding error or overflow, the final batch is used to store the excess.
- Parameters
Files – the output of GetAssignments()
batches – the maximum number of minibatches