Data-Loading Functions

void LoadData(const int ProcessRank, const int JobSize, std::vector<std::vector<Star>> &Data, int &TotalStars, const std::string dataSource, int batches)

If loaded by the RootID process, calls the File-Core Allocation protocol (an external Python file), and uses MPI to block all other workers from continuing until the script completes. All workers then GetAssignments() and CalculateBatches(), before using that information to load the assigned stars into the required structures .

Parameters
  • ProcessRank – The MPI ID of the calling process

  • JobSize – The number of MPI workers available (used to determine how to distribute files to each worker)

  • Data – A reference to the container into which the data will be inserter

  • TotalStars – A reference to a counter which sums the total number of stars loaded into the worker, and hence into the parallel system

  • dataSource – The location of the directory in which the stellar data is stored

  • batches – The number of minibatches the optimiser uses, and hence a determinant of the structure of the loaded data

Returns

No explicit returns, but the Data object becomes populated with Star objects

std::vector<File> GetAssignments(int id, std::string dataSource)

Compares the list of files assigned to the running process by the File-Core Allocation protocol and those within the Stellar Directory . If there is a match (which there really should be), it returns the assigned file objects as a list

Parameters
  • id – The MPI id of the running process, a number between 0 and NProcesses - 1

  • dataSource – The directory in which the stellar data is stored

Returns

A list of the File objects assigned to this process

void CalculateBatches(std::vector<File> &Files, int batches)

Generates a separation scheme to split the Nstars in each file approximately equally across the minibatches - where there is a rounding error or overflow, the final batch is used to store the excess.

Parameters
  • Files – the output of GetAssignments()

  • batches – the maximum number of minibatches