/* * Copyright © 2013-2014 University of Wisconsin-La Crosse. * All rights reserved. * Copyright © 2014 Cisco Systems, Inc. All rights reserved. * * Copyright © 2014-2018 Inria. All rights reserved. * See COPYING in top-level directory. * * $HEADER$ */ /*! \page netloc_intro Network Locality (netloc) \htmlonly
\endhtmlonly Portable abstraction of network topologies for high-performance computing. The netloc documentation spans of these sections: \htmlonly
\endhtmlonly \section netloc_summary Netloc Summary The Portable Network Locality (netloc) software package provides network topology discovery tools, and an abstract representation of those networks topologies for a range of network types and configurations. It is provided as a companion to the Portable Hardware Locality (hwloc) package. These two software packages work together to provide a comprehensive view of the HPC system topology, spanning from the processor cores in one server to the cores in another - including the complex network(s) in between. Towards this end, netloc is divided into two sets of components. The first tools are for the admin to extract the information about the topology of the machines with topology discovery tools for each network type and discovery technique (called readers). The second set of tools is for the user to exploit the collected information: to display the topology or create a topology-aware mapping of the processes of an application. \image html netloc_design.png \image latex netloc_design.png "" width=9cm \htmlonly
\endhtmlonly \subsection supportednetworks Supported Networks For now, only InfiniBand (See \ref netloc_setup) is supported, but it is planned to be extended it very soon. \htmlonly
\endhtmlonly \section netloc_installation Netloc Installation The generic installation procedure for both hwloc and netloc is described in \ref common_installation. Note that netloc is currently not supported on as many platforms as the original hwloc project. netloc is enabled by default when supported, or can be disabled by passing \--disable-netloc to the configure command-line. \htmlonly
\endhtmlonly \section netloc_setup Setup To use Netloc tools, we need two steps. The first step consists in getting information about network directly from tools distributed by manufacturers. For Infiniband, for instance, this operation needs privileges to access to the network device. For this step we have wrappers in Netloc that will call the right tools with the right options. The second step will transform the raw files generated by manufacturer tools, into files in a format readable by Netloc tools, and that will not depend on network technologies. To be clear, let's take an example with Infiniband. This first step is handled by \c netloc_ib_gather_raw that will call \c ibnetdiscover and \c ibroutes tools to generate the necessary raw data files. The step has to be run by an administrator, since the Infiniband tools need to access to the network device. \verbatim shell$ netloc_ib_gather_raw --help Usage: netloc_ib_gather_raw [options] Dumps topology information to /ib-raw/ Subnets are guessed from the /hwloc/ directory where the hwloc XML exports of some nodes are stored. Options: --sudo Pass sudo to internal ibnetdiscover and ibroute invocations. Useful when the entire script cannot run as root. --hwloc-dir Use instead of /hwloc/ for hwloc XML exports. --force-subnet [:]: to force the discovery Do not guess subnets from hwloc XML exports. Force discovery on local board port and optionally force the subnet id instead of reading it from the first GID. Examples: --force-subnet mlx4_0:1 --force-subnet fe80:0000:0000:0000:mlx4_0:1 --ibnetdiscover /path/to/ibnetdiscover --ibroute /path/to/ibroute Specify exact location of programs. Default is /usr/bin/ --sleep Sleep for seconds between invocations of programs probing the network --ignore-errors Ignore errors from ibnetdiscover and ibroute, assume their outputs are ok --force -f Always rediscover to overwrite existing files without asking --verbose -v Add verbose messages --dry-run Do not actually run programs or modify anything --help -h Show this help shell$ ./netloc_ib_gather_raw /home/netloc/data WARNING: Not running as root. Using /home/netloc/data/hwloc as hwloc lstopo XML directory. Exporting local node hwloc XML... Running lstopo-no-graphics... Found 1 subnets in hwloc directory: Subnet fe80:0000:0000:0000 is locally accessible from board qib0 port 1. Looking at fe80:0000:0000:0000 (through local board qib0 port 1)... Running ibnetdiscover... Getting routes... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L112' LID 18... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L108' LID 20... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L102' LID 23... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L104' LID 25... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L106' LID 24... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L114' LID 22... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L116' LID 21... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L109' LID 12... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L111' LID 11... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L107' LID 13... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L103' LID 17... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L105' LID 16... Running ibroute for switch 'QLogic 12800-180 GUID=0x00066a00e8001310 L113' LID 15... \endverbatim The second step, that can be done by a regular user, is done by the tool \c netloc_ib_extract_dats. \verbatim shell$ netloc_ib_extract_dats --help Usage: netloc_ib_extract_dats [--hwloc-dir ] hwloc-dir can be an absolute path or a relative path from output path shell$ netloc_ib_extract_dats /home/netloc/data/ib-raw /home/netloc/data/netloc \ --hwloc-dir ../hwloc Read subnet: fe80:0000:0000:0000 2 partitions found 'node' 'admin' \endverbatim \htmlonly
\endhtmlonly \section netloc_draw Topology display Netloc provides a tool, \c netloc_draw.html, that displays a topology in a web browser, by using a JSON file. \subsection netloc_draw_setup Generate the JSON file In order to display a topology, Netloc needs to generate a JSON file corresponding to a topology. For this operation, the user must run \c netloc_draw_to_json. \verbatim shell$ netloc_draw_to_json --help Usage: netloc_draw_to_json shell$ netloc_draw_to_json /home/netloc/data/netloc \endverbatim The \c netloc_draw_to_json command will write a JSON file for each topology file found in the input directory. The output files, written also in the input directory, can be open by \c netloc_draw.html in a web browser. \subsection netloc_draw_tool Using netloc_draw Once the JSON file is opened, the rendering is generated by the Javascript vis library for computing the position of the nodes. From the interface, it is possible to search for a specific node, to color the nodes, to expand merged switches, to show statistics, to export as an image... The user can interact with the nodes by moving them. For now, there are bugs and other nodes might move too. The placement of the nodes is done statically if the topology is detected as a tree. If not, vis.js will use physics to find good positions, and it can be very time consuming. \image html netloc_draw.png \image latex netloc_draw.png "" width=15cm \page netloc_scotch Netloc with Scotch \htmlonly
\endhtmlonly Scotch is a toolbox for graph partitioning [XXX], that can do mapping between a communication graph and an architecture. Netloc interfaces with Scotch, by getting the topology of the machine and building the Scotch architecture. It is also possible to directly build a mapping file that can be given to \c mpirun. \htmlonly
\endhtmlonly \section scotch_intro Introduction Scotch is able to deal architectures to represent the topology of a complete machine. Scotch handles several types of topologies: complete graphs, hypercubes, fat trees, meshes, torus, and random graphs. Moreover, Scotch is able to manage parts of architectures that are called sub-architectures. Thus, from a complete architecture, we can create a sub-architecture that will represent the available resources of the complete machine. \htmlonly
\endhtmlonly \section scotch_setup Setup The first step in order to use Netloc tools is to discover the network. For this task, we provide tools called netloc_gather that are wrappers to the dedicated tools provided by the manufacturer of the network, that generate the raw data given by the devices. This task needs privileges to access to the network devices. Once, this task is completed, the raw data is converted in a generic format independent to the fabric by extract_dats. Figure 1 shows how the different modules of Netloc are linked, and what are the tools provided by Netloc. \htmlonly
\endhtmlonly \section scotch_tools_api Tools and API When the machine is discovered and all the needed files are generated as seen previously, a user can call the netlocscotch functions from the API and interact with Scotch. \subsection netlocscotch_arch Build Scotch architectures Netloc provides a function to export the built topology into the Scotch format. That will give the possibility to the user to play with the topology in Scotch. Since Netloc matches the discovered topology with known topologies, the Scotch architecture won’t be random graphs but known topologies also in Scotch that will lead to optimized graph algorithms. This function is called netlocscotch_build_arch. When the network topology is a tree, the topology converted by netlocscotch is the complete topology of the machine containing intranode topologies from hwloc. In this case, merging the two levels results in a bigger tree. For other network topologies, the global graph created for Scotch is a generic graph since it not not (at this moment) possible to create nested known architectures. \subsection netlocscotch_subarch Build Scotch sub-architectures Most of the time, the user does not have access to the complete machine. He uses a resource manager to run his application and he will gain access only to a set of nodes. In this case getting the Scotch architecture of the complete machine is not relevant. Fortunately, Netloc is also able to build a Scotch sub-architecture that will contain only the available nodes. For this operation the user needs to run a specific program, netloc_get_resources, that will record in a file, the lists of available nodes and available cores by using MPI and hwloc. From this file, the function netlocscotch_build_subarch will build the Scotch sub-architecture. \subsection netlocscotch_mapping Mapping of processes A main goal in having all these data about the network topology, especially in Scotch structures, is to help the process placement. For that, we use the mapping of a process graph to the architecture provided by Scotch. As we have seen previously, Netloc is able to detect the structure of the topology and will build the adapted Scotch architecture that will be more efficient than a random structure. In case, the network topology is not a tree, netlocscotch converts the complete topology into a generic graph. The drawback in that is the Scotch graph algorithms are less efficient. To overcome that, netlocscotch does two steps of mapping: first it maps the processes to the nodes, and then for each node maps the processes to the cores. We have to conduct tests to check if the method gives better results than using a generic graph directly. The other input needed in Scotch is the process graph. Since we want to optimize the placement to decrease the communication time, a good metric for building the application graph is the amount of communications between all pairs of processes. Studies still have to be done to choose, in the most efficient way, what we take into account to define the amount of communications between the number of messages, the size of messages... This information will be transformed into a process graph. Once we have a good mapping computed by Scotch, we can give it to the user, or Netloc can even generate the corresponding rank file useful to MPI. */