Sophia:Network: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
 
(11 intermediate revisions by 5 users not shown)
Line 2: Line 2:
{{Portal|Network}}
{{Portal|Network}}
{{Portal|User}}
{{Portal|User}}
{{Maintainer|Vidal WATAT}}


=Network Topology=
'''See also:''' [[Sophia:Hardware|Hardware description for Sophia]]
<gallery>
File:Topo_sophia.svg|[[Sophia_Network_Topology|Automated Network topology from API]]
</gallery>


= IP networks in use =
= Overview of Ethernet network topology =


You have to use a public network range to run an experiment between several Grid5000 sites.  
[[File:SophiaNetwork.svg|600px]]


=== Public Networks ===
{{:Sophia:GeneratedNetwork}}


* computing : '''172.16.128.0/20'''
* virtual : '''10.164.0.0/14'''


=== Local Networks ===
= Old information (possibly outdated) =
 
* admin or ipmi : '''172.17.128.0.24'''
 
= Network =


[[Image:Sophia_modified.jpg|center|thumb|Physical view of the compute network|600px]]
[[Image:Sophia_modified.jpg|center|thumb|Physical view of the compute network|600px]]
Line 28: Line 18:
== Gigabit Ethernet ==
== Gigabit Ethernet ==


=== Sol Cluster (Sun X2200 M2) ===
All the nodes are connected (in a non blocking way) to the main Foundry FastIron Super X switch.
47 nodes (from sol-1 to sol-47) have a second Ethernet interface connected to 2 Cisco 3750  stacked switches  but they are not activated.
* IP addressing for eth1: sol-{1-47}-eth1, using the 172.16.128.0/24 network ( 172.16.128.154 can be used as a gateway)
The Cisco are interconnected to the main FastIron router with a single 1Gbps link.
{{Warning|text=The second network is no longer available on sol, as one of the two Cisco switch is dead }}


=== Suno Cluster (Dell R410) ===
=== Suno Cluster (Dell R410) ===
Line 44: Line 23:
All the nodes are connected (in a non blocking way) to the main Foundry FastIron Super X switch.
All the nodes are connected (in a non blocking way) to the main Foundry FastIron Super X switch.


=== UVA and UVB Clusters ===
=== UVB Cluster ===


All the nodes are connected to two stacked Dell PowerConnect 6248 switches. The Powerconnect stack is connected to the  Foundry FastIron Super X through a 10Gbps fiber.
All the nodes are connected to two stacked Dell PowerConnect 6248 switches. The Powerconnect stack is connected to the  Foundry FastIron Super X through a 10Gbps link.


=== Topology ===
=== Topology ===
Line 54: Line 33:
== High Performance networks ==
== High Performance networks ==


=== Myri 10G ===
=== Infiniband 40G on uvb ===
A subset of the sol cluster (sol-[29-50]) is connected to a 10G Myrinet switch.
uvb cluster nodes are all connected to 40G infiniband switches. Since these two clusters are shared with the Nef procution cluster at INRIA Sophia, we are using Infiniband partitions to isolate the nodes from nef when they are available on grid5000. The partition dedicated to grid5000 is '''0x8100'''. The ipoib interfaces on nodes are therefore named ib0.8100 instead of ib0.  
 
=== Infiniband 40G on uva and uvb ===
uva and uvb cluster nodes are all connected to 40G infiniband switches. Since these two clusters are shared with the Nef procution cluster at INRIA Sophia, we are using Infiniband partitions to isolate the nodes from nef when they are available on grid5000. The partition dedicated to grid5000 is '''0x8100'''. The ipoib interfaces on nodes are therefore named ib0.8100 instead of ib0.  
To use the native openib driver of '''openmpi''', you must set: <code>btl_openib_pkey = 0x8100 </code>
To use the native openib driver of '''openmpi''', you must set: <code>btl_openib_pkey = 0x8100 </code>


==== Nodes ====
==== Nodes ====


*<code class="host">uva-1</code> to <code class="host">uva-13</code> and <code class="host">uvb-1</code> to <code class="host">uvb-44</code> have one QDR Infiniband card.
*<code class="host">uvb-1</code> to <code class="host">uvb-44</code> have one QDR Infiniband card.
* Card Model : Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s].
* Card Model : Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s].
* Driver : <code class="dir">mlx4_ib</code>
* Driver : <code class="dir">mlx4_ib</code>
* OAR property : ib_rate=40
* OAR property : ib_rate=40
* IP over IB addressing :  
* IP over IB addressing :  
** <code class="host">uva-[1..13]-ib0</code>.sophia.grid5000.fr ( 172.18.131.[1..13] )
** <code class="host">uvb-[1..44]-ib0</code>.sophia.grid5000.fr ( 172.18.132.[1..44] )
** <code class="host">uvb-[1..44]-ib0</code>.sophia.grid5000.fr ( 172.18.132.[1..44] )


Line 74: Line 49:


* three Mellanox IS50xx QDR Infiniband Switchs
* three Mellanox IS50xx QDR Infiniband Switchs
* Topology available here : https://wiki.inria.fr/ClustersSophia/Network (uva and uvb are nef084-nef140 on the nef production cluster)
* Topology available here : https://wiki.inria.fr/ClustersSophia/Network (uvb are nef084-nef140 on the nef production cluster)


==== Interconnection ====
==== Interconnection ====

Latest revision as of 13:49, 12 March 2024

See also: Hardware description for Sophia

Overview of Ethernet network topology

SophiaNetwork.svg


Network devices models

  • gw: Foundry
  • swx1nef: Juniper EX2500-24F-FB
  • swy1nef: Dell PowerConnect 6248

More details (including address ranges) are available from the Grid5000:Network page.


Old information (possibly outdated)

Physical view of the compute network

Gigabit Ethernet

Suno Cluster (Dell R410)

All the nodes are connected (in a non blocking way) to the main Foundry FastIron Super X switch.

UVB Cluster

All the nodes are connected to two stacked Dell PowerConnect 6248 switches. The Powerconnect stack is connected to the Foundry FastIron Super X through a 10Gbps link.

Topology

The main switch is a Foundry FastIron Super X. It has 2 dual 10GB modules, 4 modules with 24 gigabit ports, and 12 gigabit ports on the management module (so 60 gigabit ports are available). 4 slots are currently free.

High Performance networks

Infiniband 40G on uvb

uvb cluster nodes are all connected to 40G infiniband switches. Since these two clusters are shared with the Nef procution cluster at INRIA Sophia, we are using Infiniband partitions to isolate the nodes from nef when they are available on grid5000. The partition dedicated to grid5000 is 0x8100. The ipoib interfaces on nodes are therefore named ib0.8100 instead of ib0. To use the native openib driver of openmpi, you must set: btl_openib_pkey = 0x8100

Nodes

  • uvb-1 to uvb-44 have one QDR Infiniband card.
  • Card Model : Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s].
  • Driver : mlx4_ib
  • OAR property : ib_rate=40
  • IP over IB addressing :
    • uvb-[1..44]-ib0.sophia.grid5000.fr ( 172.18.132.[1..44] )

Switch

Interconnection

Infiniband network is physically isolated from Ethernet networks. Therefore, Ethernet network emulated over Infiniband is isolated as well. There isn't any interconnexion, neither at the data link layer nor at the network layer.

Architecture

view of the Baie