CSC2039 - Architecture and networks

Architecture and Networks

Networks

Networks

Layers and the OSI Model

Data structures

Many third generation languages (Pascal, C, etc) allow the programmer to package these basic types together to user defined structures. This makes for better data management in the program. A struct is like a row in a realtional database table.

struct employee { int iTax; double dSalary; char[32] cName;};

The compiler allocates all of these variables back to back in memory. Data structures can be worked with as a whole unit; as a set of bytes, when they are being copied and moved around in memory.

A structure can be sent byte by byte (across a wire or WiFi) from one computer to another. Clearly there is an order from front to back, or top to bottom.

Data structures are fundamental to writing networking software.

Data structure definition

The programmer of the application creating the data decides if the application is going to be used by many people and have many programmers work on it.

If the data is sent over a network, the receiving program must have been programmed to know the format of the data structures that it will receive, otherwise there will be chaos

In the early days of the Internet the mechanism for this was called the Request for Comments (RFC). This name has remained to present days and they are managed by the Internet Engineering Task Force (IETF).

Data structures for networking

The internet protocol suite nests data structures. Each nesting is called a layer. Each layer is used to handle a unique and well defined part of the process of transmitting information from sender to receiver.

This means that in addition to a formatted structure at each layer, the operations for software to process that layer also have to be defined.

In the 70's, when the IP was invented, it was thought that five layers would be sufficient. In the 80's the ISO worked out that seven layers are required in order to describe a network completely.

The original five layers for the Internet Protocol

Screenshot-2018-1-26-CSC2039_Week_01_Lecture_Layers_and_the_OSI_model_PPTX-pptx.png

Abstract models in science and mathematics

In science and mathematics, we often seek to define an abstract model for a process. An abstract model from an implementation of it is distinguished. The model defines components, incorporates the functionalities of its components and the relationship between the components.

Different implementations may be compared to the abstract model and contrasted with each other.

The OSI seven layer abstract model

Protocol suites

The OSI model is a set of generic definitions and actions. If specific choices for the protocols at each layer and then implement software for these, a protocol suite is defined. This means that a protocol suite is an implementation of the model.

The Internet protocol is one such example. In its original form there are no protocols at layers five and six.

There are in fact different versions of the IP suite. IPv4 is in common global use. IPv6 is an improved version, but has not yet replaced IPv4 addressing everywhere.

Advantages of the OSI model

By separating the network communications into logical smaller pieces, the OSI model simplifies how network protocols are designed. The OSI model was designed to ensure that different types of equipment, such as switches/routers, NICs, etc, would all be compatible even if they were built by different manufacturers. This also opened up the market to competition.

Furthermore, adding new protocols and other network services is generally easier in a layered architecture than in a monolithic one.

Disadvantages of the OSI model

The choice of a layered model for the network is an engineering design choice. A disadvantage of a layered model is that there is latency in processing through all of the layers. In high performance markets where every microsecond is important can be an issue. Research efforts today now include cross-layer optimisation.

IP suite (formerly TCP/IP)

Layer # OSI layer TCP/IP layer TCP/IP protocols
7 Application Application HTTP FTP Telnet SMTP DNS
6 Presentation
5 Session
4 Transport Transport TCP UDP
3 Network Network IP
2 Data link Network interface Ethernet Token ring Other link-layer protocols
1 Physical

Where OSI layers are implemented

Layer 1 - Physical - hardware - NIC or WiFi card.
Layers 2-4 - Operating system - Ethernet, IP, TCP, UDP are in the kernel of the OS.
Layers 5-6 - Implemented in the application
Layer 7 - The application itself

Around 1974, the programming concept of the socket was created for the C language. The socket concept is now available in many languages. Application programmers use sockets to interact with layer four.

 

Networks

The network layer

Data structures for networking

The IP suite nests data structures. Each nesting is called a layer. Each layer is used to handle a unique and well defined part of the process of transmitting information from sender to receiver. This means that in addition to a formatted structure at each layer, operations for software to process that layer have to be defined.

In the 70's, when the IP was invented, it was thought that five layers could be distinguised. In the 80's, the ISO worked out that seven were required to describe a network completely.

Original five layers

Seven layer model

IPv4 address classes

Initially classes were introduced to assist routing. Today routing has moved to be classless - CIDR and VLSM. However, classes are still used in networking

Class Lower Upper Binary start
A 0.0.0.0 127.255.255.255 0
B 128.0.0.0 191.255.255.255 10
C 192.0.0.0 223.255.255.255 110
D 224.0.0.0 239.255.255.255 1110
E 240.0.0.0 247.255.255.255 1111

Unicast

Unicasting is when one computer talks to another one. EG, a web browser -> server. This is enabled with TCP or UDP at layer 4.

Multicast

Multicasting is when one computer sends the same message to several computers. EG, a stock exchange -> brokers. This is done with UDP multicast.

Broadcast

Broadcasting is a a mechanism of a copmuter sending a message to all computers on its own network.

IPv4 addresses

Private addresses

Certain addresses in the IPv4 protocol are reserved for private use. They should never be routed onto the public Internet.

Class Lower Upper
A 10.0.0.0 10.255.255.255
B 172.16.0.0 172.31.255.255
C 192.168.0.0 192.168.255.255

A private network may be built and then connected to the public Internet using a technique known as Network/Port Address Translation (NAT/PAT).

Loopback

127.0.0.1 is a special IPv4 address. This is the loopback within the computer. IP packets are sent from the Transmit (Tx) to the Receive (Rx) within the same computer. This allows for testing of the software layers but not the physical layers. This is a good way to test a new piece of client/server software without setting up or configuring a network.

Networks

ARP and DHCP

Address resolution

The hardware address of a NIC is fixed at manufacture time. Known as a MAC address. MAC address is used at layer 2 to communicate on the local network.

MAC addresses are managed by the IEEE. They are 48-bits long (6 bytes). They are partitioned into:

IP address

The IP address of a device is managed in software, and it can be changed. When a computer is connected to an Ethernet or Wifi network, it can be configured with a predefined, static address or request a dynamic IP address from a resource on the network called a DHCP server.

DHCP - Dynamic host configuration protocol

  1. Computer sends a UDP packet onto the network
  2. Since the IP address of the DHCP server is also unknown at this point, the IP source and destination might be set to 0.0.0.0 and 255.255.255.255 respectively.
  3. If a DHCP server is present on the LAN, then the following dialogue follows


Encapsulation of an ARP packet

ARP is the process of getting the MAC address assigned to a particular IP address.

  1. An ARP packet is created
  2. Encapsulated in an IP packet
  3. Encasulated in an Ethernet packet
  4. Sent on the LAN

ARP could be used for protocols other than IPv4 or IPv6.

Caching ARP

Network traffic can be increased substantially by the sending of requests and waiting for a response. To reduce network traffic, ARP software extracts and saves the information from a response so that it can be used for subsequent paackets.

The information is not stored permanently, ARP maintains a small table of bindings in memory as a cache. ARP searches the cache before using the network to get an address.

If there is a binding present in the cache, then the ARP uses the binding without transmitting a request. If there is no binding present, then the ARP will broadcast a request on the network for the address.

The ARP cache can be seen by running arp -a.

RARP - Reverse address resolution protocol

The reverse of ARP. If the hardware address is known, and the user needs to know the IP address, the RARP is used. It is used by diskless workstations to get assigned IP addresses. When a workstation boots up it broadcasts its network addresses, the RARP server detects the request, and compares the network address with the configuration file to obtain the IP address and reply with the address to the sender.

 

Networks

IP Routing

IP header

A router receives a packet on one interface and has to work out the other interface on which it needs to forward that packet. This is done by examining the destination IP address.

IP Version: Indicates IP version used by the packet. 4 indicates IPv4, 6 indicates IPv6.
IP Header Length: Indicates header length in 32-bit words. Typical IPv4 packets with a header length of 20 bytes have a value of 5, meaning five 32-bit words. IPv4 header is not a fixed length, when IP options are included a maximum length of up to 60 bytes is allowed.
Type of Service: Specifies how an upper-layer protocol would like packets to be queued and processed by network elements as they are forwards through network. Usually set to zero, but may be assigned a different value to indicate another level of importance.
Total Length: Specified the length, in bytes, of the entire IP packet, including data and IP header.
Identification: Contains an integer that identifies the current datagram. Used during reassembly of fragmented datagrams.
Flags: Consists of a 3-bit field, the two lower order bits of which control fragmentation. The high-order bit is not used and must be set to 0. The middle "don't fragment" bit specifies whether the packet is permitted to be fragmented. The low order "more fragments" bit specifies whether the packet is the last fragment in a series of fragmented packets.
Fragment Offset: Provides the position, in bytes, of the fragment's data relative to the start of the data in the original datagram, which allows the destination IP process to properly reconstruct the original datagram.

Time to Live (TTL): Specifies maximum number of links/hops that packet may be routed over. This counter is decremented by one by each router that processes the packet while forwarding it towards its destination. When TTL value reaches zero, the datagram is disgarded. This prevents packets from looping endlessly, as would otherwise ocur during accidental routing loops.
Protocol: Indicates which upper-level protocol receives incoming packets after IP processing is complete. Normally, this indicates the type of payload being carried by IP. For example, a value of one indicates IP is carrying an ICMP packet, 6 indicates a TCP segment and 17 indicates a UDP packet.
Header Checksum: A 1's compliment hash, inserted by the sender and updated by each router that modifies the packet while forwarding it towards its destination. Header checksum is used to detect errors that may be introduced into the packet as it traverses the network. Packets with an invalid checksum are discarded by any receiving node in the network.
Source Address: Specifies the unique IP address of the sending node (the originator of the IP packet)
Destination Address: Specifies the unique IP address of the receiving node.
IP Options: Allows IP to support various options, such as timestamp, record route, and strict source route. IP options are not normally used.

Routing table

The key to evaluating the interface on which the packet should be forwarded to is to find the longest match in the routing table for the destination address. If there are two or more equivalent matches then apply a tie breaker.

Tie breaking process

Distance and metric

Routers also exchange information with each other using routing protocols. Examplesa re OSPF and RIP. In addition to IP addresses they send information that qualifies the various links/sources of information. Administrative distance is a level of trustworthiness of the information - some router protocols are better than others. The lower the value, the more trusted it is.

Metric

Metric is a number calculated by a routing protocol using some algorithm - different protocols use different algorithms. In some sense it is a crude estimate of distance or latency.

NAT and PAT

When sending a packet from the private address space to the public address space, the source IP address would be overwritten. When sending a packet to the private address space from the public address space, the destination IP address would be overwritten.

The implementation of a software stack for routing is made more complex by NAT/PAT.

OSPF - Open shortest path first

Routers tell each other what they know about their links on the network. These are messages encapsulated by an IP packet. Every router knows all there is about the whole network. This is the definition of link-state.

The network can be represented as a mathematical graph. Each router computes the shortest path to any other point in the network using Dijkstra's algorithm and updates its own routing table.

Networks

Transmission Control Protocol

Relevant RFCs

The programmer interacts with TCP via sockets. There is a separate socket for each connection. Servers and clients operate different with regards to setting up a connection. The server is bound to an address and waits for a connection to be initiated. The client actively connects to the server.

The TCP protocol runs inside the operating system. The complexity is hidden from the programmer. Exceptions would be thrown if the connection drops.

connection.close();

TCP - Three key features

Maintains a reliable connection between hosts. Therefore it does a lot of work behind the scenes for the programmer. It manages efficient packaging and transmission of network traffic.

Virtual circuits

TCP connections are like live, two way connections between hosts. They give the illusion of a proper circuit between two hosts.

Reliable connections

TCP guarantees delivery of data.

Performance optimisation

Includes mechanisms to modify transmission variables depending on network conditions.

How it is achieved

TCP uses a Finite State Machine (FSM) at each end of the connection. This is implemented in software within the operating system. Remember that there is one FSM for every TCP connection (ie socket) within the OS - so heavy workload to maintain many connections.

TCP exchanges events between the two ends to drive state transitions. The events are bit flags which are set (or not) in the TCP header.

Finite state machines

A finite-state machine (FSM) or finite-state automaton (FSA), finite automaton, or simply a state machine, is a mathematical model of computation.

It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some external inputs; the change from one state to another is called a transition.

A FSM is defined by a list of its states, its initial state and the onditions for each transition. There is a FSM at each side of the connection.

TCP header

TCP flag bits

URG

Set to 1 if the urgent pointer is in use.

ACK

Indicates acknowledgement number is valid.

PSH

Data in the TCP segment should be sent to application as soon as possible. Data is pushed to the application.

RST

Connection must be reset.

SYN

Synchronisation is taking place

FIN

Finish, no more data needs to be sent.

Three-way handshake

Host A and Host B

  1. Send a TCP open request with SYN bit set and an initial sequence number of 167.
  2. Receive the open request
  3. Send back a frame with a sequence number of 498, and Acknowledgement number of 168.
  4. Receive the acknowledgement frame.
  5. Send back an acknowledgement number of 499.
  6. Receive the acknowledgement frame
  7. The handshake is complete.

TCP uses 3-way handshake to guarantee that connections are established or terminated correctly. Three messages are exchanged. SYN segment is used to describe messages which create a connection. FIN segment is used to describe messages which close a connection.

Closing

Flow control

The TCP flags allow signalling between either end to control the flow rate of data. Basically the number of unacknowledged bytes of data.

Window sizing

The windowing process essentially allows for either side to tell the other how much data it can receive at any instant. Prevents buffer over runs for example. This is achieved by the event flags and sequence numbers.

Networks

User Datagram Protocol

Defined in RFC 768 to be a minimal message-oreiented transport layer.

This protocol provides a procedure for application programs to send messages to other programs with a minimm of protocol mechanisms. The protocol is transaction orientated, and delivery and duplicate protection are not guaranteed. Applications requiring ordered reliable delivery of streams of data should use the transmission control protocol.

UDP is a simple protocol, there is no error detection, no error correction, no connection-oriented links, no verification of delivery order. It is a simple datagram delivery service. Without added features, it is easy to implement with minimal overhead. It can be used in low intensity tasks performed in the background, such as routine network monitoring.

Properties

Datagrams

Data is sent across the network layer in finite packets. Each datagram contains a header and a payload. Each datagram contains a single message. A message may be a request or a reply to a request. They provide minimal transport layer functions, and uses a minimal header structure.

Datagram header

The header contains the address and port where the packet is going (destination); address and port from where the packet was sent from (origin) and other information used for transmission

Well known port examples

Application Protocol Port Network Protocol
Echo 7 TCP/UDP
Daytime 13 TCP/UDP
SMTP 25 TCP
Time 37 TCP/UDP
HTTP 80 TCP
POP3 110 TCP

Ephemeral ports

The upper limit (depending on operating system configuration is 1024. It is necessary for server software to have a specific port to listen for incoming traffic on. It does not need to operate out of the same port at all times. Ephemeral ports allows for each client session to run on a unique port.

DatagramSocket class

DatagramSocket is required to send or recieve a DatagramPacket. All datagram sockets are bound to a local port which listens for incoming data and places a datagram header into outgoing data.

UDP multicast

A and B multicast channels

A single class D IP addresses and port defines one multicast channel.

To mitigate the unreliability of UDP, the data can be sent over two different channels. It is critical that these are routed differently through the network known as geographically diverse routing. These are known as the A and B channels in the industry.

Mitigation of UDP unreliability

Following from previously, the client software will recieve the same message twice (or maybe still not at all). The client software has to be programed to handle this. Every message has to have a unique number so that the client software can detect receiving the same message a second time; gaps in the sequence of messages; never receiving some messages.

Line arbitrage is the algorithm used to receive ONE and only ONE instance of a message - arbitrate between line A and B. There is a fall back TCP connection to the exchange to request any missed messages. Note that there is also continuous rebroadcasting on a separate set of UDP multicast channels.

The burden of creating reliable comms from layer four in the operating system has been shifted onto the application programmer (the linehandler). Appartently it is still faster to operate this way than to se the TCP FSMs in the ultra low latency environment.

Networks

Troubleshooting and tools

Tools

ifconfig

Similar to ipconfig on Windows. It allows the following:

iwconfig

Similar to ifconfig and ethtool for wireless cards. They also view and set the basic WiFi network details.

netstat

Displays the active TCP connections and ports on which the computer is listening; Ethernet statistics; the IP routing table; statistics for the IP; ICMP; TCP and UDP protocols. It comes with a number of options for displaying a variety of properties of the network and TCP connections

nslookup

A command line administrative tool for testing and troubleshooting DNS servers.

traceroute / tracert

This tool sends packets with TTL values that gradually increase from packet to packet, starting with a TTL value of one. The routers decrement TTL values of packets by one when routing and discard packets whose TTL value has reasched zero, returning the ICMP error message ICMP Time Exceeded.

  1. Build a packet up to layer 4. UDP can be used for the layer 4 protocol. IPv4 used at layer 3. The source address is the machine running the traceroute programme.
  2. For the first packet, make sure tTL is set to 1 in the IPv4 header.
    The first router on the path receives the packet, decrements the TTL value and drops the packet because it then has TTL value zero.
    Protocol rules say the router must send an ICMP Time Exceeded message back to the source address.

However, many network administrators now block ICMP packets passing through their routers for security reasons, Router A may be block the ICMP time exceeded from router B, so nothing reaches the final server, resulting in a request timed out error in the traceroute utility output.

tcptraceroute

Opening a TCP connection requires a TCP packet with the SYN flag to be sent. Such packets are unlikely to be blocked through intermediate routers, otherwise you could never establish a TCP connection with a remote site.

TCP connections to specific ports could be blocked, but there is probably at least one port open.

tcptraceroute never completely establishes a TCP connection with the destination host. If the remote end is not listening for incoming connections on the destination port used, it will respond with an RST indicating that the prot is closed.

If the remote end responds with a SYN|ACK, the port is known to be open. The local machine sends an RST to tear down the connection without completely three-way handshakes.

Closing a TCP connection

Three scenarios where a FSM could get stuck in CLOSE_WAIT: The client initiates by telling the TCP FSM at the remote side to close the connection; The remote side initiates by sending a FIN control signal; Both ends issue a CLOSE simultaneously.

Networks

Security aspects

Infrastruture protection

A network is composed of sensitive equipment: cables, switches, routers, power supplies etc. Hostile actors can compromise network security by:

Attack vectors

The IP suite means that packets are exchanged from one computer to another via a set of routers. These packets can be inspected with a suitable tools such as WireShark.

Layer 1: The application layer contents => the information
Layer 2: IP ports => the programs that are being used
Layer 3: IP addresses of source and destination => where are the individuals in the exchange?

Data encryption

Start with the data known as the plaintext. An encryption algorithm is applied to create the ciphertext. A decryption scheme is needed to read the data.

The encryption algorithm is a process dependnent on a key. Two options exist:

  1. Symmetric key: Only the sender and receiver know the key (shared). If the key is stolen, the scheme can be defeated.
  2. Public/private key pair: Use one key for encryption (public) and use a different key for decryption (private). The sender cannot decrypt data once it was encrypted

XOR cipher

Apply the XOR operation with the same key to each character in the cipher text to regenerate the plain text. If a message is intercepted, and the encruyption method is known to be XOR cipher, the cipher could be broken by trying each combination of key.

Convention

There is a convention that has developed when discussing cryptography. A wants to send a secret message to B. M is an attacker who wants to intercept the message.

Public key cryptography

Each user has to generate a pair of keys and then publish their public key so that people can send encrypted messages to them.

Diffie-Hellman

The simple XOR cipher is fairly easily broken. Methods that are very time consuming or impossible to break are desireable. Advanced pure mathematics proides the tools. (eg, integer factorisation, discrete logarithms, elliptic curves...) Prime numbers play a key role in this.

Tools for network security

Penetration test

A penetration test is an authorised simulated attack on a computer system that looks for security weakesses, potentially gaining access to the system's featuers and data.

White box: which provides background and system information
Black box: which provides only basic or no information

A penetration test should help determine whether a system is vulnerable to attack.

Denial of Service attacks

The primary purpose of a denial of service (DoS) attack is to flood a service or an application with so many unwanted packets that the affected systems are put out of business due to the workload being processed. One of the most common forms of this type of attack is the SYN attack. This is based on the packet used when a TCP connection is established. Some firewalls are now able to sense when this kind of attack is taking place.

DNS fast flux attack

Fast flux is a DNS technique used by botnets to hide phishing and malware delivery sites behind an ever-changing network of compromised hosts acting as proxies.

Networks

Software Defined Networks

Data centre operations

There are two kinds of expenditure:

Servers in a data centre are often not run at maximum capacity.

Virtualising a server