Week 10: Private Communications I

⬅️ Week 8 | Main | Week 12 ➡️

🎯 Learning Goals

By the end of this week, you should understand:

The fundamentals of anonymous communication systems
Mix networks: design principles and security properties
Tor (The Onion Router): architecture and protocols
Traffic analysis attacks and defenses

📖 Theoretical Content

Introduction to Private Communications

Communication Privacy Challenges:

Content Privacy: Protecting message content from eavesdroppers
Metadata Privacy: Hiding who communicates with whom, when, and how often
Traffic Analysis: Inferring sensitive information from communication patterns
Global Surveillance: Mass monitoring by governments and corporations

Traditional Solutions vs. Anonymous Communications:

Encryption (TLS/HTTPS): Protects content but reveals metadata
VPNs: Single point of failure, trust required in VPN provider
Anonymous Networks: Distributed trust, protection against global adversaries

Threat Models for Communication Privacy

Adversary Capabilities:

Local Eavesdropper: Can monitor traffic at specific network points
Global Passive Adversary: Can observe all network traffic
Active Adversary: Can modify, delay, or inject traffic
Collusive Adversary: Multiple entities working together

Attack Goals:

Traffic Analysis: Identify communication partners
Timing Correlation: Link users across time periods
Website Fingerprinting: Identify visited websites from traffic patterns
Deanonymization: Reveal real identities of anonymous users

Mix Networks

Basic Concept: A mix is a server that receives multiple encrypted messages, decrypts them, and forwards them in a different order to break linkability between inputs and outputs.

Chaum Mix (1981):

Input: Multiple encrypted messages arrive at mix
Decryption: Mix decrypts outer layer of each message
Batching: Collects messages in batches
Shuffling: Randomly reorders messages
Output: Forwards messages to next hop

Security Properties:

Unlinkability: Cannot link input message to output message
Anonymity Set: Size of the group providing anonymity
Threshold Security: Requires minimum number of messages to operate

Cascading Mix Networks

Mix Cascade: Messages pass through a fixed sequence of mixes in predetermined order.

Advantages:

Simpler Design: Fixed routing paths
Strong Anonymity: Each mix provides additional protection
Predictable Latency: Known path lengths

Disadvantages:

Single Point of Failure: One compromised mix can compromise entire path
Limited Scalability: All traffic goes through same sequence
Performance Bottlenecks: Slowest mix limits overall performance

Free-Route Mix Networks

Stratified Mix Networks: Network organized in layers, messages choose random path through layers.

Advantages:

Path Diversity: Different messages take different routes
Robustness: Multiple path options available
Scalability: Can add mixes to increase capacity

Disadvantages:

Complex Routing: Requires path selection algorithms
Variable Latency: Different paths have different delays
Attack Complexity: More sophisticated traffic analysis possible

🔍 Detailed Explanations

Mix Network Security Analysis

Anonymity Set Size: The effective anonymity set is the number of users who could plausibly be the sender of a message.

Calculation Example:

Batch size: 100 messages
Mix cascade length: 3 mixes
If attacker controls 1 mix: anonymity set ≥ 66 users
If attacker controls 2 mixes: anonymity set ≥ 33 users

Threshold Mixes: Wait for minimum number of messages before processing batch.

Advantages: Larger anonymity sets, better mixing
Disadvantages: Higher latency, potential DoS attacks

Continuous Mixes: Process messages immediately as they arrive.

Advantages: Lower latency, constant availability
Disadvantages: Smaller effective anonymity sets

The Tor Network Architecture

Onion Routing Concept: Messages are wrapped in multiple layers of encryption, like an onion. Each relay removes one layer and forwards to the next relay.

Tor Circuit Construction:

Client selects 3 relays: Guard, Middle, Exit
Negotiates keys with each relay using Diffie-Hellman
Creates circuit by extending through each relay
Sends data through the established circuit

Tor Cell Structure:

Cell = [Circuit_ID (2 bytes) | Command (1 byte) | Payload (509 bytes)]

Three-Hop Design:

Guard Relay: First hop, knows client IP but not destination
Middle Relay: Middle hop, knows neither client nor destination
Exit Relay: Last hop, knows destination but not client IP

Tor Security Properties

Traffic Analysis Resistance:

Constant Cell Size: All cells are 512 bytes
Padding: Dummy traffic to obscure patterns
Circuit Rotation: Regularly change circuits

Forward Secrecy:

Ephemeral Keys: New keys for each session
Perfect Forward Secrecy: Past communications secure even if keys compromised

Distributed Trust:

No Single Point of Failure: Multiple independent operators
Consensus Protocol: Distributed directory service
Diversity: Geographic and organizational distribution

💡 Practical Examples

Example 1: Simple Mix Network Simulation

Scenario: 4 users sending messages through 2-mix cascade

import random
from cryptography.fernet import Fernet

class SimpleMix:
    def __init__(self, name):
        self.name = name
        self.key = Fernet.generate_key()
        self.cipher = Fernet(self.key)
        self.batch = []
        
    def receive_message(self, encrypted_msg):
        # Decrypt outer layer
        decrypted = self.cipher.decrypt(encrypted_msg)
        self.batch.append(decrypted)
        
    def flush_batch(self):
        # Shuffle and forward
        random.shuffle(self.batch)
        output = self.batch.copy()
        self.batch = []
        return output

# Simulation
mix1 = SimpleMix("Mix1")
mix2 = SimpleMix("Mix2")

# Alice sends to Bob through mix cascade
alice_message = b"Hello Bob"
# Double encrypt: first for mix2, then for mix1
encrypted_for_mix2 = mix2.cipher.encrypt(alice_message)
encrypted_for_mix1 = mix1.cipher.encrypt(encrypted_for_mix2)

# Send through cascade
mix1.receive_message(encrypted_for_mix1)
# After batching and shuffling
batch1_output = mix1.flush_batch()
# Forward to mix2
for msg in batch1_output:
    mix2.receive_message(msg)
final_output = mix2.flush_batch()

Example 2: Tor Circuit Setup Simulation

Scenario: Client establishes 3-hop circuit

class TorRelay:
    def __init__(self, name, relay_type):
        self.name = name
        self.type = relay_type  # "guard", "middle", "exit"
        self.circuits = {}
        
    def extend_circuit(self, circuit_id, next_hop=None):
        self.circuits[circuit_id] = {
            'next_hop': next_hop,
            'established': True
        }
        return True
        
    def relay_cell(self, circuit_id, cell_data):
        if circuit_id in self.circuits:
            next_hop = self.circuits[circuit_id]['next_hop']
            if next_hop:
                # Forward to next relay
                return next_hop.relay_cell(circuit_id, cell_data)
            else:
                # Exit relay - send to destination
                return cell_data

class TorClient:
    def __init__(self):
        self.circuits = {}
        
    def build_circuit(self, guard, middle, exit_relay):
        circuit_id = random.randint(1000, 9999)
        
        # Step 1: Establish with guard
        guard.extend_circuit(circuit_id, middle)
        
        # Step 2: Extend through middle
        middle.extend_circuit(circuit_id, exit_relay)
        
        # Step 3: Extend to exit
        exit_relay.extend_circuit(circuit_id, None)
        
        self.circuits[circuit_id] = {
            'path': [guard, middle, exit_relay],
            'guard': guard
        }
        
        return circuit_id
        
    def send_data(self, circuit_id, data):
        circuit = self.circuits[circuit_id]
        guard = circuit['guard']
        return guard.relay_cell(circuit_id, data)

# Usage
client = TorClient()
guard = TorRelay("Guard1", "guard")
middle = TorRelay("Middle1", "middle")
exit_relay = TorRelay("Exit1", "exit")

circuit_id = client.build_circuit(guard, middle, exit_relay)
response = client.send_data(circuit_id, "GET /index.html HTTP/1.1")

Example 3: Traffic Analysis Attack

Scenario: Timing correlation attack on mix network

class TrafficAnalyzer:
    def __init__(self):
        self.observations = []
        
    def observe_traffic(self, location, timestamp, message_size):
        self.observations.append({
            'location': location,
            'timestamp': timestamp, 
            'size': message_size
        })
        
    def correlate_flows(self, time_window=10):
        """Find potentially correlated input/output flows"""
        correlations = []
        
        inputs = [obs for obs in self.observations if obs['location'] == 'input']
        outputs = [obs for obs in self.observations if obs['location'] == 'output']
        
        for inp in inputs:
            for out in outputs:
                time_diff = abs(out['timestamp'] - inp['timestamp'])
                size_match = abs(out['size'] - inp['size']) < 100  # bytes
                
                if time_diff <= time_window and size_match:
                    correlations.append({
                        'input': inp,
                        'output': out,
                        'confidence': 1.0 - (time_diff / time_window)
                    })
                    
        return sorted(correlations, key=lambda x: x['confidence'], reverse=True)

# Example attack
analyzer = TrafficAnalyzer()

# Alice sends 1KB message at time 100
analyzer.observe_traffic('input', 100, 1024)

# Bob receives 1KB message at time 105 (5 second delay)
analyzer.observe_traffic('output', 105, 1024)

# Find correlations
correlations = analyzer.correlate_flows()
print(f"Potential correlation with confidence: {correlations[0]['confidence']}")

❓ Self-Assessment Questions

Question 1: Explain how mix networks provide anonymity and what factors determine the level of anonymity achieved. (Click to reveal answer)

**Answer:** **How Mix Networks Provide Anonymity:** **1. Decryption and Re-encryption:** - Each mix removes one layer of encryption - Prevents linking encrypted input to encrypted output - Creates cryptographic unlinkability **2. Batching and Shuffling:** - Mixes collect multiple messages before processing - Randomly reorder messages before forwarding - Breaks temporal correlation between input and output **3. Traffic Mixing:** - Multiple users' traffic combined at each mix - Creates anonymity set of all users in the batch - Harder to isolate individual communication flows **Factors Determining Anonymity Level:** **1. Anonymity Set Size:** - Number of users who could plausibly be the sender - Larger batch sizes = better anonymity - Formula: Anonymity ≥ min(batch_size) across all mixes **2. Network Topology:** - **Cascade:** Fixed path, strong against some attacks - **Free-route:** Multiple paths, robust against others - **Threshold mixes:** Wait for minimum messages vs. continuous processing **3. Adversary Capabilities:** - **Fraction of compromised mixes:** More compromised = less anonymity - **Traffic analysis capabilities:** Timing, volume, pattern analysis - **Global vs. local observation:** Global adversary more powerful **4. Usage Patterns:** - **Uniform traffic:** All users send similar amounts - **Temporal distribution:** When users are active - **Message size distribution:** Padding vs. variable sizes **Mathematical Example:** ``` 3-mix cascade, batch size 10 If adversary controls 1 mix: anonymity set ≥ 7 If adversary controls 2 mixes: anonymity set ≥ 3 If adversary controls all 3 mixes: no anonymity ```

Question 2: What are the key differences between mix networks and Tor? When would you choose one over the other? (Click to reveal answer)

**Answer:** **Key Differences:** **Mix Networks:** - **Message-based:** Discrete messages processed in batches - **High latency:** Deliberate delays for better anonymity - **Strong anonymity:** Large anonymity sets through batching - **Asynchronous:** Email-like communication model - **Threshold operation:** Waits for minimum number of messages **Tor Network:** - **Circuit-based:** Persistent connections for sessions - **Low latency:** Real-time communication focus - **Practical anonymity:** Smaller anonymity sets but immediate use - **Synchronous:** Web browsing and interactive applications - **Continuous operation:** Processes traffic immediately **Technical Comparison:** | Aspect | Mix Networks | Tor | |--------|-------------|-----| | **Latency** | Minutes to hours | Milliseconds to seconds | | **Anonymity Set** | 100s to 1000s | 10s to 100s | | **Traffic Pattern** | Batch processing | Continuous flow | | **Applications** | Email, file transfer | Web browsing, IM | | **Resistance to** | Traffic analysis | Real-time correlation | **When to Choose Mix Networks:** 1. **High-sensitivity communications:** Whistleblowing, journalism 2. **Non-interactive applications:** Email, file sharing 3. **Maximum anonymity required:** Can tolerate high latency 4. **Resistance to global adversaries:** Strong traffic analysis protection 5. **Asynchronous communication:** Messages don't require immediate delivery **When to Choose Tor:** 1. **Interactive applications:** Web browsing, social media 2. **Real-time communication:** Chat, VoIP, streaming 3. **Usability priority:** Need immediate response 4. **Existing application compatibility:** Works with standard protocols 5. **Moderate threat model:** Protection against local/regional surveillance **Hybrid Approach:** Some systems combine both: - Use Tor for interactive browsing - Use mix networks for sensitive document sharing - Route high-sensitivity communications through mix networks - Use Tor for daily privacy protection

Question 3: Describe how a timing correlation attack works against anonymous communication systems. How can such attacks be defended against? (Click to reveal answer)

**Answer:** **Timing Correlation Attack Mechanism:** **1. Traffic Observation:** ```python # Attacker observes traffic at multiple points entry_traffic = observe_entry_point() # When Alice sends exit_traffic = observe_exit_point() # When Bob receives # Look for timing patterns def correlate_timing(entry_times, exit_times, threshold=5): correlations = [] for entry_t in entry_times: for exit_t in exit_times: if abs(exit_t - entry_t) <= threshold: correlations.append((entry_t, exit_t)) return correlations ``` **2. Pattern Matching:** - **Volume correlation:** Match packet sizes and counts - **Timing correlation:** Find messages with similar delays - **Flow correlation:** Link patterns across multiple messages - **Statistical analysis:** Use machine learning for pattern recognition **3. Deanonymization Process:** - Observe Alice sending traffic at time T₁ - Observe Bob receiving traffic at time T₂ - If T₂ - T₁ matches expected network delay, likely correlation - Repeat observations to increase confidence **Real-World Example:** ``` Alice sends email through mix network at 10:00 AM Mix network has 5-minute average delay Bob receives email at 10:05 AM Attacker correlates: high probability Alice → Bob communication ``` **Defense Mechanisms:** **1. Artificial Delays and Jitter:** ```python import random import time def add_random_delay(min_delay, max_delay): delay = random.uniform(min_delay, max_delay) time.sleep(delay) def exponential_mix_delay(): # Exponential distribution for unpredictable delays return random.expovariate(1.0 / average_delay) ``` **2. Dummy Traffic (Cover Traffic):** ```python class CoverTrafficGenerator: def __init__(self, rate): self.rate = rate # messages per minute def generate_dummies(self): while True: # Send dummy message send_dummy_message() # Wait for next dummy time.sleep(60 / self.rate) def send_dummy_message(self): # Indistinguishable from real traffic dummy = create_realistic_dummy() send_through_network(dummy) ``` **3. Batching Strategies:** ```python class BatchMixer: def __init__(self, batch_size, timeout): self.batch_size = batch_size self.timeout = timeout self.current_batch = [] def add_message(self, message): self.current_batch.append(message) # Process when batch full or timeout reached if len(self.current_batch) >= self.batch_size: self.process_batch() elif self.time_since_first() > self.timeout: self.process_batch() def process_batch(self): random.shuffle(self.current_batch) for msg in self.current_batch: forward_message(msg) self.current_batch = [] ``` **4. Traffic Shaping:** ```python def constant_rate_sending(messages, rate): """Send messages at constant rate regardless of arrival pattern""" interval = 1.0 / rate for message in messages: send_message(message) time.sleep(interval) # Fixed interval between sends ``` **5. Multi-path Routing:** ```python def diversify_paths(message, num_paths=3): """Split message across multiple paths with different delays""" fragments = split_message(message, num_paths) paths = select_diverse_paths(num_paths) for fragment, path in zip(fragments, paths): send_via_path(fragment, path) ``` **Advanced Defenses:** **1. Pool Mixes:** - Maintain pool of messages from different time periods - Random selection from pool for forwarding - Exponential delays to prevent timing attacks **2. Heartbeat Traffic:** - Continuous background traffic at regular intervals - Makes real traffic harder to distinguish - Provides cover for actual communications **3. Adaptive Defenses:** - Monitor for correlation attacks - Dynamically adjust delays and batching - Machine learning to detect attack patterns **Limitations:** - **Perfect defenses are expensive:** High latency and bandwidth overhead - **Usability trade-offs:** Strong defenses impact user experience - **Global adversary problem:** Sophisticated attackers can still correlate - **Economic constraints:** Dummy traffic costs bandwidth and resources

Question 4: Explain Tor's three-hop design. Why not use more or fewer hops? (Click to reveal answer)

**Answer:** **Tor's Three-Hop Architecture:** **Standard Path:** ``` Client → Guard Relay → Middle Relay → Exit Relay → Destination ``` **Each Relay's Role:** 1. **Guard Relay:** - **Knows:** Client's IP address - **Doesn't know:** Final destination - **Purpose:** Entry point to Tor network 2. **Middle Relay:** - **Knows:** Previous and next hop in circuit - **Doesn't know:** Client IP or final destination - **Purpose:** Adds additional layer of indirection 3. **Exit Relay:** - **Knows:** Final destination - **Doesn't know:** Client's IP address - **Purpose:** Interface between Tor and regular internet **Why Not Fewer Hops?** **One Hop (Direct Connection):** ``` Client → Relay → Destination ``` **Problems:** - Relay knows both client and destination - No anonymity protection - Single point of failure for privacy - Equivalent to a VPN (trusted proxy) **Two Hops:** ``` Client → Relay1 → Relay2 → Destination ``` **Problems:** - If attacker controls both relays: complete deanonymization - Probability of compromise: P(relay1 OR relay2) = P₁ + P₂ - P₁P₂ - For 10% compromise rate per relay: ~19% chance of full compromise - Insufficient protection against global adversaries **Why Not More Hops?** **Four Hops:** ``` Client → Guard → Middle1 → Middle2 → Exit → Destination ``` **Five Hops:** ``` Client → Guard → Middle1 → Middle2 → Middle3 → Exit → Destination ``` **Problems with More Hops:** **1. Diminishing Security Returns:** ```python # Security calculation def compromise_probability(num_relays, compromise_rate=0.1): # Probability all relays are honest all_honest = (1 - compromise_rate) ** num_relays return 1 - all_honest # Examples: # 3 hops: ~27% chance at least one compromised # 4 hops: ~34% chance at least one compromised # 5 hops: ~41% chance at least one compromised ``` **2. Performance Degradation:** - **Latency:** Each hop adds ~100-500ms delay - **Bandwidth:** Limited by slowest relay in path - **Reliability:** More hops = higher failure probability - **Connection establishment:** More time to build circuits **3. Traffic Analysis Vulnerability:** - **More observation points:** Additional relays to potentially monitor - **Longer paths:** More opportunities for timing attacks - **Resource consumption:** Higher bandwidth and computational costs **Why Three Hops is Optimal:** **1. Security Properties:** ``` Compromise scenarios: - 0 compromised relays: Full anonymity ✓ - 1 compromised relay: Partial information only ✓ - 2 compromised relays: Still some protection ✓ - 3 compromised relays: Full compromise (very low probability) ``` **2. Practical Considerations:** - **Usability:** Reasonable latency for web browsing - **Adoption:** Good enough protection encourages usage - **Network effects:** More users = better anonymity for everyone **3. Threat Model Balance:** - **Protects against:** Local surveillance, ISP monitoring, website tracking - **Acceptable risk:** Global adversaries (rare, sophisticated attacks) - **Cost-benefit:** Good security-to-performance ratio **4. Mathematical Justification:** ```python # Anonymity vs Performance trade-off def anonymity_score(hops): security = 1 - (compromise_rate ** hops) performance = 1 / hops # Inverse relationship usability = max(0, 1 - (latency_per_hop * hops / max_acceptable_latency)) return security * performance * usability # Typically maximized around 3 hops ``` **Special Cases:** **Onion Services (Hidden Services):** ``` Client → 3 hops → Rendezvous Point ← 3 hops ← Hidden Service ``` - Uses 6 hops total for extra security - Both client and service are anonymous - Higher security requirements justify additional latency **Bridges and Pluggable Transports:** - May add extra layer before guard relay - Used in censorship circumvention - Security vs. censorship resistance trade-off

Question 5: Design a mix network protocol that provides protection against a global passive adversary. What are the key challenges and trade-offs? (Click to reveal answer)

**Answer:** **Global Passive Adversary-Resistant Mix Protocol:** **Threat Model:** - Adversary can observe all network traffic globally - Cannot modify or delay traffic (passive only) - Unlimited computational resources for traffic analysis - Goal: Prevent linking senders to receivers **Core Protocol Design:** **1. Distributed Threshold Mix Network:** ```python class GlobalAdversaryResistantMix: def __init__(self, mix_id, threshold_k, total_mixes_n): self.mix_id = mix_id self.k = threshold_k # Minimum for operation self.n = total_mixes_n self.message_pool = [] self.round_number = 0 def process_round(self): if len(self.message_pool) >= self.k: # Synchronous processing across all mixes self.synchronized_mix_operation() ``` **2. Synchronous Global Rounds:** ```python class SynchronizedMixingRound: def __init__(self, all_mixes): self.mixes = all_mixes self.round_duration = 300 # 5 minutes def execute_round(self): # Phase 1: Collect all messages all_messages = [] for mix in self.mixes: all_messages.extend(mix.get_pending_messages()) # Phase 2: Global shuffle using verifiable randomness global_permutation = self.generate_verifiable_permutation( len(all_messages) ) # Phase 3: Redistribute according to permutation shuffled_messages = self.apply_permutation( all_messages, global_permutation ) # Phase 4: Deliver in synchronized fashion self.synchronized_delivery(shuffled_messages) ``` **3. Cover Traffic Generation:** ```python class CoverTrafficManager: def __init__(self, target_rate): self.target_rate = target_rate # messages per round self.real_message_count = 0 def generate_cover_traffic(self, real_messages): current_count = len(real_messages) needed_dummies = max(0, self.target_rate - current_count) dummies = [] for _ in range(needed_dummies): dummy = self.create_indistinguishable_dummy() dummies.append(dummy) return real_messages + dummies ``` **Key Challenges:** **1. Synchronization Problem:** ```python # Challenge: Global clock synchronization class SynchronizationManager: def __init__(self): self.tolerance = 30 # seconds def wait_for_global_sync(self): # Use network time protocol # Account for network delays # Handle clock skew between mixes pass def handle_late_arrivals(self, cutoff_time): # What to do with messages arriving after deadline? # Trade-off: strictness vs. message loss pass ``` **2. Dummy Traffic Economics:** ```python # Challenge: Cost of maintaining cover traffic class EconomicModel: def calculate_dummy_cost(self, real_traffic_rate, cover_rate): dummy_ratio = cover_rate / real_traffic_rate bandwidth_cost = dummy_ratio * base_bandwidth_cost computation_cost = dummy_ratio * base_computation_cost return bandwidth_cost + computation_cost # For protection against global adversary: # Need constant high-rate dummy traffic # Can be 10-100x more expensive than real traffic ``` **Trade-offs:** **1. Security vs. Latency:** ```python # Stronger security requires longer delays security_levels = { 'weak': {'rounds': 1, 'delay': '1 minute'}, 'medium': {'rounds': 3, 'delay': '15 minutes'}, 'strong': {'rounds': 10, 'delay': '1 hour'}, 'maximum': {'rounds': 100, 'delay': '1 day'} } ``` **2. Anonymity vs. Efficiency:** ```python def anonymity_efficiency_tradeoff(anonymity_set_size): # Larger anonymity sets require more coordination efficiency = 1 / anonymity_set_size anonymity = math.log(anonymity_set_size) # Sweet spot typically around 1000-10000 users return anonymity * efficiency ``` **3. Usability vs. Protection:** - Global adversary protection often incompatible with real-time applications - High latency makes system unsuitable for interactive use - Economic costs may limit adoption **Practical Deployment Challenges:** - **Adoption threshold:** Need critical mass of users - **Economic sustainability:** Who pays for dummy traffic? - **Legal issues:** Some jurisdictions may prohibit such systems - **Technical complexity:** Difficult to implement and maintain **Conclusion:** While theoretically possible, global adversary resistance requires significant trade-offs in usability, cost, and performance that limit practical deployment to very high-security scenarios.

⬅️ Week 8 | Main | Week 12 ➡️