Week 10: Private Communications I
🎯 Learning Goals
By the end of this week, you should understand:
- The fundamentals of anonymous communication systems
- Mix networks: design principles and security properties
- Tor (The Onion Router): architecture and protocols
- Traffic analysis attacks and defenses
📖 Theoretical Content
Introduction to Private Communications
Communication Privacy Challenges:
- Content Privacy: Protecting message content from eavesdroppers
- Metadata Privacy: Hiding who communicates with whom, when, and how often
- Traffic Analysis: Inferring sensitive information from communication patterns
- Global Surveillance: Mass monitoring by governments and corporations
Traditional Solutions vs. Anonymous Communications:
- Encryption (TLS/HTTPS): Protects content but reveals metadata
- VPNs: Single point of failure, trust required in VPN provider
- Anonymous Networks: Distributed trust, protection against global adversaries
Threat Models for Communication Privacy
Adversary Capabilities:
- Local Eavesdropper: Can monitor traffic at specific network points
- Global Passive Adversary: Can observe all network traffic
- Active Adversary: Can modify, delay, or inject traffic
- Collusive Adversary: Multiple entities working together
Attack Goals:
- Traffic Analysis: Identify communication partners
- Timing Correlation: Link users across time periods
- Website Fingerprinting: Identify visited websites from traffic patterns
- Deanonymization: Reveal real identities of anonymous users
Mix Networks
Basic Concept:
A mix is a server that receives multiple encrypted messages, decrypts them, and forwards them in a different order to break linkability between inputs and outputs.
Chaum Mix (1981):
- Input: Multiple encrypted messages arrive at mix
- Decryption: Mix decrypts outer layer of each message
- Batching: Collects messages in batches
- Shuffling: Randomly reorders messages
- Output: Forwards messages to next hop
Security Properties:
- Unlinkability: Cannot link input message to output message
- Anonymity Set: Size of the group providing anonymity
- Threshold Security: Requires minimum number of messages to operate
Cascading Mix Networks
Mix Cascade:
Messages pass through a fixed sequence of mixes in predetermined order.
Advantages:
- Simpler Design: Fixed routing paths
- Strong Anonymity: Each mix provides additional protection
- Predictable Latency: Known path lengths
Disadvantages:
- Single Point of Failure: One compromised mix can compromise entire path
- Limited Scalability: All traffic goes through same sequence
- Performance Bottlenecks: Slowest mix limits overall performance
Free-Route Mix Networks
Stratified Mix Networks:
Network organized in layers, messages choose random path through layers.
Advantages:
- Path Diversity: Different messages take different routes
- Robustness: Multiple path options available
- Scalability: Can add mixes to increase capacity
Disadvantages:
- Complex Routing: Requires path selection algorithms
- Variable Latency: Different paths have different delays
- Attack Complexity: More sophisticated traffic analysis possible
🔍 Detailed Explanations
Mix Network Security Analysis
Anonymity Set Size:
The effective anonymity set is the number of users who could plausibly be the sender of a message.
Calculation Example:
Batch size: 100 messages
Mix cascade length: 3 mixes
If attacker controls 1 mix: anonymity set ≥ 66 users
If attacker controls 2 mixes: anonymity set ≥ 33 users
Threshold Mixes:
Wait for minimum number of messages before processing batch.
- Advantages: Larger anonymity sets, better mixing
- Disadvantages: Higher latency, potential DoS attacks
Continuous Mixes:
Process messages immediately as they arrive.
- Advantages: Lower latency, constant availability
- Disadvantages: Smaller effective anonymity sets
The Tor Network Architecture
Onion Routing Concept:
Messages are wrapped in multiple layers of encryption, like an onion. Each relay removes one layer and forwards to the next relay.
Tor Circuit Construction:
- Client selects 3 relays: Guard, Middle, Exit
- Negotiates keys with each relay using Diffie-Hellman
- Creates circuit by extending through each relay
- Sends data through the established circuit
Tor Cell Structure:
Cell = [Circuit_ID (2 bytes) | Command (1 byte) | Payload (509 bytes)]
Three-Hop Design:
- Guard Relay: First hop, knows client IP but not destination
- Middle Relay: Middle hop, knows neither client nor destination
- Exit Relay: Last hop, knows destination but not client IP
Tor Security Properties
Traffic Analysis Resistance:
- Constant Cell Size: All cells are 512 bytes
- Padding: Dummy traffic to obscure patterns
- Circuit Rotation: Regularly change circuits
Forward Secrecy:
- Ephemeral Keys: New keys for each session
- Perfect Forward Secrecy: Past communications secure even if keys compromised
Distributed Trust:
- No Single Point of Failure: Multiple independent operators
- Consensus Protocol: Distributed directory service
- Diversity: Geographic and organizational distribution
💡 Practical Examples
Example 1: Simple Mix Network Simulation
Scenario: 4 users sending messages through 2-mix cascade
import random
from cryptography.fernet import Fernet
class SimpleMix:
def __init__(self, name):
self.name = name
self.key = Fernet.generate_key()
self.cipher = Fernet(self.key)
self.batch = []
def receive_message(self, encrypted_msg):
# Decrypt outer layer
decrypted = self.cipher.decrypt(encrypted_msg)
self.batch.append(decrypted)
def flush_batch(self):
# Shuffle and forward
random.shuffle(self.batch)
output = self.batch.copy()
self.batch = []
return output
# Simulation
mix1 = SimpleMix("Mix1")
mix2 = SimpleMix("Mix2")
# Alice sends to Bob through mix cascade
alice_message = b"Hello Bob"
# Double encrypt: first for mix2, then for mix1
encrypted_for_mix2 = mix2.cipher.encrypt(alice_message)
encrypted_for_mix1 = mix1.cipher.encrypt(encrypted_for_mix2)
# Send through cascade
mix1.receive_message(encrypted_for_mix1)
# After batching and shuffling
batch1_output = mix1.flush_batch()
# Forward to mix2
for msg in batch1_output:
mix2.receive_message(msg)
final_output = mix2.flush_batch()
Example 2: Tor Circuit Setup Simulation
Scenario: Client establishes 3-hop circuit
class TorRelay:
def __init__(self, name, relay_type):
self.name = name
self.type = relay_type # "guard", "middle", "exit"
self.circuits = {}
def extend_circuit(self, circuit_id, next_hop=None):
self.circuits[circuit_id] = {
'next_hop': next_hop,
'established': True
}
return True
def relay_cell(self, circuit_id, cell_data):
if circuit_id in self.circuits:
next_hop = self.circuits[circuit_id]['next_hop']
if next_hop:
# Forward to next relay
return next_hop.relay_cell(circuit_id, cell_data)
else:
# Exit relay - send to destination
return cell_data
class TorClient:
def __init__(self):
self.circuits = {}
def build_circuit(self, guard, middle, exit_relay):
circuit_id = random.randint(1000, 9999)
# Step 1: Establish with guard
guard.extend_circuit(circuit_id, middle)
# Step 2: Extend through middle
middle.extend_circuit(circuit_id, exit_relay)
# Step 3: Extend to exit
exit_relay.extend_circuit(circuit_id, None)
self.circuits[circuit_id] = {
'path': [guard, middle, exit_relay],
'guard': guard
}
return circuit_id
def send_data(self, circuit_id, data):
circuit = self.circuits[circuit_id]
guard = circuit['guard']
return guard.relay_cell(circuit_id, data)
# Usage
client = TorClient()
guard = TorRelay("Guard1", "guard")
middle = TorRelay("Middle1", "middle")
exit_relay = TorRelay("Exit1", "exit")
circuit_id = client.build_circuit(guard, middle, exit_relay)
response = client.send_data(circuit_id, "GET /index.html HTTP/1.1")
Example 3: Traffic Analysis Attack
Scenario: Timing correlation attack on mix network
class TrafficAnalyzer:
def __init__(self):
self.observations = []
def observe_traffic(self, location, timestamp, message_size):
self.observations.append({
'location': location,
'timestamp': timestamp,
'size': message_size
})
def correlate_flows(self, time_window=10):
"""Find potentially correlated input/output flows"""
correlations = []
inputs = [obs for obs in self.observations if obs['location'] == 'input']
outputs = [obs for obs in self.observations if obs['location'] == 'output']
for inp in inputs:
for out in outputs:
time_diff = abs(out['timestamp'] - inp['timestamp'])
size_match = abs(out['size'] - inp['size']) < 100 # bytes
if time_diff <= time_window and size_match:
correlations.append({
'input': inp,
'output': out,
'confidence': 1.0 - (time_diff / time_window)
})
return sorted(correlations, key=lambda x: x['confidence'], reverse=True)
# Example attack
analyzer = TrafficAnalyzer()
# Alice sends 1KB message at time 100
analyzer.observe_traffic('input', 100, 1024)
# Bob receives 1KB message at time 105 (5 second delay)
analyzer.observe_traffic('output', 105, 1024)
# Find correlations
correlations = analyzer.correlate_flows()
print(f"Potential correlation with confidence: {correlations[0]['confidence']}")
❓ Self-Assessment Questions
Question 1: Explain how mix networks provide anonymity and what factors determine the level of anonymity achieved. (Click to reveal answer)
**Answer:**
**How Mix Networks Provide Anonymity:**
**1. Decryption and Re-encryption:**
- Each mix removes one layer of encryption
- Prevents linking encrypted input to encrypted output
- Creates cryptographic unlinkability
**2. Batching and Shuffling:**
- Mixes collect multiple messages before processing
- Randomly reorder messages before forwarding
- Breaks temporal correlation between input and output
**3. Traffic Mixing:**
- Multiple users' traffic combined at each mix
- Creates anonymity set of all users in the batch
- Harder to isolate individual communication flows
**Factors Determining Anonymity Level:**
**1. Anonymity Set Size:**
- Number of users who could plausibly be the sender
- Larger batch sizes = better anonymity
- Formula: Anonymity ≥ min(batch_size) across all mixes
**2. Network Topology:**
- **Cascade:** Fixed path, strong against some attacks
- **Free-route:** Multiple paths, robust against others
- **Threshold mixes:** Wait for minimum messages vs. continuous processing
**3. Adversary Capabilities:**
- **Fraction of compromised mixes:** More compromised = less anonymity
- **Traffic analysis capabilities:** Timing, volume, pattern analysis
- **Global vs. local observation:** Global adversary more powerful
**4. Usage Patterns:**
- **Uniform traffic:** All users send similar amounts
- **Temporal distribution:** When users are active
- **Message size distribution:** Padding vs. variable sizes
**Mathematical Example:**
```
3-mix cascade, batch size 10
If adversary controls 1 mix: anonymity set ≥ 7
If adversary controls 2 mixes: anonymity set ≥ 3
If adversary controls all 3 mixes: no anonymity
```
Question 2: What are the key differences between mix networks and Tor? When would you choose one over the other? (Click to reveal answer)
**Answer:**
**Key Differences:**
**Mix Networks:**
- **Message-based:** Discrete messages processed in batches
- **High latency:** Deliberate delays for better anonymity
- **Strong anonymity:** Large anonymity sets through batching
- **Asynchronous:** Email-like communication model
- **Threshold operation:** Waits for minimum number of messages
**Tor Network:**
- **Circuit-based:** Persistent connections for sessions
- **Low latency:** Real-time communication focus
- **Practical anonymity:** Smaller anonymity sets but immediate use
- **Synchronous:** Web browsing and interactive applications
- **Continuous operation:** Processes traffic immediately
**Technical Comparison:**
| Aspect | Mix Networks | Tor |
|--------|-------------|-----|
| **Latency** | Minutes to hours | Milliseconds to seconds |
| **Anonymity Set** | 100s to 1000s | 10s to 100s |
| **Traffic Pattern** | Batch processing | Continuous flow |
| **Applications** | Email, file transfer | Web browsing, IM |
| **Resistance to** | Traffic analysis | Real-time correlation |
**When to Choose Mix Networks:**
1. **High-sensitivity communications:** Whistleblowing, journalism
2. **Non-interactive applications:** Email, file sharing
3. **Maximum anonymity required:** Can tolerate high latency
4. **Resistance to global adversaries:** Strong traffic analysis protection
5. **Asynchronous communication:** Messages don't require immediate delivery
**When to Choose Tor:**
1. **Interactive applications:** Web browsing, social media
2. **Real-time communication:** Chat, VoIP, streaming
3. **Usability priority:** Need immediate response
4. **Existing application compatibility:** Works with standard protocols
5. **Moderate threat model:** Protection against local/regional surveillance
**Hybrid Approach:**
Some systems combine both:
- Use Tor for interactive browsing
- Use mix networks for sensitive document sharing
- Route high-sensitivity communications through mix networks
- Use Tor for daily privacy protection
Question 3: Describe how a timing correlation attack works against anonymous communication systems. How can such attacks be defended against? (Click to reveal answer)
**Answer:**
**Timing Correlation Attack Mechanism:**
**1. Traffic Observation:**
```python
# Attacker observes traffic at multiple points
entry_traffic = observe_entry_point() # When Alice sends
exit_traffic = observe_exit_point() # When Bob receives
# Look for timing patterns
def correlate_timing(entry_times, exit_times, threshold=5):
correlations = []
for entry_t in entry_times:
for exit_t in exit_times:
if abs(exit_t - entry_t) <= threshold:
correlations.append((entry_t, exit_t))
return correlations
```
**2. Pattern Matching:**
- **Volume correlation:** Match packet sizes and counts
- **Timing correlation:** Find messages with similar delays
- **Flow correlation:** Link patterns across multiple messages
- **Statistical analysis:** Use machine learning for pattern recognition
**3. Deanonymization Process:**
- Observe Alice sending traffic at time T₁
- Observe Bob receiving traffic at time T₂
- If T₂ - T₁ matches expected network delay, likely correlation
- Repeat observations to increase confidence
**Real-World Example:**
```
Alice sends email through mix network at 10:00 AM
Mix network has 5-minute average delay
Bob receives email at 10:05 AM
Attacker correlates: high probability Alice → Bob communication
```
**Defense Mechanisms:**
**1. Artificial Delays and Jitter:**
```python
import random
import time
def add_random_delay(min_delay, max_delay):
delay = random.uniform(min_delay, max_delay)
time.sleep(delay)
def exponential_mix_delay():
# Exponential distribution for unpredictable delays
return random.expovariate(1.0 / average_delay)
```
**2. Dummy Traffic (Cover Traffic):**
```python
class CoverTrafficGenerator:
def __init__(self, rate):
self.rate = rate # messages per minute
def generate_dummies(self):
while True:
# Send dummy message
send_dummy_message()
# Wait for next dummy
time.sleep(60 / self.rate)
def send_dummy_message(self):
# Indistinguishable from real traffic
dummy = create_realistic_dummy()
send_through_network(dummy)
```
**3. Batching Strategies:**
```python
class BatchMixer:
def __init__(self, batch_size, timeout):
self.batch_size = batch_size
self.timeout = timeout
self.current_batch = []
def add_message(self, message):
self.current_batch.append(message)
# Process when batch full or timeout reached
if len(self.current_batch) >= self.batch_size:
self.process_batch()
elif self.time_since_first() > self.timeout:
self.process_batch()
def process_batch(self):
random.shuffle(self.current_batch)
for msg in self.current_batch:
forward_message(msg)
self.current_batch = []
```
**4. Traffic Shaping:**
```python
def constant_rate_sending(messages, rate):
"""Send messages at constant rate regardless of arrival pattern"""
interval = 1.0 / rate
for message in messages:
send_message(message)
time.sleep(interval) # Fixed interval between sends
```
**5. Multi-path Routing:**
```python
def diversify_paths(message, num_paths=3):
"""Split message across multiple paths with different delays"""
fragments = split_message(message, num_paths)
paths = select_diverse_paths(num_paths)
for fragment, path in zip(fragments, paths):
send_via_path(fragment, path)
```
**Advanced Defenses:**
**1. Pool Mixes:**
- Maintain pool of messages from different time periods
- Random selection from pool for forwarding
- Exponential delays to prevent timing attacks
**2. Heartbeat Traffic:**
- Continuous background traffic at regular intervals
- Makes real traffic harder to distinguish
- Provides cover for actual communications
**3. Adaptive Defenses:**
- Monitor for correlation attacks
- Dynamically adjust delays and batching
- Machine learning to detect attack patterns
**Limitations:**
- **Perfect defenses are expensive:** High latency and bandwidth overhead
- **Usability trade-offs:** Strong defenses impact user experience
- **Global adversary problem:** Sophisticated attackers can still correlate
- **Economic constraints:** Dummy traffic costs bandwidth and resources
Question 4: Explain Tor's three-hop design. Why not use more or fewer hops? (Click to reveal answer)
**Answer:**
**Tor's Three-Hop Architecture:**
**Standard Path:**
```
Client → Guard Relay → Middle Relay → Exit Relay → Destination
```
**Each Relay's Role:**
1. **Guard Relay:**
- **Knows:** Client's IP address
- **Doesn't know:** Final destination
- **Purpose:** Entry point to Tor network
2. **Middle Relay:**
- **Knows:** Previous and next hop in circuit
- **Doesn't know:** Client IP or final destination
- **Purpose:** Adds additional layer of indirection
3. **Exit Relay:**
- **Knows:** Final destination
- **Doesn't know:** Client's IP address
- **Purpose:** Interface between Tor and regular internet
**Why Not Fewer Hops?**
**One Hop (Direct Connection):**
```
Client → Relay → Destination
```
**Problems:**
- Relay knows both client and destination
- No anonymity protection
- Single point of failure for privacy
- Equivalent to a VPN (trusted proxy)
**Two Hops:**
```
Client → Relay1 → Relay2 → Destination
```
**Problems:**
- If attacker controls both relays: complete deanonymization
- Probability of compromise: P(relay1 OR relay2) = P₁ + P₂ - P₁P₂
- For 10% compromise rate per relay: ~19% chance of full compromise
- Insufficient protection against global adversaries
**Why Not More Hops?**
**Four Hops:**
```
Client → Guard → Middle1 → Middle2 → Exit → Destination
```
**Five Hops:**
```
Client → Guard → Middle1 → Middle2 → Middle3 → Exit → Destination
```
**Problems with More Hops:**
**1. Diminishing Security Returns:**
```python
# Security calculation
def compromise_probability(num_relays, compromise_rate=0.1):
# Probability all relays are honest
all_honest = (1 - compromise_rate) ** num_relays
return 1 - all_honest
# Examples:
# 3 hops: ~27% chance at least one compromised
# 4 hops: ~34% chance at least one compromised
# 5 hops: ~41% chance at least one compromised
```
**2. Performance Degradation:**
- **Latency:** Each hop adds ~100-500ms delay
- **Bandwidth:** Limited by slowest relay in path
- **Reliability:** More hops = higher failure probability
- **Connection establishment:** More time to build circuits
**3. Traffic Analysis Vulnerability:**
- **More observation points:** Additional relays to potentially monitor
- **Longer paths:** More opportunities for timing attacks
- **Resource consumption:** Higher bandwidth and computational costs
**Why Three Hops is Optimal:**
**1. Security Properties:**
```
Compromise scenarios:
- 0 compromised relays: Full anonymity ✓
- 1 compromised relay: Partial information only ✓
- 2 compromised relays: Still some protection ✓
- 3 compromised relays: Full compromise (very low probability)
```
**2. Practical Considerations:**
- **Usability:** Reasonable latency for web browsing
- **Adoption:** Good enough protection encourages usage
- **Network effects:** More users = better anonymity for everyone
**3. Threat Model Balance:**
- **Protects against:** Local surveillance, ISP monitoring, website tracking
- **Acceptable risk:** Global adversaries (rare, sophisticated attacks)
- **Cost-benefit:** Good security-to-performance ratio
**4. Mathematical Justification:**
```python
# Anonymity vs Performance trade-off
def anonymity_score(hops):
security = 1 - (compromise_rate ** hops)
performance = 1 / hops # Inverse relationship
usability = max(0, 1 - (latency_per_hop * hops / max_acceptable_latency))
return security * performance * usability
# Typically maximized around 3 hops
```
**Special Cases:**
**Onion Services (Hidden Services):**
```
Client → 3 hops → Rendezvous Point ← 3 hops ← Hidden Service
```
- Uses 6 hops total for extra security
- Both client and service are anonymous
- Higher security requirements justify additional latency
**Bridges and Pluggable Transports:**
- May add extra layer before guard relay
- Used in censorship circumvention
- Security vs. censorship resistance trade-off
Question 5: Design a mix network protocol that provides protection against a global passive adversary. What are the key challenges and trade-offs? (Click to reveal answer)
**Answer:**
**Global Passive Adversary-Resistant Mix Protocol:**
**Threat Model:**
- Adversary can observe all network traffic globally
- Cannot modify or delay traffic (passive only)
- Unlimited computational resources for traffic analysis
- Goal: Prevent linking senders to receivers
**Core Protocol Design:**
**1. Distributed Threshold Mix Network:**
```python
class GlobalAdversaryResistantMix:
def __init__(self, mix_id, threshold_k, total_mixes_n):
self.mix_id = mix_id
self.k = threshold_k # Minimum for operation
self.n = total_mixes_n
self.message_pool = []
self.round_number = 0
def process_round(self):
if len(self.message_pool) >= self.k:
# Synchronous processing across all mixes
self.synchronized_mix_operation()
```
**2. Synchronous Global Rounds:**
```python
class SynchronizedMixingRound:
def __init__(self, all_mixes):
self.mixes = all_mixes
self.round_duration = 300 # 5 minutes
def execute_round(self):
# Phase 1: Collect all messages
all_messages = []
for mix in self.mixes:
all_messages.extend(mix.get_pending_messages())
# Phase 2: Global shuffle using verifiable randomness
global_permutation = self.generate_verifiable_permutation(
len(all_messages)
)
# Phase 3: Redistribute according to permutation
shuffled_messages = self.apply_permutation(
all_messages, global_permutation
)
# Phase 4: Deliver in synchronized fashion
self.synchronized_delivery(shuffled_messages)
```
**3. Cover Traffic Generation:**
```python
class CoverTrafficManager:
def __init__(self, target_rate):
self.target_rate = target_rate # messages per round
self.real_message_count = 0
def generate_cover_traffic(self, real_messages):
current_count = len(real_messages)
needed_dummies = max(0, self.target_rate - current_count)
dummies = []
for _ in range(needed_dummies):
dummy = self.create_indistinguishable_dummy()
dummies.append(dummy)
return real_messages + dummies
```
**Key Challenges:**
**1. Synchronization Problem:**
```python
# Challenge: Global clock synchronization
class SynchronizationManager:
def __init__(self):
self.tolerance = 30 # seconds
def wait_for_global_sync(self):
# Use network time protocol
# Account for network delays
# Handle clock skew between mixes
pass
def handle_late_arrivals(self, cutoff_time):
# What to do with messages arriving after deadline?
# Trade-off: strictness vs. message loss
pass
```
**2. Dummy Traffic Economics:**
```python
# Challenge: Cost of maintaining cover traffic
class EconomicModel:
def calculate_dummy_cost(self, real_traffic_rate, cover_rate):
dummy_ratio = cover_rate / real_traffic_rate
bandwidth_cost = dummy_ratio * base_bandwidth_cost
computation_cost = dummy_ratio * base_computation_cost
return bandwidth_cost + computation_cost
# For protection against global adversary:
# Need constant high-rate dummy traffic
# Can be 10-100x more expensive than real traffic
```
**Trade-offs:**
**1. Security vs. Latency:**
```python
# Stronger security requires longer delays
security_levels = {
'weak': {'rounds': 1, 'delay': '1 minute'},
'medium': {'rounds': 3, 'delay': '15 minutes'},
'strong': {'rounds': 10, 'delay': '1 hour'},
'maximum': {'rounds': 100, 'delay': '1 day'}
}
```
**2. Anonymity vs. Efficiency:**
```python
def anonymity_efficiency_tradeoff(anonymity_set_size):
# Larger anonymity sets require more coordination
efficiency = 1 / anonymity_set_size
anonymity = math.log(anonymity_set_size)
# Sweet spot typically around 1000-10000 users
return anonymity * efficiency
```
**3. Usability vs. Protection:**
- Global adversary protection often incompatible with real-time applications
- High latency makes system unsuitable for interactive use
- Economic costs may limit adoption
**Practical Deployment Challenges:**
- **Adoption threshold:** Need critical mass of users
- **Economic sustainability:** Who pays for dummy traffic?
- **Legal issues:** Some jurisdictions may prohibit such systems
- **Technical complexity:** Difficult to implement and maintain
**Conclusion:** While theoretically possible, global adversary resistance requires significant trade-offs in usability, cost, and performance that limit practical deployment to very high-security scenarios.