Skip to the content.

Week 5: Location Privacy

⬅️ Week 4 | Main | Week 6 ➡️

🎯 Learning Goals

By the end of this week, you should understand:


📖 Theoretical Content

Introduction to Location Privacy

Location Data Ubiquity: Modern devices continuously collect location information:

Why Share Location Data?

Location Data Structure

Basic Location Record:

Location Data = (Identity, Position, Time)

Components:

Collection Patterns:

Types of Location-Based Services (LBS)

1. Navigation Services

2. Information Services

3. Social Services

4. Commercial Services


🔍 Detailed Explanations

Privacy Threats in Location Data

1. Inference Attacks Deriving sensitive information from location patterns:

Home/Work Location Inference:

Lifestyle and Interest Inference:

Relationship Inference:

2. Tracking Attacks Following individuals across time and space:

Temporal Tracking:

Cross-Service Tracking:

3. Profiling Attacks Building detailed personal profiles:

Demographic Profiling:

Behavioral Profiling:

Location Privacy-Preserving Mechanisms (LPPMs)

1. Spatial Obfuscation Reducing location accuracy through spatial techniques:

Noise Addition:

Cloaking/Enlargement:

Generalization:

2. Temporal Obfuscation Modifying temporal aspects of location data:

Delay/Caching:

Dummy Generation:

3. Anonymization Techniques Removing or modifying identifiers:

Pseudonymization:

k-Anonymity for Location:


💡 Practical Examples

Example 1: Spatial Obfuscation Implementation

Scenario: User wants navigation help while protecting exact home location

Original Location: (42.3601° N, 71.0589° W) - Boston Common Privacy Goal: Hide exact position within 100-meter radius

Gaussian Noise Method:

import numpy as np

def add_gaussian_noise(lat, lon, radius_meters):
    # Convert radius to degrees (approximate)
    radius_deg = radius_meters / 111000  # 1 degree ≈ 111km
    
    # Generate Gaussian noise
    noise_lat = np.random.normal(0, radius_deg/3)  # 3σ rule
    noise_lon = np.random.normal(0, radius_deg/3)
    
    return lat + noise_lat, lon + noise_lon

# Apply obfuscation
obfuscated_lat, obfuscated_lon = add_gaussian_noise(42.3601, -71.0589, 100)
# Result: (42.3594° N, 71.0596° W) - ~95m displacement

Grid-based Cloaking:

def grid_cloak(lat, lon, grid_size_meters):
    # Convert to grid coordinates
    grid_size_deg = grid_size_meters / 111000
    
    # Snap to grid center
    grid_lat = round(lat / grid_size_deg) * grid_size_deg
    grid_lon = round(lon / grid_size_deg) * grid_size_deg
    
    return grid_lat, grid_lon

# Apply grid cloaking
cloaked_lat, cloaked_lon = grid_cloak(42.3601, -71.0589, 200)
# Result: Snapped to 200m×200m grid cell

Example 2: Temporal Caching Strategy

Scenario: Social media app with location sharing

Privacy Challenge: Real-time location sharing enables precise tracking

Caching Solution:

import time
import random

class LocationCache:
    def __init__(self, cache_time=300):  # 5 minutes default
        self.cache_time = cache_time
        self.cached_locations = []
        
    def add_location(self, lat, lon, timestamp):
        # Add random delay (0-60 seconds)
        delay = random.uniform(0, 60)
        release_time = timestamp + delay
        
        self.cached_locations.append({
            'lat': lat, 'lon': lon, 
            'release_time': release_time
        })
        
    def get_locations_to_release(self, current_time):
        # Release locations whose time has come
        ready = [loc for loc in self.cached_locations 
                if loc['release_time'] <= current_time]
        
        # Remove released locations from cache
        self.cached_locations = [loc for loc in self.cached_locations 
                               if loc['release_time'] > current_time]
        
        return ready

Example 3: k-Anonymity for Location

Scenario: Location-based service requiring k=5 anonymity

Challenge: Ensure 5 users share same spatial-temporal region

Implementation:

class LocationKAnonymizer:
    def __init__(self, k=5, spatial_threshold=1000, temporal_threshold=3600):
        self.k = k
        self.spatial_threshold = spatial_threshold  # meters
        self.temporal_threshold = temporal_threshold  # seconds
        self.pending_requests = []
        
    def add_request(self, user_id, lat, lon, timestamp, query):
        self.pending_requests.append({
            'user_id': user_id, 'lat': lat, 'lon': lon,
            'timestamp': timestamp, 'query': query
        })
        
        # Try to form k-anonymous group
        return self.try_form_group()
        
    def try_form_group(self):
        if len(self.pending_requests) < self.k:
            return None
            
        # Find spatially and temporally close requests
        groups = self.cluster_requests()
        
        for group in groups:
            if len(group) >= self.k:
                # Generate anonymized location (centroid)
                center_lat = sum(r['lat'] for r in group) / len(group)
                center_lon = sum(r['lon'] for r in group) / len(group)
                
                # Remove grouped requests from pending
                for req in group:
                    self.pending_requests.remove(req)
                    
                return {
                    'location': (center_lat, center_lon),
                    'user_count': len(group),
                    'queries': [r['query'] for r in group]
                }
        
        return None

❓ Self-Assessment Questions

Question 1: What are the three main types of attacks on location data? Provide examples for each. (Click to reveal answer) **Answer:** **1. Inference Attacks:** Deriving sensitive information from location patterns - **Home/Work Inference:** Regular nighttime locations reveal home address - **Health Inference:** Visits to hospitals, clinics reveal medical conditions - **Religious Inference:** Regular visits to religious institutions - **Relationship Inference:** Co-location patterns reveal social connections **2. Tracking Attacks:** Following individuals across time and space - **Temporal Tracking:** Monitoring daily movement patterns and routines - **Cross-Service Tracking:** Linking location data from multiple apps/services - **Stalking:** Real-time following of target individuals **3. Profiling Attacks:** Building comprehensive personal profiles - **Demographic Profiling:** Income estimation from visited neighborhoods - **Behavioral Profiling:** Shopping habits from retail location visits - **Lifestyle Profiling:** Entertainment preferences from venue choices - **Social Profiling:** Social status from exclusive locations visited
Question 2: Compare spatial obfuscation techniques: noise addition vs. cloaking vs. generalization. What are the trade-offs? (Click to reveal answer) **Answer:** **Noise Addition:** - **Method:** Add random displacement to coordinates - **Privacy:** Continuous protection, harder to reverse - **Utility:** Maintains relative distances and patterns - **Drawback:** Possible location outside valid areas (ocean, private property) **Cloaking/Enlargement:** - **Method:** Report larger areas instead of exact points - **Privacy:** Guaranteed containment within region - **Utility:** Good for range queries, preserves area containment - **Drawback:** Coarse granularity, potential for inference from region boundaries **Generalization:** - **Method:** Use hierarchical location representations (street→city→state) - **Privacy:** Strong protection through reduced specificity - **Utility:** Good for statistical analysis, maintains hierarchical relationships - **Drawback:** Significant utility loss, limited query types supported **Trade-offs:** - **Accuracy vs Privacy:** More obfuscation = less accuracy - **Consistency:** Some methods may produce inconsistent results - **Computational Cost:** Complex methods require more processing - **Service Quality:** Different LBS types have different utility requirements
Question 3: Explain the concept of "mix zones" in location privacy. How do they work and what are their limitations? (Click to reveal answer) **Answer:** **Mix Zones Concept:** Geographic regions where users change their pseudonyms simultaneously, making it difficult to link trajectories before and after entering the zone. **How They Work:** 1. **Entry:** Users enter mix zone with old pseudonyms 2. **Mixing:** All users in zone stop reporting locations temporarily 3. **Exit:** Users exit with new pseudonyms, breaking trajectory linkage 4. **Unlinkability:** Attacker cannot determine which new pseudonym corresponds to which old pseudonym **Example:** - Alice (ID: A123) and Bob (ID: B456) enter a subway station - Both stop location updates while in station - Alice exits as (ID: X789) and Bob as (ID: Y012) - Attacker cannot determine if X789 is Alice or Bob **Benefits:** - Breaks long-term tracking - Provides formal anonymity guarantees - Works with existing LBS infrastructure **Limitations:** 1. **Requires Multiple Users:** Need sufficient simultaneous users for mixing 2. **Geographic Constraints:** Limited to specific locations (tunnels, buildings) 3. **Timing Attacks:** Entry/exit timing patterns may enable correlation 4. **Service Interruption:** Location services unavailable during mixing 5. **Predictable Locations:** Attackers may predict mix zone usage patterns **Improvements:** - **Silent Periods:** Random delays before pseudonym changes - **Dummy Trajectories:** Generate fake paths during mixing - **Distributed Mix Zones:** Multiple smaller zones vs. few large ones
Question 4: A user visits the following locations in one day: home (8 hours), office (8 hours), restaurant (1 hour), gym (1 hour). What sensitive information could an attacker infer, and how would you protect against such inferences? (Click to reveal answer) **Answer:** **Potential Inferences:** **Direct Inferences:** - **Home Address:** 8-hour nighttime stay reveals residential location - **Workplace:** 8-hour daytime weekday pattern reveals employment location - **Income Level:** Office building/neighborhood indicates salary range - **Lifestyle:** Gym visits suggest health consciousness, disposable income **Indirect Inferences:** - **Commute Pattern:** Travel time/route between home and work - **Transportation Mode:** Speed/path analysis reveals car vs. public transport - **Social Status:** Restaurant choice indicates dining preferences and budget - **Health Information:** Specific gym type (rehabilitation, luxury) reveals health status **Temporal Inferences:** - **Work Schedule:** Regular 9-5 pattern - **Flexibility:** Ability to visit gym/restaurant during workday - **Routine Predictability:** High regularity enables future location prediction **Protection Mechanisms:** **1. Temporal Obfuscation:** ```python # Add random delays to location reports def delay_location_report(locations): for loc in locations: delay = random.uniform(0, 1800) # 0-30 min delay loc['report_time'] = loc['actual_time'] + delay ``` **2. Spatial Generalization:** ```python # Report neighborhood instead of exact address def generalize_location(lat, lon): # Round to ~1km grid grid_size = 0.01 # degrees return round(lat/grid_size)*grid_size, round(lon/grid_size)*grid_size ``` **3. Activity Suppression:** ```python # Selectively hide sensitive locations sensitive_categories = ['medical', 'religious', 'adult'] def filter_sensitive_locations(locations): return [loc for loc in locations if loc['category'] not in sensitive_categories] ``` **4. Dummy Location Injection:** ```python # Add fake locations to mask real patterns def inject_dummy_locations(real_locations): dummies = generate_plausible_locations(real_locations) return real_locations + dummies ```
Question 5: Design a location privacy metric that balances privacy protection with service utility. What factors should it consider? (Click to reveal answer) **Answer:** **Comprehensive Location Privacy Metric (CLPM):** **Components:** **1. Spatial Privacy (SP):** ``` SP = log(Area_obfuscated / Area_minimum_required) ``` - Measures spatial uncertainty introduced - Higher values = better privacy - Normalized by service requirements **2. Temporal Privacy (TP):** ``` TP = Delay_average / Update_frequency_required ``` - Measures temporal obfuscation level - Accounts for service latency requirements **3. Trajectory Unlinkability (TU):** ``` TU = 1 - (Correctly_linked_trajectories / Total_trajectories) ``` - Measures resistance to trajectory tracking - Based on empirical linkage attacks **4. Inference Resistance (IR):** ``` IR = 1 - max(Confidence_sensitive_inference) ``` - Measures protection against sensitive inferences - Considers most successful inference attack **Combined Metric:** ``` CLPM = α×SP + β×TP + γ×TU + δ×IR ``` Where α + β + γ + δ = 1 (weights based on application priorities) **Utility Considerations:** **1. Service Quality Degradation (SQD):** ``` SQD = (Utility_original - Utility_with_privacy) / Utility_original ``` **2. Query Accuracy (QA):** ``` QA = 1 - Average_error_in_location_queries ``` **Final Balanced Metric:** ``` Location_Privacy_Score = CLPM × (1 - λ×SQD) × QA ``` Where λ controls privacy-utility trade-off preference **Factors Considered:** - **Application Requirements:** Navigation needs accuracy, social media tolerates delays - **User Preferences:** Privacy-conscious vs. convenience-focused users - **Threat Model:** Casual observer vs. sophisticated attacker - **Environmental Context:** Urban vs. rural, indoor vs. outdoor - **Temporal Sensitivity:** Emergency vs. routine services **Example Application:** For navigation service: Higher weight on spatial accuracy (low α), moderate temporal requirements (medium β) For social check-ins: Higher weight on inference resistance (high δ), relaxed spatial requirements (higher α acceptable)

📚 Additional Resources

Foundational Papers

Advanced Research

Privacy Tools

Industry Perspectives


⬅️ Week 4 | Main | Week 6 ➡️