Scrubbing Rules

How we scrub the public activity feed · Version 2.1 · Effective April 22, 2026

This is the written policy WatchLocal follows when publishing the agent-activity feed that appears at the bottom of our homepage. It exists so any client or prospect can verify — with receipts — what we publish, what we strip, and why we believe the feed prove our system works without identifying any single client.

The 10,000-client forcing function

This policy is designed to be correct on the day WatchLocal crosses 10,000 paying clients. Not the day we cross 3. Not the day we cross 50. The day we cross 10,000.

That constraint shapes every decision below. An earlier draft of this policy hardcoded a 51-entry trade list, a 25-entry action list, and a below_threshold: true flag that was meant to flip off later. All three would have become maintenance treadmills — at 10,000 clients we'd be amending the trade list monthly, the action list weekly, and either shipping threshold-flag code into production or scrambling to disable it. We do the work once, at the scale we intend to operate at.

The current design has three properties that survive 10,000 clients:

Closed axes are shaped by the work, not by the world. The 8 work-type categories don't grow — they describe what marketing work is. Surface (platform) and trade (client industry) are open strings because the world supplies those and we don't get to decide when a new platform launches a feature or when a new client signs up as "pool & spa service."
The client owns their own label. Trade names live in the client's own brand profile. When a new client joins as "pool & spa service," no edit to this policy is required. No schema change. The client owns it.
K-anonymity is continuous, not thresholded. No row appears in the feed unless at least k distinct clients contribute to that (category, surface, trade, city) bucket right now, in the live window. There is no flag to flip off. When the cohort grows, more rows appear. When it shrinks, rows disappear. The rule is the same at N=3 and at N=10,000.

If a future amendment adds any field that re-introduces a curated list, a threshold flag, or a client-specific carve-out, that amendment fails the discipline check and should be rejected.

1. Purpose

Define exactly which fields from raw agent logs may appear in the public agent-activity feed, which must be stripped, and which must be generalized. This is also the written policy WatchLocal points at when a client or prospect asks "how do you protect my business's data when you publish agent activity?"

Operating principle: the feed must prove the machine works without identifying any single client. If a reader can combine what we publish with any public data source to deanonymize a specific business, we have failed.

2. Scope

In scope: every record in the public JSON feed served from our infrastructure. The feed has two independent blocks: the per-client tasks array (governed by sections 3–10) and the platform by_category counts (governed by section 9).

Out of scope: internal agent logs, internal dashboards, client-facing monthly visibility reports. Those can contain client names and details — they are not public.

Never in feed: any record whose source is a prospecting action (outreach SMS or email to non-clients). The feed only surfaces actions taken on behalf of paying clients. Prospecting activity is a different category and belongs elsewhere, or nowhere public.

3. The five published fields — and nothing else

Every row in the public feed has exactly these five fields. The aggregator produces rows that look like this and only like this.

Field	Type	Example
`category`	closed enum (8 values — see section 4)	`listing_optimization`
`surface`	free string, ≤ 80 chars	`Google Business Profile`
`trade`	free string, ≤ 60 chars	`HVAC contractor`
`city_state`	free string, ≤ 60 chars (format: `City, ST`)	`Celina, TX`
`ts`	ISO-8601 UTC, hour-rounded	`2026-04-22T14:00:00Z`

Plus two top-level metadata fields in the feed envelope (not per-row): generated_at (hour-rounded) and total/by-category counts.

Anything not listed above is stripped. The aggregator does not pass through an agent field, a business field, a phone field, or any free-text body. If a field isn't in this table, it doesn't appear.

4. The 8 per-client categories (closed enum)

This is the one hardcoded axis in the per-client feed. It describes the kinds of work the agents do. It's closed because the shape of SEO and local-marketing work is closed — new platforms come and go, new client trades come and go, but the fundamental operations fall into these eight buckets.

citation_building       — creating / updating NAP listings on directories
review_management       — requesting, routing, and logging customer reviews
listing_optimization    — updating GBP / Bing / Apple / Yelp listing fields (hours, services, photos, service areas)
content_publishing      — posting updates / photos / offers to listing or social surfaces
schema_markup           — adding or updating structured data on the client's site
technical_seo           — sitemap, redirects, canonicalization, page speed, indexing
reputation_response     — responding to reviews, Q&A, and community threads
seasonal_content        — seasonal campaigns, holiday hours, weather-driven service posts

If a raw action doesn't fit any of the 8 categories, the record is dropped and logged to the violations log (section 10). We do not invent a 9th category on the fly. Amendments to this list require written approval, a policy version bump, and a discipline check — and are reviewed skeptically, because the whole point of 8 is that the list doesn't grow.

5. `surface` and `trade` — free strings, client-owned

5a. `surface`

The platform or channel on which the work was performed. Examples the aggregator emits today: Google Business Profile, Yelp, Nextdoor, Facebook Business, Instagram, Bing Places, Apple Maps, BBB, Angi, HomeAdvisor, Thumbtack, client website, client sitemap, schema.org.

No enum. When a platform ships a new feature and we start using it, the string just appears in the feed. When a new supported platform goes live, no schema change is needed. The aggregator enforces a maximum length of 80 characters as a sanity cap — beyond that, it's almost certainly a copy-paste of a description and gets dropped to violations.

5b. `trade`

The client's industry label, as the client wants it displayed. The client owns this value. It lives in the client's brand profile:

{
  "slug": "acme-hvac-celina",
  "display_trade": "HVAC contractor",
  ...
}

When a client is onboarded, display_trade is set once. The aggregator reads it and emits it verbatim. If the client later says "show us as 'heating & cooling' instead," they edit their brand profile — no code change, no policy amendment, no coordination with shared infrastructure.

Sanity caps: maximum length 60 characters. If the string is empty or malformed, the record is dropped to violations.

5c. Why free strings work at 10,000 clients

A single closed enum for trades is a maintenance treadmill that breaks under load: every new client onboarded in a category we haven't seen before would require a policy edit, a schema edit, and a deploy. Free strings plus client-owned labels move the responsibility to where it belongs (the client) and remove the central bottleneck.

The risk of free strings is that a client could type something revealing ("Joe's One-Truck Plumbing" instead of "Plumber"). Three controls mitigate this:

display_trade is labeled on the onboarding form: "Industry or trade only. Do NOT include your business name, your city, or your phone number."
The aggregator runs a regex check before emitting: reject any trade value containing digits, @, http, or more than 3 consecutive capitalized words — all proxies for "this is a business name, not a trade."
The continuous k-anonymity rule (section 6) guarantees no row appears unless at least k distinct clients share the exact (category, surface, trade, city) bucket. A one-off bad label can't leak through because it won't cluster.

6. Continuous k-anonymity

Rule: before emitting any row, the aggregator groups the last 30 days of scrubbed activity records by the tuple (category, surface, trade, city_state). Only buckets with at least k = 3 distinct client slugs contributing are emitted. The aggregator publishes the latest timestamped record from each qualifying bucket.

Generalization is not automatic. Earlier drafts quietly bumped Pediatric dentist to Dentist to force a match. The current design does not — if a bucket doesn't meet k, the row is simply not published. Generalization is the client's own choice via display_trade (a solo pediatric-dentist client can set display_trade to "Dentist" if they want broader coverage; that's their call, not the aggregator's).

The k parameter. k = 3 at launch. If an analysis at N=500 or N=5,000 clients shows k=3 is too loose (specific buckets deanonymize under re-identification attacks), we raise k globally. We do not add per-category carve-outs.

City widening is NOT performed by the aggregator. Earlier drafts had a fallback that dropped city and published state-only rows. This was removed. If (category, surface, trade, city_state) doesn't meet k, nothing is emitted for that bucket. State-level aggregation would create deanonymization risk when combined with external data (e.g., only Texas HVAC contractor doing schema markup = one client). Simpler and safer: don't publish unless the full tuple meets k.

7. Timestamp handling

Round to the hour: 2026-04-22T14:23:07Z becomes 2026-04-22T14:00:00Z. Drop minutes and seconds before the record enters the feed.
Do not expose more than 30 rolling days of history in the public feed, even if older data exists internally.
generated_at at the feed envelope level is also hour-rounded.
Exact second and minute timestamps are a re-identification surface (cross-referenced with a client's own social-post timestamps); the hour-round breaks that.

8. Twelve concrete before/after examples

Raw = what lives in internal logs. Scrubbed = what the aggregator emits into the public JSON feed (assuming k-anonymity passes; the row is dropped entirely if it doesn't).

1. citation_building — NAP listing created for a plumber

Raw: {"business":"Celina Plumbers","phone":"(972) 555-1234","action":"Yelp citation created","ts":"2026-04-22T09:15:42Z"}

display_trade: "Plumber"

Scrubbed: {"category":"citation_building","surface":"Yelp","trade":"Plumber","city_state":"Celina, TX","ts":"2026-04-22T09:00:00Z"}

2. review_management — review request SMS to an existing customer (not prospecting)

Raw: {"business":"Mike's HVAC of Celina","action":"review request SMS sent","recipient":"+19725551234","customer_status":"active_client","ts":"2026-04-22T14:23:07Z"}

display_trade: "HVAC contractor"

Scrubbed: {"category":"review_management","surface":"SMS","trade":"HVAC contractor","city_state":"Celina, TX","ts":"2026-04-22T14:00:00Z"}

3. listing_optimization — GBP service-area update

Raw: {"business":"A+ Garage Door of Prosper","action":"GBP service area updated","platform":"Google Business Profile","ts":"2026-04-22T08:00:00Z"}

display_trade: "Garage door specialist"

Scrubbed: {"category":"listing_optimization","surface":"Google Business Profile","trade":"Garage door specialist","city_state":"Prosper, TX","ts":"2026-04-22T08:00:00Z"}

4. content_publishing — Nextdoor post with free-text body stripped

Raw: {"business":"Little Elm Pediatric Dentistry","action":"Nextdoor post published","post_body":"Dr. Kim reminds parents of back-to-school fluoride checkups!","ts":"2026-04-22T19:00:00Z"}

display_trade: "Pediatric dentist" — if only 1–2 pediatric-dentist clients in Little Elm, k-anonymity fails and the row is dropped entirely, not generalized

Scrubbed (if k ≥ 3): {"category":"content_publishing","surface":"Nextdoor","trade":"Pediatric dentist","city_state":"Little Elm, TX","ts":"2026-04-22T19:00:00Z"} — post_body stripped

5. schema_markup — LocalBusiness markup added to client site

Raw: {"business":"John Smith Roofing LLC","contact_email":"[email protected]","action":"schema.org LocalBusiness markup added","ts":"2026-04-22T16:47:11Z"}

display_trade: "Roofer"

Scrubbed: {"category":"schema_markup","surface":"client website","trade":"Roofer","city_state":"Frisco, TX","ts":"2026-04-22T16:00:00Z"} — owner name and email stripped

6. technical_seo — sitemap refresh

Raw: {"business":"Wylie Wellness Chiropractic","action":"sitemap.xml regenerated","pages_added":12,"ts":"2026-04-22T15:20:00Z"}

display_trade: "Chiropractor"

Scrubbed: {"category":"technical_seo","surface":"client sitemap","trade":"Chiropractor","city_state":"Wylie, TX","ts":"2026-04-22T15:00:00Z"} — pages_added dropped (not a published field)

7. reputation_response — GBP review response (bodies stripped)

Raw: {"business":"Frisco Dental Group","action":"GBP review response sent","review_text":"Dr. Johnson was great with my daughter","response_text":"Thanks Susan — we loved having Emma!","ts":"2026-04-22T11:30:00Z"}

display_trade: "Dentist"

Scrubbed: {"category":"reputation_response","surface":"Google Business Profile","trade":"Dentist","city_state":"Frisco, TX","ts":"2026-04-22T11:00:00Z"} — review_text and response_text both dropped

8. seasonal_content — weather-driven HVAC post

Raw: {"business":"McKinney Cooling & Heating","action":"GBP post published","post_body":"Summer heatwave tune-up special","theme":"seasonal","ts":"2026-04-22T10:00:00Z"}

display_trade: "HVAC contractor"

Scrubbed: {"category":"seasonal_content","surface":"Google Business Profile","trade":"HVAC contractor","city_state":"McKinney, TX","ts":"2026-04-22T10:00:00Z"} — post_body and theme dropped; category chosen over generic content_publishing because the raw theme field tagged it seasonal

9. Prospecting record — DROP (never in feed)

Raw: {"business":"Allen Auto Works","action":"outreach SMS sent","recipient_number":"(214) 555-9999","customer_status":"prospect","ts":"2026-04-22T17:44:00Z"}

Scrubbed: record dropped — prospecting is out of scope per section 2; logged to violations with reason out_of_scope_prospecting

10. listing_optimization — Apple Maps update where display_trade is unusual

Raw: {"business":"Diana's Custom Koi Pond Installations","action":"Apple Maps listing hours updated","ts":"2026-04-22T13:05:22Z"}

display_trade: "Pond & water feature contractor" — client's choice, not aggregator-forced

Scrubbed (if k ≥ 3): {"category":"listing_optimization","surface":"Apple Maps","trade":"Pond & water feature contractor","city_state":"Plano, TX","ts":"2026-04-22T13:00:00Z"}

More likely at low N: row dropped because the bucket doesn't meet k=3. Not generalized. Client has the option to change display_trade to "Landscape contractor" to widen their cohort.

11. schema_markup — FAQ schema

Raw: {"business":"The Colony Physical Therapy","action":"FAQ schema added to /faqs","ts":"2026-04-22T18:00:00Z"}

display_trade: "Physical therapist"

Scrubbed: {"category":"schema_markup","surface":"client website","trade":"Physical therapist","city_state":"The Colony, TX","ts":"2026-04-22T18:00:00Z"}

12. review_management — review request email batch

Raw: {"business":"Anna's Bakery Frisco","action":"review request email sent","recipients":14,"customer_status":"active_client","ts":"2026-04-22T07:30:00Z"}

display_trade: "Bakery"

Scrubbed: {"category":"review_management","surface":"email","trade":"Bakery","city_state":"Frisco, TX","ts":"2026-04-22T07:00:00Z"} — recipients count dropped (not a published field)

9. The platform activity block (counts only)

Why this block exists. Sections 3–8 above govern the per-client tasks array: individual marketing actions taken on behalf of paying clients, scrubbed to 5 fields, gated by continuous k-anonymity. That's correct at 10,000 clients and at 3 clients, but at 3 clients zero rows qualify — the per-client feed correctly shows empty. Meanwhile the system is still doing real work every day (audits, market research, technical scans, listing surveys) across the entire public web. Section 9 exists so that work can be published honestly, in a way that carries zero re-identification risk and therefore needs no anonymity math.

9a. Scope

In scope for the platform block: aggregate counts of work performed by our agents that is not tied to any specific paying client — public-web audits, market-ranking reports, directory surveys, technical-health scans, content scheduled for public distribution, platform-infrastructure and cost events.

Out of scope for the platform block (same as section 2):

Any outreach event to a specific recipient (SMS or email to a prospect). The individual send is out-of-scope; the aggregate count of outreach sends is also out-of-scope because that number signals commercial intent and is competitively sensitive.
Any per-business row, even without a name. If the shape of a row would allow cross-reference with public data to identify who was scraped or audited, it doesn't go in.
Anything that would require a per-client display_trade or city_state lookup to render — those belong in the per-client tasks array, not here.

9b. What gets published — counts only

The platform block in the feed is a flat structure:

"platform": {
  "total_30d": 467,
  "by_category": {
    "business_audits":     111,
    "market_intelligence":  70,
    "content_publishing":  105,
    "technical_scans":      78,
    "ranking_reports":      84,
    "listing_scrapes":      19,
    "ops_events":            0
  }
}

No timestamps per action, no surfaces, no cities, no trades, no business identifiers, no row array. This is the sufficient-and-necessary condition for no re-identification risk. A bare integer carries no identification signal regardless of scale.

9c. The 7 platform categories (closed enum)

Parallel to the 8-value per-client enum but independent from it. Shaped by the work the fleet does at platform scale, not by trade or platform.

business_audits       — comprehensive multi-dimension audits across the public web
market_intelligence   — market-ranking and gap analyses
content_publishing    — social and listing posts scheduled or published
technical_scans       — security and technical-health scans
ranking_reports       — local-ranking reports
listing_scrapes       — directory surveys to identify market opportunities (survey runs, NOT outreach events)
ops_events            — platform ops and infrastructure cost events

A new category requires written approval, a policy version bump, and a discipline check. Same discipline as section 4: the shape does not grow with the world.

9d. What sources feed these counts

The aggregator counts records produced by our agents over the last rolling 30 days, broken down by category:

Category	Source (abstract)
`business_audits`	Multi-dimension audit run outputs (one record per completed audit)
`market_intelligence`	Market-gap and competitive-ranking analyses (one record per completed analysis)
`content_publishing`	Scheduled and published post records (one record per post)
`technical_scans`	Security-alert and technical-health scan outputs (one record per scan run)
`ranking_reports`	Local-ranking report outputs (one record per report)
`listing_scrapes`	Directory-survey run outputs — survey runs, NOT any individual outreach SMS or email that may follow, which remain out of scope per section 2
`ops_events`	Platform operations and cost event records

If a source category produces zero records in the window, its count is 0 and the feed reflects that honestly. No source list, no client list, and no thresholds are hardcoded — as coverage expands, counts grow automatically.

9e. Discipline check

The platform block does not introduce a curated list, a threshold flag, or a client-specific carve-out:

Closed enum of 7 platform categories: shaped by work types, same rationale as the 8-value per-client enum. Does not grow with scale.
No threshold flag: there is no below-N fallback, no per-category toggle. If a source is empty, its count is 0 and the feed reflects that.
No client carve-out: this section explicitly publishes nothing per-client. The only client-related block in the feed remains the tasks array, governed by sections 3–8 unchanged.

10. Violation handling

The aggregator must never silently pass through anything that fails a rule above. On failure:

Drop the record from the feed.
Write a line to an internal violations log containing the raw record plus the rule that was violated. Violation codes:
- category_not_in_enum — raw action didn't map to any of the 8 categories
- surface_empty or surface_too_long
- trade_empty, trade_too_long, or trade_regex_fail (looks like a business name, not a trade)
- city_state_missing or city_state_malformed
- ts_unparseable
- k_anonymity_fail — bucket had fewer than k distinct clients (not an error, just a filter; logged at debug level, not violation level)
- out_of_scope_prospecting — record was a prospecting action
- platform_out_of_scope — source was not in the section-9d whitelist
- platform_outreach_leak — source appeared to be an outreach send (recipient phone, email, or SMS body present)
Never auto-expand the category enums — a new category requires a human-approved amendment to this document.

The violation log is reviewed at the end of each internal review cycle. If the non-debug violation rate exceeds 1% of records for two consecutive cycles, the aggregator auto-disables publishing and alerts the team.

11. Amendment process

Changes to this document require:

Written founder approval (commit reviewed in a dedicated PR).
A change-log entry explaining the why.
A corresponding schema update if field names change.
A discipline check: does the change introduce a curated list, a threshold flag, or a client-specific carve-out? If yes, the amendment is rejected by default. Strong cause required.

The aggregator reads the current version of this document's policy constants at runtime via the feed's JSON schema, which is the enforceable artifact, and logs which schema version it was built against.

12. What changed from the earlier draft

Concern	Earlier draft	Current (v2.1)
Work-type axis	`action` — 25-entry closed enum	`category` — 8-entry closed enum
Platform axis	`platform` — implicit closed list	`surface` — free string, ≤ 80 chars
Industry axis	`trade` — 51-entry closed enum maintained centrally	`trade` — free string, ≤ 60 chars, owned by client
Small-cohort handling	Below-threshold flag, expected to flip later	Continuous k-anonymity — row simply doesn't appear unless k ≥ 3
Generalization	Aggregator auto-bumped `Pediatric dentist` to `Dentist`	Client chooses their own `display_trade`
City widening	`Dentist, TX` fallback when `(trade, city)` cohort too small	Removed — state-only rows too re-identifiable
New client or platform	Policy amendment + schema amendment + deploy	No coordination — strings just appear
Agent identity in feed	Agent identifier published	Dropped — redundant with category, and reduces fingerprinting surface

Contact

Questions about this policy, or want receipts for any specific claim? Email the founder at [email protected].