40 lines
3.0 KiB
Markdown
40 lines
3.0 KiB
Markdown
---
|
|
name: project-mcp-emaily
|
|
description: "MCP server \"emaily\" - fulltext nad 9 schrankami z Microsoft Graph importu (~268k emailu)"
|
|
metadata:
|
|
node_type: memory
|
|
type: project
|
|
originSessionId: 49aa480f-6667-4832-b091-08f333c27872
|
|
---
|
|
|
|
MCP server `emaily` v [mcp_emaily.py](EmailsImport/mcp_emaily.py), registrovan v `U:\janssen\.mcp.json` jako `emaily`.
|
|
|
|
**Architektura paralelni s [[project_mcp_soubory]]:**
|
|
- Mongo `emaily.<mailbox>` (z [parse_emails_graph_v1.4.py](Python-runner/parse_emails_graph_v1.4.py)) = body_html / body_text + headers + recipients + attachments[]
|
|
- PG `MongoEmaily.emails` (z [enrich_fulltext_emails_v1.1.py](Python-runner/enrich_fulltext_emails_v1.1.py)) = plain text + tsvector index
|
|
|
|
**Pipeline opraveny 2026-06-03:** v1.3 parseru ukladal plain-text emaily JEN jako prvnich 2000 znaku do body_preview a zbytek zahazoval (17 672 emailu, 6.2% korpusu). v1.4 uklada plne plain-text telo do noveho pole body_text. Pro stare zaznamy: [refetch_text_bodies_v1.0.py](Python-runner/refetch_text_bodies_v1.0.py) - prochazi Mongo, refetchne z Graph API jen tam kde body_html i body_text chybi (cca 80 min pro 17.7k emailu). Enrich v1.1 ma fallback poradi html -> body_text -> body_preview.
|
|
|
|
**9 schranek, ~268k emailu celkem:**
|
|
vladimir.buzalka@buzalka.cz, vbuzalka@buzalka.cz, ordinace@buzalkova.cz, alica.buzalkova@buzalka.cz, mbuzalkova@buzalka.cz, jan.luxemburk@luxemburk.cz, vbuzalka@its.jnj.com, jarmila.kusinova@buzalka.cz, michaela.buzalkova@buzalka.cz
|
|
|
|
**Index tsv pokryva:** subject + sender_email + sender_name + to_addrs + cc_addrs + attachments_summary + body. Takze search najde i emaily kde slovo je jen v predmetu nebo jmene odesilatele.
|
|
|
|
**MCP tools:**
|
|
- `ping`, `list_mailboxes` - prehled korpusu
|
|
- `search(query, mailbox?, since?, until?, folder_contains?, sender_contains?, has_attachments?, limit)` - HLAVNI fulltext (websearch_to_tsquery), <<...>> snippet
|
|
- `read_email(message_id, mailbox?, offset/length/around_match, include_html?)` - cely email, slice nebo okno
|
|
- `by_sender(sender, mailbox?, since?, has_attachments?)` - regex na sender_email/name
|
|
- `recent_emails(mailbox?, days, folder_contains?, has_attachments?)` - by received_at
|
|
- `conversation_thread(conversation_id)` - cele Outlook vlakno chronologicky
|
|
- `find_attachment(name_contains, mailbox?, since?)` - hledani podle nazvu prilohy
|
|
- `top_senders(mailbox?, since?)` - kdo me nejvic email
|
|
|
|
**Why:** Mongo uz mela emaily (body_html), uzivatel se ptal jestli se musi znovu stahovat - nemusi. Stacilo z HTML udelat plain text pres BeautifulSoup a zaindexovat v PG.
|
|
|
|
**How to apply:**
|
|
- Pred prvnim plnym importem 268k emailu spustit: `python U:\janssen\EmailsImport\enrich_fulltext_emails_v1.0.py` (~80 min). Pro test `--limit 500 --mailbox X`.
|
|
- Sdileny TS config `soubory` (simple + unaccent) takze diakritika a case insensitive.
|
|
- Pri reset/zmene parseru: bump `EXTRACTOR_VERSION` -> preparsuje vse.
|
|
- Pri dotazu na "co poslal/posilam X" pouzivat `by_sender` namisto search - rychlejsi a vyhne se false matchum v tele.
|