2023-08-09 project diary entry

Project futures; planning notes:

  • focus on extracting topics from documents and collections of documents
  • use more than one framework (e.g., langchain, llama-index, ?)
  • use OpenAI and other models, including huggingface pipelines, gpt4all, ?
  • explore value of organizing aipraxisLab repository along model or framework axes (?)
  • set up three different document collections for testing
  • keep notes on Python setup details (venv, pip, what else?)
  • wrap-up individual experiments with notes and possibly put into separate folders
  • keep the lab benches clean, with only a few exp'ts going at a time (perfect use for a Kanban board); in fact, limit number of open exp'ts to three or fewer.

Peter Kaminsky suggestion from Massive Wiki Wednesday 2023-08-09 call:

Idea for Category / Topic Mapping for Articles

  • have ChatGPT make a list of categories or topics
    • then have it make sub-categories and sub-sub-categories, etc. as desired
  • have ChatGPT synthesize articles for the leaf categories
  • generate embeddings for the synthetic articles
  • do vector database matching with embeddings for real articles

