I’ve learned of an interesting new type of text extraction and query AI. They seem to have become public during March / Easter and I’d not previously been aware of them. You upload a single PDF, have an AI auto-tag, segment, summarise the segments, and cross-page links built for its various topics and facts, etc. Probably more, ‘under the hood’.

All this is done in order to make the PDF more searchable in the form of “chat”. After upload and analysis you can “chat” with the uploaded PDF, by asking it natural-language questions. Results have natural language answers, and links to the relevant page-numbers. Which means you can check that the AI isn’t getting it wrong (as they often do) due to dodgy ‘facts’ in the model inputs and/or confabulation when forming the reply.

Such things are exemplified by the likes of ChatPDF, Humata, Unriddle, Docu-Talk and Docalysis. Doubtless Microsoft Office is also ‘on the case’ with this sort of thing, if they don’t already have it in Office 365. I no longer have access to 365, and it’s difficult to discover a good overview explaining their vast range of new AI assistants.

Anyway, the new assistants are perhaps useful for those who want to plump up a traditional back-of-the-book index, and be sure they’ve not missed anything. Doubtless you’ll think of other uses.

As usual with such services, you don’t know where the PDFs or the questions are going after they hit the remote servers in Whereizitagin. So sending PDFs or asking questions that could reveal business or research secrets is not advisable. But I imagine that this sort of ‘one-book analysis’ is not too processor-intensive, so doubtless there will be local non-cloud versions soon enough. If there aren’t already.

But I also wonder what would happen if one uploaded a single-file PDF of the collected fiction (or even letters or essays) of H.P. Lovecraft. To what extent would it be like ‘talking’ with Lovecraft, and how original would it seem? In other words, would it be doing a minimum of comparing statements across disparate pages, then bringing them together in a way that offers a more powerful insight into the topic in question? And could a further ‘style model’ be built from the PDF, which would mean that the replies are given in a Lovecraftian manner?

Meanwhile, a second fully and properly ‘open source’ chat AI is released, OpenAssistant. The first was OpenChatKit a month ago.