Tuesday, January 30, 2024

Is the Path to AI/ML Commercial Success Curated Wall Gardened Data Sources?

There has been quite a lot written on the subject of hallucinations and the increasing amount of erroneous results coming from the major AI/LLM systems, in part as more bad data is generated the more bad data exists on the internet for the major AI systems to "learn" from. A good article on the subject is Is ChatGPT Getting Worse Over Time?

This phase of technology development is very different from the origins of the web and search engines where the search companies dumped in everything to there indexing engines and then presented many results for users to sort through, with LLMs the user asks a specific question and expects a correnct answer, so the relationships between questions and answers are now 1:1 rather than 1:n - I would argue that this puts a higher burden on the providers of answers.

So how can the benefits of AI/ML be safely realized by businesses and governments? I would posit that at least until the LLMs that use public data can guarantee a higher quality the answer is to leverage well curated data sources. A good example of this is Apple working on licensing data from well known publishers, see: Apple Explores A.I. Deals With News Publishers.

So are the major beneficiaries of this wave of innovation the aggregators of clean curated data? Perhaps the recent Juniper/HPE deal is an indication of this. One of the rationals for this deal was discussed on a couple of podcasts on Silicon Angle  Research Analysis: HPE Acquires Juniper and  The AI evolution in tech: Pioneering smarter decisions, from surgery to security. I would argue if HPE can integrate its data from compute, storage and corporate wireless into MIST AI the combined company will be able to offer customers something very unique - the ability to manage their whole IT infrastructure through AI/ML that is safe and dependable.

This then raises the question for customers of data aggregators, why should I allow you to collect and use my data? There has to be a strong value proposition for the customer to share their data and a high guarantee of anonymization. In the case of MIST AI the benefit is improved IT management, giving unified management of compute, storage and networking. I imagine that HPE hopes that the benefit is that enterprises who buy Juniper will want to add HP gear and vice versa to leverage the single view of the enterprise provided by MIST.

There are many other curated information sources, health care, security, manufacturing, etc, but I believe in all these verticals the key to successfully deployment of AI/ML solutions is aggregating the data (and getting permission to do it, with the appropriate anonymization). Also aggregating it such that "truth" is maintained and aggregated from all the different sources. The ability of the human brain to do such "voting" on multiple reference frames is discussed by Jeff Hawkins in "A Thousand Brains". Being able to automate this knowledge collection and creating a clean knowledge base in multiple domains is one of the big challenges, I think, we face making AI/ML successful.

So not only is there a need to have a lot of data, there is also a need to have well organized data sets that have unambiguous facts collected from multiple sources that can be leverage to give accurate answers to questions that are are now 1:1 rather than 1:n. Customers will need to see a benefit to allow the aggregator to collect the data, as the value of the data increases there could be some interesting discussions on licensing. When a customer goes to a corporate support site, or management tool, and asks a question, the expectation is now a single correct answer not a list of search results that the user has to decide which is relevant to their problem.


Saturday, January 06, 2024

Cisco, Isovalent Acquistion

Well it has happened, I have followed Isovalent/Cilium for a while now and have always been impressed by the company and what they have been doing with eBPF. In the past several months in conversations with associates we have discussed Cilium and its future/exit strategy and the one company I always came back too was Cisco. Though I thought is also might make sense for one of the pure play security companies to acquire them, especially Fortinet as that would help plug their major public cloud gap, especially as Cilium's Tetragon product is such a good security play. The combination of networking and security that Isovalent has pioneered with eBPF is a good fit for Cisco - they are both plumbers :-).

I see this as a big plus for Cisco if they manage it correctly, they get an instant footprint in most if not all public cloud vendors, where they are not as strong as they could be. They will also have additional opportunities in service providers and mobile where they are strong and Kubernetes is gathering stream. The other big play for Cisco is the amount of observability data that Cilium/Tetragon makes available. Data is the life blood of AI/ML and the data generated by Cilium/Tetragon is structured, close to realtime, and common across cloud vendors. The opportunity for Cisco/Isovalent is to provide almost realtime AI/ML powered security across clouds, and also enterprises.

Moving forward I see an opportunity for Cisco/Cilium to expand Tetragon beyond Kubernetes, especially in public cloud where deploying eBPF infrastructure as a part of a standard image would be possible enabling deeper visibility and security, and broadening the security boundaries. Once again data is the life blood of AI/ML and getting a large footprint into the infrastructure layer provides Cisco and the public cloud vendors to provide better and more realtime security.

The only downside of the acquisition, I see, is that the pace of eBPF/Linux development may slow. Isovalent as a nimble startup could push changes into the kernel with minimal corporate overhead, now as part of Cisco I would worry that the focus and speed of innovation of eBPF/Linux will slow down. However time will tell.

So congratulations to the Isovalent team and it will be interesting to see where this goes.