We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Let the OSS Enterprise newsletter guide your open-source journey! Sign up here.
If there’s one thing the enterprise world doesn’t have a shortage of, it’s data. But access to data doesn’t necessarily equate to useful, contextualized information that’s easy to search and derive insights from.
The holy grail of information retrieval, arguably, is the ability to search vast data repositories using simple, plain-English (or whatever your mother tongue is) queries — natural language processing (NLP) is the name of the game. And this is something that German company Deepset is setting out to solve, with an open-source NLP framework called Haystack that enables developers to build pipelines for myriad search use-cases.
Founded in 2018, Deepset started work on Haystack in 2019, and released the first incarnation of the open-source project the following May. In the near two years since, Haystack has attracted nearly 100 contributing developers from around the world, with thousands of organizations such as Alcatel Lucent using the open-source product, and many companies such as aerospace giant Airbus paying Deepset to provide professional support and services on top of Haystack.
It was these initial revenues that enabled Deepset to bootstrap its growth over the past few years, and today the Berlin-based company is unveiling a new cloud-based product that ushers Haystack into the modern enterprise software-as-a-service (SaaS) realm. Deepset is also announcing a $14 million series A round of funding led by Alphabet’s venture capital arm GV, with participation from a slew of institutional and angel investors including founders of esteemed companies such as Cockroach Labs, Cloudera, Deepmind, Neo4J, and NGINX.
NLP for all
So, what kinds of things can developers use Haystack for? Well, anything that involves retrieving information using natural language. A company that has built a library of technical documentation for staff to search through, as Alcatel Lucent Enterprise did, can create a chatbot to let technicians ask questions or describe an issue that they’re having, and serve up the best answers from the digital documents.
Alternatively, a government could create an NLP-powered search system to make it easier to find information across different internal websites, while a financial services company can automate aspects of their risk-management workflow by allowing auditors to ask questions such as “How did revenues evolve in the past year” during a credit approval application.
But in truth, Haystack can be used for just about anything that involves a knowledge-base search, such as internal wikis that plug into an extensive arsenal of documents and databases to deliver insights on whatever subject matter is important to an organization.
In terms of how developers and companies deploy the technology within their stack, Haystack basically offers a more convenient way of serving NLP models, making it easy to try out models from Hugging Face, and figure out what works for a specific NLP use-case — Haystack presents a more developer-friendly way of building an API-driven backend application, using existing building blocks from the broader NLP realm.
“Haystack is built for the modern world of NLP — it is part of an extremely rich and completely open NLP environment that has flourished in the past few years,” Deepset cofounder and CEO Milos Rusic told VentureBeat. “It is very hard to maintain the required level of sophistication with any proprietary solution, there’s so much happening and new [NLP] models, algorithms, and workflows appear practically every day. Haystack allows developers to access the latest outcomes of this open NLP world, and leverage the top-notch building blocks in a practical, rapid, and safe manner.”
The Haystack-based NLP is usually deployed atop a text database such as Elasticsearch or Amazon’s OpenSearch fork, and then integrates directly with the end-user application (e.g. in a search bar or chatbot) via a REST API.
So, while something like Elasticsearch is a well-established keyword-based search engine for enterprises, Haystack allows developers to add NLP-powered semantic search on top of it, one that understands the actual meaning of the query.
For comparison, in a keyword search, the user will likely start with a single word or set of words to narrow down their search to find their desired results — but even then they might not find what they’re looking for, and may have to sift through various tenuously related sources. In Haystack’s neural search domain, results are automatically adjusted based on a deeper understanding of what the person is actually asking.
It’s worth noting that in its current guise, Haystack is mostly designed for text-based NLP searches, though users are able to build a custom node for voice-based searches so they can tap into any number of third-party speech-to-text models from Hugging Face or other commercial APIs. But in the coming months, Deepset will be rolling out native support for voice-based searches, according to Rusic.
“We will have a dedicated, native node for it [voice search], which will make it easier for developers to do all the other workflows in Haystack and Deepset Cloud, that helps them to build successful voice-based search pipelines,” Rusic said.
Landscape
Haystack inhabits a world that includes notable open-source NLP toolkits and frameworks like Spacy and the aforementioned Hugging Face, while it also jives with the likes of semantic search and information retrieval entities such as Vespa, Weaviate, Jina AI, Zilliz. However, Rusic is quick to stress that they are not really like-for-like comparisons.
“Due to the design of Haystack, we are not really in competition with those companies but are partnering with them, are often integrated with each other, and also create joint content — like with Huggingface, Weaviate or Zilliz.
On the proprietary side, Haystack can perhaps be compared to the likes of Amazon’s AWS Kendra, Microsoft’s Azure Cognitive Search, or Sinequa, but this is where Haystack’s open-source foundations set it apart. Indeed, open source has played a pivotal role not only in the advancement of the internet as we know it, but in the burgeoning AI sphere where trust and transparency is key.
“In order to reach mainstream adoption, AI needs to be more approachable,” Rusic explained. “Vendors who claim to have unique AI, models and so on, struggle with large(-scale) adoption due to a lack of trust and transparency. With an open source approach, the core tech is open, benchmarks exist that give an idea about the true performance, as well as research and content is created around the projects that educate the market. All of this is essential to bring AI and NLP to the mainstream.”
This also helps companies attain a higher level of independence, as they have greater control over the technologies and systems that make up their stack.
“For all disruptive technologies, but especially for AI and NLP, being locked-in is what most enterprises fear,” Rusic continued. “With an open-source technology, this allows [them] to move between vendors or even consider self-hosting systems — this lock-in is way lower, and drives not only the confidence to adopt a technology but is also becoming a requirement.”
On top of all that, open-source technology is far easier to customize and tailor to specific applications and use-cases — companies can adapt it to their own unique needs, while developers can tinker with things and really dive under the hood to see what makes it tick.
“Many engineers are ‘kinesthetic’ learners — they like to see the code, ‘touch’ it, try things out fast, learn by example, and so on,” Rusic added. “They also like to share their findings, and this is what drives so many open-source communities. Only an open-source approach brings the most of the above, as compared to anything ‘proprietary.’”
Deepset Cloud
With a fresh $14 million in the bank, Deepset is better positioned to build on top of the open-source foundation it has created with Haystack over the past few years, which is where it’s new enterprise-focused SaaS product enters the mix.
Deepset Cloud, available in beta from today, removes many of the practical and technical headaches that companies may otherwise face using Haystack as a standalone open-source project — it’s all about giving developers the tools to build production-ready NLP systems faster.
The new SaaS product includes a user interface for designing, deploying, and monitoring NLP pipelines, with support for collaboration and garnering feedback within developer teams, while it packs Kubernetes, databases, and other crucial services “needed to run NLP pipelines at scale” in production environments, according to Rusic.
“Deepset has offered professional services, support, and hosting of Haystack-based systems before — these revenues allowed the company to bootstrap for three years,” Rusic explained. “Deepset Cloud is born out of the lessons, know-how’s and rich expertise from the early bootstrapping. We learned from the community that not every team has the time to build and manage all the infrastructure around it.”
So what’s next for Deepset?
“Deepset Cloud will be the sole focus for the next few years, but there are big plans to build the platform out, support more and more workflows, richer NLP use cases, flexible integrations — and make it a unified platform for enterprise to develop any NLP-powered application,” Rusic said.
In addition to lead investor GV, Deepset’s series A round included participation from System.One, Harpoon Ventures, Acequia Capital, Spencer Kimball, Alex Ratner, Emil Eifrem, and Mustafa Suleyman.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.
Author: Paul Sawers
Source: Venturebeat