AI & Robotics News

ChatGPT is combining its different abilities into a single ‘Voltron-style’ chat

October 31, 2023

Startup OpenAI has steadily improved its popular AI chatbot ChatGPT since its release nearly a year ago on November 30, 2022, but the latest update takes everything that came before and seemingly combines it into one, according to users for whom the experience has already rolled out.

Multiple users have taken to social media to share an update message to their ChatGPT accounts that reads:

“Your GPT-4 has been updated

Upload many types of documents: Work with PDFs, data files, or any document you want to analyze. Just upload and start asking questions.

Use Tools without switching: Access to Browsing, Advanced Data Analysis, and DALL-E is now automatic. (If preferred, manual selection is still available under GPT-4.)”

While these capabilities — analyzing and answering questions about PDFs and other documents, web browsing and data analysis, and integration with OpenAI’s image generation model DALL-E 3 allowing users to use text prompts to make new images — were all introduced one-by-one over the last few months, users previously had to toggle each one on independently underneath the “GPT-4” dropdown menu on their ChatGPT session. In other words: users previously could only use one of these ChatGPT capabilities at a time.

This meant that if you wanted to analyze a document and then generate an image about it, you’d have to complete the first task in a single chat session, manually copy the analysis text returned from ChatGPT, and then start a new chat window with DALL-E 3 enabled. Then, you could paste the text carried over from your first chat session and ask ChatGPT in the new DALL-3 session to generate the image. Now, with OpenAI’s latest update, you can do all of these tasks in the same single chat session, vastly improving the efficiency of the service.

Users have deemed this update and mode to be “All Tools.”

“BREAKING: ChatGPT4 just combined its insane tools into a single chat, Voltron-style! Work w/ PDFs, data, DALLE, vision, browse- seamlessly. Your powers just leveled up,” wrote Connor Grennan, Dean of Students, NYU Stern School of Business, in a LinkedIn post on Sunday, referencing the influential 1980s cartoon in which large mechanical lions piloted by people combined to form a single warrior. (Power Rangers of the 1990s would take a similar approach in live action).

“Many startups just died today,” proclaimed p-AI incubator founder Alex Ker on X (formerly Twitter), “Because OpenAI added PDF chat. You can also chat with data files and other document types. We had a wave of products better suited as features rather than stand-alone companies. Wrappers are being squeezed by OpenAI on one side and incumbents on the other. It’s a rough world out there.”

Nvidia senior AI scientist Jim Fan agreed, posting on X: “Before your adrenaline rush for a shiny startup idea, ask yourself this: Can OpenAI/Anthropic/Microsoft add this feature with 3 engineers in a hackathon?” He also suggested startups that followed this model would end up in a “thin wrapper graveyard.”

Before your adrenaline rush for a shiny startup idea, ask yourself this:

Can OpenAI/Anthropic/Microsoft add this feature with 3 engineers in a hackathon?

The number of “yes” to the above is astounding. Happy Halloween in the thin wrapper graveyard. ? https://t.co/ehnGvxBQaG

Ker’s and Fan’s references were to the number of companies that have sprung up since OpenAI enabled API access to its GPT-3.5 and GPT-4 large language models (LLMs), the AI models underpinning the different versions of ChatGPT.

Third-party companies have been able to access these models to build their own apps and offerings powered by OpenAI’s tech, some of which offered PDF and document analysis. These apps and offerings have been deemed by members of the tech community to be “wrappers,” sometimes derisively, because they are essentially just different user interfaces “wrapped” around the underlying GPT-3.5/4 technology.

Indeed, OpenAI opened its own ChatGPT third-party plugin library in March of this year, and a number of the offerings from third-party developers including PDF and document analysis tools. However, the experience in using them was often a little cumbersome for the user (at least it was for us in our tests at VentureBeat), requiring them to upload documents to a separate website and paste the URL into ChatGPT.

The new update seems to render these plugins essentially obsolete. In addition, some users have pointed out that thanks to the upload feature combined with DALL-E 3 image generation and ChatGPT’s existing conversational understanding, the “All Tools” update can edit images provided by the user using their natural language instructions, effectively rivaling Adobe Photoshop for this task.

Bundling ChatGPT’s steadily expanding list of capabilities into a single “Voltron”-like form makes sense for the sake of efficiency and offering a more powerful experience for users. Nonetheless, some have raised security concerns.

“I’m really surprised to see browsing and code interpreter made available in the same session – feels like a potent vector for creative prompt injection attacks against the combination of the two,” posted Simon Willison, co-creator of the Django Python web framework and founder of the data publishing/exploration tool Datasette, on X.

I’m really surprised to see browsing and code interpreter made available in the same session – feels like a potent vector for creative prompt injection attacks against the combination of the two https://t.co/NASxP3Qv7B

“Code interpreter,” was the name given previously to the “Advanced Data Analysis” setting in ChatGPT, which allows for the upload and analysis of documents.

However, as various users have shown, ChatGPT is susceptible to being tricked by uploads containing certain information, such as whited out text that give covert instructions.

Willison elaborated on his concerns in a subsequent X post, writing: “Browse mode is a vector for prompt injection because malicious instructions can be hid in pages that browsing mode accesses. And now those malicious instructions gain access to Python in a sandbox, and the output from that could include further instructions to trigger browsing?”

Browse mode is a vector for prompt injection because malicious instructions can be hid in pages that browsing mode accesses

And now those malicious instructions gain access to Python in a sandbox, and the output from that could include further instructions to trigger browsing?

Willison’s point is well taken: that if ChatGPT can read webpages, and hackers or malicious actors build webpages that give it covert instructions to program things using the code generation capabilities available in the “Advanced Data Analysis” mode — formerly siloed from the browsing and other capabilities — said attackers could get ChatGPT to do all sorts of things for their profit, mischief, vandalism or worse, including getting it to write programs that, theoretically, hijack a person’s computer or device when installed.

OpenAI has yet to formerly announce the new bundling version of ChatGPT — neither the official company blog nor ChatGPT release notes webpage have been updated to contain new information about the bundled capabilities at the time of this article’s publication. Nor have CEO Sam Altman, CTO Mira Murati, and developer relations advocate Logan Kilpatrick posted about it yet from their X accounts. We’ve reached out to a spokesperson for more information about this and will update our piece upon hearing back.

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

Startup OpenAI has steadily improved its popular AI chatbot ChatGPT since its release nearly a year ago on November 30, 2022, but the latest update takes everything that came before and seemingly combines it into one, according to users for whom the experience has already rolled out.

Multiple users have taken to social media to share an update message to their ChatGPT accounts that reads:

“Your GPT-4 has been updated

Upload many types of documents: Work with PDFs, data files, or any document you want to analyze. Just upload and start asking questions.

Event

AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.

Use Tools without switching: Access to Browsing, Advanced Data Analysis, and DALL-E is now automatic. (If preferred, manual selection is still available under GPT-4.)”

While these capabilities — analyzing and answering questions about PDFs and other documents, web browsing and data analysis, and integration with OpenAI’s image generation model DALL-E 3 allowing users to use text prompts to make new images — were all introduced one-by-one over the last few months, users previously had to toggle each one on independently underneath the “GPT-4” dropdown menu on their ChatGPT session. In other words: users previously could only use one of these ChatGPT capabilities at a time.

This meant that if you wanted to analyze a document and then generate an image about it, you’d have to complete the first task in a single chat session, manually copy the analysis text returned from ChatGPT, and then start a new chat window with DALL-E 3 enabled. Then, you could paste the text carried over from your first chat session and ask ChatGPT in the new DALL-3 session to generate the image. Now, with OpenAI’s latest update, you can do all of these tasks in the same single chat session, vastly improving the efficiency of the service.

Users have deemed this update and mode to be “All Tools.”

Initial reactions are extremely favorable, disruptive to other GPT-based startups

“BREAKING: ChatGPT4 just combined its insane tools into a single chat, Voltron-style! Work w/ PDFs, data, DALLE, vision, browse- seamlessly. Your powers just leveled up,” wrote Connor Grennan, Dean of Students, NYU Stern School of Business, in a LinkedIn post on Sunday, referencing the influential 1980s cartoon in which large mechanical lions piloted by people combined to form a single warrior. (Power Rangers of the 1990s would take a similar approach in live action).

“Many startups just died today,” proclaimed p-AI incubator founder Alex Ker on X (formerly Twitter), “Because OpenAI added PDF chat. You can also chat with data files and other document types. We had a wave of products better suited as features rather than stand-alone companies. Wrappers are being squeezed by OpenAI on one side and incumbents on the other. It’s a rough world out there.”

Nvidia senior AI scientist Jim Fan agreed, posting on X: “Before your adrenaline rush for a shiny startup idea, ask yourself this: Can OpenAI/Anthropic/Microsoft add this feature with 3 engineers in a hackathon?” He also suggested startups that followed this model would end up in a “thin wrapper graveyard.”

Before your adrenaline rush for a shiny startup idea, ask yourself this:

Can OpenAI/Anthropic/Microsoft add this feature with 3 engineers in a hackathon?

The number of “yes” to the above is astounding. Happy Halloween in the thin wrapper graveyard. ? https://t.co/ehnGvxBQaG

— Jim Fan (@DrJimFan) October 29, 2023

Ker’s and Fan’s references were to the number of companies that have sprung up since OpenAI enabled API access to its GPT-3.5 and GPT-4 large language models (LLMs), the AI models underpinning the different versions of ChatGPT.

Third-party companies have been able to access these models to build their own apps and offerings powered by OpenAI’s tech, some of which offered PDF and document analysis. These apps and offerings have been deemed by members of the tech community to be “wrappers,” sometimes derisively, because they are essentially just different user interfaces “wrapped” around the underlying GPT-3.5/4 technology.

Indeed, OpenAI opened its own ChatGPT third-party plugin library in March of this year, and a number of the offerings from third-party developers including PDF and document analysis tools. However, the experience in using them was often a little cumbersome for the user (at least it was for us in our tests at VentureBeat), requiring them to upload documents to a separate website and paste the URL into ChatGPT.

The new update seems to render these plugins essentially obsolete. In addition, some users have pointed out that thanks to the upload feature combined with DALL-E 3 image generation and ChatGPT’s existing conversational understanding, the “All Tools” update can edit images provided by the user using their natural language instructions, effectively rivaling Adobe Photoshop for this task.

…But some have security concerns

Bundling ChatGPT’s steadily expanding list of capabilities into a single “Voltron”-like form makes sense for the sake of efficiency and offering a more powerful experience for users. Nonetheless, some have raised security concerns.

“I’m really surprised to see browsing and code interpreter made available in the same session – feels like a potent vector for creative prompt injection attacks against the combination of the two,” posted Simon Willison, co-creator of the Django Python web framework and founder of the data publishing/exploration tool Datasette, on X.

I’m really surprised to see browsing and code interpreter made available in the same session – feels like a potent vector for creative prompt injection attacks against the combination of the two https://t.co/NASxP3Qv7B

— Simon Willison (@simonw) October 29, 2023

“Code interpreter,” was the name given previously to the “Advanced Data Analysis” setting in ChatGPT, which allows for the upload and analysis of documents.

However, as various users have shown, ChatGPT is susceptible to being tricked by uploads containing certain information, such as whited out text that give covert instructions.

Willison elaborated on his concerns in a subsequent X post, writing: “Browse mode is a vector for prompt injection because malicious instructions can be hid in pages that browsing mode accesses. And now those malicious instructions gain access to Python in a sandbox, and the output from that could include further instructions to trigger browsing?”

Browse mode is a vector for prompt injection because malicious instructions can be hid in pages that browsing mode accesses

And now those malicious instructions gain access to Python in a sandbox, and the output from that could include further instructions to trigger browsing?

— Simon Willison (@simonw) October 29, 2023

Willison’s point is well taken: that if ChatGPT can read webpages, and hackers or malicious actors build webpages that give it covert instructions to program things using the code generation capabilities available in the “Advanced Data Analysis” mode — formerly siloed from the browsing and other capabilities — said attackers could get ChatGPT to do all sorts of things for their profit, mischief, vandalism or worse, including getting it to write programs that, theoretically, hijack a person’s computer or device when installed.

OpenAI has yet to formerly announce the new bundling version of ChatGPT — neither the official company blog nor ChatGPT release notes webpage have been updated to contain new information about the bundled capabilities at the time of this article’s publication. Nor have CEO Sam Altman, CTO Mira Murati, and developer relations advocate Logan Kilpatrick posted about it yet from their X accounts. We’ve reached out to a spokesperson for more information about this and will update our piece upon hearing back.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Author: Carl Franzen
Source: Venturebeat
Reviewed By: Editorial Team

666

0