close
  • Sc chevron_right

    LLMs and Tool Use

    news.movim.eu / Schneier · Wednesday, 6 September - 15:24 · 6 minutes

Last March, just two weeks after GPT-4 was released , researchers at Microsoft quietly announced a plan to compile millions of APIs—tools that can do everything from ordering a pizza to solving physics equations to controlling the TV in your living room—into a compendium that would be made accessible to large language models (LLMs). This was just one milestone in the race across industry and academia to find the best ways to teach LLMs how to manipulate tools, which would supercharge the potential of AI more than any of the impressive advancements we’ve seen to date.

The Microsoft project aims to teach AI how to use any and all digital tools in one fell swoop, a clever and efficient approach. Today, LLMs can do a pretty good job of recommending pizza toppings to you if you describe your dietary preferences and can draft dialog that you could use when you call the restaurant. But most AI tools can’t place the order, not even online. In contrast, Google’s seven-year-old Assistant tool can synthesize a voice on the telephone and fill out an online order form, but it can’t pick a restaurant or guess your order. By combining these capabilities, though, a tool-using AI could do it all. An LLM with access to your past conversations and tools like calorie calculators, a restaurant menu database, and your digital payment wallet could feasibly judge that you are trying to lose weight and want a low-calorie option, find the nearest restaurant with toppings you like, and place the delivery order. If it has access to your payment history, it could even guess at how generously you usually tip. If it has access to the sensors on your smartwatch or fitness tracker, it might be able to sense when your blood sugar is low and order the pie before you even realize you’re hungry.

Perhaps the most compelling potential applications of tool use are those that give AIs the ability to improve themselves. Suppose, for example, you asked a chatbot for help interpreting some facet of ancient Roman law that no one had thought to include examples of in the model’s original training. An LLM empowered to search academic databases and trigger its own training process could fine-tune its understanding of Roman law before answering. Access to specialized tools could even help a model like this better explain itself. While LLMs like GPT-4 already do a fairly good job of explaining their reasoning when asked, these explanations emerge from a “black box” and are vulnerable to errors and hallucinations . But a tool-using LLM could dissect its own internals, offering empirical assessments of its own reasoning and deterministic explanations of why it produced the answer it did.

If given access to tools for soliciting human feedback, a tool-using LLM could even generate specialized knowledge that isn’t yet captured on the web. It could post a question to Reddit or Quora or delegate a task to a human on Amazon’s Mechanical Turk. It could even seek out data about human preferences by doing survey research, either to provide an answer directly to you or to fine-tune its own training to be able to better answer questions in the future. Over time, tool-using AIs might start to look a lot like tool-using humans. An LLM can generate code much faster than any human programmer, so it can manipulate the systems and services of your computer with ease. It could also use your computer’s keyboard and cursor the way a person would, allowing it to use any program you do. And it could improve its own capabilities, using tools to ask questions, conduct research, and write code to incorporate into itself.

It’s easy to see how this kind of tool use comes with tremendous risks. Imagine an LLM being able to find someone’s phone number, call them and surreptitiously record their voice, guess what bank they use based on the largest providers in their area, impersonate them on a phone call with customer service to reset their password, and liquidate their account to make a donation to a political party. Each of these tasks invokes a simple tool—an Internet search, a voice synthesizer, a bank app—and the LLM scripts the sequence of actions using the tools.

We don’t yet know how successful any of these attempts will be. As remarkably fluent as LLMs are, they weren’t built specifically for the purpose of operating tools, and it remains to be seen how their early successes in tool use will translate to future use cases like the ones described here. As such, giving the current generative AI sudden access to millions of APIs—as Microsoft plans to—could be a little like letting a toddler loose in a weapons depot.

Companies like Microsoft should be particularly careful about granting AIs access to certain combinations of tools. Access to tools to look up information, make specialized calculations, and examine real-world sensors all carry a modicum of risk. The ability to transmit messages beyond the immediate user of the tool or to use APIs that manipulate physical objects like locks or machines carries much larger risks. Combining these categories of tools amplifies the risks of each.

The operators of the most advanced LLMs, such as OpenAI, should continue to proceed cautiously as they begin enabling tool use and should restrict uses of their products in sensitive domains such as politics, health care, banking, and defense. But it seems clear that these industry leaders have already largely lost their moat around LLM technology—open source is catching up. Recognizing this trend, Meta has taken an “If you can’t beat ’em, join ’em” approach and partially embraced the role of providing open source LLM platforms.

On the policy front, national—and regional—AI prescriptions seem futile. Europe is the only significant jurisdiction that has made meaningful progress on regulating the responsible use of AI, but it’s not entirely clear how regulators will enforce it. And the US is playing catch-up and seems destined to be much more permissive in allowing even risks deemed “ unacceptable ” by the EU. Meanwhile, no government has invested in a “ public option ” AI model that would offer an alternative to Big Tech that is more responsive and accountable to its citizens.

Regulators should consider what AIs are allowed to do autonomously, like whether they can be assigned property ownership or register a business. Perhaps more sensitive transactions should require a verified human in the loop, even at the cost of some added friction. Our legal system may be imperfect, but we largely know how to hold humans accountable for misdeeds; the trick is not to let them shunt their responsibilities to artificial third parties. We should continue pursuing AI-specific regulatory solutions while also recognizing that they are not sufficient on their own.

We must also prepare for the benign ways that tool-using AI might impact society. In the best-case scenario, such an LLM may rapidly accelerate a field like drug discovery, and the patent office and FDA should prepare for a dramatic increase in the number of legitimate drug candidates. We should reshape how we interact with our governments to take advantage of AI tools that give us all dramatically more potential to have our voices heard. And we should make sure that the economic benefits of superintelligent, labor-saving AI are equitably distributed.

We can debate whether LLMs are truly intelligent or conscious, or have agency, but AIs will become increasingly capable tool users either way. Some things are greater than the sum of their parts. An AI with the ability to manipulate and interact with even simple tools will become vastly more powerful than the tools themselves. Let’s be sure we’re ready for them.

This essay was written with Nathan Sanders, and previously appeared on Wired.com.

  • Sc chevron_right

    The Hacker Tool to Get Personal Data from Credit Bureaus

    news.movim.eu / Schneier · Tuesday, 5 September - 19:06

The new site 404 Media has a good article on how hackers are cheaply getting personal information from credit bureaus:

This is the result of a secret weapon criminals are selling access to online that appears to tap into an especially powerful set of data: the target’s credit header. This is personal information that the credit bureaus Experian, Equifax, and TransUnion have on most adults in America via their credit cards. Through a complex web of agreements and purchases, that data trickles down from the credit bureaus to other companies who offer it to debt collectors, insurance companies, and law enforcement.

A 404 Media investigation has found that criminals have managed to tap into that data supply chain, in some cases by stealing former law enforcement officer’s identities, and are selling unfettered access to their criminal cohorts online. The tool 404 Media tested has also been used to gather information on high profile targets such as Elon Musk, Joe Rogan, and even President Joe Biden, seemingly without restriction. 404 Media verified that although not always sensitive, at least some of that data is accurate.

  • Sc chevron_right

    Cryptocurrency Startup Loses Encryption Key for Electronic Wallet

    news.movim.eu / Schneier · Tuesday, 5 September - 18:59

The cryptocurrency fintech startup Prime Trust lost the encryption key to its hardware wallet—and the recovery key—and therefore $38.9 million. It is now in bankruptcy.

I can’t understand why anyone thinks these technologies are a good idea.

  • Sc chevron_right

    Inconsistencies in the Common Vulnerability Scoring System (CVSS)

    news.movim.eu / Schneier · Friday, 1 September - 21:41 · 1 minute

Interesting research :

Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities

Abstract: The Common Vulnerability Scoring System (CVSS) is a popular method for evaluating the severity of vulnerabilities in vulnerability management. In the evaluation process, a numeric score between 0 and 10 is calculated, 10 being the most severe (critical) value. The goal of CVSS is to provide comparable scores across different evaluators. However, previous works indicate that CVSS might not reach this goal: If a vulnerability is evaluated by several analysts, their scores often differ. This raises the following questions: Are CVSS evaluations consistent? Which factors influence CVSS assessments? We systematically investigate these questions in an online survey with 196 CVSS users. We show that specific CVSS metrics are inconsistently evaluated for widespread vulnerability types, including Top 3 vulnerabilities from the ”2022 CWE Top 25 Most Dangerous Software Weaknesses” list. In a follow-up survey with 59 participants, we found that for the same vulnerabilities from the main study, 68% of these users gave different severity ratings. Our study reveals that most evaluators are aware of the problematic aspects of CVSS, but they still see CVSS as a useful tool for vulnerability assessment. Finally, we discuss possible reasons for inconsistent evaluations and provide recommendations on improving the consistency of scoring.

Here’s a summary of the research.

  • Sc chevron_right

    Applying AI to License Plate Surveillance

    news.movim.eu / Schneier · Tuesday, 15 August - 16:55

License plate scanners aren’t new. Neither is using them for bulk surveillance. What’s new is that AI is being used on the data, identifying “suspicious” vehicle behavior:

Typically, Automatic License Plate Recognition (ALPR) technology is used to search for plates linked to specific crimes. But in this case it was used to examine the driving patterns of anyone passing one of Westchester County’s 480 cameras over a two-year period. Zayas’ lawyer Ben Gold contested the AI-gathered evidence against his client, decrying it as “dragnet surveillance.”

And he had the data to back it up. A FOIA he filed with the Westchester police revealed that the ALPR system was scanning over 16 million license plates a week, across 480 ALPR cameras. Of those systems, 434 were stationary, attached to poles and signs, while the remaining 46 were mobile, attached to police vehicles. The AI was not just looking at license plates either. It had also been taking notes on vehicles’ make, model and color—useful when a plate number for a suspect vehicle isn’t visible or is unknown.

  • Sc chevron_right

    Cryptographic Flaw in Libbitcoin Explorer Cryptocurrency Wallet

    news.movim.eu / Schneier · Wednesday, 9 August - 18:16

Cryptographic flaws still matter. Here’s a flaw in the random-number generator used to create private keys. The seed has only 32 bits of entropy.

Seems like this flaw is being exploited in the wild.

  • Sc chevron_right

    Microsoft Signing Key Stolen by Chinese

    news.movim.eu / Schneier · Sunday, 6 August - 17:05 · 1 minute

A bunch of networks, including US Government networks , have been hacked by the Chinese. The hackers used forged authentication tokens to access user email, using a stolen Microsoft Azure account consumer signing key. Congress wants answers . The phrase “ negligent security practices ” is being tossed about—and with good reason. Master signing keys are not supposed to be left around, waiting to be stolen.

Actually, two things went badly wrong here. The first is that Azure accepted an expired signing key, implying a vulnerability in whatever is supposed to check key validity. The second is that this key was supposed to remain in the the system’s Hardware Security Module—and not be in software. This implies a really serious breach of good security practice. The fact that Microsoft has not been forthcoming about the details of what happened tell me that the details are really bad.

I believe this all traces back to SolarWinds . In addition to Russia inserting malware into a SolarWinds update, China used a different SolarWinds vulnerability to break into networks. We know that Russia accessed Microsoft source code in that attack. I have heard from informed government officials that China used their SolarWinds vulnerability to break into Microsoft and access source code, including Azure’s.

I think we are grossly underestimating the long-term results of the SolarWinds attacks. That backdoored update was downloaded by over 14,000 networks worldwide. Organizations patched their networks, but not before Russia—and others—used the vulnerability to enter those networks. And once someone is in a network, it’s really hard to be sure that you’ve kicked them out.

Sophisticated threat actors are realizing that stealing source code of infrastructure providers, and then combing that code for vulnerabilities, is an excellent way to break into organizations who use those infrastructure providers. Attackers like Russia and China—and presumably the US as well—are prioritizing going after those providers.

News articles .

  • Sc chevron_right

    Google Reportedly Disconnecting Employees from the Internet

    news.movim.eu / Schneier · Thursday, 20 July - 22:32

Supposedly Google is starting a pilot program of disabling Internet connectivity from employee computers:

The company will disable internet access on the select desktops, with the exception of internal web-based tools and Google-owned websites like Google Drive and Gmail. Some workers who need the internet to do their job will get exceptions, the company stated in materials.

Google has not confirmed this story.

More news articles .