Communities • Schneier on Security

chevron_right

Privacy of Printing Services

news.movim.eu / Schneier · Tuesday, 11 July, 2023 - 11:57

The Washington Post has an article about popular printing services, and whether or not they read your documents and mine the data when you use them for printing:

Ideally, printing services should avoid storing the content of your files, or at least delete daily. Print services should also communicate clearly upfront what information they’re collecting and why. Some services, like the New York Public Library and PrintWithMe, do both.

Others dodged our questions about what data they collect, how long they store it and whom they share it with. Some—including Canon, FedEx and Staples—declined to answer basic questions about their privacy practices.

chevron_right

Wisconsin Governor Hacks the Veto Process

news.movim.eu / Schneier · Saturday, 8 July, 2023 - 00:18 · 2 minutes

In my latest book, A Hacker’s Mind , I wrote about hacks as loophole exploiting. This is a great example: The Wisconsin governor used his line-item veto powers—supposedly unique in their specificity—to change a one-year funding increase into a 400-year funding increase.

He took this wording:

Section 402. 121.905 (3) (c) 9. of the statues is created to read: 121.903 (3) (c) 9. For the limit for the 2023-24 school year and the 2024-25 school year, add $325 to the result under par. (b).

And he deleted these words, numbers, and punctuation marks:

Section 402. 121.905 (3) (c) 9. of the statues is created to read: 121.903 (3) (c) 9. For the limit for the 2023 -24 school year and the 20 24 – 25 school year , add $325 to the result under par. (b).

Seems to be legal:

Rick Champagne, director and general counsel of the nonpartisan Legislative Reference Bureau, said Evers’ 400-year veto is lawful in terms of its form because the governor vetoed words and digits.

“Both are allowable under the constitution and court decisions on partial veto. The hyphen seems to be new, but the courts have allowed partial veto of punctuation,” Champagne said.

Definitely a hack. This is not what anyone thinks about when they imagine using a line-item veto.

And it’s not the first time. I don’t know the details, but this was certainly the same sort of character-by-character editing:

Mr Evers’ Republican predecessor once deploying it to extend a state programme’s deadline by one thousand years.

A couple of other things:

One, this isn’t really a 400-year change. Yes, that’s what the law says. But it can be repealed. And who knows that a dollar will be worth—or if they will even be used—that many decades from now.

And two, from now all Wisconsin lawmakers will have to be on the alert for this sort of thing. All contentious bills will be examined for the possibility of this sort of delete-only rewriting. This sentence could have been reworded, for example:

For the 2023-2025 school years, add $325 to the result under par. (b).

The problem is, of course, that legalese developed over the centuries to be extra wordy in order to limit disputes. If lawmakers need to state things in the minimal viable language, that will increase court battles later. And that’s not even enough. Bills can be thousands of words long. If any arbitrary characters can be glued together by deleting enough other characters, bills can say anything the governor wants.

The real solution is to return the line-item veto to what we all think it is: the ability to remove individual whole provisions from a law before signing it.

chevron_right

Open-Source LLMs

news.movim.eu / Schneier · Sunday, 4 June, 2023 - 19:54 · 5 minutes

In February, Meta released its large language model: LLaMA. Unlike OpenAI and its ChatGPT, Meta didn’t just give the world a chat window to play with. Instead, it released the code into the open-source community, and shortly thereafter the model itself was leaked. Researchers and programmers immediately started modifying it, improving it, and getting it to do things no one else anticipated. And their results have been immediate, innovative, and an indication of how the future of this technology is going to play out. Training speeds have hugely increased, and the size of the models themselves has shrunk to the point that you can create and run them on a laptop. The world of AI research has dramatically changed.

This development hasn’t made the same splash as other corporate announcements, but its effects will be much greater. It will wrest power from the large tech corporations, resulting in both much more innovation and a much more challenging regulatory landscape. The large corporations that had controlled these models warn that this free-for-all will lead to potentially dangerous developments, and problematic uses of the open technology have already been documented. But those who are working on the open models counter that a more democratic research environment is better than having this powerful technology controlled by a small number of corporations.

The power shift comes from simplification. The LLMs built by OpenAI and Google rely on massive data sets, measured in the tens of billions of bytes, computed on by tens of thousands of powerful specialized processors producing models with billions of parameters. The received wisdom is that bigger data, bigger processing, and larger parameter sets were all needed to make a better model. Producing such a model requires the resources of a corporation with the money and computing power of a Google or Microsoft or Meta.

But building on public models like Meta’s LLaMa, the open-source community has innovated in ways that allow results nearly as good as the huge models—but run on home machines with common data sets. What was once the reserve of the resource-rich has become a playground for anyone with curiosity, coding skills, and a good laptop. Bigger may be better, but the open-source community is showing that smaller is often good enough. This opens the door to more efficient, accessible, and resource-friendly LLMs.

More importantly, these smaller and faster LLMs are much more accessible and easier to experiment with. Rather than needing tens of thousands of machines and millions of dollars to train a new model, an existing model can now be customized on a mid-priced laptop in a few hours. This fosters rapid innovation.

It also takes control away from large companies like Google and OpenAI. By providing access to the underlying code and encouraging collaboration, open-source initiatives empower a diverse range of developers, researchers, and organizations to shape the technology. This diversification of control helps prevent undue influence, and ensures that the development and deployment of AI technologies align with a broader set of values and priorities. Much of the modern internet was built on open-source technologies from the LAMP (Linux, Apache, mySQL, and PHP/PERL/Python) stack—a suite of applications often used in web development. This enabled sophisticated websites to be easily constructed, all with open-source tools that were built by enthusiasts, not companies looking for profit. Facebook itself was originally built using open-source PHP.

But being open-source also means that there is no one to hold responsible for misuse of the technology. When vulnerabilities are discovered in obscure bits of open-source technology critical to the functioning of the internet, often there is no entity responsible for fixing the bug. Open-source communities span countries and cultures, making it difficult to ensure that any country’s laws will be respected by the community. And having the technology open-sourced means that those who wish to use it for unintended, illegal, or nefarious purposes have the same access to the technology as anyone else.

This, in turn, has significant implications for those who are looking to regulate this new and powerful technology. Now that the open-source community is remixing LLMs, it’s no longer possible to regulate the technology by dictating what research and development can be done; there are simply too many researchers doing too many different things in too many different countries. The only governance mechanism available to governments now is to regulate usage (and only for those who pay attention to the law), or to offer incentives to those (including startups, individuals, and small companies) who are now the drivers of innovation in the arena. Incentives for these communities could take the form of rewards for the production of particular uses of the technology, or hackathons to develop particularly useful applications. Sticks are hard to use—instead, we need appealing carrots.

It is important to remember that the open-source community is not always motivated by profit. The members of this community are often driven by curiosity, the desire to experiment, or the simple joys of building. While there are companies that profit from supporting software produced by open-source projects like Linux, Python, or the Apache web server, those communities are not profit driven.

And there are many open-source models to choose from. Alpaca, Cerebras-GPT, Dolly, HuggingChat, and StableLM have all been released in the past few months. Most of them are built on top of LLaMA, but some have other pedigrees. More are on their way.

The large tech monopolies that have been developing and fielding LLMs—Google, Microsoft, and Meta—are not ready for this. A few weeks ago, a Google employee leaked a memo in which an engineer tried to explain to his superiors what an open-source LLM means for their own proprietary tech. The memo concluded that the open-source community has lapped the major corporations and has an overwhelming lead on them.

This isn’t the first time companies have ignored the power of the open-source community. Sun never understood Linux. Netscape never understood the Apache web server. Open source isn’t very good at original innovations, but once an innovation is seen and picked up, the community can be a pretty overwhelming thing. The large companies may respond by trying to retrench and pulling their models back from the open-source community.

But it’s too late. We have entered an era of LLM democratization. By showing that smaller models can be highly effective, enabling easy experimentation, diversifying control, and providing incentives that are not profit motivated, open-source initiatives are moving us into a more dynamic and inclusive AI landscape. This doesn’t mean that some of these models won’t be biased, or wrong, or used to generate disinformation or abuse. But it does mean that controlling this technology is going to take an entirely different approach than regulating the large players.

This essay was written with Jim Waldo, and previously appeared on Slate.com.

EDITED TO ADD (6/4): Slashdot thread .

chevron_right

Friday Squid Blogging: Squid Chromolithographs

news.movim.eu / Schneier · Sunday, 4 June, 2023 - 19:51

Beautiful illustrations .

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here .

EDITED TO ADD (6/4): Slashdot thread .

chevron_right

Brute-Forcing a Fingerprint Reader

news.movim.eu / Schneier · Friday, 26 May, 2023 - 18:41 · 1 minute

It’s neither hard nor expensive :

Unlike password authentication, which requires a direct match between what is inputted and what’s stored in a database, fingerprint authentication determines a match using a reference threshold. As a result, a successful fingerprint brute-force attack requires only that an inputted image provides an acceptable approximation of an image in the fingerprint database. BrutePrint manipulates the false acceptance rate (FAR) to increase the threshold so fewer approximate images are accepted.

BrutePrint acts as an adversary in the middle between the fingerprint sensor and the trusted execution environment and exploits vulnerabilities that allow for unlimited guesses.

In a BrutePrint attack, the adversary removes the back cover of the device and attaches the $15 circuit board that has the fingerprint database loaded in the flash storage. The adversary then must convert the database into a fingerprint dictionary that’s formatted to work with the specific sensor used by the targeted phone. The process uses a neural-style transfer when converting the database into the usable dictionary. This process increases the chances of a match.

With the fingerprint dictionary in place, the adversary device is now in a position to input each entry into the targeted phone. Normally, a protection known as attempt limiting effectively locks a phone after a set number of failed login attempts are reached. BrutePrint can fully bypass this limit in the eight tested Android models, meaning the adversary device can try an infinite number of guesses. (On the two iPhones, the attack can expand the number of guesses to 15, three times higher than the five permitted.)

The bypasses result from exploiting what the researchers said are two zero-day vulnerabilities in the smartphone fingerprint authentication framework of virtually all smartphones. The vulnerabilities—one known as CAMF (cancel-after-match fail) and the other MAL (match-after-lock)—result from logic bugs in the authentication framework. CAMF exploits invalidate the checksum of transmitted fingerprint data, and MAL exploits infer matching results through side-channel attacks.

Depending on the model, the attack takes between 40 minutes and 14 hours.

Also:

The ability of BrutePrint to successfully hijack fingerprints stored on Android devices but not iPhones is the result of one simple design difference: iOS encrypts the data, and Android does not.

Other news articles . Research paper .

chevron_right

On the Poisoning of LLMs

news.movim.eu / Schneier · Monday, 22 May, 2023 - 20:08

Interesting essay on the poisoning of LLMs—ChatGPT in particular:

Given that we’ve known about model poisoning for years, and given the strong incentives the black-hat SEO crowd has to manipulate results, it’s entirely possible that bad actors have been poisoning ChatGPT for months. We don’t know because OpenAI doesn’t talk about their processes, how they validate the prompts they use for training, how they vet their training data set, or how they fine-tune ChatGPT. Their secrecy means we don’t know if ChatGPT has been safely managed.

They’ll also have to update their training data set at some point. They can’t leave their models stuck in 2021 forever.

Once they do update it, we only have their word— pinky-swear promises —that they’ve done a good enough job of filtering out keyword manipulations and other training data attacks, something that the AI researcher El Mahdi El Mhamdi posited is mathematically impossible in a paper he worked on while he was at Google .

chevron_right

Indiana, Iowa, and Tennessee Pass Comprehensive Privacy Laws

news.movim.eu / Schneier · Monday, 22 May, 2023 - 19:25

It’s been a big month for US data privacy. Indiana, Iowa, and Tennessee all passed state privacy laws, bringing the total number of states with a privacy law up to eight . No private right of action in any of those, which means it’s up to the states to enforce the laws.

chevron_right

Credible Handwriting Machine

news.movim.eu / Schneier · Friday, 19 May, 2023 - 20:19 · 1 minute

In case you don’t have enough to worry about, someone has built a credible handwriting machine:

This is still a work in progress, but the project seeks to solve one of the biggest problems with other homework machines, such as this one that I covered a few months ago after it blew up on social media. The problem with most homework machines is that they’re too perfect. Not only is their content output too well-written for most students, but they also have perfect grammar and punctuation something even we professional writers fail to consistently achieve. Most importantly, the machine’s “handwriting” is too consistent. Humans always include small variations in their writing, no matter how honed their penmanship.

Devadath is on a quest to fix the issue with perfect penmanship by making his machine mimic human handwriting. Even better, it will reflect the handwriting of its specific user so that AI-written submissions match those written by the student themselves.

Like other machines, this starts with asking ChatGPT to write an essay based on the assignment prompt. That generates a chunk of text, which would normally be stylized with a script-style font and then output as g-code for a pen plotter. But instead, Devadeth created custom software that records examples of the user’s own handwriting. The software then uses that as a font, with small random variations, to create a document image that looks like it was actually handwritten.

Watch the video.

My guess is that this is another detection/detection avoidance arms race.

chevron_right

Google Is Not Deleting Old YouTube Videos

news.movim.eu / Schneier · Thursday, 18 May, 2023 - 20:18

Google has backtracked on its plan to delete inactive YouTube videos—at least for now. Of course, it could change its mind anytime it wants.

It would be nice if this would get people to think about the vulnerabilities inherent in letting a for-profit monopoly decide what of human creativity is worth saving.