if I had a lot more time I think I might write a book on my ideas about "adversarial automation".
The idea that the point of computers is to help the humans do their job faster and easier, and sometimes the computer or the software on it is the enemy in that battle.
@foone Very much my PhD thesis. It was trying to understand the interaction between the human and computer elements of a system.
Computers should be enabling the human process to be easier and more straightforward. A computer introduced in a system should ease the processing, nothing else.
@foone great thread. i think it's worth mentioning the connection to accessibility tools and APIs as well. often people are put into the same position as a screen scraper when faced with software whose "API 0" is particularly restrictive (e.g. requires acting like an abled user in a particular way, like having to interact with things via moving a mouse and clicking, rather than like a human in general who may want to or have to interact with the software via different modalities)
@foone and you see this connection a lot in practice -- accessibility hooks are quite useful for adversarial automation, and adversarial automation techniques can be used to build accessibility tools too
@ianh oh definitely. accessibility is so often a last minute concern (if they thought about it at all) that it only makes sense to go "screw you" and build your own accessibility functionality.
@foone I have a raspi with a speaker in my garage that has recordings of my voice so I can give voice commands to a smart speaker when I'm out of the house. :)
@foone I read the thread and on this, have you ever heard the story of how plaid got started?
The ahem ahem rumor is that they just did exactly as you said, API 0 and also managed to reverse a lot of bank mobile APIs. They implemented all this and then went to the banks and said, "You can either work with us or we'll go live with this impl and we won't work with your provisioning team." Some banks were forced to improve their non-internal APIs because of this implicit threat.
@foone Things got worse in the last decades wrt interoperability, automation and scraping.
In the 2000s we had multi-protocol Trillian, if it was developed today, it would be flooded with cease and desists, account blocks and aggressive technical countermeasures.
Even ex-"information wants to be free" leftists lose their minds about AI scraping.
And remote attestation is looming on the horizon.
because I see a lot of people approaching automation from this attitude of "software/sites should have APIs so that users can write software to automate it!"
and while that's not wrong, exactly, it's also not the attitude I think makes the most sense, you know?
We do not ask for access. We don't need to get permission to be able to automate our tasks.
You see the point of this a lot in API design, where a company is like "okay we made an API but we limited it a bunch because we are scared about cheaters/bots/scrapers/whatever", while the things they limit are things a user clicking links can do in 2 seconds.
like, if your API doesn't provide me a follow_user() call, but the user can follow anyone by clicking one link?
Your lack of a follow_user() call is not going to stop me. I'm just going to click the link, automatically.
Having an API 1.0 doesn't mean API 0 goes away.
And I think this is an under-discussed part of automation because it's associated with spammers and such, but they're only one possible user of this. By making it better known it can get used for more legitimate uses
I'm talking less like "you're in a constant arms race with the people maintaining the official API as they try to stop your spamming" and more like "Your lab depends on this program from 1996 and there's no updates and no way to automate it"
And that's really a shame. Computers should be used to automate things. We spend way too much time dealing with shitty sites and shitty programs because we have no choice and think we can't automate them away.
I think of this as a short term vs long term thinking sort of problem. Like, a lot of programmers are stuck in the "should" part of thinking about programs and sites.
Yes, the program SHOULD be open source, so you can just fix the UI. Yes, the website SHOULD have an extensive API so you can easily automate it.
and if you want to automate it today, your only options are to be adversarial about it. It's the enemy, you pretend to be a human user and automate the interactions with the app/site. It's the only way.
by all means, try to switch to open source alternatives or get them to fix it or add an API.
But at the end of the day that's asking "the enemy" to do something for you, and they are under no obligation to listen to you.
(They may not even exist anymore, given that a lot of the times when I've used this sort of Adversarial Automation it's been focused on software from decades ago)
It's also a thing that intersects with the way a lot of people online are thinking about computer-use as something they do as a personal hobby, you know? They can run any OS, any software they can legally (or even illegally) install, they can use any options they want
But the fact is, often times people have jobs where they aren't self-employed and have to work for other people, and those other people can be like "you need to use FooBaz 2007 for this job".
Would it be easier to automate if you were using OpenBaz? Certainly! But your boss can still tell you "no, we're not switching to OpenBaz, we need to use FooBaz 2007"
One example where this came up in my career was when I was working for an educational book creator/publisher. Apple had just added a bookreader tool for iphones/ipads/etc, and we had a lot of colleges asking if we could provide our textbooks in that format.
well, at the time the only way you could make books for apple devices was to use the book creation program, which was basically a word processor. It was focused around the idea that you would create your books inside it.
We could import them as plain text (or DOC, I think?) and that'd get the actual text content of our books with some minor formatting, but we had very interactive and multimedia books. Tons of images, cross links, quizzes, and so on. Pretty much all things that the apple book format supported, but didn't support importing.
We figured out how much could be imported, and what was left out. We figured out the limitations of the undocumented applescript interface. We figured out we could build complex HTML documents, copy them, and then have the keyboard automation press "cmd-V" and they'd be brought in without issues.
We automated away the bad UI that was going to make it too expensive to publish on apple platforms.
Should Apple have provided better docs and interfaces and APIs?
Yes, of course!
We asked for them.
But at the end of they day, they may not. And we need to publish this stuff soon, not in several years when Apple decides it might be a good idea for the next revision
I was automatically taking screenshots of a DS game in an emulator. my program would load a savestate, jam some new data into the DS's RAM, hit a button, then screenshot it. But the emulator was showing a "SAVE STATE LOADED!" text overlay over the game's window, no matter what option I set.
I go on the dev's discord/IRC, talk to them about making it an option, they say they've considered it but it's low priority.
I look into building the software myself, but it's very complicated on windows, with a lot of dependencies and such...
@foone I never thought of it this way, but now API 0 is going to be stuck in my head forever. I had a ticket for an in-house web app closed because the feature could be abused, and now I just click a bookmarklet that implements it. My life’s full of adversarial automations, and I love your description here.
@foone I guess there is a case that unsupported legacy software is easier to hack around, because it’s a dead sitting duck, instead of constantly moving chameleon.
@foone I automate facebook. I check if new users have answered the group admittance qustions correctly, and only then I allow them into the facebook group.
All done with firefox piloted with xdotool.
Firefox could offer an API for group management, but they don't. And they will never do.
Your message is well received. I 100% agree with the "fuck you, doing it anyway" methods and have done some of the things you mention, like wrapping a shitty Java hardware controller with xdotool, editing binaries, replacing the dynamic linker to use different libc.
My first job involved protocol normalizers for financial data, and included putting an API around things designed for a dedicated rs232 attached terminal.
@foone thanks for this informative thread. This answers a lot of questions I've had for a while but been unable to get concrete answers about. (When I've asked I've gotten a lot of, "Well, maybe, technically, BUT" type answers that sounded like a semi hard no but it sounds like the real answer was actually "Yes but I don't want to/am afraid to/don't know how to approach it" all along.)
@foone True story: the original way that signing Windows binaries for Firefox was automated was an AutoIt script because (IIRC) signcode had no way to enter the passphrase than a modal pop up at the time.
@foone You are basically describing my full time job. Just the automation is for internal systems, that are ancient and horrible. They do often complain that I’m sending them too much traffic, but that is because nearly all of the usage of their system is via my automation and the simple user interfaces I created for it.
@foone one of the eye opening things for me pver the past few years was digging into the scraper community. My use case is a little different (deep web inventorying/analysis for clients) but yeah — scraper chad don’t give a damn about api limitations
@foone there is, of course, issues with the fact that a lot of people on the other side of the equation know about API 0. I remember when twitter started making The Changes, they also got REALLY strict with automation detection to the point where it was dinging the average user because it seemed like they MIGHT be trying to bypass the lack of an API
@foone if a website provides extra API to do things in a different way from how the UI itself does it, that seems a bit sus to me really, as if they're trying to separate API users and other users (see telegram bot API)
@foone I'm running some medical software automation where the creators do not care much about either automation or easy workflows. Oh well. Change pricing on 100 items? Windows automation. Need to unassign a doctor from 1000 patients? Windows automation. Stupid vaccination workflow? Windows automation.
Shoutout to https://github.com/FlaUI/FlaUI for making the job so much simpler.