Open Infrastructures and the Future of Knowledge Production, part 2

In my last post, I unpacked some of the reasons why open infrastructures matter for the future of knowledge production, and I talked a bit about how Humanities Commons and hcommons.social strive to live out their principles of community governance that truly open infrastructure requires. But I ended on a less cheerleadery note: We aren’t a perfect alternative to the corporate platforms by which we’re surrounded. And this is where we need to dig down into the dirty underside of digital infrastructure. As Deb Chachra points out, the term “infrastructure” literally points to those systems that are hidden, in our walls, under our floors, and buried underground. If we are going to mitigate the inequities created by and sustained through our infrastructures, we have to get busy unearthing those systems and finding ways to build new ones. 

And so: We need to take a hard look at the fact that the infrastructure that Humanities Commons is built upon is AWS, or Amazon Web Services. As you might guess from the name, AWS is part of the Greater Jeff Bezos Empire, and every dollar that we spend to host with them helps to keep that empire running. And run it does! Amazon’s revenue derived from AWS passed $80 billion-with-a-b in 2022, and as of August 2023, AWS hosted 42 percent of the top 100,000 websites, and 25 percent of the top one million (ironically enough including BuiltWith, the site from which these data are made available).

Why has Amazon become such a powerful force in web hosting and cloud computing? Largely because they provide not just servers but a powerful and wide-ranging suite of tools that help folks like us not just make our platform available but also help keep it stable and secure and enable it to scale with enormous flexibility. AWS provides connected equipment and tools that would be more than a full-time job for someone to maintain in-house, and it enables redundancy and global reach at speed, and it’s relatively easy to manage.

So… it works for us, just as it works for 42,000 of the top 100,000 websites across the internet. But I’m not happy about it. It’s not just that I hate feeding more money into the Bezos empire every month, but that I know for certain that our values and Bezos’s do not align. And every so often I have to stop and ask myself how much good it does for us to build pathways of escape from the extractive clutches of Elsevier and Springer-Nature, only to have those pathways deliver us all into the gaping maw of Amazon?

AWS has a stranglehold on web-based platforms of our size, as we’re too complicated for a server kept under the desk, too big for a smaller hosting service, and too small for our own data center. And if you don’t want to deal with the risks and costs involved in owning and operating the metal yourself, there just aren’t many alternatives, and certainly not many good ones.

Our host institution, Michigan State University, like most institutions its size, operates both a large-scale data center through our central IT unit and a high-performance computing center under the aegis of the office of research and innovation. The latter can’t really help us, as it’s focused pretty exclusively on computational uses and not at all on service hosting. And the former comes with a suite of restrictions and regulations in terms of access and security – pretty understandably so, given recent attacks and exploits such as the one that caused our neighbor to the east to disconnect the entire campus from the internet on the first day of classes – but nevertheless restrictions that make it impossible for us to be flexible enough with our work.

In fact, central IT strongly encourages projects like ours to make use of cloud computing, given the complexity of our needs and the risk-averseness of the campus. And we have our pick! AWS, Microsoft’s Azure, and Google Cloud Services.

I just can’t help but think that it’s a Bad Thing for academic and nonprofit services like ours – services that are working to be open, and public, and values aligned with our communities – to be dependent upon Silicon Valley megacorps for our very presence. We need alternatives. Real alternatives. And I fear that we’re going to have to invent them, because as the example of open access publishing demonstrates, waiting to see what commercial providers come up with is certain to increase our lock-in, and increase the level of resources they extract from our campuses.

So what might it look like if our infrastructure for the future of knowledge production and dissemination was community-led all the way down? What might enable the Commons to leave AWS behind and instead contribute our resources to supporting a truly shared, openly governed, not-for-profit cloud service? Could such a service be collaborative, with all member research institutions and organizations paying into a shared, professionally staffed data center?

King’s College London and Jisc think so – they established the first collaborative research data center in the world nine years ago, precisely in order to help UK institutions achieve economies of scale, to increase energy efficiency, and to reduce costs. Of course, it’s a lot easier to get all the UK institutions of higher education on board with such a centralized initiative, partly because there are fewer of them and partly because they are all centrally funded.

But what if Internet2, for instance, instead of restricting its areas of interest to networking and protocols, and instead of offering to connect member institutions with corporate cloud services, instead provided a real alternative – one that was not just developed for the academic community but that would be governed by that community? What if each member institution or organization agreed to contribute its existing infrastructure, along with its annual maintenance budget, to a shared, distributed, community-owned cloud computing center? Could excess capacity then be offered at reasonable prices to other nonprofit institutions or organizations or projects like mine, in a way that might entice them away from the Silicon Valley megacorps? Would our institutions, our libraries, our publishers, and our many other web-based projects find themselves with better control over their futures?

None of what I’m suggesting here would be easy, and a lot of the questions I’ve just asked fall – at least for the moment – into the realm of the pipe dream. But if we were to be willing to press forward with them, we might find ourselves in a world in which the scholarly communication infrastructures on which we build, develop, design, and publish our work can help us foster rather than hinder social and epistemic justice, can empower communities of practice by centering their needs and their work to meet them, and can enable trustworthy community governance and decision-making in support of truly open, public, shared infrastructures for the future of knowledge production.

Open Infrastructures and the Future of Knowledge Production, part 1

I’ve been thinking a good bit lately about the ways that the future of knowledge production depends upon the openness of the infrastructures that support our work. For a lot of people, the word “infrastructure” triggers a yawn reflex, and not without reason. As Deb Chachra points out in her brilliant new book, How Infrastructure Works, the best thing that infrastructure can do is remain invisible and just work. But as Chachra also argues, the shape of our entire culture is dependent on our infrastructure, and where inequities are part of those systems’ engineering, they constrain the ways that culture can evolve. Infrastructure matters enormously, and the scholarly communication infrastructures on which we build, develop, design, and publish our work have deep implications for our abilities to foster social and epistemic justice in our knowledge production and communication practices, to empower communities of practice and their concerns in the development and dissemination of knowledge, and to enable trustworthy governance and decision-making that is led by the communities that our publications and platforms are intended to serve. Our team is far from alone in thinking about these questions right now. We’re seeing the idea of “open infrastructure” pop up a lot lately, in no small part because folks are recognizing that a commitment to open, public infrastructures is necessary to ensure that scholarly communication can become actually equitable.

What do I mean by “actually equitable”? How might that sense of equity intersect with the aims of the open-access movement? Over the last twenty-plus years that movement has worked to transform scholarly communication, arguing in part that if our work could be read more openly by anyone, it might both have more impact on the world at large and create a more equitable knowledge environment. It’s of course true that open access in its many present flavors has done a lot to make more research available to be read online. But the movement toward open access began as a means of attempting to break the stranglehold that a few extractive corporate publishers have established over the research and publishing process – and in that, it hasn’t succeeded. The last decade in particular has revealed all of the resilience with which capital responds to challenges, as those corporate publishers have in fact become more profitable than ever. Not only have they figured out how to exploit article processing charges in order to make some work published in their journals openly available while continuing to charge libraries for subscriptions to the journals as a whole, but they’ve also developed whole new business plans like the so-called “read and publish” agreements that keep many institutions tied to them, and they’ve developed new platforms and infrastructures like discovery engines and research information management systems that serve to increase corporate lock-in over the work produced on campus.

For all these reasons, the 20th anniversary statement of the Budapest Open Access Initiative took on a slightly different focus, noting that “OA is not an end in itself, but a means to other ends, above all, to the equity, quality, usability, and sustainability of research.” In order to achieve those ends, the statement proposes several key recommendations – and chief among them?

Host OA research on open infrastructure. Host and publish OA texts, data, metadata, code, and other digital research outputs on open, community-controlled infrastructure. Use infrastructure that minimizes the risk of future access restrictions or control by commercial organizations. Where open infrastructure is not yet adequate for current needs, develop it further.

This recommendation recognizes that the control of the infrastructure by profit-seeking entities cements inequities – and this is true even where the large corporate publishers purport to create opportunities for the disadvantaged by offering fee waivers and discounts on their publishing charges. Those discounts only serve to normalize a culture in which it is considered correct for those who produce knowledge to pay corporations to host and circulate it.

What scholarly communication needs today, more than anything, is a broad-based sense of accountability to scholars and fields and institutions rather than shareholders. Hence the call in the 20th anniversary Budapest statement for hosting open access research on open infrastructure: infrastructure that is led by us, and accountable to us.

This is the fundamental orientation and driving purpose of Humanities Commons. Our goal is to provide a non-extractive, community-led and transparently governed alternative to commercial platforms. We also want to encourage our users to rethink the purposes and the dynamics of publishing altogether, in ways that might allow for the development of new, open, collective, equitable processes of creating and sharing knowledge that re-center agency over the ways that scholarly work develops and circulates with the scholars themselves. As a result, we have put in place a participatory governance structure that enables both individual users and our institutional sustaining members to have a voice in the project’s future, and we have developed network policies that emphasize inclusion and openness. We are committed to transparency in our finances, and most importantly to remaining not-for-profit in perpetuity.

We are also working to build and sustain the kinds of new platforms and services that will allow for rich conversations among members of our community and between that community and the rest of the world. A year ago, seeing the handwriting on the wall for the platform formerly known as Twitter (and frankly having suffered through quite a number of unhappy years there before the beginning of the end), we launched hcommons.social, a Hometown-flavored Mastodon instance, in the hopes of providing a collegial, community-oriented space for informal communication among scholars and practitioners everywhere. We currently have more than 2000 users on our instance who are connecting with users throughout the Fediverse, and we support those users through a strong moderation policy and code of conduct. We also work to ensure that new policies and processes are discussed with that community before they’re implemented.

This kind of openness matters enormously, not just to ensure that we’re living up to the values that we’ve established for our projects, but to ensure that there’s a worthwhile future for them. Cory Doctorow has written extensively of late about what he has famously called the “enshittification” of the internet, a process in which value is sucked out of the community and into the pockets of shareholders. Users are left with no control over the platform, or the content they’ve provided to it. And this, he notes in a post on the new corporate platforms seeking to replace Twitter, remains true even if their C-suite is populated by good actors, because they’re still walled gardens.

The problem with walled gardens is partly about their ownership, but largely about their governance. It’s not just that the owners of any particular proprietary network might turn out to be racist, fascist megalomaniacs – it’s that we have no control if and when they do. Choosing open platforms means that we as users have a say in the future of the plots of ground we choose to develop. This is especially true for the kind of work, like knowledge production, that is intended to have a public benefit. It’s incumbent on us to ensure that those gardens aren’t walled, that they don’t just have a gate that management may one day decide to unlock to let select folks in or out. Rather, our gardens must be open from the start, open to connect and cultivate in the ways that we as a community decide.

As Doctorow notes, Mastodon is far from perfect, and as much as I love our own instance, hcommons.social is far from perfect. But we’re doing our best to ensure that we’re running it in the open. And operating in the open, both for the Commons and for hcommons.social, means for us that we are accountable to our users and responsible for safeguarding the openness of their work. Together, those two ideals undergird our commitment to provide alternatives to the many platforms that purport to make scholarly work more accessible but in fact serve as mechanisms of corporate data capture, extracting value from creators and institutions for private rather than public gain.

But, as I note, we aren’t a perfect solution to the problems of corporate control in scholarly communication. More on why in my next post.

NSF Grant for New STEM-focused Commons

The Commons team is delighted to have been awarded one of the inaugural FAIROS RCN grants from the NSF, in order to establish DBER+ Commons. That’s a big pile of acronyms, so here’s a breakdown: the NSF is of course the National Science Foundation, one of the most important federal funding bodies in the United States, and a new funder for us. The FAIROS RCN grant program was launched this year by the NSF in order to invest in Findable, Accessible, Interoperable, Reusable Open Science (FAIROS) by supporting the formation and development of Research Coordination Networks (RCN) dedicated to those principles.

We have teamed up with a group of amazing folks at Michigan State University who are working across science, technology, engineering, math, and more traditional NSF fields, all of whom are focused on discipline-based education research (DBER) as well as other engaged education research methodologies (the +). Our goal for this project is to bring them together with their national and international collaborators in STEM education to create DBER+ Commons, which will use — and crucially, expand — the affordances of the HCommons network and promote FAIR and CARE (Collective Benefit, Authority to control, Responsibility, Ethics) practices, principles, and guidelines in undergraduate, postbaccalaureate, graduate, and postdoctoral science education research activities.

We are thrilled about the collaboration this grant will allow us to develop, as well as the network advancements it will allow us to build. We’ll share more as the work progresses!

On Prior Publication

Last week, we received two takedown notices for items deposited to CORE. They arrived at nearly the same time, and so we found ourselves thinking about them in connected ways, though their cases are very, very different.

The first came through AWS Abuse, who passed on a report to us that we were distributing copyright infringing content. Under DMCA Safe Harbor provisions, we are required to take down such potentially infringing material immediately, and can only afterward follow up with an investigation to determine whether it’s actually infringing or whether it should be restored. Agreeing to follow this process is important to the network’s survival, as it’s only through such adherence that we can prevent the Commons from being sued for instances of copyright infringement of which we’re unaware.

In this case, we took the item down. Looking at the document revealed that it was a scan of copyrighted material, so the complainant may have a case. We have, however, inquired with the depositor in case there are complicating circumstances that we should know about.

The second request came to us from a user, who asked us to remove one of their deposits. Generally speaking, we resist removing deposits unless there are very good reasons, given our concern for the continuity of the scholarly record. In this case, it turned out that the deposit was a conference paper that the depositor later submitted for publication by a journal. The journal was now demanding that the deposit be removed, as they have a policy against accepting material that has been published elsewhere.

We reached out to the journal to ask about this policy, noting that even the venerable PMLA would not consider a conference paper deposited in a repository to be a violation of its prior-publication rule.

The response we received was — well, let’s say it — rude. The managing editor ultimately made it clear that if we did not remove the deposit, the journal would rescind its offer of publication to the author.

We are not in the business of harming the careers of our users, and so we have removed the deposit, if reluctantly. But we want to use this incident to open a conversation about the differences between conference papers and published articles, as well as between preprints and publications. We believe that authors have the right to share and seek feedback on the early stages of work prior to submitting that work to publishers, and that the existence of such pre-prints online does not constitute prior publication. And we urge our users to seek venues for publication that do not limit their rights over the ways they share their own work.

What issues have you run into in the relationship between sharing work online and publishing it in more formal venues? How would you encourage us to respond to situations like this? And how might we work together to create a more open, less extractive, and completely non-punitive scholarly communication ecosystem?

The Commons at Five

It’s been a bit of a whirlwind, having this 5th birthday celebration for the Commons come so soon on the heels of Thanksgiving and Giving Tuesday, but the coincidence of these events has had me thinking about the many many aspects of this project and its development for which I’m grateful, and the ways we’re hoping to mobilize our resources, and our community, to transform the development, sharing, and preservation of knowledge in and around the academy.

We’ll be sharing some of the numbers in other posts — the growth in our membership, the flourishing of our repository, and more — but I want to focus a bit on our path to sustainability. As we originally noted in our plan for Sustaining the Commons, we migrated the platform to Michigan State University in order to have a secure home base from which to begin a program of expansion that we believe will lead us to both technical and financial sustainability in the years ahead. That expansion is supported by an Infrastructure and Capacity Building Challenge Grant from the National Endowment for the Humanities, a program which has challenged us to raise $1.5 million in order to release $500,000 in federal matching funds. Thanks to a generous change capital grant from the Andrew W. Mellon Foundation, and a wide range of gifts from groups including the Samuel H. Kress Foundation, the Open Society Foundations, and Digital Scholar — not to mention, of course, community members like you — we are nearing our goal, with only a bit over $200,000 to go. The combined fund created by all of these gifts provides us with a five-year runway to establish the business model that will enable us to be fully sustainable.

We’ve begun that work by investing in our team. Earlier this year, we hired Mike Thicke as our full-time lead developer. Mike has been working hard on remediating some of the technical debt that accrued over the last few years. This work includes getting our underlying software updated and squashing up some long-standing bugs, but also updating our workflows to make them more sustainable. Our project manager, Bonnie Russell, has been a key collaborator in that process, as has our graduate assistant, Katie Knowles. Our work is backed by the rest of the team at MESH Research, including Brian Adams, Kelly Sattler, Scott Schopieray, and has been made possible in the first place by the fantastic team at the MLA, past and present, who got us to this point, including Eric Knappe, Anne Donlon, Nicky Agate, Katina Rogers, Nelson Alonso, and more.

Additionally, we’re in the process of bringing on board two more Commons team members who will help our community grow: a community development manager charged with bringing in the new organizations and institutions whose investments in the platform will be the basis for our future financial sustainability, and a user engagement specialist who will work to ensure that individual users are successful in their goals for using the platform. And we’re also searching for a couple of technical folks: an AWS-oriented systems architect and administrator, and (posting soon!) an identity management engineer.

This expanded team will allow us to do the thing that we’ve all been hoping for: to move beyond a non-stop round of Whack-a-Mole with the most immediate day-to-day issues, to instead focus on the platform’s future. That future includes significant expansion, as we not only bring on new partner organizations but also add hubs for the social sciences and STEM fields to the Commons constellation. It also includes creating new forms of interoperability with other key scholarly tools, allowing your Commons account to serve as a hub for a wide range of online collaboration and communication activities.

Beyond the financial and the technical, however, lies another category of sustainability, one that we believe makes the others possible: social sustainability. This form of sustainability arises from the commitment of a community not just to working together on a particular project, but to the idea of building the conditions for that togetherness in the first place. We’re working to enable this social sustainability by implementing a governance model that gives our partner organizations and institutions a voice in the platform’s development, and that draws on the insights and goals of our users in establishing that development path. To that end, our Participating Organization Council — the equivalent of a governing board for the project — has appointed members to two new groups: a Technical Advisory Group and a User Advisory Group. We’ll be meeting with those groups in January, and we’ll keep all of you posted on how you might connect with them to provide your input into the future of the Commons.

All of this work has been made possible by the support that we have received from our host institution, Michigan State University; from our partners, the Modern Language Association, the Association for Slavic, East European, and Eurasian Studies, the Association of University Presses, and the Society for Architectural Historians (plus a few more soon to be announced!); from our funders; and most of all from you. We’re enormously grateful to all of you, and very much looking forward to what we might do together in the next five years.

Misinformation and the Commons

My birthday fell earlier this week, and brought with it the usual delightful overflow of Facebook greetings. It was always my favorite part of that platform, and it managed to draw me out of the semi-boycott I’ve been conducting to say thanks to everyone.

My semi-boycott means that while I haven’t deleted my account because I still lurk a bit in order to see pictures of my nieces and nephews and so forth, I don’t post or interact with others’ posts. This decision has largely been driven by the enormous damage the platform has wrought in recent years, at levels from national politics to individual health and well-being. I miss some of my interactions there, but the nausea I feel when I consider contributing my time and attention to that company is just too much to squelch.

I bring this up here because this morning we discovered a new CORE deposit filled with highly problematic claims about the spread of COVID-19 and the effects of vaccines. We don’t — and can’t — review deposits for soundness, and yet we bill ourselves as a scholarly network, where work with some measure of academic authority can be found. Because of this, I believe we have a responsibility to prevent, or at a bare minimum not contribute to, the spread of harmful misinformation.

As a result, we’ve removed the deposit, and we will remove any similar deposits that we uncover.

We want, however, to develop the best policies and processes we can in order to ensure our network — whose openness we are committed to — does not risk becoming another vector for damaging misinformation. We’d very much appreciate your thoughts about this work; please leave your ideas and concerns in the comments. We’ll keep you posted as our work continues.

We Need Your Input

The questions that have recently surfaced for us around community, safety, and trust have made clear the extent to which we on the Commons team need ongoing feedback and advice from our users. Our network governance model, recognizing that need, provide for two advisory groups: a technical advisory group and a user advisory group. Members of each are to be named by the Participating Organization Council, and each group will bring the concerns and ideas of the Commons community to the team for discussion and integration into our project roadmap. (See our bylaws for more details.)

We are currently seeking nominations for each of these groups. If you would like to join us, please email a brief statement of interest, along with a link to your Commons profile, to hello@hcommons.org.

Let us know if you have any questions — we’ll look forward to hearing from you.

Community, Safety, and Trust

Earlier this month, the Modern Language Association held its annual convention, and our team hoped that we would be able to engage with attendees, helping them continue their conversations with one another via the Commons. Instead, we found ourselves fending off what initially looked like a bot attack: a massive influx of new account creation attempts with a few shared characteristics that made clear that there was orchestration involved. We put some measures in place to attempt to ensure that the majority of these attempts did not succeed, and spent several days playing whack-a-mole with the few that did.

In the process, it gradually came to seem that we might not be dealing with bots, but with humans: bad actors who were trying to find ways into the Commons community. To what end, we weren’t sure. But given the visibility of the MLA Convention, we really, really did not want to find out.

Things have gotten a bit quieter since the convention ended, but the suspicious account creation attempts continue. And fighting off this attack has taken all of the time that might have gone into the work we’re trying to do to improve and advance the platform, and it’s left our very small team exhausted. So we’re discussing some longer-term options, options that raise a few key questions we’d like to open up for discussion with the Commons community.

The most important question is this:

How do we balance our commitment to ensuring that the Commons is open to anyone — regardless of credentials, memberships, employment status, language, geographical location, and so forth — with our commitment to ensuring that the members of our community are safe and free from harassment? We’ve all seen much too graphically of late the costs of a hands-off approach to open social networks, but even within a more local academic frame of reference, we’ve seen what can happen when virtual events get Zoom-bombed or otherwise disrupted. We absolutely do not want members of our community to be threatened in any way that unsettles their ability, not to mention their willingness, to engage in the shared collaborative work that they’re undertaking here. We’re grateful that the Commons has managed to avoid such incidents up until now, but we’ve achieved a size and a visibility that has led us to become a target. As a result, we need to take action to protect the network and its members.

Should we establish some kind of verification requirement before new accounts are permitted to use some of the network’s features? We imagine that we might restrict new, unverified user accounts in ways that prevent such accounts from sending direct messages to other community members, for instance, or from creating unwelcome groups and sites within the network. This might work something like the trust levels model that Discourse uses, relying on a demonstration of good-faith engagement to gradually open up features to new accounts, though we may need something a bit lighter weight as we get started.

If we establish such a requirement, what paths toward verification should we enable? We could imagine verification happening as part of account creation if the new user uses an email address that demonstrates a connection with a trusted institution or organization, or if the new user links their account to another trustworthy scholarly data system such as ORCiD. But we also want to ensure that independent scholars and practitioners who may not have institutional credentials or established publication records can join us as well. Should we take the arXiv approach of having established members of the community vouch for new members, or does that run the risk of clubbiness? How do we preserve access for good actors while minimizing the damage that bad actors can do?

We welcome your thoughts on these questions, and we look forward to discussing the path ahead with the community as a whole.

Infrastructure and Capacity Building

I was delighted this week to be notified that the Humanities Commons team has received an Infrastructure and Capacity Building Challenge grant from the National Endowment for the Humanities.

This grant is the foundation of a long-term sustainability strategy for the Commons, which includes hiring two new full-time staff members to join the team and contribute to the build out of both our technical infrastructure and our community and governance models.

Of course, being a challenge grant, it comes with significant responsibilities on our part: chiefly, the raising of a 3:1 match to augment the federal funding. But we are excited about the prospects, and looking forward to getting started.

Another aspect of this plan includes migrating the Commons’s hosting and fiscal sponsorship to Michigan State University. The MLA has committed enormous energy and resources to getting the Commons off the ground and will continue to contribute to the network as the founding member organization and a key development partner. A research university, however — and particularly one as focused on public-facing research and scholarship as MSU — can provide certain kinds of long-term stability for our growing network.

You’ll be hearing more from us about all our plans in the weeks ahead. In the meantime, I want to thank the NEH for their ongoing support for this project, and thank all the members of the Humanities Commons community for getting us to this point. We look forward to serving the future of your work for years to come.

Building Community

Brightly colored bunting

When we launched Humanities Commons three years ago, our user base consisted of the 5,000-ish pre-existing members of MLA Commons. With generous support from the Andrew W. Mellon Foundation, we expanded the network to include Commons sites for our first-round pilot partners, CAA, AJS, and ASEEES. Perhaps most importantly, though, we also opened the Humanities Commons hub to any interested user who wanted to join us, regardless of institutional affiliation, society membership, disciplinary home, employment status, or geographic location.

Continue reading “Building Community”