AI Reflections: Indigenous Data Sovereignty and Artificial Intelligence

Artificial Intelligence (AI) requires large amounts of data to function; that data comes from somewhere, and from someone(s). Over the years, many of us have given up our data; social media is a prime example of the ways that we hand over our data for free to use a service. Indigenous communities, however, have never given up their sovereignty, including sovereignty over their data. You may have heard of Indigenous data sovereignty; we’re going to highlight why it’s a particularly important principle to consider when engaging with AI.

Indigenous data is far more expansive than numbers or clicks; it includes all data about Indigenous peoples, from Indigenous peoples, and data collected from Indigenous territories (FNIGC, 2019). Indigenous data is also collectively owned and stewarded, a concept that doesn’t fit well within the colonial understanding of copyright or intellectual property. An Indigenous community, therefore, has the rights to, and sovereignty over, all data that impacts them and their ability to thrive.

To connect this back to AI, due to the history of researchers, governments, and churches collecting data on Indigenous peoples without consent, without sharing back the data they collected with the community, and without the intention of benefitting Indigenous communities at all (FNIGC, 2019), there is now a plethora of stolen Indigenous data that is publicly accessible. Considering informed consent has only recently been mandated as part of research, the history of the misuse of Indigenous data is long. Over time these materials have been published and then digitized, leaving anything open access likely to be scraped by bots training GenAI products and tools, as just one example. As Keoni Mahelona (featured below) aptly identifies, “data is the last frontier of colonization.”

Since the 1990s, frameworks have emerged to support the implementation of Indigenous data sovereignty, including the First Nations Principles of OCAP^(R) and the CARE principles, which build on the FAIR principles for scientific data management. While the frameworks themselves are relatively new, the rights they uphold are inalienable, and have existed for as long as Indigenous people have—since time immemorial. Further, Indigenous data sovereignty is enshrined in Canadian law, through the legislation of the UNDRIP at both the provincial (BC) and federal levels, and by extension, UBC’s Indigenous Strategic Plan. As Indigenous communities consider implementing AI to meet their needs, data sovereignty remains a key priority, as highlighted in the examples we share below.

This introduction provides a high-level overview of Indigenous data sovereignty and its connection to AI, but we encourage you to use the links in this post to dig deeper into these nuanced topics. We have also included questions you can use for personal reflection or to bring into the classroom to encourage critical thinking and discussion.

Questions for Discussion/Reflection

What do these different projects and initiatives have in common?
How are these examples similar/different from AI you have used? (for example: a LLM such as ChatGPT)
Does thinking about Indigenous data sovereignty make you reconsider any AI news you’ve recently encountered?
What would the future look like if large AI companies (such as OpenAI, Meta, etc.) upheld Indigenous data sovereignty?
How does collective stewardship of data change how we approach data collection, processing, and use in the context of AI?

Examples

The Cherokee Nation – Creating a Taskforce

In late 2024, the Cherokee Nation convened a task force on AI, data sovereignty, and cyber security. That task force compiled a detailed report to help guide their Nation, one grounded in Cherokee Community Values. For some added context, based in what is colonially known as northeastern Oklahoma, the Cherokee Nation is made up of over 450,000 citizens.

One of the values they grounded their report in was “ᏕᏣᏓᎨᏳᏎᏍᏗ detsadageyusesdi — Be stingy with one another’s existence, like a mother with child.” They tied this to data sovereignty, saying; “We strive to protect the data of our citizens as we would protect their existence. Cherokee data sovereignty is inextricably linked to our right and ability to govern ourselves.”

The taskforce listed several recommendations including; the creation of governance committees, data literacy and outreach programs, and an AI questionnaire to be answered by any companies hoping to work with the Nation. They have carefully followed through on those plans, working with several companies as well as MIT interns to create systems for their citizens including a tribal service portal, and a legal agent. All the time keeping values at the heart of their plans, as Chief Information Officer Paula Starr shared at a 2025 conference, “AI must serve the collective good and uphold Cherokee values. If a tool compromises that, it doesn’t belong in our Nation’s systems.”

Haíɫzaqv (Heiltsuk) Nation – Counting Salmon

In response to a declining salmon population and declining federal funding to monitor that population, the Heiltsuk Nation has collaborated with a team from the Pacific Salmon Foundation, Simon Fraser University, and others to implement SalmonVision. It is an AI monitoring system which uses images of salmon to keep an accurate count of how many are returning to a given river and location. They are able to collect images as the salmon swim through weirs built and deployed by the Nation and based on traditional designs passed down through generations.

The data from the images helps the Heiltsuk and other First Nations make real time decisions on how many fish they can sustainably harvest. The data is much more accurate than the pre-season estimates from Fisheries and Oceans Canada. As one project partner Will Atlas said, “having high quality, in-season data should put the nations in the driver’s seat. Knowledge is power when it comes to fisheries decision-making.” The hope is that the data collected by SalmonVision will ensure the long-term sustainability of salmon populations in the face of climate change and other challenges.

When reflecting on the ethics of the AI they are using, William Housty, Associate Director of the Heiltsuk Integrated Resource Management Department responded; “to us it is…We’re utilizing our own Traditional Knowledge to inform AI. AI is turning around and helping us to gather information, and we’re making decisions based on that information… That in itself is ethical, in that we’re not relying on the technology to make a decision for us. We’re relying on the technology to help us make an informed decision.”

Papa Reo Project: Bringing Te Reo Māori to the Digital World

The long-term goal of the Papa Reo Project is to build a “multilingual language platform that will develop cutting edge natural language processing methods and tools.” The project grew from decades of work undertaken by Te Hiku Media and their collaborators to revitalize the Te Reo Māori language. Project founders Peter-Lucas Jones and Keoni Mahelona see language revitalization at a crossroads with the proliferation of AI. They believe it will either solidify the supremacy of languages such as English or make it possible for languages like Te Reo Māori to reclaim space in the digital realm. Jones and Mahelona began by reaching out to the Māori community, asking intermediate level language speakers to record themselves reading words and phrases which were used to train their algorithm. Over time Papa Reo has grown to include everything from bi-lingual transcription tools to a real-time pronunciation assessment feature.

Central to the project is the value described in the word Kaitiaki, which roughly translates to “guardian”. Those involved in the project are protecting the collected data, the models derived from the data, and the resulting natural language processing tools. People, companies, and governments which gain access to it are only permitted to engage with that material if their intentions align with Māori customs, protocols, and values. It is all protected by the Kaitiakitanga License and the Māori Data Sovereignty Principles.

Do you have any other examples you’d like to share? Questions or comments on this post? Please don’t hesitate to get in touch; we’d love to hear your thoughts.

Positionality Reflection

As two non-Indigenous people sharing resources about this topic, we believe that it is important to acknowledge our positionality. We come to this work from our roles on the Indigenous Initiatives (II) team and our shared interest in supporting Indigenous data sovereignty and ethical engagement with technology. We are very grateful for the guidance and feedback of our II team members and CTLT colleagues.