This is not an official government website. Views expressed here represent the personal opinions of current and former federal employees.

What it really takes to migrate COBOL, and why DOGE will fail

By Anonymous federal technologists

We are (or were) federal technologists who have modernized legacy systems, including COBOL mainframes. When we read that DOGE was planning on migrating the Social Security Administration’s (SSA) COBOL systems, we immediately had concerns.

COBOL runs most of the calculations on government systems. For example, SSA distributed $1.3 trillion in benefits in FY 2023 to 73 million people. COBOL touches most (if not all) of it.

In fact, more than 200 billion lines of COBOL quietly power our lives, from finance to healthcare to retail to travel. 95% of the time you use a credit or debit card today, COBOL is involved.

Given how integrated COBOL is in critical aspects of our lives, moving away from it is a major effort. It takes time and care to make sure it goes right.

Reliability is the highest priority in a government system. Break something, and you could potentially ruin the lives of hundreds of thousands of people. 

To modernize a system, you have to understand it

Your approach to modernizing software depends on your end goal. Is the goal to make the entire system faster? More accurate? Cheaper to maintain? There’s never just one way to modernize a system.

The first step of modernizing COBOL is to understand why COBOL was originally used. But even the scope for this endeavor is currently murky. SSA's infrastructure contains more than 60 million lines of COBOL, and changes every year because legislation changes. SSA has more than 4,000 technical systems, and it's unclear how many of them touch COBOL.

A large part of the effort also involves understanding what the system currently does, which is where Artificial Intelligence (AI) can be leveraged well by the right people.

Experts are needed when using AI to understand legacy systems

AI is able to directly translate lines of code from one language to another, but the output will not be useful out of the box. You still need to understand what the system is doing first, so you can give the AI the right training data to be helpful. That is especially important for massive legacy COBOL systems, because each system is bespoke to each agency.

Even then, AI has a harder time keeping track of human logic and creative solutions across multiple systems. Both of which abound in any system that's decades old, like SSA's COBOL system.

It'll take time and care to untangle all the logic, which includes understanding the original requirements in the legacy system and working with the people who know how everything connects. You need to understand what the original code is doing, especially between systems, before siccing AI on porting it.

It starts with the people

Training people to read COBOL isn't the hard part, because it's actually very human readable. The hard part is untangling decades of legislated business rules and small codebase-level decisions to make things work. Very little of that will be written down anywhere. Instead, people are essentially the documentation in legacy systems. 

It'll be the employee who's been there more than 50 years, who knows the benefits programs and technical systems inside and out, who is the only person able to validate how something is supposed to work. DOGE is currently making it extremely unattractive to work in the federal government, by offering early retirement for people, as well as firing scores of people without cause. What do you do when those people with deep knowledge are gone?

Thorough testing will make or break the initiative

When modernizing legacy systems, moving the code actually doesn't take up most of the timeline. What takes the most time is testing the new system and making sure the outputs are what you expect they are.

COBOL is extremely tedious to test

This is where working with COBOL is challenging. COBOL isn't like other programming languages—it was created to allow non-engineers to work with business logic in technical systems. Critically, COBOL systems lack the infrastructure to support automated testing.

Updating software today can be extremely fast, mostly because developers can write automated tests directly into the code. Make a change, throw a battery of tests at the new code, let them run, and you can see almost immediately if something unexpected happens (or breaks). Agile modern software practices rely on automated testing.

COBOL doesn't support automated testing out of the box. AI that can generate COBOL tests are being developed, but to our knowledge has never been successfully used at the scale that SSA requires. Right now, testing COBOL at SSA’s scope likely must be manual, by which we mean someone has to go through the process, document what happens at each step, then log the output. That can mean hundreds of pages of Word docs with screenshots, for each change in the code. And for a nationwide government system, you need to perform hundreds of thousands of tests to come anywhere close to covering all the possible use cases.

It is completely understandable that a team of developers would want to get away from that workflow. But to migrate COBOL to another language requires, you guessed it, an extreme amount of testing.

The math will be off with a direct migration without testing and adjusting

COBOL has been around so long, that it thinks about data in a very different way than modern systems. To summarize an extremely technical explanation, basically, the math may be off when converting COBOL to something else, even its accepted modern equivalent business language Java.

Needless to say, the math being off isn't great when the whole point of the system is to disperse more than a trillion dollars to more than 70 million people.

The only way to mitigate the unexpected math result is to take the time to rigorously test the new system against the original, compare the outputs, then tweak the code to ensure the outputs are exactly the same. It’s possible that this will need to be done for every single use case, and there could be hundreds of thousands of use cases.

Even if AI can be used for this, a successful migration still requires giving the right people time to do the work.

Trusting people to do the work

No organization has ever liked to spend money or time maintaining or cleaning up (a.k.a., “refactoring”) code, even in the private sector. It’s always new features that get the glory. It's the same in government. But that’s how we got to where we are.

Maintenance and refactoring is real work, and needs to be treated as such with effort and investment. SSA staff told the New York Times that it would take an estimated 5 to 7 years to modernize COBOL, and cost more than $2 billion. And remember, most of that time will be spent testing, not migrating code.

Promoting psychological safety on modernization teams

There will be failures when modernizing a legacy system. Inevitably, things will break along the way. You have to plan how to roll things back quickly and smoothly. Embracing the inevitability of failure and preparing for swift recovery empowers people to be bold and do what needs to be done.

Legacy system modernization initiatives frequently fail because people don't want to face the truth about the work—how hard it is, how long it will take, and how risky it can be. What we often see is that the people who have been maintaining the code don't fully understand it, and they're embarrassed by that.

To do modernization work well, it has to be okay to not know things, and to fail and recover. We have seen that COBOL analysts pairing with coders can successfully convert systems incrementally, but there must be a culture of collaboration and trust. And the team needs to be given the time and space to do their jobs well.

DOGE has admitted they are not about solving problems or improving services

Nothing DOGE has done so far gives us confidence they will migrate COBOL responsibly. Nothing tells us that they want to work with longtime SSA staff on understanding the current system. They don’t even seem to want to understand the current system at all, given their incorrect assumptions about how birth dates work in SSA.

DOGE leads at SSA have said that DOGE is “not there to solve problems or improve service — only to eliminate property, head count and find fraud”. The COBOL migration initiative is reportedly led by Elon Musk lieutenant Steve Davis, who is notorious for cost-cutting, overseeing layoffs at Twitter and finding line items to cut at DOGE. Musk himself has said that Steve Davis is "like chemo. A little chemo can save your life; a lot of chemo could kill you."

(To be clear, a legacy system migration should not be like chemo at all unless it involves policy changes from Congress first, to streamline the requirements before migrating anything. But we have not seen any evidence of intended policy updates.)

Under DOGE, SSA will de-emphasize direct services to the public, closing many regional offices. DOGE is getting SSA to reduce phone service just as calls have increased 30% compared to last year. If someone has a problem with their Social Security payments, they will have fewer avenues to resolve them.

SSA is being forced to fail before Musk decides what is worthy of keeping or fixing, according to his zero-based budgeting approach. This likely means there will be no fallback plan if something breaks during the migration.

DOGE's true intentions with COBOL

It's becoming more obvious that DOGE is likely getting SSA off of COBOL because they don't understand the system, don’t want to understand the system, and want to start fresh with something they can control 100%.

Recently, it was reported that DOGE is exploiting existing technical weaknesses in SSA's system to falsify government records and punish people they deem undesirable, namely by using SSA's "death master file" to "terminate...the financial lives" of people who are not dead.

A COBOL migration under DOGE would not be for better maintenance or service. Otherwise they would approach it with curiosity, investment, and time, not a rushed timeline while prioritizing cuts to the rest of the agency.

It's very possible that DOGE wants to take over SSA's entire technical system so they can destroy anyone’s financial life with the click of a button. Given what we've seen, they’ll likely port the COBOL system to Java using a Large Language Model (LLM), call it a day, then turn off all customer service channels so nobody can yell about the math being wrong, which it will be.

What’s at stake

With no fallback plan, there’s no estimating when or if services can be brought back if they break something. Botched multi-year migrations by banks (with smaller customer bases than SSA) have resulted in widespread service outages that take more than 6 months to fully recover from.

If services fail at a private sector company, people can choose another company to do business with. They might lose money with a bank, but they could choose another bank.

There is no other U.S. government. There’s only one SSA. And there are millions of people in this country who rely on Social Security benefits for over 90% of their income. What will they do if that income disappears?