22 Jun 2022
The past few years of working with enterprise (in-house) applications has been a delightful synthesis of content from almost every course I took about software engineering (infosec, networking, software development lifecycles & architecture, design patterns, quality assurance & testing, project management, database design, database administration, web site development) and every meetup/conference I ever attended, and I wish I could find something like Andrew Tanenbaum’s Structured Computer Organization to TL;DR it all for the world.
I know, it’s called “DevOps,” right? :P
Anyway, I’m feeling really “galaxy-brain” across all of the silos of computer expertise that we carve day jobs into, and I’m craving a way to TL;DR decades of learning into a shareable nugget of knowledge.
Probably the closest thing in existence is Kamran Ahmed’s Backend Developer roadmap, but I still feel like there’s some “aha moment” I’ve had lately that has made its skills feel relevant to my job even though I’m not even a teeny-tiny bit a “web developer” by trade.
It turns out that because people mostly access the software I know a lot about through a web browser, it actually is useful for me to be able to see that software the way a “backend web developer” or a “system administrator” or a “network engineer” sees it.
I’m just not sure exactly how to get everybody else quite so excited about it, and which parts of that knowledge base to cherry-pick and TL;DR or briefly explain at a beginner’s level.
Until then, here are some thoughts (I’ll update this as my incoherent ramblings need expression):
- It’s important to be able to think of complex “software” as a suite of softwares that contribute different talents to letting humans/machine/clock-driven business processes. The web site you, a human, interact with through
https://login.salesforce.com, is actually a bunch of pieces of software: one “web server” that makes
https://login.salesforce.com/and show up in your web browser when you type that in, another “server” that crunches your login when you click the “Log In” button, another “web server” that makes things about your data show up in your web browser at
https://something.salesforce.com/once you log in, several more “servers” that serve the web server responsible for
https://something.salesforce.com/– and that’s before we start getting into which servers make it possible for you to work on Flow, host your Visualforce pages, etc!
- It’s really hard to care about the technicalities above unless you’ve tried to build multi-part software. Or at least gotten to be in the room when others are trying to build software that you’re familiar enough with to follow along in their conversations. If you can, it’s a great thing to really be able to truly have a feel for. If you combine such an understanding with knowledge about telecommunications/networking and tools available to troubleshoot telecommunications/networking issues, your troubleshooting abilities will increase greatly. Once you realize that computers have to talk to each other to make things happen, you realize that troubleshooting requires both 1) ways to test whether a specific computer is doing its specific job correctly and 2) ways to test whether network telecommunications into and out of a given computer are being transmitted as expected.
- It’s important to really have a strong sense of the differences between unit, integration, and regression testing in software programming, and then to be able to apply that mentality to other things like network connectivity or operating system functionality. (Unit testing is, “Does this teeny-tiny thing work?” Integration testing is “Does the entire business process work?” Regression testing is, “Do all of the unit & integration tests still work like they used to?”)
- It’s important to strongly value the principle of least privilege. Be the annoying person who challenges “Let’s just make this simple to implement” and ask, “How much tighter can we lock things down without breaking everything?” It’s not that there’s no elegance or security in simplicity … but be the gadfly who asks these questions and forces everyone around you to come up with specific lists/ballpark-counts of the humans and computers that need to behave in various sensitive ways, and then document it and share the documentation around.
- Networking/telecommunications: It’s important to know that most inter-computer communication works like a boomerang when it comes to security. Yes, messages might go “from computer A to computer B and back to computer A,” but that actually doesn’t mean that the computers talk “both ways” from a security perspective. “Communications from computer B to computer A” do not typically have to be specifically allowed to allow communication to go “A->B->A” – just “communications from computer A to computer B.” So any time people from networking/security are asking for your knowledge about “which computers talk to which,” never say “both ways” if it goes only “A->B->A” but never “B->A->B”. Always clarify, “Do you mean which computers are allowed to initiate a network connection to which other computers?” and if the answer is yes, in that case say, “Computer A needs to be able to initiate network connections to computer B and accept the responses. However, computer B should never need to initiate a network connection to computer A.”
- Not sure which direction things go? Get the IP addresses of Computer A & Computer B from a people who manage the computers in question, then ask your Networking expert to look through logs from the last few months and figure it out. They should be able to say, “I see computer A initiating lots of connections to computer B, but I don’t see any connections initiated by computer B to computer A in the last 3 months.”
- If you don’t make this distinction and tell people that they can lock down “B->A->B” traffic since it’s not normal, you could make computer A more vulnerable to being hacked by someone who’s already hacked computer B.
- “Anomaly detection” (a phrase important in computer security) bonus: If you’ve properly banned “B->A->B” traffic and suddenly there start being attempts to do that kind of traffic, it’ll show up in “banned activity” logs, which are often more well-watched by humans than “allowed activity” logs,” which potentially gives you earlier detection that computer B has been hacked. Earlier detection is pretty much always better than later detection when it comes to hackers.
- Pretty much all software that gets interacted with by humans who are sitting at a desktop computer is either:
- Installed onto their computer (meaning it’s “Windows desktop software” or “Mac desktop software” or “Linux desktop software” or “Chromebook desktop software”). Such software might mainly do work that doesn’t require an internet connection (like Microsoft Word) or it might mainly be programmed to interact with servers that exist on the internet (like Zoom), but either way, it runs on the user’s desktop computer as standalone software.
- A web site. Such “software” runs in the user’s web browser. Examples: the web site that Salesforce users use as a CRM, AirTable, Gmail, your bank’s portal, etc. Users may or may not feel like they’re “operating software,” depending on how technical the user interface feels. But the programmers who coded the web site most definitely think of themselves as programmers who have a lot in common with the kinds of programmers who make standalone desktop software like Microsoft Word & Zoom. In fact, they probably feel like they have a lot more in common with those programmers than with programmers who, say, make a personal blog website look pretty.
- “System administration” & “devops” & “site reliability engineering” & “cloud infrastructure” are the industry catchphrases you should look up if you’d like to find podcasts, meetups, & conferences that talk about the parts of the “web development” industry that set apart “making a pretty blog web site” from “making the Gmail website work.”
- Again, even if this isn’t explicitly your job, getting really knowledgeable about the topics covered in these areas makes you a lot more effective at troubleshooting “Why isn’t _____ working?”
- “Desktop support / management” & “service desk” & “help desk” & “software configuration” & “technical support” are the industry catchphrases you should look up if you’d like to become a more effective troubleshooter of “Why isn’t _____ working?” for the parts of your massive interconnected system that users access through desktop computer software. Your company’s tech helpdesk and whoever crawls under your desk to replace your cabling at the office are great resources to learn from.
- You can unblock a lot of inter-team conversations by learning a little of everyone else’s job. For example, if a server system administrator asks a software expert, “Where does that live?” the server administrator would probably like the name of the server that the files in question live on and the exact filesystem filepath that the files in question are stored under. If the software expert knows what the contents of the files should look like but doesn’t know their way around the server’s Linux/Windows operating system and folder structure, then maybe the two people can sit down at a computer together and explore as a pair.
- If the two can’t recognize their overlapping expertise themselves, a third person who’s a bit familiar with server administration and the software in question might be able to help suggest this team activity by recognizing that the software expert possesses expertise about the contents of a file and that the system administrator possesses expertise about using tools built into servers’ operating systems geared at quickly exploring the contents of files stored on those servers.
- (Pssst – managers: $$$ -> glue work.)
- In a well-designed complicated system, there should be lots of security restrictions scattered about the various components that make it up. If someone can’t use a system, they could be “locked out” at any of those points. (In addition to the possibility that they’re not even doing things correctly in the first place, like missing clicking an important button.) Human users will have an “integration test” point of view: “I can’t do the thing I used to do yesterday.” To actually figure out the problem, it’s useful to have a lot of “unit tests” that support staff can run to troubleshoot all of the possible contributing factors. How will they know which “unit tests” to try and who’s authorized to try them or make a fix? DOCUMENTATION. Writing is time-traveling mind-reading technology. Leverage it.
- (Pssst – managers: $$$ -> glue work.)
- To over-generalize, it seems to me that:
- People who work daily in “web development” and “system/server administration” and “networking” have the most up-to-date knowledge about best practices for skills listed in Kamran Ahmed’s Backend Developer roadmap (again, often marketed in conferences and podcasts as “devops”), even if they didn’t personally write the software that runs the website. Like, they probably know a lot about how to rent servers and make software run on them. This expertise is hard to gain if it’s not your full-time job unless you basically make it your part-time job through side projects and classes, but I want to fix that. (Let’s call this “expert category A.”)
- However, people who work daily in configuring the software in question (“application” experts, including those who program code running within the enterprise software rather than programming the enterprise software itself into existence), “tech desks” / “desktop support”, “infosec,” and again “networking”, probably have the most oddball-yet-relevant hands-on troubleshooting knowledge. This expertise is hard to gain except on the job because it’s really specific to your company. (Let’s call this “expert category B.”)
- Knowledge transfer:
- I think the best way to transfer knowledge from expert category B to expert category A is through documentation and lunch-and-learns. It’s a hodge-podge of weird company-specific gotchas; it’s not really something you can make a curriculum out of.
- But what’s the perfect way to transfer just exactly the right smidgin of knowledge from expert category A to expert category B? Still daydreaming and ruminating.
- Heard on a podcast just after finishing this: “when you’re trying to secure something, you need to have at least a topical level of understanding of the entire system, start to finish.” Security & troubleshooting go so hand-in-hand (social engineering!) that it’s almost like I didn’t forget security when I remembered troubleshooting but yes also security because the people who care in depth about security vs. troubleshooting may be on different teams yet both need some of that “category A” infrastructure management knowledge.