Anti Hacker Tool Kit, 4th Edition

  

  

ANTI-HACKER TOOL KIT

Fourth Edition

ANTI-HACKER TOOL KIT

Fourth Edition

  About the Author

Mike Shema is the co-author of several books on information security, including the

Anti-Hacker Tool Kit and Hacking Exposed: Web Applications, and is the author of Hacking

Web Applications . Mike is Director of Engineering for Qualys, where he writes software

  to automate security testing for web sites. He has taught hacking classes and continues to present research at security conferences around the world. Check out his blog at

  About the Technical Editors

Eric Heitzman is an experienced security consultant (Foundstone, McAfee, Mandiant)

  and static analysis and application security expert (Ounce Labs, IBM). Presently, Eric is working as a Technical Account Manager at Qualys, supporting customers in their evaluation, deployment, and use of network vulnerability management, policy compliance, and web application scanning software.

  Robert Eickwort , CISSP, is the ISO of an agency within a major municipal

  government, where he has worked for fifteen years in IT administration and information security. The challenges of meeting wide-ranging regulatory and contractual security requirements within the limited resources, legacy systems, and slow-changing culture of local government have brought him a special appreciation of DIY tactics and open-source tools. His responsibilities range from security systems operation to vulnerability and risk assessment to digital forensics and incident response. Rob holds a B.A. in History from the University of Colorado at Boulder and an M.A. in History from the University of Kansas.

ANTI-HACKER TOOL KIT

  Fourth Edition Fourth Edition

  M i k e Shema New York Chicago San Francisco Athens London Madrid

  Mexico City Milan New Delhi Singapore Sydney Toronto

  

Copyright © 2014 by McGraw-Hill Education (Publisher). All rights reserved. Printed in the United States of America. Except

as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any

means, or stored in a database or retrieval system, without the prior written permission of Publisher, with the exception that the

program listings may be entered, stored, and executed in a computer system, but they may not be reproduced for publication.

  ISBN: 978-0-07-180015-0 MHID: 0-07-180015-8 ® e-Book conversion by Cenveo Publisher Services Version 1.0 The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-180014-3,

MHID: 0-07-180014-X.

  

McGraw-Hill Education eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use

in corporate training programs. To contact a representativ All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps. Information has been obtained by McGraw-Hill Education from sources believed to be reliable. However, because of the

possibility of human or mechanical error by our sources, McGraw-Hill Education, or others, McGraw-Hill Education does not

guarantee the accuracy, adequacy, or completeness of any information and is not responsible for any errors or omissions or the

results obtained from the use of such information. TERMS OF USE

This is a copyrighted work and McGraw-Hill Education (“McGraw Hill”) and its licensors reserve all rights in and to the work.

Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve

one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based

upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill’s prior

consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited.

Your right to use the work may be terminated if you fail to comply with these terms. THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED

FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA

HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING

BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR

PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your

requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you

or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom.

McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall

McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that

result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages. This

limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise.

  

For the Menagerie:

Fins, claws, teef, and all.

  This page has been intentionally left blank At a Glance

  ●

  • – ●

  

  

  

  ●

  • – ●

  

  

  

  ●

  • – ●

  

  

  

  

  

  

vii viii Anti-Hacker Tool Kit

  ●

  • – ●

  

  

  

  ●

  • – ●

  

  

  Contents

  

  ●

  • – ●

  

  

  

  

  

  

  ix x Anti-Hacker Tool Kit

  

  

  

  

  

  

  

  

  

  

  

  

  

   Contents xi

  ●

  • – ●

4 Vulnerability Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  xii Anti-Hacker Tool Kit ●

  

  

  

  

  

  

  

  

  

  Contents xiii

  

  

  

  

  

  

  

  

  

  

  

   xiv Anti-Hacker Tool Kit

  ●

  • – ●

  

  

  

  

  

  

  

  

  

  

  

  

  

  

   Contents xv ●

  

  

  

  

  

  

  

  

  

  This page has been intentionally left blank Acknowledgments

  hanks to Amy Eden for starting the engines on this new edition, and to Amanda Russell for making sure it reached the finish line. Everyone at McGraw-Hill who worked on this book provided considerable support, not to mention patience.

  T

  Rob and Eric provided insightful suggestions and important corrections during the tech editing process. If there are any mistakes, it’s because I foolishly ignored their advice.

  Thanks to all the readers who supported the previous editions of this title. It’s your interest that brought this book back. I’d like to include a shout-out to Maria, Sasha, Melinda, and Victoria for their help in spreading the word about my books. Your aid is greatly appreciated. And finally, the Lorimer crew has remained steadfast and true. Keep the van running, don’t make a deal with a dragon, and remember the motto. Always remember the motto.

  xvii This page has been intentionally left blank

Introduction

  elcome to the fourth edition of the Anti-Hacker Tool Kit. This is a book about the tools that hackers use to attack and defend systems. Knowing how to conduct advanced configuration for an operating system is a step toward being a

  W

  hacker. Knowing how to infiltrate a system is a step along the same path. Knowing how to monitor an attacker’s activity and defend a system are more points on the path to hacking. In other words, hacking is more about knowledge and creativity than it is about having a collection of tools.

  Computer technology solves some problems; it creates others. When it solves a problem, technology may seem wonderful. Yet it doesn’t have to be wondrous in the sense that you have no idea how it works. In fact, this book aims to reveal how easy it is to run the kinds of tools that hackers, security professionals, and hobbyists alike use.

  A good magic trick amazes an audience. As the audience, we might guess at whether the magician is performing some sleight of hand or relying on a carefully crafted prop. The magician evokes delight through a combination of skill that appears effortless and misdirection that remains overlooked. A trick works not because the audience lacks knowledge of some secret, but because the magician has presented a sort of story, however brief, with a surprise at the end. Even when an audience knows the mechanics of a trick, a skilled magician may still delight them.

  The tools in this book aren’t magical; and simply having them on your laptop won’t make you a hacker. But this book will demystify many aspects of information security. You’ll build a collection of tools by following through each chapter. More importantly, you’ll build the knowledge of how and why these tools work. And that’s the knowledge that lays the foundation for being creative with scripting, for combining attacks in clever ways, and for thinking of yourself as a hacker.

  xix xx Anti-Hacker Tool Kit Why This Book?

  By learning how security defenses can be compromised, you also learn how to fix and reinforce them. This book goes beyond brief instruction manuals to explain fundamental concepts of information security and how to apply those concepts in practice using the tools presented in each chapter. It’s a reference that will complement every tool’s own documentation.

  Who Should Read This Book

  Anyone who has ever wondered if their own computer is secure will find a wealth of information about the different tools and techniques that hackers use to compromise systems. This book arms the reader with the knowledge and tools to find security vulnerabilities and defend systems from attackers. System administrators and developers will gain a better understanding of the threats to their software. And anyone who has ever set up a home network or used a public Wi-Fi network will learn the steps necessary to discover if it is insecure and, if so, how to make it better.

  What This Book Covers

  This book describes how to use tools for everything from improving your command-line skills to testing the security of operating systems, networks, and applications. With only a few exceptions, the tools are all free and open source. This means you can obtain them easily and customize them to your own needs.

  How to Use This Book

  This book is separated into four parts that cover broad categories of security. If you’re already comfortable navigating a command line and have different operating systems available to you, then turn to any topic that appeals most to you. If you’re just getting started with exploring your computer, be sure to check out Part I first in order to build some fundamental skills needed for subsequent chapters.

  In all cases, it’s a good idea to have a handful of operating systems available, notably a version of Windows, OS X, and Linux. Each chapter includes examples and instructions for you to follow along with. Most of the tools work across these operating systems, but a few are specific to Linux or Windows.

  Tools

  In the chapters, you’ll find globe icons in the left margin to indicate links for downloading the tools to add to your toolkit. Introduction xxi Videos

  You’ll also find references throughout the book to several videos that further discuss various topics. The videos may be obtained from McGraw-Hill Professional’s Media plus your e-mail address at the Media Center site to receive an e-mail message with a download link.

  How Is This Book Organized?

  Part I: The Best of the Basics The material in this part walks you through fundamental

  tools and concepts necessary to build and manage systems for running hacking tools as well as hacking on the tools themselves to modify their code. Chapter 1 explains how to use the different source control management commands necessary to obtain and build the majority of tools covered in this book. It also covers simple programming concepts to help you get comfortable dealing with code. Chapter 2 helps you become more familiar with using systems, such as discovering the flexibility and power of the Unix command line. Chapter 3 introduces virtualization concepts and tools to help you manage a multitude of systems easily—you’ll find virtualization a boon to setting up test environments and experimenting with attacks.

  Part II: Systems This part covers tools related to addressing security for operating

  systems like Windows, Linux, and OS X. Chapter 4 introduces the vulnerability testing leviathans, OpenVAS and Metasploit. These are the all-encompassing tools for finding and exploiting flaws in systems. Chapter 5 goes into more detail on how to conduct file system monitoring to help alert administrators to suspicious activity. Chapter 6 covers more Windows-specific system auditing tools.

  Part III: Networks This part shows how different tools attack and defend the

  communications between systems. Chapter 7 leads off this section by showing how the venerable Netcat command-line tool provides easy interaction with network services.

  Chapter 8 builds on the Netcat examples by showing how hackers use port redirection to bypass security restrictions. Chapter 9 explains how using port scanners reveals the services and operating systems present on a network; this is important for finding targets. Chapter 10 starts with the sizable topics of sniffing packets on wired and wireless networks, and then it moves from those passive attacks to more active ones like breaking wireless network passwords and injecting traffic to spoof connections.

  Chapter 11 describes how to monitor and defend a network from network probes like Nmap to exploit engines like Metasploit. Chapter 12 takes a detour into dial-up networking, which, even though it has been largely supplanted by wireless and wired remote access, still represents a potential weakness in an organization. xxii Anti-Hacker Tool Kit

  

Part IV: Applications This part shifts the book’s focus to tools that aid in the analysis and

  defense of the software that runs on systems and drives web applications. Chapter 13 catalogs some tools necessary to start reverse engineering binary applications in order to understand their function or find vulnerabilities (vulns) within them. Chapter 14 explains how to use command-line and proxy tools to find vulns in web applications.

  Chapter 15 delves into the techniques for successful, optimal password cracking. Part V: Forensics This part introduces several tools related to discovering, collecting, and protecting system and user data. Chapter 16 presents the basics to building a forensics toolkit for monitoring events and responding to suspected intrusions.

  Chapter 17 brings the book to a close with an eye on tools to help enhance privacy in a networked world.

  PART PART

  I THE BEST OF THE

THE BEST OF THE

BASICS BASICS

  1

  1 This page has been intentionally left blank

  3 CHAPTER CHAPTER

  1 MANAGING SOURCE CODE

AND WORKING WITH

PROGRAMMING LANGUAGES

MANAGING SOURCE CODE

AND WORKING WITH

PROGRAMMING LANGUAGES

4 Anti-Hacker Tool Kit

  hether they like it or not, we tell computers what to do. Decades ago programmers wrote instructions on physical punch cards, heavy paper with tiny holes. Development principles haven’t changed much, although the

  W

  methods have. We have replaced punch cards with sophisticated assembly instructions, system languages like C and C++, and higher-level languages like Python and JavaScript. Programming guides typically introduce new developers to a language with the standard “Hello, World!” demonstration before they dive into the syntax and grammar of the language. If you’re lucky, you’ll learn to write a syntactically correct program that doesn’t crash. If you’re not lucky...well, bad things happen. Nothing of much consequence happens should a “Hello, World!” example fail, but the same is not true when your voice-activated computer refuses to respond to a command like, “Open the pod bay doors, HAL.”

  Regardless of whether you’re programming an artificial intelligence for a parallel hybrid computer, a computer that communicates via a tarriel cell, or a shipboard computer to assist a crew on a five-year mission destined to explore strange, new worlds, you’ll need to keep track of its source code.

  You will likely also be tracking the source code for many of the tools covered throughout this book. Some developers provide packaged binaries that you can download and install. Some tools require compilation from source in order to be customized to your particular system. In other cases, a packaged release might be out of date, missing bug fixes only present in the “trunk” of its source tree. Finally, you might find yourself impressed, frustrated, or curious enough to want to modify a tool to suit your needs. In each of these cases, familiarity with SCM comes in handy for managing changes, sharing patches, and collaborating with others.

  This chapter covers source control management (SCM) as well as a brief introduction to programming languages in order to help you understand and, ideally, be able to modify and hack the tools throughout this book. One definition of hacking is the ability to imagine, modify, and create software. On the hierarchy of hacking, blindly running a tool someone else wrote ranks low, whereas understanding and creating your own tools is a commendable goal.

  SCM Concepts

  Documents go through all sorts of changes as we work on them, from fixing typos to adding footnotes to rewriting complete sections. In programming terms, such edits are a diff (or difference) from one version to the next. If two people are working from the same original text, a diff for one text may be shared and applied as a patch to the other. This synchronizes changes so that multiple texts can be brought to the same version as different people work on them, or piecemeal changes can be applied to texts that are diverging. A diff works on a line-by-line basis. So, if one character in a line changes, then a diff algorithm will “remove” the old line and “add” a new replacement line with the troublesome character fixed.

  It’s also possible to apply a patch even when the target has diverted from the original. Patch algorithms make educated guesses about where to apply a diff based on hints like

Chapter 1 Managing Source Code and Working with Programming Languages

  5

  filenames, line numbers, and surrounding text. These algorithms have improved over decades of experience with handling source code. However, if a document has changed too much from the original version on which the patch is based, then the diff will result in a conflict. A programmer must resolve a conflict manually by inspecting the two different texts and deciding which changes to keep or reject based on the context of the text in conflict.

  Not all edits are good. Sometimes they have typos, introduce bugs, or implement a poor solution to a problem. In this case you would revert a diff, removing its changes and returning the document to a previous state.

  At the moment it’s not necessary to know the details of the patch or diff commands available from the Unix command line. The intent of a diff is somewhat evident in terms of which lines it adds or removes. The following diff adds a <meta> tag to an HTML document. The new line is distinguished by a single plus symbol (+) at the beginning of a line. The name of the file to be changed is “index.html” (compared from two repositories called “a” and “b”). The line starting with the @@ characters is a “range” hint that the diff and patch algorithms use to deduce the context where a change should be applied. This way a patch can still be applied to a target file even when the target has changed from the original (such as having a few dozen new lines of code unrelated to the diff).

  diff a/index.html b/index.html index 77984c8..57c583e 100644

  • a/index.html
    • b/index.html @@ -1,6 +1,7 @@ <!doctype html> <html> <body>
      • <meta charset="utf-8"> <title>My Web Page</title> </body> <head>

  This section focuses on the “unified” diff format. This is the most common format generated by SCM tools. Include the

  • u or --unified option to ensure that your system’s diff command produces this format.

  The developer might choose to set the charset via a header, deciding it’s unnecessary to use a <meta> tag. In that case the line would be removed, as indicated by a single minus symbol (-) at the beginning. The deletion is shown here:

  diff a/index.html b/index.html index 57c583e..77984c8 100644

  • a/index.html
    • b/index.html

6 Anti-Hacker Tool Kit

  @@ -1,7 +1,6 @@ <!doctype html> <html> <body>

  • <meta charset="utf-8"> <title>My Web Page</title> </body> <head>

  Or the developer might decide that since the web site is going to be translated into Russian, it’s a better idea to use a different character set. In this case the diff removes a line and adds a line to resolve the edit:

  diff a/index.html b/index.html index 57c583e..504db3f 100644

  • a/index.html
    • b/index.html @@ -1,7 +1,7 @@ <!doctype html> <html> <body>
      • <meta charset="utf-8">
        • <meta charset="koi8-r"> <title>My Web Page</title> </body> <head>

  By now you may have noticed that diffs apply to each line of a document rather than to just a few specific characters in a line. Changing the charset from “utf-8” to “koi8-r” required removing the original line and replacing it with a new one. Often a diff affects multiple lines of a document. In the previous examples there was an embarrassing error: the <body> and <head> elements were created backwards. The following diff fixes the error:

  diff a/index.html b/index.html index 57c583e..65e5856 100644

  • a/index.html
    • b/index.html @@ -1,9 +1,9 @@ <!doctype html> <html>
      • <body>
        • <head>> <meta charset="utf-8"> <title>My Web Page</title>

      • </body>

Chapter 1 Managing Source Code and Working with Programming Languages

  7

  • <head> </head>
    • <body>
    • </body> </html>

  An SCM keeps track of all these kinds of changes in a repository. After a while of referring to it as such (about twice), you’ll start calling it a repo. Each diff is marked with a revision that serves as an identifier to help distinguish when (relative to the application of other diffs) or where (in a branch, trunk, or tag—we’ll get to this in a bit) it was applied. The repository manages each change to make sure files don’t get out of sync or to warn developers when a diff is too ambiguous to be applied (for example, if someone else also changed the same area of the document). The way a repository manages content falls into two broad categories:

  • Centralized Version control is maintained at a single location or origin

  (sometimes called master) server. Developers retrieve code from and commit code to this master server, which manages and synchronizes each change. As a consequence, developers must have network connectivity to the server in order to save or retrieve changes, but they always know what the latest revision is for the code base.

  • Distributed Version control is managed locally. Developers may retrieve patches from or commit patches to another copy of the repository, which may be ahead of or behind the local version. There is technically no master server, although a certain repository may be designated the official reference server. As a consequence, developers may work through several revisions, trunks, or branches on their local system regardless of network connectivity.

  Always use the https:// scheme instead of http:// (note the s) to encrypt the communication between the client and repository. It’s a good habit that protects your passwords. Even anonymous, read-only access to repositories should use HTTPS connections to help prevent the kinds of attacks covered in Chapter 10.

  Users commit diffs to the repository in order to store the changes for later reference and for access by other developers. For a centralized repo, such changes are immediately available to other developers since the centralized repo is considered the primary reference point for the code base (and all developers are assumed to have access to it). For a distributed repo, the changes aren’t available to others until the developer shares the patch, “pushes” the revision to a shared, nonlocal repo, or invites another developer to “pull” the revision. (This represents two different styles of development, not that one or the other is superior.) Each commit produces a revision that is referenced by a name or number. Revision numbers are how repositories keep track of their state.

  Repositories are usually successful at automatically merging diffs from various commits. Even so, a conflict is bound to happen when either the algorithm is unable to determine where a file should be changed or the change is ambiguous because the target

8 Anti-Hacker Tool Kit

  file has diverged too much from the original. Conflicts should be resolved by hand, which means using an editor to resolve the problem (or actual hand-to-hand combat, because developers too often disagree on coding styles or solutions to a problem). The following example shows a merge conflict within a file. The text between <<<<<<< and ======= typically represents your local changes, while the text below it indicates the incoming conflict.

  </head> <body> <<<<<<<

"For there to be betrayal, there would have to have been trust first."

======= "And trust has not been part of the agreement." >>>>>>> </body> </html>

  The state of a repository may also be broken out by revisions to the trunk, branches, or tags. A repository’s trunk typically represents the mainline or most up-to-date state of its contents. Branches may represent version numbers or modifications with a distinctive property. A branch creates a snapshot of the repository’s state that, for example, represents a stable build. New commits may be made to the trunk, keeping the project moving forward but also keeping the branch in a predictable state for testing and release. Tags may be used to create functional snapshots of the state, or capture the state in a certain revision for comparison against another. From a technical perspective, there’s no real difference between branches and tags in terms of how the repository handles commits. The terms exist more for developers to conceptualize and track the status of a project over time.

  SCM commands that operate on a file or directory usually also operate on a label that represents the trunk, a branch, or a tag. For example, a command may generate diffs between a branch and the trunk, or from a master source and a local repository. Learn the label syntax for your SCM of choice; it makes working with revisions much easier.

  Development rarely progresses in a linear manner. Developers may use different branches to test particular features. Different commits may affect the same areas of code. Bug fixes applied to the trunk may need to be back-ported to an old release branch. SCM tools have commands for conducting a merge that brings together different commits. Merge operations are not immune to conflicts. When problems do arise, the tool usually prompts for instructions on how to automatically resolve a conflict (e.g., which changes take precedence over others) or has a means to manually resolve the merge.

  Code repositories are fundamental to creating code in a collaborative manner. The collaboration may be between two people who share an office, between large development teams, or between globally distributed contributors to an open source project. In all cases, the role of comments for every commit is important for maintaining

Chapter 1 Managing Source Code and Working with Programming Languages

  9

  communication within the project and avoiding or resolving conflicts that arise from design and implementation decisions.

  Just as coding style guidelines evoke strong feelings based on preference, bias, and subjective measures, so does documenting code and making comments for a commit. The following example comes from the Linux Kernel Newbies development policies. Whether you agree or not may reflect, once again, your preference, or may be due to differences between your project (no legacy of years of code, no requirements for broad platform support), or differences in your developers (no global distribution, no diversity of contributors’ spoken language). On the other hand, it can’t hurt to emulate the practice of coders who are creating high-quality, high-performance code for millions of users from contributors in dozens of countries.

  That’s a long preamble for simple advice. Here are the guidelines fr

   Describe the technical detail of the change(s) your patch includes. Be as specific as possible. The WORST descriptions possible include things like “update driver X”, “bug fix for driver X”, or “this patch includes updates for subsystem X.

  Please apply.” If your description starts to get long, that’s a sign that you probably need to split up your patch.

  In other words, there’s nothing wrong with a brief comment. However, it should be informative for someone else who looks at the commit. It’s often helpful to explain why or how a fix improves code (e.g., “Normalize the string to UTF-8 first”) as opposed to stating what it fixes (e.g., “Prevent security vuln” or “Missing check”). If you have a bug-tracking system in which you create helpful comments, test cases, and other annotations, then it’s more acceptable to have comments like “Bug XYZ, set ptr to NULL after freeing it.” You can find mor

  UTF-8 is an ideal character set for comments, regardless of what other character sets may be present in a project. Developers may share a programming language but not a spoken (or written) one. There are dozens of character sets with varying support for displaying words in Cyrillic, Chinese, German, or English, to name just a few examples. UTF-8 has the developer-friendly properties of being universally supported, able to render all written languages (except Klingon and Quenya), and NULL-terminated (which avoids several programming and API headaches). There’s one final concept to introduce before we dive into the different SCM software. You’ll notice that the tools share many similarities in syntax and semantics. Most commands have an action or subcommand to perform a specific task. For example, checking in a commit usually looks like one of the following two commands. The first command (with a “naked” action, meaning it has no further arguments) commits changes for all files in the project or the project’s current directory. The second command

10 Anti-Hacker Tool Kit

  commits the changes for a single file named mydocument.code, leaving any other changes untracked for the moment.

  $ scmtool commit $ scmtool commit mydocument.code

  If you get lost following any of the upcoming examples, or you’d like to know more details about a task, use the help action. The tool will be happy to provide documentation.

  $ scmtool help $ scmtool help action

  See? Even if we’re always telling computers what to do, they’re ever-ready to help. Except when it comes to those pod bay doors.

  Git

  om Linus Torvalds’ desire to create a source control system for the Linux kernel. In 1991, Linus released the first version of what is arguably the most famous, and perhaps most successful, open source project. More than 10 years later the kernel had grown into a globally distributed programming effort with significant branches, patches, and variations in features. Clearly, having an effective mechanism to manage this effort was needed. In 2005 Linus released Git to help manage the kernel in particular, and manage distributed software projects in general.

  Git works the familiar primitives of source control management systems such as commits, diffs, trunks, tags, branches, and so on. However, Git has the intrinsic property of being a distributed system—a system in which there is no official client/ server relationship. Each repository contains its entire history of revisions. This means that there’s no need to have network access or synchronization to a central repository. In essence, a Git repository is nonlinear with regard to revisions. Two different users may change source code in unique, independent ways without interfering with each other. One benefit of this model is that developers are more free to independently work with, experiment with, and tweak code.

  Of course, a software project like the Linux kernel requires collaboration and synchronization among its developers. Any project needs this. So, while Git supports independent development and revision management, it also supports the means to share and incorporate revisions made in unsynchronized (i.e., distributed) repositories. This section walks through several fundamental commands to using Git.

   provide hosting and web interfaces for Git-based projects.

  Working with Repositories

  There are two basic ways of working with a repository: either create (initialize) one yourself or clone one from someone else. In both cases, all revisions will be tracked in

Chapter 1 Managing Source Code and Working with Programming Languages

  11

  the local repository and will be unknown to others until the revisions are explicitly shared. To create your own repository, use the init action, as follows:

  $ mkdir my_project $ cd my_project $ git init $ cd .git $ ls HEAD branches/ config description hooks/ info/ logs/ objects/ packed-refs refs/

  The repository is created within the current working directory. All of its management files are maintained in the top-level .git directory. It’s never a good idea to edit or manipulate these files directly; doing so will likely corrupt the repository beyond repair. Instead, use any of the plentiful Git actions. Also note that the repository exists in this one directory. It’s still a good idea to have a backup plan for these files in case they are deleted or lost to a drive failure (or the occasional accident of typing rm -rf file *).

  With the repository created, the next step is to add files to be tracked and commit them at desired revision points. These steps are carried out with the appropriately named add and commit actions:

  $ cd my_project $ touch readme.md $ git add readme.md $ git commit readme.md

  One quirk of Git that may become apparent (or surprising) is that it works only with files, not directories. In an SCM like Subversion, it’s possible to commit an empty directory to a repository. Git won’t commit the directory until there’s a file within it to be tracked. After all, a diff needs to operate on the contents of a file.

  Sometimes you’ll have present in a repository particular files that you don’t wish to track at all. Git will look for a .gitignore file with a manifest of files or directories to be ignored. Merely create the .gitignore file and manage it like you would any other commit. You may use explicit names for the entries in this file or use globs (e.g., *.exe is a glob that would ignore any name with a suffix of .exe; whereas tmp* would ignore any name that starts with tmp).

  $ touch .gitignore $ git add .gitignore

  The usual Git model is to commit files to the local repository and, when it’s necessary to share revisions, pull them into the repository. In a centralized SCM system, the natural procedure would be to push revisions to the master repository. The distributed model differs because there’s no guarantee that repositories are in sync, or that they have the same branches, or that revisions from one will not overwrite uncommitted changes in another. Therefore, repositories pull in changes in order to avoid a lot of these problems.

12 Anti-Hacker Tool Kit

  If you do wish to assign a repository as the master and consider it the “central” server, consider creating a bare repository. This creates the management files normally found in the .git subdirectory right in the current working directory:

  $ mkdir central $ cd central $ git init --bare $ ls HEAD branches/ config description hooks/ info/ objects/ refs/

  If you’ll be working from someone else’s repository, then you’ll need to create a local copy on your development system by using the clone action. This creates the top-level working directory of the repository, the .git subdirectory, and a copy of the repository’s revision history. This last point, the revision history, is important. In a centralized model, you’d query the changes for a file from the central server. In Git’s distributed model, you already have this information locally. The benefit of this model is that you can review the history and make changes without having access to the server from which it was originally cloned—a boon to developers’ independence and a reduction in bandwidth that a server would otherwise have to support.

  When working with large projects, consider using the

  • depth 1 or --single- branch option to clone only the primary “top” (or HEAD) branch of the project.

  The clone action requires a path to the repository. The path is often an HTTP link. The following example clones the entire development history of the Linux kernel. We’ll return to this repo for some later examples. However, the repo contains about 1.2GB of data, so the cloning process may take a significant amount of time (depending on the bandwidth of your network connection) and occupy more disk space than you desire. If you’re hesitant to invest time and disk space on a repo that you’ll never use, you should still be able to follow along with the concepts that refer to this repo without having a local copy. In fact, you should be able to interact with the web-based interface to the kernel’s Git r

  

$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Cloning into 'linux'... remote: Counting objects: 2622145, done. remote: Compressing objects: 100% (402814/402814), done. remote: Total 2622145 (delta 2198177), reused 2617016 (delta 2193622) Receiving objects: 100% (2622145/2622145), 534.73 MiB | 2.07 MiB/s, done. Resolving deltas: 100% (2198177/2198177), done.

  Now that you have created or cloned a repository, it’s time to work with the files. Use the status action to check which files are tracked, untracked, and modified. The status action accepts the -s and -u flags to display shortened output and untracked files, respectively. The following example shows the status of the my_project repo that we

Chapter 1 Managing Source Code and Working with Programming Languages

  13 used to demonstrate the diff concepts when changing the contents of an index.html file.

  In this case, we have uncommitted changes to the index.html file. Plus, we’ve created a file called new_file in order to demonstrate how Git reports the status for a file it isn’t tracking.

  $ cd my_project $ git status -s M index.html ?? new_file

  Use git help status to find out the meaning of status indicators. In the previous example, the M indicates a tracked file that has been modified but whose changes haven’t been committed. The ?? indicates an untracked file.

  As noted earlier, Git tracks individual files. Should you need to rename a file, use the Git action to do so rather than a raw file system command. This preserves the revision history for the file.

  $ cd my_project $ git mv readme.md readme $ git commit -a rename readme.md => readme (100%)

  Because Git tracks the repo’s entire revision history, the file store used to track changes can become very large. Running the occasional clean action (e.g., git clean) will keep the file store tidy by compressing references to old revisions or removing redundant information that has accumulated over time. Try adding the

  • d, -f, or -x flags (or include all three at once) to this action to return the repository to a pristine condition.

  Git works with the master branch by default. Branching and tagging are lightweight operations; they induce very little overhead in terms of file copies. Consequently, it’s common for developers to create branches for testing different configurations or code changes. The lightweight nature of branches makes it easy to switch between them as well. The following example shows the creation of a new branch, a checkout action to switch to it, and then a merge action to bring the branch’s changes back into the master branch: