Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 87 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,111 @@
# gitpull
# Dopull

Update an eBook folder with the latest files from the Git repository.
## Utilities used
[dopull.sh](#dopull)
[updatehosts.py](#updatehosts)
[puller.py](#puller)
[gitpull.py](#dopull)

## Overview
---
## dopull.sh {#dopull}
Cron task on pglaf.org to create or update an eBook on ibiblio and each of the mirrors, with the latest files from the Git repository.

`gitpull` is a simple Python utility that helps you keep a local folder synchronized with an eBook Git repository. It automatically clones the repository if it doesn't exist locally, or pulls the latest changes if it does.
---
## updatehosts.py {#updatehosts}

`puller` invokes gitpull in the ibiblio context. it looks for .zip.trig files in the system dopull directory, invokes gitpull on them to build or re-sync a source repo in the ibiblio FILES directory, and if successful, moves the .zip.trig file to the system `dopush` directory it is intended to run in a privileged account - most accounts do not have write access to the
Executed by dopull.sh on pglaf.org to update eBook files on ibiblio and each of the mirrors.

## Installation
### Arguments
- eBook number
- --update-gitpull (optional): update the gitpull script on all hosts and exit.

for use on pglaf, use the https://github.com/gutenbergtools/pglaf-gitpull repo instead of this one (for now)

```bash
git clone https://github.com/gutenbergtools/pglaf-gitpull.git
cd pglaf-gitpull
```

for use on iBiblio, use pipenv to create an environment, then install from pypi:
### Environment variables
- PRIVATE: base folder on ibiblio.
- IBIBLIO_BIN: location for the gitpull script on ibiblio.
- MIRROR_BIN: location for the gitpull script on the mirrors
- EBOOKS_DIR: the destination directory for the eBooks on the mirrors.

for production
```
pipenv install git+https//github.com/gutenbergtools/gitpull.git
```
### Usage

for local development:
```
pipenv install -e git+https//github.com/gutenbergtools/gitpull.git
```bash
python3 updatehosts.py #####
```

Then copy sample.env to .env and edit the paths as appropriate

### Behavior

## Usage
---
## puller.py {#puller}

```bash
python3 gitpull.py <eBook #> <target_path>
```
or
```bash
puller
```
Called by a cron task in the ibiblio context, it invokes gitpull. It looks for .zip.trig files in the system dopull directory, invokes gitpull on them to build or re-sync a source repo in the ibiblio FILES directory, and if successful, moves the .zip.trig file to the system `dopush` directory. It is intended to run in a privileged account - most accounts do not have write access to the system.

### Arguments and Configuration

gitpull

- `repository_url`: Number of the eBook Git repository to clone/pull from (e.g., `12345`)
- `target_path`: The local path _containing_ the eBook folder where the repository should be cloned or updated (e.g., `servername/1/2/3/4`, to update `servername/1/2/3/4/12345`)

puller
- has no arguments
- Has no arguments
- Reads three variables from its environment: PUBLIC, PRIVATE and UPSTREAM_REPO_DIR, which it uses to form a repository url and a target path for gitpull
- the default for PRIVATE is '' and for UPSTREAM_REPO_DIR is 'https://github.com/gutenbergbooks/' (which is used for testing)
- puller looks for 'trig' files named NNNNN.zip.trig in $PRIVATE/logs/dopull', extracts NNNNN and uses that as the git repository number for gitpull. The trig file is then moved to the $PRIVATE/logs/dopush directory, which is how indexing and ebook builds are triggered.
- puller looks for 'trig' files named NNNNN.zip.trig in \$PRIVATE/logs/dopull', extracts NNNNN and uses that as the git repository number for gitpull. The trig file is then moved to the \$PRIVATE/logs/dopush directory, which is how indexing and ebook builds are triggered.
- these directories should be created if they do not exist. The target directories need to be writable by the user.

if the target directories are not owned by the user who runs the gitpull or puller, the directories must be configured as "safe" with the command
If the target directories are not owned by the user who runs the gitpull or puller, the directories must be configured as "safe" with the command

`git config --global safe.directory '/path/to/directory/*'`

or for older versions of git:

`git config --global safe.directory '*'`


git worries about this to protect a user from having code deployed by an unauthorized user. (It is not sufficient to for the user to have group writing privileges.)

### Behavior

### Options
---
## gitpull.py {#gitpull}
Create or update an eBook folder with the latest files from the Git repository.

### Overview

A simple Python utility that helps you keep a local folder synchronized with an eBook Git repository. It automatically clones the repository if it doesn't exist locally, or pulls the latest changes if it does.

### Installation

for gitpull:
```bash
git clone https://github.com/gutenbergtools/gitpull.git
cd gitpull
```

for use on iBiblio, use pipenv to create an environment, then install from pypi:

for production
```
pipenv install git+https//github.com/gutenbergtools/gitpull.git
```

for local development:
```
pipenv install -e git+https//github.com/gutenbergtools/gitpull.git
```

Then copy sample.env to .env and edit the paths as appropriate.

### Usage

```bash
python3 gitpull.py <eBook #> <target_path>
```

### Arguments and configuration
- `repository_url`: Number of the eBook Git repository to clone/pull from (e.g., `12345`)
- `target_path`: The local path _containing_ the eBook folder where the repository should be cloned or updated (e.g., `servername/1/2/3/4`, to update `servername/1/2/3/4/12345`)

### Options
- `-h, --help`: Show help message and exit
- `-v, --verbose`: Enable verbose output
- `--norepo`: Do not keep Git history
- `--createdir`: Create `target_path` if needed
- `--createdirs`: Create `target_path` if needed

### Environment variables
- UPSTREAM_REPO_DIR: location of the PG Git repository system.

### Examples for gitpull

Expand All @@ -85,37 +115,36 @@ Clone a new repository or update an existing repository:
python3 gitpull.py 12345 /path/to/target
```

or
or
`pipenv run gitpull 12345 /path/to/target`

### Behavior


## Behavior of gitpull

- **The files will be pulled to a folder named with the eBook number in the target folder**: This prevents pulling to a folder that does not match the eBook number
- **If the target folder doesn't exist**: The application will exit
- **If the eBook folder doesn't exist in the target folder**: The repository will be cloned to the target path
- **The files will be pulled to a folder named with the eBook number in the target folder**: This prevents pulling to a folder that does not match the eBook number.
- **If the target folder doesn't exist**: The application will exit, unless `--createdir` is specified.
- **If the eBook folder doesn't exist in the target folder**: The repository will be cloned to the target path.
- **If the eBook folder exists but is empty**: the repository will be cloned
- **If the eBook folder exists and is a Git repository**:
- If it has the same remote URL, the latest changes will be pulled
- If it has a different remote URL, an error will be displayed and no changes will be made
- If it has the same remote URL, the latest changes will be pulled.
- If it has a different remote URL, an error will be displayed and no changes will be made.
- **If the eBook folder exists, but is not a Git repository** (the typical case in the 1/2/3 filesystem):
- Initialize the repository
- Initialize the repository.
- `git init`
- Connect to origin
- Connect to origin.
- `git remote add origin https://r.pglaf.org/git/76044.git/`
- Get the history - may take a while
- Get the history (may take a while).
- `git fetch --all`
- Check out main branch with overwrite - updates changed files, but we are in 'detached HEAD' state
- Check out main branch with overwrite - updates changed files, but we are in 'detached HEAD' state.
- `git checkout -f origin/main`
- Restore state
- `git switch main`
- Remove untracked files - force, include directories, & ignored (.zip) files
- `git clean -fdx`
- **The eBook folder will now be a Git repository, unless `--norepo` was used**
- **It does not update the database**: It is assumed that the chron-dopush.sh call to autodelete.py will do that

#### The eBook folder will now contain the current source files (only).
- Unless `--norepo` was specified. it will also contain the Git history.

---
## Requirements

- Python 3.6 or higher
Expand Down
83 changes: 58 additions & 25 deletions gitpull.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,41 @@
from pathlib import Path
import shutil

VERSION = "2026.03.16"
VERSION = "2026.03.26"

def load_env_file(filepath=".env"):
"""
Reads an .env file and sets environment variables.
Expected format: THEKEY=the_value
Assumes .env file is located in the directory where this script is.
"""
directory = os.path.dirname(os.path.abspath(__file__))
filepath = os.path.join(directory, filepath)

if not os.path.exists(filepath):
# User could set them manually...
#print(f"Warning: {filepath} file not found. Environment variables must be set manually.")
return

with open(filepath, "r") as file:
for line in file:
line = line.strip()
# Skip empty lines, comments, invalid lines
if not line or line.startswith("#") or "=" not in line:
continue
key, value = line.split("=", 1)
key = key.strip()
# Strip blanks & quotes
value = value.strip().strip('\'\"')
os.environ[key] = value
# print(f"Loaded environment variable: {key}={value}")


# Load the variables from the.env file
load_env_file()

UPSTREAM_REPO_DIR = os.getenv('UPSTREAM_REPO_DIR') or ''
#print(f"Using UPSTREAM_REPO_DIR: {UPSTREAM_REPO_DIR}")

# Configure logging
logging.basicConfig(filename='gitpull.log', level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
Expand Down Expand Up @@ -185,7 +218,7 @@ def remove_git_history(target_path):
"""
Remove Git history from the target path.
Deletes the .git directory and common Git-related files like .gitignore, .gitattributes,
README.md, and LICENSE.txt if they exist.
if they exist.
It might be cleaner to use "git archive" to export only the files without Git history,
but our server does not support the protocol. Would also need to remove untracked files.
Any existing unchanged files should not be updated.
Expand All @@ -196,7 +229,7 @@ def remove_git_history(target_path):
logger.info("Git history removed successfully")
else:
logger.info("No Git history found to remove")
files_to_remove = [".gitignore", ".gitattributes", "README.md", "LICENSE.txt"]
files_to_remove = [".gitignore", ".gitattributes"]
for filename in files_to_remove:
file_path = Path(target_path) / filename
if file_path.exists():
Expand All @@ -211,23 +244,18 @@ def main():
description="Update an eBook folder with the latest files from the Git repository",
epilog="Example: %(prog)s 12345 /path/to/target"
)
parser.add_argument(
"--version",
action="store_true",
help="Show version information"
)
parser.add_argument(
"ebook_number",
help="Number of the eBook Git repository to clone/pull from"
help="Number of the eBook Git repository to pull from"
)
parser.add_argument(
"target_path",
help="Path to the target folder to update"
)
parser.add_argument(
"-v", "--verbose",
action="store_true",
help="Enable verbose output"
"-v", "--version",
action="version",
version=f"%(prog)s version {VERSION}"
)
parser.add_argument(
"--norepo",
Expand All @@ -242,18 +270,21 @@ def main():

args = parser.parse_args()

if args.version:
print(f"gitpull version {VERSION}")
sys.exit(0)
# Set logging level based on verbosity
if args.verbose:
logger.setLevel(logging.DEBUG)

if not UPSTREAM_REPO_DIR:
logger.error("UPSTREAM_REPO_DIR environment variable is not set")
print("Failed: UPSTREAM_REPO_DIR environment variable is not set.")
sys.exit(1)

# Get the source repository
origin = f"{UPSTREAM_REPO_DIR}/{args.ebook_number}.git"
# Check if the source repository exists by trying to get its remote URL
try:
run_command(["git", "ls-remote", origin], noerror=True)
except subprocess.CalledProcessError:
logger.error(f"Source repository {origin} does not exist or is not accessible")
print(f"Failed: source repository {origin} does not exist or is not accessible.")
sys.exit(1)

# Check if target exists and is a directory
target_path = Path(args.target_path).resolve()
if not target_path.exists() or not target_path.is_dir():
Expand All @@ -271,14 +302,16 @@ def main():
print(f"Failed: {args.target_path} does not exist or is not a directory")
sys.exit(1)

# Update the directory
origin = f"{UPSTREAM_REPO_DIR}/{args.ebook_number}.git/"

# destination is a directory named with the ebook number under the target path
destination = f"{args.target_path}/{args.ebook_number}"
# Destination is a directory named with the ebook number under the target path
destination = Path(args.target_path).expanduser().resolve() / str(args.ebook_number)
logger.info(f"Pulling from {origin} to {destination}")

success = update_folder(origin, destination)
try:
success = update_folder(origin, destination)
except subprocess.CalledProcessError as e:
logger.error(f"Unexpected git operation failure: {e}")
success = False

# Remove Git history if not needed, but only if the update was successful to avoid
# deleting existing files on failure
if args.norepo and success:
Expand Down
Loading