fix: get all questions by yanglbme · Pull Request #5072 · doocs/leetcode

yanglbme · 2026-03-10T11:23:08Z

No description provided.

Copilot

Pull request overview

This PR updates the LeetCode spider to reliably fetch the full question list by paginating the GraphQL questionList endpoint, and fixes a bad Referer header construction in the English question detail fetch.

Changes:

Reworked get_all_questions_v2 to fetch questions page-by-page until totalNum is reached (instead of a single large request).
Changed the default limit value for get_all_questions_v2 and added progress logging during pagination.
Fixed get_question_detail_en Referer header to use question_title_slug (previously referenced an undefined variable).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-10T11:26:46Z

solution/main.py

+    def get_all_questions_v2(self, retry: int = 3, limit: int = 100) -> List:
        headers = {
            "Cookie": self.cookie_en,
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0",
            "Content-Type": "application/json",
        }
-        form = {
-            "query": "\n    query problemsetQuestionList($categorySlug: String, $limit: Int, $skip: Int, $filters: QuestionListFilterInput) {\n  problemsetQuestionList: questionList(\n    categorySlug: $categorySlug\n    limit: $limit\n    skip: $skip\n    filters: $filters\n  ) {\n    total: totalNum\n    questions: data {\n      acRate\n      difficulty\n      freqBar\n      frontendQuestionId: questionFrontendId\n      isFavor\n      paidOnly: isPaidOnly\n      status\n      title\n      titleSlug\n      topicTags {\n        name\n        id\n        slug\n      }\n      hasSolution\n      hasVideoSolution\n    }\n  }\n}\n    ",
-            "variables": {
-                "categorySlug": "all-code-essentials",
-                "skip": 0,
-                "limit": limit,
-                "filters": {"orderBy": "FRONTEND_ID", "sortOrder": "DESCENDING"},
-            },
-            "operationName": "problemsetQuestionList",
-        }
-        try:
-            resp = requests.post(
-                "https://leetcode.com/graphql",
-                headers=headers,
-                data=json.dumps(form),
-                timeout=20,
-            )
-            return resp.json()["data"]["problemsetQuestionList"]["questions"]
-        except Exception as e:
-            print("get_all_questions_v2", e)
-            time.sleep(2)
-            return self.get_all_questions_v2(retry - 1, limit) if retry > 0 else []
+        page_size = max(1, min(limit, 100))
+        skip = 0
+        total = None


limit is effectively used as the per-request page size (and is clamped to 100 via page_size = max(1, min(limit, 100))), so it no longer represents a maximum number of questions to fetch. This can be confusing for callers and changes the method’s external behavior; consider renaming the parameter to page_size (or adding a separate max_total) and documenting the semantics accordingly.

Copilot · 2026-03-10T11:26:47Z

solution/main.py

+                        "https://leetcode.com/graphql",
+                        headers=headers,
+                        data=json.dumps(form),
+                        timeout=20,


This request doesn’t pass verify=False, while other requests in this file consistently disable TLS verification and suppress urllib3 warnings. If the runtime environment relies on verify=False (as suggested by the rest of the file), get_all_questions_v2 may fail with SSL/cert errors; consider making the TLS verification behavior consistent with the other requests in Spider (or centralize it).

Suggested change

timeout=20,

timeout=20,

verify=False,

Copilot · 2026-03-10T11:26:47Z

solution/main.py

+                    time.sleep(2)
+
+            if not ok:
+                break


On repeated failures for a page, the function breaks out of the loop and returns whatever has been accumulated so far. Since the function is intended to “get all questions”, returning a partial list can silently corrupt downstream processing; consider failing the whole call (return [] like get_all_questions does, or raise an exception) or returning an explicit completion indicator alongside the data.

Suggested change

break

return []

yanglbme added 2 commits March 10, 2026 19:07

fix: get all questions

30eac41

fix: update

7887b43

Copilot AI review requested due to automatic review settings March 10, 2026 11:23

idoocs added py Issues or Pull requests relate to .py code core team Issues or pull requests from core team labels Mar 10, 2026

Copilot started reviewing on behalf of yanglbme March 10, 2026 11:23 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

yanglbme merged commit 945c25c into main Mar 10, 2026
13 checks passed

yanglbme deleted the dev branch March 10, 2026 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: get all questions#5072

fix: get all questions#5072
yanglbme merged 2 commits intomainfrom
dev

yanglbme commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yanglbme commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants