Skip to content

fix: get all questions#5072

Merged
yanglbme merged 2 commits intomainfrom
dev
Mar 10, 2026
Merged

fix: get all questions#5072
yanglbme merged 2 commits intomainfrom
dev

Conversation

@yanglbme
Copy link
Member

No description provided.

Copilot AI review requested due to automatic review settings March 10, 2026 11:23
@idoocs idoocs added py Issues or Pull requests relate to .py code core team Issues or pull requests from core team labels Mar 10, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the LeetCode spider to reliably fetch the full question list by paginating the GraphQL questionList endpoint, and fixes a bad Referer header construction in the English question detail fetch.

Changes:

  • Reworked get_all_questions_v2 to fetch questions page-by-page until totalNum is reached (instead of a single large request).
  • Changed the default limit value for get_all_questions_v2 and added progress logging during pagination.
  • Fixed get_question_detail_en Referer header to use question_title_slug (previously referenced an undefined variable).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +58 to +66
def get_all_questions_v2(self, retry: int = 3, limit: int = 100) -> List:
headers = {
"Cookie": self.cookie_en,
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0",
"Content-Type": "application/json",
}
form = {
"query": "\n query problemsetQuestionList($categorySlug: String, $limit: Int, $skip: Int, $filters: QuestionListFilterInput) {\n problemsetQuestionList: questionList(\n categorySlug: $categorySlug\n limit: $limit\n skip: $skip\n filters: $filters\n ) {\n total: totalNum\n questions: data {\n acRate\n difficulty\n freqBar\n frontendQuestionId: questionFrontendId\n isFavor\n paidOnly: isPaidOnly\n status\n title\n titleSlug\n topicTags {\n name\n id\n slug\n }\n hasSolution\n hasVideoSolution\n }\n }\n}\n ",
"variables": {
"categorySlug": "all-code-essentials",
"skip": 0,
"limit": limit,
"filters": {"orderBy": "FRONTEND_ID", "sortOrder": "DESCENDING"},
},
"operationName": "problemsetQuestionList",
}
try:
resp = requests.post(
"https://leetcode.com/graphql",
headers=headers,
data=json.dumps(form),
timeout=20,
)
return resp.json()["data"]["problemsetQuestionList"]["questions"]
except Exception as e:
print("get_all_questions_v2", e)
time.sleep(2)
return self.get_all_questions_v2(retry - 1, limit) if retry > 0 else []
page_size = max(1, min(limit, 100))
skip = 0
total = None
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is effectively used as the per-request page size (and is clamped to 100 via page_size = max(1, min(limit, 100))), so it no longer represents a maximum number of questions to fetch. This can be confusing for callers and changes the method’s external behavior; consider renaming the parameter to page_size (or adding a separate max_total) and documenting the semantics accordingly.

Copilot uses AI. Check for mistakes.
"https://leetcode.com/graphql",
headers=headers,
data=json.dumps(form),
timeout=20,
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request doesn’t pass verify=False, while other requests in this file consistently disable TLS verification and suppress urllib3 warnings. If the runtime environment relies on verify=False (as suggested by the rest of the file), get_all_questions_v2 may fail with SSL/cert errors; consider making the TLS verification behavior consistent with the other requests in Spider (or centralize it).

Suggested change
timeout=20,
timeout=20,
verify=False,

Copilot uses AI. Check for mistakes.
time.sleep(2)

if not ok:
break
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On repeated failures for a page, the function breaks out of the loop and returns whatever has been accumulated so far. Since the function is intended to “get all questions”, returning a partial list can silently corrupt downstream processing; consider failing the whole call (return [] like get_all_questions does, or raise an exception) or returning an explicit completion indicator alongside the data.

Suggested change
break
return []

Copilot uses AI. Check for mistakes.
@yanglbme yanglbme merged commit 945c25c into main Mar 10, 2026
13 checks passed
@yanglbme yanglbme deleted the dev branch March 10, 2026 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core team Issues or pull requests from core team py Issues or Pull requests relate to .py code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants