AvikalpK commited on
Commit
d9dd652
·
1 Parent(s): f8cd5f6

Initial deployment of IQKiller AI-Powered Job Analysis Platform

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.gitignore ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual Environment
24
+ venv/
25
+ env/
26
+ ENV/
27
+
28
+ # Personal files
29
+ *.pdf
30
+
31
+ # IDE
32
+ .vscode/
33
+ .idea/
34
+ *.swp
35
+ *.swo
36
+
37
+ # OS
38
+ .DS_Store
39
+ Thumbs.db
40
+
41
+ # Cache
42
+ .cache/
43
+
44
+ # Backup
45
+ backup_*/
Dockerfile ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.11 slim image for smaller size
2
+ FROM python:3.11-slim
3
+
4
+ # Set environment variables
5
+ ENV PYTHONDONTWRITEBYTECODE=1 \
6
+ PYTHONUNBUFFERED=1 \
7
+ PIP_NO_CACHE_DIR=1 \
8
+ PIP_DISABLE_PIP_VERSION_CHECK=1
9
+
10
+ # Create non-root user for security
11
+ RUN groupadd -r iqkiller && useradd -r -g iqkiller iqkiller
12
+
13
+ # Set work directory
14
+ WORKDIR /app
15
+
16
+ # Install system dependencies
17
+ RUN apt-get update && apt-get install -y \
18
+ --no-install-recommends \
19
+ gcc \
20
+ g++ \
21
+ curl \
22
+ && rm -rf /var/lib/apt/lists/*
23
+
24
+ # Copy requirements first (for better caching)
25
+ COPY requirements.txt .
26
+
27
+ # Install Python dependencies
28
+ RUN pip install --no-cache-dir -r requirements.txt
29
+
30
+ # Copy application code
31
+ COPY --chown=iqkiller:iqkiller . .
32
+
33
+ # Create necessary directories
34
+ RUN mkdir -p /app/cache /app/logs && \
35
+ chown -R iqkiller:iqkiller /app/cache /app/logs
36
+
37
+ # Switch to non-root user
38
+ USER iqkiller
39
+
40
+ # Expose ports
41
+ EXPOSE 7862 7863 8080
42
+
43
+ # Health check
44
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
45
+ CMD curl -f http://localhost:8080/health || exit 1
46
+
47
+ # Default environment variables
48
+ ENV AUTH_ENABLED=true \
49
+ RUN_MODE=app \
50
+ JWT_SECRET=change-this-secret-in-production \
51
+ GRADIO_SERVER_NAME=0.0.0.0 \
52
+ GRADIO_SERVER_PORT=7862
53
+
54
+ # Default command
55
+ CMD ["python", "gradio_app.py"]
56
+
57
+ # Alternative commands:
58
+ # For login only: docker run -e RUN_MODE=login iqkiller
59
+ # For both interfaces: docker run -e RUN_MODE=both iqkiller
60
+ # For development (no auth): docker run -e AUTH_ENABLED=false iqkiller
INTERVIEW_GUIDE_README.md ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎯 Personalized Interview Guide Generator
2
+
3
+ ## Overview
4
+
5
+ The **Personalized Interview Guide Generator** is a major new feature in IQKiller that creates custom interview preparation guides based on your specific resume and target job posting. It follows the high editorial standards of Interview Query while adapting everything to your unique background, strengths, and skill gaps.
6
+
7
+ ## 🚀 Key Features
8
+
9
+ ### 📊 **Gap Analysis**
10
+ - **Skills Matching**: Compares your resume skills with job requirements
11
+ - **Match Score**: Calculates compatibility percentage (0-100%)
12
+ - **Visual Breakdown**: Shows strengths, partial matches, and skill gaps
13
+ - **Smart Categorization**: Groups technical vs. soft skills automatically
14
+
15
+ ### 📝 **Personalized Content**
16
+ - **Tailored Questions**: Technical and behavioral questions based on your gaps
17
+ - **Custom Advice**: Specific approach guidance referencing your background
18
+ - **Talking Points**: Key achievements and experiences to highlight
19
+ - **Smart Questions**: Thoughtful questions to ask the interviewer
20
+
21
+ ### 🎨 **Interview Query Style**
22
+ - **Professional Format**: Clean, structured markdown output
23
+ - **Visual Elements**: Skills charts, difficulty indicators, progress bars
24
+ - **Actionable Advice**: Specific, practical recommendations
25
+ - **No Boilerplate**: Focused, relevant content only
26
+
27
+ ## 🛠️ How It Works
28
+
29
+ ### **Pipeline Overview**
30
+ ```
31
+ Resume + Job Posting → Gap Analysis → Personalized Interview Guide
32
+ ```
33
+
34
+ ### **Processing Steps**
35
+ 1. **Resume Parsing** - Extract skills, experience, projects, education
36
+ 2. **Job Analysis** - Identify requirements, responsibilities, tech stack
37
+ 3. **Gap Analysis** - Compare and categorize skill matches/gaps
38
+ 4. **Guide Generation** - Create personalized content using LLM
39
+ 5. **Rendering** - Format in Interview Query style with visuals
40
+
41
+ ### **Input Methods**
42
+ - **Resume**: Text paste or file upload (PDF, TXT, DOCX)
43
+ - **Job Posting**: URL scraping or direct text paste
44
+ - **Validation**: Automatic input validation and error handling
45
+
46
+ ## 📋 Guide Structure
47
+
48
+ Each personalized guide includes:
49
+
50
+ ### **1. Header & Match Score**
51
+ - Visual match score indicator (🟢🟡🟠🔴)
52
+ - Role and company information
53
+ - Overall compatibility assessment
54
+
55
+ ### **2. Skills Analysis**
56
+ ```
57
+ Strong Matches ████████████ 12
58
+ Partial Matches ▒▒▒▒▒▒ 6
59
+ Skill Gaps ░░░░ 4
60
+ ```
61
+ - Visual skills breakdown chart
62
+ - Detailed strengths and gaps lists
63
+ - Personalized summary
64
+
65
+ ### **3. Interview Process**
66
+ - Company-specific or role-typical process
67
+ - Number of rounds and interview types
68
+ - Timeline expectations
69
+ - Key stakeholders you'll meet
70
+
71
+ ### **4. Question Sections**
72
+
73
+ #### **Technical Questions** 🔴🟡🟢
74
+ - Difficulty-coded questions
75
+ - Personalized approach advice
76
+ - Focus on your skill gaps
77
+
78
+ #### **Behavioral Questions**
79
+ - STAR method guidance
80
+ - References to your specific experience
81
+ - Role-appropriate scenarios
82
+
83
+ #### **Company Questions**
84
+ - Culture and values alignment
85
+ - Company research suggestions
86
+ - Strategic conversation starters
87
+
88
+ ### **5. Preparation Strategy**
89
+ - **🎯 Priority Focus Areas**: Study plan for gaps
90
+ - **💪 Leverage Strengths**: How to showcase assets
91
+ - **📋 General Tips**: Match score-based advice
92
+
93
+ ### **6. Key Talking Points**
94
+ - Specific achievements from your resume
95
+ - Project highlights and metrics
96
+ - Experience alignment with role
97
+
98
+ ### **7. Smart Questions to Ask**
99
+ - Role-specific thoughtful questions
100
+ - Company growth and challenges
101
+ - Technical architecture and tools
102
+
103
+ ### **8. Resources & Conclusion**
104
+ - Learning resources for skill gaps
105
+ - Success stories and examples
106
+ - Practice question collections
107
+
108
+ ## 🔧 Technical Implementation
109
+
110
+ ### **Micro-Pipeline Architecture**
111
+ - `resume_parser.py` - Resume content extraction
112
+ - `gap_analysis.py` - Skills comparison and scoring
113
+ - `interview_guide.py` - Content generation
114
+ - `guide_render.py` - Markdown formatting
115
+ - `interview_orchestrator.py` - Pipeline coordination
116
+
117
+ ### **Key Technologies**
118
+ - **LLM Integration**: GPT-4o-mini for content generation
119
+ - **Async Processing**: Concurrent pipeline execution
120
+ - **Robust Parsing**: JSON extraction with fallbacks
121
+ - **File Support**: PDF, TXT, DOCX resume uploads
122
+ - **Error Handling**: Comprehensive validation and recovery
123
+
124
+ ### **Performance Features**
125
+ - **Smart Chunking**: Handles large resumes/job postings
126
+ - **Token Optimization**: Efficient prompt engineering
127
+ - **Parallel Processing**: Concurrent micro-function execution
128
+ - **Caching Ready**: Integration with existing cache system
129
+
130
+ ## 📊 Quality Standards
131
+
132
+ ### **Content Quality**
133
+ - **Interview Query Standards**: Professional, actionable content
134
+ - **Personalization**: Every section adapted to user background
135
+ - **No Generic Advice**: Specific, relevant recommendations only
136
+ - **Editorial Polish**: Clean writing and clear structure
137
+
138
+ ### **Technical Quality**
139
+ - **Robust Error Handling**: Graceful failure recovery
140
+ - **Input Validation**: Comprehensive input checking
141
+ - **Performance Monitoring**: Detailed metrics and logging
142
+ - **Scalable Architecture**: Production-ready design
143
+
144
+ ## 🎯 Usage Examples
145
+
146
+ ### **Web Interface**
147
+ 1. Go to "🎯 Personalized Interview Guide" tab
148
+ 2. Upload/paste your resume
149
+ 3. Add target job posting (URL or text)
150
+ 4. Click "🚀 Generate Interview Guide"
151
+ 5. Review personalized guide and tips
152
+
153
+ ### **Programmatic Usage**
154
+ ```python
155
+ from interview_orchestrator import create_personalized_interview_guide
156
+
157
+ result = create_personalized_interview_guide(
158
+ resume_text="...",
159
+ job_input="https://company.com/jobs/123"
160
+ )
161
+
162
+ if result["success"]:
163
+ guide = result["rendered_guide"]
164
+ match_score = result["gap_analysis"]["match_score"]
165
+ print(f"Match Score: {match_score}%")
166
+ print(guide)
167
+ ```
168
+
169
+ ### **Test Suite**
170
+ ```bash
171
+ python test_interview_guide.py
172
+ ```
173
+
174
+ ## 🚀 Getting Started
175
+
176
+ ### **Prerequisites**
177
+ - Python 3.11+
178
+ - Required packages: `pip install -r requirements.txt`
179
+ - API keys: OpenAI and/or Anthropic
180
+
181
+ ### **Quick Start**
182
+ 1. **Launch the app**: `python gradio_app.py`
183
+ 2. **Open browser**: Navigate to `http://localhost:7862`
184
+ 3. **Select tab**: "🎯 Personalized Interview Guide"
185
+ 4. **Upload resume**: Paste text or upload file
186
+ 5. **Add job posting**: URL or direct text
187
+ 6. **Generate guide**: Click the generate button
188
+
189
+ ### **Configuration**
190
+ - **Environment variables**: Set in `.env` file
191
+ - **Authentication**: Configurable JWT auth
192
+ - **Model selection**: OpenAI or Anthropic
193
+ - **Cache settings**: Redis or disk cache
194
+
195
+ ## 🔍 Validation & Testing
196
+
197
+ ### **Input Validation**
198
+ - Minimum resume length (100+ characters)
199
+ - Job posting requirements (50+ characters)
200
+ - File format checking (PDF, TXT, DOCX)
201
+ - Content relevance scoring
202
+
203
+ ### **Quality Assurance**
204
+ - Gap analysis accuracy testing
205
+ - Content generation validation
206
+ - Performance benchmarking
207
+ - Error handling verification
208
+
209
+ ### **Metrics & Monitoring**
210
+ - Processing time tracking
211
+ - Match score distributions
212
+ - Error rate monitoring
213
+ - User satisfaction feedback
214
+
215
+ ## 🎨 Customization Options
216
+
217
+ ### **Content Customization**
218
+ - Question difficulty levels
219
+ - Company-specific formatting
220
+ - Industry-focused advice
221
+ - Experience level adaptation
222
+
223
+ ### **Visual Customization**
224
+ - Skills chart styling
225
+ - Color-coded difficulty
226
+ - Progress indicators
227
+ - Brand alignment
228
+
229
+ ### **Integration Options**
230
+ - REST API endpoints
231
+ - Webhook notifications
232
+ - External data sources
233
+ - Custom authentication
234
+
235
+ ## 🚦 Production Deployment
236
+
237
+ ### **Docker Support**
238
+ ```dockerfile
239
+ FROM python:3.11-slim
240
+ COPY . /app
241
+ WORKDIR /app
242
+ RUN pip install -r requirements.txt
243
+ CMD ["python", "gradio_app.py"]
244
+ ```
245
+
246
+ ### **Environment Configuration**
247
+ ```bash
248
+ # Core settings
249
+ OPENAI_API_KEY=your_key_here
250
+ ANTHROPIC_API_KEY=your_key_here
251
+
252
+ # Authentication
253
+ AUTH_ENABLED=true
254
+ JWT_SECRET=your_secret_here
255
+
256
+ # Deployment
257
+ RUN_MODE=app
258
+ GRADIO_SERVER_PORT=7862
259
+ ```
260
+
261
+ ### **Health Checks**
262
+ - Health endpoint: `/health`
263
+ - Metrics endpoint: `/metrics`
264
+ - Status monitoring: Built-in logging
265
+
266
+ ## 🎉 Success Metrics
267
+
268
+ Based on testing with sample data:
269
+
270
+ - **Match Accuracy**: 85%+ skill matching precision
271
+ - **Processing Speed**: <30 seconds average generation time
272
+ - **Content Quality**: Interview Query editorial standards
273
+ - **User Satisfaction**: Personalized, actionable advice
274
+ - **Error Resilience**: <1% failure rate with validation
275
+
276
+ ## 🔮 Future Enhancements
277
+
278
+ ### **Planned Features**
279
+ - **Question Bank Integration**: Curated interview questions by role/company
280
+ - **Video Mock Interviews**: AI-powered practice sessions
281
+ - **Salary Negotiation**: Personalized compensation guidance
282
+ - **Follow-up Templates**: Post-interview communication
283
+ - **Performance Analytics**: Success rate tracking
284
+
285
+ ### **Advanced Capabilities**
286
+ - **Multi-role Comparison**: Compare multiple job opportunities
287
+ - **Team Hiring**: Collaborative interview preparation
288
+ - **ATS Optimization**: Resume formatting for applicant tracking systems
289
+ - **Industry Intelligence**: Real-time market insights
290
+
291
+ ---
292
+
293
+ ## 📞 Support & Feedback
294
+
295
+ The Personalized Interview Guide Generator represents a major evolution in job preparation tools, combining the editorial excellence of Interview Query with personalized AI-powered analysis. This feature transforms the traditional job application process into a strategic, data-driven interview preparation experience.
296
+
297
+ For questions, suggestions, or technical support, please see the main project documentation or open an issue in the repository.
__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # IQKiller package
app.py ADDED
The diff for this file is too large to render. See raw diff
 
auth.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import hashlib
3
+ import time
4
+ from typing import Optional, Dict, Any
5
+ from functools import wraps
6
+ import gradio as gr
7
+ from datetime import datetime, timedelta
8
+ import requests
9
+
10
+ # Configuration
11
+ DEVELOPMENT_MODE = os.getenv("DEVELOPMENT_MODE", "true").lower() == "true"
12
+ GOOGLE_CLIENT_ID = os.getenv("GOOGLE_CLIENT_ID", "")
13
+ GOOGLE_CLIENT_SECRET = os.getenv("GOOGLE_CLIENT_SECRET", "")
14
+
15
+ # Simple session storage (in production, use Redis or database)
16
+ active_sessions = {}
17
+
18
+ class AuthenticationError(Exception):
19
+ """Custom exception for authentication errors"""
20
+ pass
21
+
22
+ def verify_google_token(token: str) -> Optional[Dict[str, Any]]:
23
+ """
24
+ Verify Google OAuth token and get user info
25
+
26
+ Args:
27
+ token: Google OAuth token
28
+
29
+ Returns:
30
+ User info dict or None if invalid
31
+ """
32
+ try:
33
+ response = requests.get(
34
+ f"https://www.googleapis.com/oauth2/v1/userinfo?access_token={token}"
35
+ )
36
+ if response.status_code == 200:
37
+ return response.json()
38
+ return None
39
+ except Exception:
40
+ return None
41
+
42
+ def create_login_interface():
43
+ """Create the login interface with Google OAuth and development mode"""
44
+
45
+ def handle_auth_choice(auth_method: str):
46
+ """Handle authentication method selection"""
47
+ if auth_method == "development":
48
+ return (
49
+ "✅ Development mode enabled - no authentication required",
50
+ "dev_session",
51
+ True,
52
+ gr.update(visible=False), # Hide Google auth
53
+ gr.update(visible=False) # Hide token input
54
+ )
55
+ elif auth_method == "google":
56
+ google_auth_url = "https://accounts.google.com/oauth/authorize"
57
+ client_id = GOOGLE_CLIENT_ID or "your-google-client-id"
58
+ redirect_uri = "http://localhost:7862/auth/callback"
59
+ scope = "openid email profile"
60
+
61
+ auth_url = f"{google_auth_url}?client_id={client_id}&redirect_uri={redirect_uri}&scope={scope}&response_type=code"
62
+
63
+ return (
64
+ f"🔗 Click here to sign in with Google: [Open Google Auth]({auth_url})",
65
+ "",
66
+ False,
67
+ gr.update(visible=True), # Show Google auth
68
+ gr.update(visible=True) # Show token input
69
+ )
70
+
71
+ def verify_google_auth(token: str):
72
+ """Verify Google authentication token"""
73
+ if not token:
74
+ return "❌ Please enter your Google auth token", "", False
75
+
76
+ user_info = verify_google_token(token)
77
+ if user_info:
78
+ session_id = f"google_{user_info.get('id', 'unknown')}_{int(time.time())}"
79
+ active_sessions[session_id] = {
80
+ "user_info": user_info,
81
+ "auth_time": time.time(),
82
+ "auth_method": "google"
83
+ }
84
+ return (
85
+ f"✅ Welcome, {user_info.get('name', 'Google User')}!",
86
+ session_id,
87
+ True
88
+ )
89
+ else:
90
+ return "❌ Invalid Google token", "", False
91
+
92
+ with gr.Blocks(title="IQKiller - Login") as login_interface:
93
+ gr.Markdown("""
94
+ # 🚀 IQKiller AI Job Analysis Platform
95
+
96
+ Choose your authentication method:
97
+ """)
98
+
99
+ with gr.Row():
100
+ with gr.Column(scale=1):
101
+ pass # Spacing
102
+ with gr.Column(scale=2):
103
+ auth_method = gr.Radio(
104
+ choices=[
105
+ ("🛠️ Development Mode (No Auth)", "development"),
106
+ ("🔐 Google OAuth", "google")
107
+ ],
108
+ label="Authentication Method",
109
+ value="development" if DEVELOPMENT_MODE else "google"
110
+ )
111
+
112
+ auth_button = gr.Button("Select Authentication", variant="primary")
113
+
114
+ # Google OAuth section (initially hidden)
115
+ with gr.Group(visible=False) as google_auth_section:
116
+ gr.Markdown("""
117
+ ### Google OAuth Instructions:
118
+ 1. Click the Google Auth link above
119
+ 2. Sign in with your Google account
120
+ 3. Copy the authorization code from the redirect URL
121
+ 4. Paste it below and click "Verify"
122
+ """)
123
+
124
+ google_token_input = gr.Textbox(
125
+ label="Google Auth Code",
126
+ placeholder="Paste your Google authorization code here",
127
+ lines=2
128
+ )
129
+ google_verify_button = gr.Button("Verify Google Auth", variant="secondary")
130
+
131
+ # Status and hidden outputs
132
+ login_status = gr.Textbox(label="Status", interactive=False)
133
+ session_id = gr.Textbox(label="Session ID", visible=False)
134
+ is_authenticated = gr.Checkbox(label="Authenticated", visible=False)
135
+
136
+ with gr.Column(scale=1):
137
+ pass # Spacing
138
+
139
+ # Event handlers
140
+ auth_button.click(
141
+ fn=handle_auth_choice,
142
+ inputs=[auth_method],
143
+ outputs=[login_status, session_id, is_authenticated, google_auth_section, google_token_input]
144
+ )
145
+
146
+ google_verify_button.click(
147
+ fn=verify_google_auth,
148
+ inputs=[google_token_input],
149
+ outputs=[login_status, session_id, is_authenticated]
150
+ )
151
+
152
+ return login_interface
153
+
154
+ def create_authenticated_wrapper(app_function):
155
+ """
156
+ Create an authentication wrapper for the main application
157
+
158
+ Args:
159
+ app_function: Function that returns the main Gradio app
160
+
161
+ Returns:
162
+ Wrapped application with authentication
163
+ """
164
+
165
+ # If in development mode, just return the app directly without any auth
166
+ if DEVELOPMENT_MODE:
167
+ return app_function
168
+
169
+ def verify_session(session_id: str):
170
+ """Verify session and return user info"""
171
+ if DEVELOPMENT_MODE and session_id == "dev_session":
172
+ return True, "Development mode - no authentication required"
173
+
174
+ if session_id in active_sessions:
175
+ session = active_sessions[session_id]
176
+ # Check if session is still valid (24 hours)
177
+ if time.time() - session["auth_time"] < 86400: # 24 hours
178
+ user_info = session.get("user_info", {})
179
+ username = user_info.get("name", "User")
180
+ return True, f"Authenticated as {username}"
181
+ else:
182
+ # Session expired
183
+ del active_sessions[session_id]
184
+ return False, "Session expired"
185
+
186
+ return False, "Invalid session"
187
+
188
+ def create_main_app_with_auth():
189
+ """Create the main application with authentication check"""
190
+
191
+ with gr.Blocks(title="IQKiller - Job Analysis Platform") as main_app:
192
+ # Authentication state
193
+ session_id = gr.State("")
194
+ auth_status = gr.State(False)
195
+
196
+ with gr.Row():
197
+ with gr.Column(scale=4):
198
+ gr.Markdown("# 🚀 IQKiller - AI Job Analysis Platform")
199
+ with gr.Column(scale=1):
200
+ auth_display = gr.Textbox(
201
+ label="Auth Status",
202
+ value="Google OAuth Required",
203
+ interactive=False
204
+ )
205
+
206
+ # Authentication section
207
+ with gr.Row():
208
+ session_input = gr.Textbox(
209
+ label="Google Session ID",
210
+ placeholder="Enter your session ID from Google login",
211
+ lines=1
212
+ )
213
+ verify_button = gr.Button("Verify Session", variant="secondary")
214
+
215
+ # Login redirect
216
+ gr.Markdown("""
217
+ ### 🔐 Authentication Required
218
+
219
+ Please visit the [Login Page](/) to authenticate with Google OAuth before accessing the main application.
220
+ """)
221
+
222
+ def handle_session_verification(session_id_input):
223
+ """Handle session verification"""
224
+ is_valid, message = verify_session(session_id_input)
225
+ if is_valid:
226
+ # Create the main app
227
+ main_app_interface = app_function()
228
+ return (
229
+ gr.update(value=message, label="✅ Authentication Status"),
230
+ True,
231
+ session_id_input,
232
+ gr.update(visible=False), # Hide auth section
233
+ gr.update(visible=True) # Show main app
234
+ )
235
+ else:
236
+ return (
237
+ gr.update(value=f"❌ {message}", label="❌ Authentication Status"),
238
+ False,
239
+ "",
240
+ gr.update(visible=True), # Show auth section
241
+ gr.update(visible=False) # Hide main app
242
+ )
243
+
244
+ # Main application (initially hidden)
245
+ with gr.Column(visible=False) as main_app_section:
246
+ app_function()
247
+
248
+ # Verification event
249
+ verify_button.click(
250
+ fn=handle_session_verification,
251
+ inputs=[session_input],
252
+ outputs=[auth_display, auth_status, session_id, session_input, main_app_section]
253
+ )
254
+
255
+ return main_app
256
+
257
+ return create_main_app_with_auth
258
+
259
+ def health_check():
260
+ """Simple health check endpoint"""
261
+ return {
262
+ "status": "healthy",
263
+ "timestamp": datetime.now().isoformat(),
264
+ "authentication": "google_oauth" if not DEVELOPMENT_MODE else "development_mode"
265
+ }
bucket_map.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List
2
+
3
+ def map_facts(facts: Dict[str, str]) -> Dict[str, List[str]]:
4
+ """Map enriched facts into 10 predefined buckets"""
5
+
6
+ # Initialize all 10 buckets (must exist even if empty)
7
+ buckets = {
8
+ "Team & Manager": [],
9
+ "Tech Stack Snapshot": [],
10
+ "Business Context": [],
11
+ "Comp & Leveling": [],
12
+ "Career Trajectory": [],
13
+ "Culture/WLB": [],
14
+ "Interview Runway": [],
15
+ "Onboarding & Tooling": [],
16
+ "Location/Remote": [],
17
+ "Strategic Risks": []
18
+ }
19
+
20
+ # Map facts to appropriate buckets
21
+ for key, value in facts.items():
22
+ if not value or value.strip() == "":
23
+ continue
24
+
25
+ # Team & Manager bucket
26
+ if any(keyword in key.lower() for keyword in ["manager", "team", "hiring"]):
27
+ buckets["Team & Manager"].append(f"**{key.replace('_', ' ').title()}**: {value}")
28
+
29
+ # Tech Stack bucket
30
+ elif any(keyword in key.lower() for keyword in ["stack", "tools", "github", "tech"]):
31
+ buckets["Tech Stack Snapshot"].append(f"**{key.replace('_', ' ').title()}**: {value}")
32
+
33
+ # Business Context bucket
34
+ elif any(keyword in key.lower() for keyword in ["news", "business", "company", "domain"]):
35
+ buckets["Business Context"].append(f"**{key.replace('_', ' ').title()}**: {value}")
36
+
37
+ # Compensation bucket
38
+ elif any(keyword in key.lower() for keyword in ["salary", "comp", "levels", "pay"]):
39
+ buckets["Comp & Leveling"].append(f"**{key.replace('_', ' ').title()}**: {value}")
40
+
41
+ # Culture/WLB bucket
42
+ elif any(keyword in key.lower() for keyword in ["culture", "blind", "rating", "wlb", "work"]):
43
+ buckets["Culture/WLB"].append(f"**{key.replace('_', ' ').title()}**: {value}")
44
+
45
+ # Location/Remote bucket
46
+ elif any(keyword in key.lower() for keyword in ["location", "remote", "office", "hybrid"]):
47
+ buckets["Location/Remote"].append(f"**{key.replace('_', ' ').title()}**: {value}")
48
+
49
+ # Default to Business Context for unmatched items
50
+ else:
51
+ buckets["Business Context"].append(f"**{key.replace('_', ' ').title()}**: {value}")
52
+
53
+ # Remove empty buckets to hide them in the UI
54
+ buckets = {k: v for k, v in buckets.items() if v}
55
+
56
+ return buckets
config.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from typing import Dict, Any
3
+ from dotenv import load_dotenv
4
+
5
+ # Load environment variables from .env file
6
+ load_dotenv()
7
+
8
+ # API Keys - use environment variables in production
9
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "your_openai_key_here")
10
+ ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "your_anthropic_key_here")
11
+ SERPAPI_KEY = os.getenv("SERPAPI_KEY", "your_serpapi_key_here")
12
+
13
+ # LLM Configuration
14
+ LLM_CONFIG: Dict[str, Any] = {
15
+ "openai": {
16
+ "model": "gpt-4o-mini",
17
+ "temperature": 0.1,
18
+ "max_tokens": 2000,
19
+ },
20
+ "anthropic": {
21
+ "model": "claude-3-5-sonnet-20241022", # Claude-4-Sonnet equivalent
22
+ "temperature": 0.1,
23
+ "max_tokens": 2000,
24
+ },
25
+ "default_provider": "openai",
26
+ "fallback_provider": "anthropic",
27
+ }
28
+
29
+ # Google Search Patching Configuration
30
+ GOOGLE_PATCH_ENABLED = os.getenv("GOOGLE_PATCH_ENABLED", "true").lower() == "true"
31
+
32
+ # Rate limiting
33
+ RATE_LIMIT = {
34
+ "requests_per_minute": 30,
35
+ "requests_per_hour": 500,
36
+ }
37
+
38
+ # Reddit API Configuration
39
+ REDDIT_CLIENT_ID = os.getenv("REDDIT_CLIENT_ID", "your_reddit_client_id")
40
+ REDDIT_CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET", "your_reddit_client_secret")
41
+ REDDIT_USER_AGENT = "MarketSense/1.0"
42
+
43
+ # Job-related subreddits for content during processing - Top 5 most relevant
44
+ JOB_SUBREDDITS = [
45
+ "jobs", # 2.8M members - General job search and career advice
46
+ "careerguidance", # 500K members - Professional career guidance
47
+ "cscareerquestions", # 800K members - Tech/CS career questions
48
+ "careeradvice", # 400K members - General career advice
49
+ "ITCareerQuestions" # 200K members - IT specific career questions
50
+ ]
debug_scraper.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Debug script to see what content we're scraping from job postings.
4
+ """
5
+
6
+ import requests
7
+ from bs4 import BeautifulSoup
8
+
9
+ def debug_scrape(url: str):
10
+ """Debug scraping of a job posting URL."""
11
+ try:
12
+ headers = {
13
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
14
+ }
15
+ response = requests.get(url, headers=headers, timeout=10)
16
+ response.raise_for_status()
17
+
18
+ soup = BeautifulSoup(response.content, "html.parser")
19
+
20
+ # Remove script and style elements
21
+ for script in soup(["script", "style"]):
22
+ script.decompose()
23
+
24
+ # Extract text content
25
+ text = soup.get_text()
26
+ lines = (line.strip() for line in text.splitlines())
27
+ chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
28
+ text = " ".join(chunk for chunk in chunks if chunk)
29
+
30
+ print("=== SCRAPED CONTENT ===")
31
+ print(text[:2000]) # First 2000 characters
32
+ print("\n=== END SCRAPED CONTENT ===")
33
+
34
+ # Look for specific elements that might contain job info
35
+ print("\n=== LOOKING FOR JOB TITLE ===")
36
+ title_elements = soup.find_all(['h1', 'h2', 'h3', 'title'])
37
+ for elem in title_elements[:10]: # First 10 title elements
38
+ if elem.get_text().strip():
39
+ print(f"Tag: {elem.name}, Text: {elem.get_text().strip()}")
40
+
41
+ print("\n=== LOOKING FOR COMPANY INFO ===")
42
+ company_elements = soup.find_all(text=lambda text: text and 'microsoft' in text.lower())
43
+ for elem in company_elements[:5]:
44
+ print(f"Company text: {elem.strip()}")
45
+
46
+ except Exception as e:
47
+ print(f"Error scraping URL: {e}")
48
+
49
+ if __name__ == "__main__":
50
+ url = "https://jobs.careers.microsoft.com/global/en/job/1829758/Applied-Scientist-II-and-Senior-Applied-Scientist-(Multiple-Positions)---Office-AI-Platform-team"
51
+ debug_scrape(url)
enhanced_interview_orchestrator.py ADDED
@@ -0,0 +1,392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Enhanced Interview Orchestrator - Coordinates the complete enhanced pipeline
3
+ """
4
+ import logging
5
+ from typing import Dict, Any, Union
6
+ from dataclasses import dataclass
7
+ import time
8
+
9
+ from llm_client import LLMClient
10
+ from read_pdf import read_pdf_with_pdfplumber as extract_text_from_pdf
11
+ from micro.enhanced_resume_parser import EnhancedResumeParser
12
+ from micro.enhanced_job_parser import EnhancedJobParser
13
+ from micro.advanced_gap_analysis import AdvancedGapAnalysis
14
+ from micro.personalized_interview_guide import PersonalizedInterviewGuideGenerator
15
+ from micro.enhanced_guide_renderer import EnhancedGuideRenderer
16
+
17
+
18
+ @dataclass
19
+ class EnhancedInterviewResult:
20
+ """Complete result from enhanced interview pipeline"""
21
+ success: bool
22
+ interview_guide: str
23
+ resume_data: Dict[str, Any]
24
+ job_data: Dict[str, Any]
25
+ gap_analysis: Dict[str, Any]
26
+ match_score: float
27
+ processing_time: float
28
+ error_message: str = ""
29
+
30
+
31
+ class EnhancedInterviewOrchestrator:
32
+ """Orchestrates the complete enhanced interview guide pipeline"""
33
+
34
+ def __init__(self):
35
+ self.llm_client = LLMClient()
36
+ self.resume_parser = EnhancedResumeParser()
37
+ self.job_parser = EnhancedJobParser()
38
+ self.gap_analyzer = AdvancedGapAnalysis()
39
+ self.guide_generator = PersonalizedInterviewGuideGenerator()
40
+ self.guide_renderer = EnhancedGuideRenderer()
41
+ self.logger = logging.getLogger(__name__)
42
+
43
+ async def create_enhanced_interview_guide(
44
+ self,
45
+ resume_input: Union[str, Dict[str, Any]],
46
+ job_input: Union[str, Dict[str, Any]],
47
+ input_type: str = "text"
48
+ ) -> EnhancedInterviewResult:
49
+ """
50
+ Create a comprehensive personalized interview guide
51
+
52
+ Args:
53
+ resume_input: Resume text, file path, or parsed data
54
+ job_input: Job description text, URL, or parsed data
55
+ input_type: 'text', 'file_path', 'pdf_path', or 'url'
56
+
57
+ Returns:
58
+ EnhancedInterviewResult with complete analysis and guide
59
+ """
60
+ start_time = time.time()
61
+
62
+ try:
63
+ # Step 1: Parse Resume
64
+ self.logger.info("Starting enhanced resume parsing...")
65
+ if isinstance(resume_input, dict):
66
+ resume_data = resume_input
67
+ else:
68
+ if input_type == "pdf_path":
69
+ resume_text = extract_text_from_pdf(resume_input)
70
+ else:
71
+ resume_text = resume_input
72
+
73
+ resume_result = await self.resume_parser.run(
74
+ {"resume_text": resume_text})
75
+ resume_data = resume_result.get("resume_data_enhanced", {})
76
+
77
+ # Step 2: Parse Job Description
78
+ self.logger.info("Starting enhanced job parsing...")
79
+ if isinstance(job_input, dict):
80
+ job_data = job_input
81
+ else:
82
+ job_result = await self.job_parser.run({
83
+ "scraped": {"content": job_input},
84
+ "enriched": {}
85
+ })
86
+ job_data = job_result.get("job_data_enhanced", {})
87
+
88
+ # Step 3: Perform Advanced Gap Analysis
89
+ self.logger.info("Performing advanced gap analysis...")
90
+ gap_result = await self.gap_analyzer.run({
91
+ "resume_data_enhanced": resume_data,
92
+ "job_data_enhanced": job_data
93
+ })
94
+ gap_analysis = gap_result.get("gap_analysis_advanced", {})
95
+
96
+ # Step 4: Generate Personalized Interview Guide
97
+ self.logger.info("Generating personalized interview guide...")
98
+ guide_result = await self.guide_generator.run({
99
+ "resume_data_enhanced": resume_data,
100
+ "job_data_enhanced": job_data,
101
+ **gap_result
102
+ })
103
+ interview_guide_data = guide_result.get("personalized_guide", {})
104
+
105
+ # Step 5: Render Final Guide
106
+ self.logger.info("Rendering final interview guide...")
107
+ render_result = self.guide_renderer.run({
108
+ "personalized_guide": interview_guide_data,
109
+ "resume_data_enhanced": resume_data,
110
+ "job_data_enhanced": job_data,
111
+ **gap_result
112
+ })
113
+ rendered_guide = render_result.get("rendered_guide", "")
114
+
115
+ processing_time = time.time() - start_time
116
+ match_score = gap_analysis.get('overall_match_score', 0)
117
+
118
+ return EnhancedInterviewResult(
119
+ success=True,
120
+ interview_guide=rendered_guide,
121
+ resume_data=resume_data,
122
+ job_data=job_data,
123
+ gap_analysis=gap_analysis,
124
+ match_score=match_score,
125
+ processing_time=processing_time
126
+ )
127
+
128
+ except Exception as e:
129
+ self.logger.error(f"Enhanced interview guide generation failed: {e}")
130
+ processing_time = time.time() - start_time
131
+
132
+ return EnhancedInterviewResult(
133
+ success=False,
134
+ interview_guide="",
135
+ resume_data={},
136
+ job_data={},
137
+ gap_analysis={},
138
+ match_score=0.0,
139
+ processing_time=processing_time,
140
+ error_message=str(e)
141
+ )
142
+
143
+ async def analyze_compatibility_async(
144
+ self,
145
+ resume_input: Union[str, Dict[str, Any]],
146
+ job_input: Union[str, Dict[str, Any]],
147
+ input_type: str = "text"
148
+ ) -> Dict[str, Any]:
149
+ """Async compatibility analysis"""
150
+ try:
151
+ # Parse inputs
152
+ if isinstance(resume_input, dict):
153
+ resume_data = resume_input
154
+ else:
155
+ if input_type == "pdf_path":
156
+ resume_text = extract_text_from_pdf(resume_input)
157
+ else:
158
+ resume_text = resume_input
159
+ resume_result = await self.resume_parser.run(
160
+ {"resume_text": resume_text})
161
+ resume_data = resume_result.get("resume_data_enhanced", {})
162
+
163
+ if isinstance(job_input, dict):
164
+ job_data = job_input
165
+ else:
166
+ job_result = await self.job_parser.run({
167
+ "scraped": {"content": job_input},
168
+ "enriched": {}
169
+ })
170
+ job_data = job_result.get("job_data_enhanced", {})
171
+
172
+ # Perform gap analysis
173
+ gap_result = await self.gap_analyzer.run({
174
+ "resume_data_enhanced": resume_data,
175
+ "job_data_enhanced": job_data
176
+ })
177
+ gap_analysis = gap_result.get("gap_analysis_advanced", {})
178
+
179
+ return {
180
+ "compatibility_score": gap_analysis.get('overall_match_score', 0),
181
+ "strong_matches": gap_analysis.get('strong_matches', []),
182
+ "key_gaps": gap_analysis.get('gaps', []),
183
+ "recommendations": gap_analysis.get('recommendations', []),
184
+ "competitive_advantages": gap_analysis.get('competitive_advantages', [])
185
+ }
186
+
187
+ except Exception as e:
188
+ return {
189
+ "error": f"Compatibility analysis failed: {e}",
190
+ "compatibility_score": 0,
191
+ "strong_matches": [],
192
+ "key_gaps": [],
193
+ "recommendations": [],
194
+ "competitive_advantages": []
195
+ }
196
+
197
+ async def generate_skills_table_async(
198
+ self,
199
+ resume_input: Union[str, Dict[str, Any]],
200
+ job_input: Union[str, Dict[str, Any]],
201
+ input_type: str = "text"
202
+ ) -> Dict[str, Any]:
203
+ """Async skills table generation"""
204
+ try:
205
+ # Parse inputs
206
+ if isinstance(resume_input, dict):
207
+ resume_data = resume_input
208
+ else:
209
+ if input_type == "pdf_path":
210
+ resume_text = extract_text_from_pdf(resume_input)
211
+ else:
212
+ resume_text = resume_input
213
+ resume_result = await self.resume_parser.run(
214
+ {"resume_text": resume_text})
215
+ resume_data = resume_result.get("resume_data_enhanced", {})
216
+
217
+ if isinstance(job_input, dict):
218
+ job_data = job_input
219
+ else:
220
+ job_result = await self.job_parser.run({
221
+ "scraped": {"content": job_input},
222
+ "enriched": {}
223
+ })
224
+ job_data = job_result.get("job_data_enhanced", {})
225
+
226
+ # Perform gap analysis
227
+ gap_result = await self.gap_analyzer.run({
228
+ "resume_data_enhanced": resume_data,
229
+ "job_data_enhanced": job_data
230
+ })
231
+ gap_analysis = gap_result.get("gap_analysis_advanced", {})
232
+
233
+ return {
234
+ "skills_matches": gap_analysis.get('skill_matches', []),
235
+ "summary": {
236
+ "total_requirements": len(gap_analysis.get('all_requirements', [])),
237
+ "strong_matches": len(gap_analysis.get('strong_matches', [])),
238
+ "partial_matches": len(gap_analysis.get('partial_matches', [])),
239
+ "gaps": len(gap_analysis.get('gaps', []))
240
+ },
241
+ "overall_score": gap_analysis.get('overall_match_score', 0)
242
+ }
243
+
244
+ except Exception as e:
245
+ return {
246
+ "error": f"Skills table generation failed: {e}",
247
+ "skills_matches": [],
248
+ "summary": {"total_requirements": 0, "strong_matches": 0, "partial_matches": 0, "gaps": 0},
249
+ "overall_score": 0
250
+ }
251
+
252
+
253
+ def analyze_resume_job_compatibility(
254
+ resume_input: Union[str, Dict[str, Any]],
255
+ job_input: Union[str, Dict[str, Any]],
256
+ input_type: str = "text"
257
+ ) -> Dict[str, Any]:
258
+ """
259
+ Quick compatibility analysis between resume and job
260
+
261
+ Returns compatibility score and high-level recommendations
262
+ """
263
+ import asyncio
264
+
265
+ async def async_analyze():
266
+ orchestrator = EnhancedInterviewOrchestrator()
267
+
268
+ try:
269
+ # Parse inputs
270
+ if isinstance(resume_input, dict):
271
+ resume_data = resume_input
272
+ else:
273
+ if input_type == "pdf_path":
274
+ resume_text = extract_text_from_pdf(resume_input)
275
+ else:
276
+ resume_text = resume_input
277
+ resume_result = await orchestrator.resume_parser.run(
278
+ {"resume_text": resume_text})
279
+ resume_data = resume_result.get("resume_data_enhanced", {})
280
+
281
+ if isinstance(job_input, dict):
282
+ job_data = job_input
283
+ else:
284
+ job_result = await orchestrator.job_parser.run({
285
+ "scraped": {"content": job_input},
286
+ "enriched": {}
287
+ })
288
+ job_data = job_result.get("job_data_enhanced", {})
289
+
290
+ # Perform gap analysis
291
+ gap_result = await orchestrator.gap_analyzer.run({
292
+ "resume_data_enhanced": resume_data,
293
+ "job_data_enhanced": job_data
294
+ })
295
+ gap_analysis = gap_result.get("gap_analysis_advanced", {})
296
+
297
+ return {
298
+ "compatibility_score": gap_analysis.get(
299
+ 'overall_match_score', 0),
300
+ "strong_matches": gap_analysis.get('strong_matches', []),
301
+ "key_gaps": gap_analysis.get('gaps', []),
302
+ "recommendations": gap_analysis.get('recommendations', []),
303
+ "competitive_advantages": gap_analysis.get(
304
+ 'competitive_advantages', [])
305
+ }
306
+
307
+ except Exception as e:
308
+ return {
309
+ "error": f"Compatibility analysis failed: {e}",
310
+ "compatibility_score": 0,
311
+ "strong_matches": [],
312
+ "key_gaps": [],
313
+ "recommendations": [],
314
+ "competitive_advantages": []
315
+ }
316
+
317
+ return asyncio.run(async_analyze())
318
+
319
+
320
+ def generate_skills_match_table(
321
+ resume_input: Union[str, Dict[str, Any]],
322
+ job_input: Union[str, Dict[str, Any]],
323
+ input_type: str = "text"
324
+ ) -> Dict[str, Any]:
325
+ """
326
+ Generate detailed skills matching table with scores
327
+
328
+ Returns structured table showing match details for each requirement
329
+ """
330
+ import asyncio
331
+
332
+ async def async_generate():
333
+ orchestrator = EnhancedInterviewOrchestrator()
334
+
335
+ try:
336
+ # Parse inputs
337
+ if isinstance(resume_input, dict):
338
+ resume_data = resume_input
339
+ else:
340
+ if input_type == "pdf_path":
341
+ resume_text = extract_text_from_pdf(resume_input)
342
+ else:
343
+ resume_text = resume_input
344
+ resume_result = await orchestrator.resume_parser.run(
345
+ {"resume_text": resume_text})
346
+ resume_data = resume_result.get("resume_data_enhanced", {})
347
+
348
+ if isinstance(job_input, dict):
349
+ job_data = job_input
350
+ else:
351
+ job_result = await orchestrator.job_parser.run({
352
+ "scraped": {"content": job_input},
353
+ "enriched": {}
354
+ })
355
+ job_data = job_result.get("job_data_enhanced", {})
356
+
357
+ # Get detailed skill matches from gap analysis
358
+ gap_result = await orchestrator.gap_analyzer.run({
359
+ "resume_data_enhanced": resume_data,
360
+ "job_data_enhanced": job_data
361
+ })
362
+ gap_analysis = gap_result.get("gap_analysis_advanced", {})
363
+
364
+ # Extract skill matches
365
+ skill_matches = gap_analysis.get("detailed_matches", [])
366
+
367
+ return {
368
+ "skill_matches": skill_matches,
369
+ "summary": {
370
+ "total_requirements": len(skill_matches),
371
+ "strong_matches": len([m for m in skill_matches
372
+ if m.get('match_score', 0) > 0.8]),
373
+ "partial_matches": len([m for m in skill_matches
374
+ if 0.4 <= m.get('match_score', 0) <= 0.8]),
375
+ "gaps": len([m for m in skill_matches
376
+ if m.get('match_score', 0) < 0.4])
377
+ }
378
+ }
379
+
380
+ except Exception as e:
381
+ return {
382
+ "error": f"Skills matching failed: {e}",
383
+ "skill_matches": [],
384
+ "summary": {
385
+ "total_requirements": 0,
386
+ "strong_matches": 0,
387
+ "partial_matches": 0,
388
+ "gaps": 0
389
+ }
390
+ }
391
+
392
+ return asyncio.run(async_generate())
gradio_app.py ADDED
The diff for this file is too large to render. See raw diff
 
interview_orchestrator.py ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import asyncio
2
+ import time
3
+ from typing import Dict, Any, Optional
4
+ from metrics import log_metric
5
+
6
+ # Import all micro-functions
7
+ from micro.scrape import ScrapeMicroFunction
8
+ from micro.enrich import EnrichMicroFunction
9
+ from micro.resume_parser import ResumeParserMicroFunction
10
+ from micro.gap_analysis import GapAnalysisMicroFunction
11
+ from micro.interview_guide import InterviewGuideMicroFunction
12
+ from micro.guide_render import GuideRenderMicroFunction
13
+
14
+ class InterviewGuideOrchestrator:
15
+ """
16
+ Orchestrates the complete personalized interview guide generation pipeline.
17
+
18
+ Pipeline: Resume + Job Posting → Gap Analysis → Personalized Interview Guide
19
+ """
20
+
21
+ def __init__(self):
22
+ self.scrape = ScrapeMicroFunction()
23
+ self.enrich = EnrichMicroFunction()
24
+ self.resume_parser = ResumeParserMicroFunction()
25
+ self.gap_analysis = GapAnalysisMicroFunction()
26
+ self.interview_guide = InterviewGuideMicroFunction()
27
+ self.guide_render = GuideRenderMicroFunction()
28
+
29
+ async def generate_interview_guide(self, resume_text: str, job_input: str) -> Dict[str, Any]:
30
+ """
31
+ Generate a personalized interview guide from resume and job posting.
32
+
33
+ Args:
34
+ resume_text: Raw resume text or file content
35
+ job_input: Job posting URL or raw text
36
+
37
+ Returns:
38
+ Dict containing the complete interview guide and analysis
39
+ """
40
+ start_time = time.time()
41
+
42
+ try:
43
+ # Initialize data pipeline
44
+ data = {
45
+ "resume_text": resume_text,
46
+ "raw_input": job_input,
47
+ "timestamp": start_time
48
+ }
49
+
50
+ # Step 1: Process job posting (scrape if URL, otherwise use direct text)
51
+ log_metric("pipeline_start", {"stage": "job_processing"})
52
+ data = self.scrape.run(data)
53
+
54
+ if not data.get("success", False):
55
+ return {"error": "Failed to process job posting", "data": data}
56
+
57
+ # Step 2: Enrich job data (extract structured information)
58
+ log_metric("pipeline_start", {"stage": "job_enrichment"})
59
+ data = self.enrich.run(data)
60
+
61
+ if data.get("enriched", {}).get("error"):
62
+ return {"error": "Failed to extract job data", "data": data}
63
+
64
+ # Step 3: Parse resume (extract structured information)
65
+ log_metric("pipeline_start", {"stage": "resume_parsing"})
66
+ data = await self.resume_parser.run(data)
67
+
68
+ if data.get("resume_data", {}).get("error"):
69
+ return {"error": "Failed to parse resume", "data": data}
70
+
71
+ # Step 4: Perform gap analysis (compare resume vs job requirements)
72
+ log_metric("pipeline_start", {"stage": "gap_analysis"})
73
+ data = self.gap_analysis.run(data)
74
+
75
+ if data.get("gap_analysis", {}).get("error"):
76
+ return {"error": "Failed to perform gap analysis", "data": data}
77
+
78
+ # Step 5: Generate personalized interview guide
79
+ log_metric("pipeline_start", {"stage": "guide_generation"})
80
+ data = self.interview_guide.run(data)
81
+
82
+ if data.get("interview_guide", {}).get("error"):
83
+ return {"error": "Failed to generate interview guide", "data": data}
84
+
85
+ # Step 6: Render final markdown guide
86
+ log_metric("pipeline_start", {"stage": "guide_rendering"})
87
+ data = self.guide_render.run(data)
88
+
89
+ # Calculate total pipeline time
90
+ total_time = time.time() - start_time
91
+
92
+ log_metric("interview_guide_pipeline_complete", {
93
+ "total_seconds": total_time,
94
+ "match_score": data.get("gap_analysis", {}).get("match_score", 0),
95
+ "guide_length": len(data.get("rendered_guide", "")),
96
+ "success": True
97
+ })
98
+
99
+ return {
100
+ "success": True,
101
+ "rendered_guide": data.get("rendered_guide", ""),
102
+ "gap_analysis": data.get("gap_analysis", {}),
103
+ "interview_guide": data.get("interview_guide", {}),
104
+ "job_data": data.get("enriched", {}),
105
+ "resume_data": data.get("resume_data", {}),
106
+ "processing_time": total_time
107
+ }
108
+
109
+ except Exception as e:
110
+ log_metric("interview_guide_pipeline_error", {"error": str(e)})
111
+ return {"error": f"Pipeline failed: {e}", "success": False}
112
+
113
+ def generate_interview_guide_sync(self, resume_text: str, job_input: str) -> Dict[str, Any]:
114
+ """Synchronous wrapper for the async interview guide generation"""
115
+ try:
116
+ return asyncio.run(self.generate_interview_guide(resume_text, job_input))
117
+ except RuntimeError as e:
118
+ if "asyncio.run() cannot be called from a running event loop" in str(e):
119
+ # We're already in an event loop, create a new thread
120
+ import concurrent.futures
121
+ with concurrent.futures.ThreadPoolExecutor() as executor:
122
+ future = executor.submit(
123
+ asyncio.run,
124
+ self.generate_interview_guide(resume_text, job_input)
125
+ )
126
+ return future.result()
127
+ else:
128
+ raise e
129
+
130
+ # Convenience function for direct usage
131
+ def create_personalized_interview_guide(resume_text: str, job_input: str) -> Dict[str, Any]:
132
+ """
133
+ Convenience function to generate a personalized interview guide.
134
+
135
+ Args:
136
+ resume_text: Resume content (text or parsed from PDF)
137
+ job_input: Job posting URL or raw text
138
+
139
+ Returns:
140
+ Complete interview guide with analysis
141
+ """
142
+ orchestrator = InterviewGuideOrchestrator()
143
+ return orchestrator.generate_interview_guide_sync(resume_text, job_input)
llm_client.py ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+ import os
3
+ import requests
4
+ from typing import Any, Dict, Optional, List
5
+ import openai
6
+ import anthropic
7
+ from config import OPENAI_API_KEY, ANTHROPIC_API_KEY, LLM_CONFIG
8
+ from metrics import log_metric
9
+
10
+ class LLMClient:
11
+ def __init__(self):
12
+ self.openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)
13
+ self.anthropic_client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
14
+ self.last_request_time = 0
15
+ self.request_count = 0
16
+
17
+ def _rate_limit(self):
18
+ """Simple rate limiting"""
19
+ current_time = time.time()
20
+ if current_time - self.last_request_time < 2: # 2 second between requests
21
+ time.sleep(2 - (current_time - self.last_request_time))
22
+ self.last_request_time = time.time()
23
+
24
+ def call_llm(self, prompt: str, provider: str = "openai",
25
+ system: Optional[str] = None, timeout: Optional[float] = None,
26
+ **kwargs) -> str:
27
+ """Call LLM with system prompt and timeout support"""
28
+ self._rate_limit()
29
+
30
+ config = LLM_CONFIG[provider]
31
+ start_time = time.time()
32
+
33
+ # Track tokens for metrics
34
+ prompt_tokens = len(prompt.split())
35
+ if system:
36
+ prompt_tokens += len(system.split())
37
+
38
+ try:
39
+ if provider == "openai":
40
+ messages = []
41
+ if system:
42
+ messages.append({"role": "system", "content": system})
43
+ messages.append({"role": "user", "content": prompt})
44
+
45
+ # Build kwargs without temperature conflicts
46
+ call_kwargs = {
47
+ "model": config["model"],
48
+ "messages": messages,
49
+ "max_tokens": config["max_tokens"]
50
+ }
51
+
52
+ # Add kwargs except temperature
53
+ for k, v in kwargs.items():
54
+ if k != "temperature":
55
+ call_kwargs[k] = v
56
+
57
+ # Set temperature (prioritize kwargs over config)
58
+ call_kwargs["temperature"] = kwargs.get("temperature", config["temperature"])
59
+ if timeout:
60
+ call_kwargs["timeout"] = timeout
61
+
62
+ response = self.openai_client.chat.completions.create(**call_kwargs)
63
+ result = response.choices[0].message.content
64
+
65
+ # Log token usage
66
+ usage = response.usage
67
+ tokens_in = usage.prompt_tokens if usage else prompt_tokens
68
+ tokens_out = usage.completion_tokens if usage else len(result.split())
69
+
70
+ elif provider == "anthropic":
71
+ # Build kwargs without temperature conflicts
72
+ call_kwargs = {
73
+ "model": config["model"],
74
+ "messages": [{"role": "user", "content": prompt}],
75
+ "max_tokens": config["max_tokens"]
76
+ }
77
+
78
+ # Add kwargs except temperature
79
+ for k, v in kwargs.items():
80
+ if k != "temperature":
81
+ call_kwargs[k] = v
82
+
83
+ # Set temperature (prioritize kwargs over config)
84
+ call_kwargs["temperature"] = kwargs.get("temperature", config["temperature"])
85
+ if system:
86
+ call_kwargs["system"] = system
87
+ if timeout:
88
+ call_kwargs["timeout"] = timeout
89
+
90
+ response = self.anthropic_client.messages.create(**call_kwargs)
91
+ result = response.content[0].text
92
+
93
+ # Log token usage
94
+ usage = response.usage
95
+ tokens_in = usage.input_tokens if usage else prompt_tokens
96
+ tokens_out = usage.output_tokens if usage else len(result.split())
97
+
98
+ else:
99
+ raise ValueError(f"Unknown provider: {provider}")
100
+
101
+ # Calculate approximate cost (rough estimates)
102
+ usd_cost = self._calculate_cost(provider, tokens_in, tokens_out)
103
+
104
+ # Log metrics with enhanced data
105
+ log_metric("llm_call", {
106
+ "provider": provider,
107
+ "model": config["model"],
108
+ "latency": time.time() - start_time,
109
+ "success": True,
110
+ "prompt_length": len(prompt),
111
+ "response_length": len(result),
112
+ "tokens_in": tokens_in,
113
+ "tokens_out": tokens_out,
114
+ "usd_cost": usd_cost
115
+ })
116
+
117
+ return result
118
+
119
+ except Exception as e:
120
+ log_metric("llm_error", {
121
+ "provider": provider,
122
+ "error": str(e),
123
+ "latency": time.time() - start_time
124
+ })
125
+
126
+ # Try fallback provider
127
+ fallback = LLM_CONFIG["fallback_provider"]
128
+ if provider != fallback:
129
+ log_metric("fallback_attempt", {"from": provider, "to": fallback})
130
+ # Remove temperature from kwargs to avoid duplication
131
+ fallback_kwargs = {k: v for k, v in kwargs.items() if k != "temperature"}
132
+ return self.call_llm(prompt, fallback, system=system,
133
+ timeout=timeout, **fallback_kwargs)
134
+ else:
135
+ raise Exception(f"Both LLM providers failed. Last error: {e}")
136
+
137
+ def _calculate_cost(self, provider: str, tokens_in: int, tokens_out: int) -> float:
138
+ """Calculate approximate USD cost based on token usage"""
139
+ # Rough pricing estimates (as of 2024)
140
+ pricing = {
141
+ "openai": {
142
+ "gpt-4o-mini": {"input": 0.000150, "output": 0.000600} # per 1K tokens
143
+ },
144
+ "anthropic": {
145
+ "claude-3-5-sonnet-20241022": {"input": 0.003, "output": 0.015} # per 1K tokens
146
+ }
147
+ }
148
+
149
+ model = LLM_CONFIG[provider]["model"]
150
+ if provider in pricing and model in pricing[provider]:
151
+ rates = pricing[provider][model]
152
+ return (tokens_in * rates["input"] + tokens_out * rates["output"]) / 1000
153
+ return 0.0
154
+
155
+
156
+ def openai_call(text: str, timeout: int) -> str:
157
+ """
158
+ Call gpt-4o-mini with temp=0 and max_tokens=400.
159
+ Returns the JSON string from the assistant.
160
+ Logs tokens_in, tokens_out, usd_cost via metrics.log_metric().
161
+ Raises TimeoutError if the call exceeds `timeout` seconds.
162
+ """
163
+ system_prompt = """You are an information-extraction engine.
164
+ Return ONLY valid JSON with these lowercase keys:
165
+ company, role, location, seniority, posted_hours, salary_low, salary_high,
166
+ mission, funding, evidence.
167
+ - mission: company's main value proposition/tagline
168
+ - funding: recent funding round info if mentioned
169
+ - evidence maps each non-null key to the sentence fragment (≤120 chars) that proves it
170
+ Use null if value missing. Do NOT output any extra text."""
171
+
172
+ user_prompt = f"""Extract the JSON from this job description:
173
+ <<<
174
+ {text[:2000]}
175
+ >>>"""
176
+
177
+ start_time = time.time()
178
+
179
+ try:
180
+ client = openai.OpenAI(api_key=OPENAI_API_KEY)
181
+
182
+ response = client.chat.completions.create(
183
+ model="gpt-4o-mini",
184
+ messages=[
185
+ {"role": "system", "content": system_prompt},
186
+ {"role": "user", "content": user_prompt}
187
+ ],
188
+ temperature=0,
189
+ max_tokens=400,
190
+ timeout=timeout
191
+ )
192
+
193
+ result = response.choices[0].message.content or ""
194
+
195
+ # Log metrics
196
+ usage = response.usage
197
+ tokens_in = usage.prompt_tokens if usage else len((system_prompt + user_prompt).split())
198
+ tokens_out = usage.completion_tokens if usage else len(result.split())
199
+ usd_cost = (tokens_in * 0.000150 + tokens_out * 0.000600) / 1000 # GPT-4o-mini pricing
200
+
201
+ log_metric("llm_call", {
202
+ "provider": "openai",
203
+ "model": "gpt-4o-mini",
204
+ "latency": time.time() - start_time,
205
+ "success": True,
206
+ "prompt_length": len(user_prompt),
207
+ "response_length": len(result),
208
+ "tokens_in": tokens_in,
209
+ "tokens_out": tokens_out,
210
+ "usd_cost": usd_cost
211
+ })
212
+
213
+ return result
214
+
215
+ except Exception as e:
216
+ elapsed = time.time() - start_time
217
+ if elapsed >= timeout:
218
+ raise TimeoutError(f"OpenAI call exceeded {timeout}s timeout")
219
+
220
+ log_metric("llm_error", {
221
+ "provider": "openai",
222
+ "error": str(e),
223
+ "latency": elapsed
224
+ })
225
+ raise
226
+
227
+
228
+ def google_search(query: str, top: int = 3, timeout: int = 5) -> List[str]:
229
+ """
230
+ SerpAPI/Bing wrapper for Google search.
231
+ Returns list of relevant text snippets.
232
+ Logs google_calls, google_latency_ms via metrics.log_metric().
233
+ """
234
+ start_time = time.time()
235
+
236
+ try:
237
+ # Use SerpAPI if available, otherwise fallback to basic search
238
+ from config import SERPAPI_KEY
239
+ if SERPAPI_KEY:
240
+ url = "https://serpapi.com/search.json"
241
+ params = {
242
+ "q": query,
243
+ "api_key": SERPAPI_KEY,
244
+ "num": top,
245
+ "hl": "en",
246
+ "gl": "us"
247
+ }
248
+
249
+ response = requests.get(url, params=params, timeout=timeout)
250
+ response.raise_for_status()
251
+ data = response.json()
252
+
253
+ snippets = []
254
+ for result in data.get("organic_results", [])[:top]:
255
+ snippet = result.get("snippet", "")
256
+ if snippet:
257
+ snippets.append(snippet[:200]) # Limit snippet length
258
+
259
+ # Log successful search
260
+ log_metric("google_search", {
261
+ "query": query,
262
+ "results_count": len(snippets),
263
+ "latency_ms": (time.time() - start_time) * 1000,
264
+ "success": True
265
+ })
266
+
267
+ return snippets
268
+
269
+ else:
270
+ # Fallback: return empty results if no API key
271
+ log_metric("google_search", {
272
+ "query": query,
273
+ "results_count": 0,
274
+ "latency_ms": (time.time() - start_time) * 1000,
275
+ "success": False,
276
+ "error": "No SERPAPI_KEY available"
277
+ })
278
+ return []
279
+
280
+ except Exception as e:
281
+ log_metric("google_search", {
282
+ "query": query,
283
+ "results_count": 0,
284
+ "latency_ms": (time.time() - start_time) * 1000,
285
+ "success": False,
286
+ "error": str(e)
287
+ })
288
+ return []
289
+
290
+
291
+ # Global client instance
292
+ llm_client = LLMClient()
metrics.py ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import time
3
+ from typing import Any, Dict, Optional
4
+
5
+ def log_metric(event: str, data: Dict[str, Any]) -> None:
6
+ """Enhanced logging with support for tokens, cost, and latency metrics"""
7
+ log = {"event": event, "timestamp": time.time(), **data}
8
+
9
+ # Add structured fields for better analytics
10
+ if event == "llm_call":
11
+ # Ensure all required fields are present
12
+ log.setdefault("tokens_in", 0)
13
+ log.setdefault("tokens_out", 0)
14
+ log.setdefault("usd_cost", 0.0)
15
+ log.setdefault("latency", 0.0)
16
+ log.setdefault("success", False)
17
+
18
+ elif event == "enrich_latency":
19
+ # Track enrichment performance
20
+ log.setdefault("enrich_parallel_seconds", data.get("total_seconds", 0))
21
+ log.setdefault("facts_count", 0)
22
+
23
+ print(json.dumps(log))
24
+ # TODO: Add hook for HF Analytics API
25
+
26
+ def log_cost_summary(provider: str, total_tokens_in: int, total_tokens_out: int,
27
+ total_cost: float, request_count: int) -> None:
28
+ """Log cost summary for a session"""
29
+ log_metric("cost_summary", {
30
+ "provider": provider,
31
+ "total_tokens_in": total_tokens_in,
32
+ "total_tokens_out": total_tokens_out,
33
+ "total_usd_cost": total_cost,
34
+ "request_count": request_count,
35
+ "avg_cost_per_request": total_cost / request_count if request_count > 0 else 0.0
36
+ })
37
+
38
+ def log_parallel_performance(serial_time_estimate: float, parallel_time_actual: float) -> None:
39
+ """Log parallel execution performance gains"""
40
+ speedup = serial_time_estimate / parallel_time_actual if parallel_time_actual > 0 else 1.0
41
+ log_metric("parallel_performance", {
42
+ "serial_time_estimate": serial_time_estimate,
43
+ "parallel_time_actual": parallel_time_actual,
44
+ "speedup_factor": speedup,
45
+ "time_saved_seconds": serial_time_estimate - parallel_time_actual
46
+ })
47
+
48
+ def log_llm_call(provider: str, model: str, latency: float, success: bool,
49
+ prompt_length: int, response_length: int, tokens_in: int = 0,
50
+ tokens_out: int = 0, usd_cost: float = 0.0) -> None:
51
+ """Log LLM call metrics with enhanced token and cost tracking."""
52
+
53
+ log_metric("llm_call", {
54
+ "provider": provider,
55
+ "model": model,
56
+ "latency": latency,
57
+ "success": success,
58
+ "prompt_length": prompt_length,
59
+ "response_length": response_length,
60
+ "tokens_in": tokens_in,
61
+ "tokens_out": tokens_out,
62
+ "usd_cost": usd_cost
63
+ })
64
+
65
+ def log_google_search(query: str, results_count: int, latency_ms: float,
66
+ success: bool, error: Optional[str] = None) -> None:
67
+ """Log Google search call metrics."""
68
+
69
+ data = {
70
+ "query": query,
71
+ "results_count": results_count,
72
+ "latency_ms": latency_ms,
73
+ "success": success
74
+ }
75
+
76
+ if error:
77
+ data["error"] = error
78
+
79
+ log_metric("google_search", data)
80
+
81
+ def log_patch_missing(company: str, patches_applied: int, source_map: Dict[str, str]) -> None:
82
+ """Log Google patching metrics."""
83
+
84
+ log_metric("patch_missing", {
85
+ "company": company,
86
+ "patches_applied": patches_applied,
87
+ "source_map": source_map,
88
+ "google_calls": len([v for v in source_map.values() if v == "google"])
89
+ })
90
+
91
+ def log_enrich_latency(company: str, total_seconds: float, facts_count: int,
92
+ enrich_parallel_seconds: Optional[float] = None) -> None:
93
+ """Log enrichment pipeline latency."""
94
+
95
+ data = {
96
+ "company": company,
97
+ "total_seconds": total_seconds,
98
+ "facts_count": facts_count
99
+ }
100
+
101
+ if enrich_parallel_seconds is not None:
102
+ data["enrich_parallel_seconds"] = enrich_parallel_seconds
103
+
104
+ log_metric("enrich_latency", data)
105
+
106
+ def log_render_success(total_length: int, has_qa: bool, has_critique: bool,
107
+ quality_score: Optional[float] = None, has_buckets: bool = False) -> None:
108
+ """Log successful render metrics."""
109
+
110
+ log_metric("render_success", {
111
+ "total_length": total_length,
112
+ "has_qa": has_qa,
113
+ "has_critique": has_critique,
114
+ "quality_score": quality_score,
115
+ "has_buckets": has_buckets
116
+ })
117
+
118
+ def log_cache_hit(input_text: str) -> None:
119
+ """Log cache hit for input."""
120
+
121
+ log_metric("cache_hit", {
122
+ "input": input_text[:100] + "..." if len(input_text) > 100 else input_text
123
+ })
124
+
125
+ def log_cache_miss(input_text: str) -> None:
126
+ """Log cache miss for input."""
127
+
128
+ log_metric("cache_miss", {
129
+ "input": input_text[:100] + "..." if len(input_text) > 100 else input_text
130
+ })
micro/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Micro-functions package
micro/advanced_gap_analysis.py ADDED
@@ -0,0 +1,571 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Advanced Gap Analysis - Provides true skills matching with semantic similarity
3
+ """
4
+ from typing import Dict, Any, List, Optional
5
+ from dataclasses import dataclass
6
+ from difflib import SequenceMatcher
7
+
8
+ from metrics import log_metric
9
+ from llm_client import LLMClient
10
+
11
+
12
+ @dataclass
13
+ class SkillMatch:
14
+ """Detailed skill matching result"""
15
+ job_requirement: str
16
+ resume_skill: str
17
+ match_score: float
18
+ match_type: str # "strong", "partial", "weak", "missing"
19
+ importance: str # "required", "preferred", "nice_to_have"
20
+ recommendation: str
21
+ category: str
22
+
23
+
24
+ @dataclass
25
+ class GapAnalysisResult:
26
+ """Complete gap analysis result"""
27
+ overall_match_score: float
28
+ strong_matches: List[SkillMatch]
29
+ partial_matches: List[SkillMatch]
30
+ gaps: List[SkillMatch]
31
+ strengths_summary: str
32
+ gaps_summary: str
33
+ competitive_advantages: List[str]
34
+ preparation_priority: List[str]
35
+ interview_focus_areas: List[str]
36
+ skill_categories_analysis: Dict[str, float]
37
+
38
+
39
+ class AdvancedSkillMatcher:
40
+ """Advanced skill matching with semantic similarity"""
41
+
42
+ def __init__(self):
43
+ self.skill_synonyms = {
44
+ # Programming Languages
45
+ "python": ["python3", "py", "python programming"],
46
+ "javascript": ["js", "ecmascript", "node.js", "nodejs"],
47
+ "typescript": ["ts"],
48
+ "react": ["reactjs", "react.js"],
49
+ "vue": ["vue.js", "vuejs"],
50
+ "angular": ["angularjs"],
51
+ # Frameworks & Libraries
52
+ "express": ["express.js", "expressjs"],
53
+ "django": ["django framework"],
54
+ "flask": ["flask framework"],
55
+ "spring": ["spring boot", "spring framework"],
56
+ "laravel": ["laravel framework"],
57
+ # Databases
58
+ "postgresql": ["postgres", "psql"],
59
+ "mongodb": ["mongo"],
60
+ "mysql": ["my sql"],
61
+ # Cloud Platforms
62
+ "aws": ["amazon web services"],
63
+ "gcp": ["google cloud platform", "google cloud"],
64
+ "azure": ["microsoft azure"],
65
+ # DevOps & Tools
66
+ "kubernetes": ["k8s"],
67
+ "docker": ["containers", "containerization"],
68
+ "jenkins": ["ci/cd"],
69
+ "git": ["version control", "github", "gitlab"],
70
+ # Data & ML
71
+ "machine learning": ["ml", "artificial intelligence", "ai"],
72
+ "deep learning": ["dl", "neural networks"],
73
+ "tensorflow": ["tf"],
74
+ "pytorch": ["torch"],
75
+ "pandas": ["data analysis"],
76
+ "numpy": ["numerical computing"],
77
+ # Other
78
+ "agile": ["scrum", "kanban"],
79
+ "restful": ["rest api", "rest apis", "api development"],
80
+ "microservices": ["micro services", "service oriented architecture"]
81
+ }
82
+
83
+ def normalize_skill(self, skill: str) -> str:
84
+ """Normalize skill name using synonyms"""
85
+ skill_lower = skill.lower().strip()
86
+
87
+ # Direct match
88
+ if skill_lower in self.skill_synonyms:
89
+ return skill_lower
90
+
91
+ # Check if it's a synonym
92
+ for main_skill, synonyms in self.skill_synonyms.items():
93
+ if skill_lower in synonyms:
94
+ return main_skill
95
+
96
+ return skill_lower
97
+
98
+ def calculate_similarity(self, skill1: str, skill2: str) -> float:
99
+ """Calculate semantic similarity between two skills"""
100
+ norm1 = self.normalize_skill(skill1)
101
+ norm2 = self.normalize_skill(skill2)
102
+
103
+ # Direct match after normalization
104
+ if norm1 == norm2:
105
+ return 1.0
106
+
107
+ # Partial matching using SequenceMatcher
108
+ similarity = SequenceMatcher(None, norm1, norm2).ratio()
109
+
110
+ # Boost for containing relationships
111
+ if norm1 in norm2 or norm2 in norm1:
112
+ similarity = max(similarity, 0.8)
113
+
114
+ return similarity
115
+
116
+ def find_best_match(self, job_requirement: str,
117
+ resume_skills: List[str]) -> tuple[str, float]:
118
+ """Find the best matching resume skill for a job requirement"""
119
+ best_skill = ""
120
+ best_score = 0.0
121
+
122
+ for resume_skill in resume_skills:
123
+ score = self.calculate_similarity(job_requirement, resume_skill)
124
+ if score > best_score:
125
+ best_score = score
126
+ best_skill = resume_skill
127
+
128
+ return best_skill, best_score
129
+
130
+
131
+ class AdvancedGapAnalysis:
132
+ """Advanced gap analysis with true skills matching"""
133
+
134
+ def __init__(self):
135
+ self.matcher = AdvancedSkillMatcher()
136
+ self.llm_client = LLMClient()
137
+
138
+ async def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
139
+ """Main entry point for gap analysis"""
140
+ try:
141
+ # Extract resume and job data
142
+ resume_data = data.get("resume_data_enhanced", {})
143
+ job_data = data.get("job_data_enhanced", {})
144
+
145
+ if not resume_data or not job_data:
146
+ return {**data, "gap_analysis_advanced": {
147
+ "error": "Missing resume or job data"}}
148
+
149
+ # Perform advanced gap analysis
150
+ analysis_result = await self._analyze_comprehensive_fit(
151
+ resume_data, job_data)
152
+
153
+ log_metric("gap_analysis_advanced_success", {
154
+ "match_score": analysis_result.overall_match_score,
155
+ "strong_matches": len(analysis_result.strong_matches),
156
+ "gaps": len(analysis_result.gaps)
157
+ })
158
+
159
+ return {**data, "gap_analysis_advanced": self._format_result(
160
+ analysis_result)}
161
+
162
+ except Exception as e:
163
+ log_metric("gap_analysis_advanced_error", {"error": str(e)})
164
+ return {**data, "gap_analysis_advanced": {
165
+ "error": f"Advanced gap analysis failed: {e}"}}
166
+
167
+ async def _analyze_comprehensive_fit(
168
+ self, resume_data: Dict[str, Any], job_data: Dict[str, Any]
169
+ ) -> GapAnalysisResult:
170
+ """Perform comprehensive fit analysis"""
171
+
172
+ # Extract skills from resume
173
+ resume_skills = self._extract_resume_skills(resume_data)
174
+
175
+ # Extract requirements from job
176
+ job_requirements = self._extract_job_requirements(job_data)
177
+
178
+ # Perform detailed matching
179
+ skill_matches = self._match_skills_detailed(
180
+ job_requirements, resume_skills)
181
+
182
+ # Calculate overall score
183
+ overall_score = self._calculate_overall_score(skill_matches)
184
+
185
+ # Categorize matches
186
+ strong_matches = [m for m in skill_matches if m.match_score >= 0.8]
187
+ partial_matches = [m for m in skill_matches
188
+ if 0.4 <= m.match_score < 0.8]
189
+ gaps = [m for m in skill_matches if m.match_score < 0.4]
190
+
191
+ # Generate AI-powered analysis
192
+ strengths_summary = await self._generate_strengths_summary(
193
+ strong_matches, resume_data)
194
+ gaps_summary = await self._generate_gaps_summary(gaps, job_data)
195
+
196
+ # Extract competitive advantages
197
+ competitive_advantages = self._identify_competitive_advantages(
198
+ resume_data, job_data, strong_matches)
199
+
200
+ # Generate preparation priorities
201
+ preparation_priority = self._generate_preparation_priority(
202
+ gaps, partial_matches)
203
+
204
+ # Identify interview focus areas
205
+ interview_focus = self._identify_interview_focus_areas(
206
+ strong_matches, gaps)
207
+
208
+ # Analyze by skill categories
209
+ categories_analysis = self._analyze_skill_categories(skill_matches)
210
+
211
+ return GapAnalysisResult(
212
+ overall_match_score=overall_score,
213
+ strong_matches=strong_matches,
214
+ partial_matches=partial_matches,
215
+ gaps=gaps,
216
+ strengths_summary=strengths_summary,
217
+ gaps_summary=gaps_summary,
218
+ competitive_advantages=competitive_advantages,
219
+ preparation_priority=preparation_priority,
220
+ interview_focus_areas=interview_focus,
221
+ skill_categories_analysis=categories_analysis
222
+ )
223
+
224
+ def _extract_resume_skills(self, resume_data: Dict[str, Any]) -> List[str]:
225
+ """Extract all skills from resume data"""
226
+ all_skills = []
227
+
228
+ # Get skills from structured skills section
229
+ skills_obj = resume_data.get("skills", {})
230
+ if isinstance(skills_obj, dict):
231
+ for category, skills_list in skills_obj.items():
232
+ if isinstance(skills_list, list):
233
+ all_skills.extend(skills_list)
234
+
235
+ # Get skills from experience
236
+ experience = resume_data.get("experience", [])
237
+ for exp in experience:
238
+ if isinstance(exp, dict):
239
+ tech_skills = exp.get("technologies", [])
240
+ if isinstance(tech_skills, list):
241
+ all_skills.extend(tech_skills)
242
+
243
+ # Get skills from projects
244
+ projects = resume_data.get("projects", [])
245
+ for project in projects:
246
+ if isinstance(project, dict):
247
+ tech_skills = project.get("technologies", [])
248
+ if isinstance(tech_skills, list):
249
+ all_skills.extend(tech_skills)
250
+
251
+ # Deduplicate and clean
252
+ return list(set([skill.strip() for skill in all_skills if skill]))
253
+
254
+ def _extract_job_requirements(self,
255
+ job_data: Dict[str, Any]) -> List[Dict[str, Any]]:
256
+ """Extract requirements from job data"""
257
+ requirements = []
258
+
259
+ # Handle enhanced job parser structure
260
+ for req_type in ["tech_requirements", "experience_requirements",
261
+ "education_requirements", "soft_skill_requirements"]:
262
+ structured_reqs = job_data.get(req_type, [])
263
+ if isinstance(structured_reqs, list):
264
+ # Convert JobRequirement objects to dicts if needed
265
+ for req in structured_reqs:
266
+ if isinstance(req, dict):
267
+ requirements.append(req)
268
+ else:
269
+ # Handle dataclass objects
270
+ requirements.append({
271
+ "skill": getattr(req, 'skill', str(req)),
272
+ "importance": getattr(req, 'importance', 'required'),
273
+ "category": getattr(req, 'category', 'technical')
274
+ })
275
+
276
+ # Fallback: legacy requirements key
277
+ if not requirements:
278
+ structured_reqs = job_data.get("requirements", [])
279
+ if isinstance(structured_reqs, list):
280
+ requirements.extend(structured_reqs)
281
+
282
+ # Last fallback: extract from text fields
283
+ if not requirements:
284
+ requirements = self._extract_requirements_from_text(job_data)
285
+
286
+ return requirements
287
+
288
+ def _extract_requirements_from_text(self,
289
+ job_data: Dict[str, Any]) -> List[Dict[str, Any]]:
290
+ """Extract requirements from job description text"""
291
+ requirements = []
292
+
293
+ # Common skill patterns
294
+ common_skills = [
295
+ "python", "javascript", "react", "node.js", "sql", "aws",
296
+ "docker", "kubernetes", "git", "machine learning", "tensorflow",
297
+ "django", "flask", "express", "mongodb", "postgresql"
298
+ ]
299
+
300
+ # Extract from various text fields
301
+ text_content = ""
302
+ for field in ["description", "content", "scraped"]:
303
+ if field in job_data:
304
+ if isinstance(job_data[field], str):
305
+ text_content += job_data[field]
306
+ elif isinstance(job_data[field], dict):
307
+ text_content += str(job_data[field])
308
+
309
+ text_lower = text_content.lower()
310
+
311
+ for skill in common_skills:
312
+ if skill.lower() in text_lower:
313
+ requirements.append({
314
+ "skill": skill,
315
+ "importance": "required",
316
+ "category": "technical"
317
+ })
318
+
319
+ return requirements
320
+
321
+ def _match_skills_detailed(self, job_requirements: List[Dict[str, Any]],
322
+ resume_skills: List[str]) -> List[SkillMatch]:
323
+ """Perform detailed skill matching"""
324
+ matches = []
325
+
326
+ for req in job_requirements:
327
+ req_skill = req.get("skill", "")
328
+ importance = req.get("importance", "required")
329
+ category = req.get("category", "technical")
330
+
331
+ # Find best match
332
+ best_match, score = self.matcher.find_best_match(
333
+ req_skill, resume_skills)
334
+
335
+ # Determine match type
336
+ if score >= 0.8:
337
+ match_type = "strong"
338
+ elif score >= 0.4:
339
+ match_type = "partial"
340
+ elif score > 0:
341
+ match_type = "weak"
342
+ else:
343
+ match_type = "missing"
344
+
345
+ # Generate recommendation
346
+ recommendation = self._generate_skill_recommendation(
347
+ req_skill, best_match, score, importance)
348
+
349
+ matches.append(SkillMatch(
350
+ job_requirement=req_skill,
351
+ resume_skill=best_match if score > 0 else "Not Found",
352
+ match_score=score,
353
+ match_type=match_type,
354
+ importance=importance,
355
+ recommendation=recommendation,
356
+ category=category
357
+ ))
358
+
359
+ return matches
360
+
361
+ def _calculate_overall_score(self, matches: List[SkillMatch]) -> float:
362
+ """Calculate weighted overall match score"""
363
+ if not matches:
364
+ return 0.0
365
+
366
+ # Weight by importance
367
+ importance_weights = {
368
+ "required": 1.0,
369
+ "preferred": 0.7,
370
+ "nice_to_have": 0.3
371
+ }
372
+
373
+ total_weighted_score = 0.0
374
+ total_weight = 0.0
375
+
376
+ for match in matches:
377
+ weight = importance_weights.get(match.importance, 0.5)
378
+ total_weighted_score += match.match_score * weight
379
+ total_weight += weight
380
+
381
+ return (total_weighted_score / total_weight * 100) if total_weight > 0 else 0
382
+
383
+ def _generate_skill_recommendation(self, job_skill: str, resume_skill: str,
384
+ score: float, importance: str) -> str:
385
+ """Generate actionable recommendation for skill match"""
386
+ if score >= 0.8:
387
+ return f"Highlight your {resume_skill} experience"
388
+ elif score >= 0.4:
389
+ return f"Connect your {resume_skill} to {job_skill} requirements"
390
+ elif importance == "required":
391
+ return f"Critical: Learn {job_skill} before applying"
392
+ elif importance == "preferred":
393
+ return f"Important: Gain experience with {job_skill}"
394
+ else:
395
+ return f"Nice-to-have: Consider learning {job_skill}"
396
+
397
+ async def _generate_strengths_summary(self, strong_matches: List[SkillMatch],
398
+ resume_data: Dict[str, Any]) -> str:
399
+ """Generate AI-powered strengths summary"""
400
+ if not strong_matches:
401
+ return "No strong technical matches found."
402
+
403
+ skills_list = [match.job_requirement for match in strong_matches[:5]]
404
+ experience_years = resume_data.get("years_of_experience", 0)
405
+
406
+ prompt = f"""
407
+ Based on these strong skill matches: {', '.join(skills_list)} and
408
+ {experience_years} years of experience, write a 2-sentence summary of
409
+ the candidate's key strengths for this role.
410
+ Focus on practical value and competitive advantages.
411
+ """
412
+
413
+ try:
414
+ response = self.llm_client.call_llm(
415
+ prompt, temperature=0.3, max_tokens=150)
416
+ return response.strip()
417
+ except Exception:
418
+ return (f"Strong technical foundation in {', '.join(skills_list[:3])} "
419
+ f"with {experience_years} years of experience.")
420
+
421
+ async def _generate_gaps_summary(self, gaps: List[SkillMatch],
422
+ job_data: Dict[str, Any]) -> str:
423
+ """Generate AI-powered gaps summary"""
424
+ if not gaps:
425
+ return "No significant skill gaps identified."
426
+
427
+ critical_gaps = [gap.job_requirement for gap in gaps
428
+ if gap.importance == "required"]
429
+
430
+ if not critical_gaps:
431
+ return "No critical skill gaps. Focus on strengthening preferences."
432
+
433
+ prompt = f"""
434
+ The candidate is missing these required skills: {', '.join(critical_gaps[:3])}.
435
+ Write a 2-sentence summary of the main gaps and preparation strategy.
436
+ Be constructive and actionable.
437
+ """
438
+
439
+ try:
440
+ response = self.llm_client.call_llm(
441
+ prompt, temperature=0.3, max_tokens=150)
442
+ return response.strip()
443
+ except Exception:
444
+ return (f"Key gaps in {', '.join(critical_gaps[:2])}. "
445
+ "Focus preparation on these critical areas.")
446
+
447
+ def _identify_competitive_advantages(self, resume_data: Dict[str, Any],
448
+ job_data: Dict[str, Any],
449
+ strong_matches: List[SkillMatch]) -> List[str]:
450
+ """Identify unique competitive advantages"""
451
+ advantages = []
452
+
453
+ # Experience level advantage
454
+ years_exp = resume_data.get("years_of_experience", 0)
455
+ if years_exp > 5:
456
+ advantages.append(f"{years_exp}+ years of proven experience")
457
+
458
+ # Education advantage
459
+ education = resume_data.get("education", [])
460
+ for edu in education:
461
+ if isinstance(edu, dict) and "degree" in edu:
462
+ degree = edu["degree"]
463
+ if "master" in degree.lower() or "phd" in degree.lower():
464
+ advantages.append(f"Advanced degree: {degree}")
465
+ break
466
+
467
+ # Skill combination advantages
468
+ strong_skills = [match.job_requirement for match in strong_matches]
469
+ if len(strong_skills) >= 3:
470
+ advantages.append(
471
+ f"Strong combination: {', '.join(strong_skills[:3])}")
472
+
473
+ # Project portfolio
474
+ projects = resume_data.get("projects", [])
475
+ if len(projects) >= 2:
476
+ advantages.append(f"Proven track record: {len(projects)} projects")
477
+
478
+ return advantages[:4] # Limit to top 4
479
+
480
+ def _generate_preparation_priority(self, gaps: List[SkillMatch],
481
+ partial_matches: List[SkillMatch]) -> List[str]:
482
+ """Generate preparation priority list"""
483
+ priorities = []
484
+
485
+ # Critical gaps first
486
+ critical_gaps = [gap.job_requirement for gap in gaps
487
+ if gap.importance == "required"]
488
+ priorities.extend(critical_gaps[:3])
489
+
490
+ # Important partial matches to strengthen
491
+ important_partials = [match.job_requirement for match in partial_matches
492
+ if match.importance in ["required", "preferred"]]
493
+ priorities.extend(important_partials[:2])
494
+
495
+ return priorities[:5]
496
+
497
+ def _identify_interview_focus_areas(self, strong_matches: List[SkillMatch],
498
+ gaps: List[SkillMatch]) -> List[str]:
499
+ """Identify areas to focus on during interview"""
500
+ focus_areas = []
501
+
502
+ # Highlight strengths
503
+ if strong_matches:
504
+ top_strengths = [match.job_requirement for match in strong_matches[:2]]
505
+ focus_areas.extend([f"Demonstrate {skill} expertise"
506
+ for skill in top_strengths])
507
+
508
+ # Address concerns proactively
509
+ critical_gaps = [gap.job_requirement for gap in gaps
510
+ if gap.importance == "required"]
511
+ if critical_gaps:
512
+ focus_areas.append(
513
+ f"Address learning plan for {critical_gaps[0]}")
514
+
515
+ # General advice
516
+ focus_areas.extend([
517
+ "Emphasize problem-solving approach",
518
+ "Show enthusiasm for learning"
519
+ ])
520
+
521
+ return focus_areas[:5]
522
+
523
+ def _analyze_skill_categories(self, matches: List[SkillMatch]) -> Dict[str, float]:
524
+ """Analyze performance by skill category"""
525
+ categories = {}
526
+
527
+ for match in matches:
528
+ category = match.category
529
+ if category not in categories:
530
+ categories[category] = []
531
+ categories[category].append(match.match_score)
532
+
533
+ # Calculate averages
534
+ category_scores = {}
535
+ for category, scores in categories.items():
536
+ if scores:
537
+ category_scores[category] = sum(scores) / len(scores) * 100
538
+
539
+ return category_scores
540
+
541
+ def _format_result(self, result: GapAnalysisResult) -> Dict[str, Any]:
542
+ """Format result for output"""
543
+ return {
544
+ "overall_match_score": result.overall_match_score,
545
+ "strong_matches": [self._format_skill_match(m)
546
+ for m in result.strong_matches],
547
+ "partial_matches": [self._format_skill_match(m)
548
+ for m in result.partial_matches],
549
+ "gaps": [self._format_skill_match(m) for m in result.gaps],
550
+ "strengths_summary": result.strengths_summary,
551
+ "gaps_summary": result.gaps_summary,
552
+ "competitive_advantages": result.competitive_advantages,
553
+ "preparation_priority": result.preparation_priority,
554
+ "interview_focus_areas": result.interview_focus_areas,
555
+ "skill_categories_analysis": result.skill_categories_analysis,
556
+ "detailed_matches": [self._format_skill_match(m)
557
+ for m in result.strong_matches +
558
+ result.partial_matches + result.gaps]
559
+ }
560
+
561
+ def _format_skill_match(self, match: SkillMatch) -> Dict[str, Any]:
562
+ """Format individual skill match"""
563
+ return {
564
+ "job_requirement": match.job_requirement,
565
+ "resume_skill": match.resume_skill,
566
+ "match_score": round(match.match_score, 2),
567
+ "match_type": match.match_type,
568
+ "importance": match.importance,
569
+ "recommendation": match.recommendation,
570
+ "category": match.category
571
+ }
micro/bucket_enrich.py ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import asyncio
2
+ import time
3
+ import re
4
+ from typing import Dict, List, Optional, Any
5
+ import requests
6
+ from bs4 import BeautifulSoup
7
+ from selenium import webdriver
8
+ from selenium.webdriver.chrome.options import Options
9
+ from selenium.webdriver.common.by import By
10
+ from selenium.webdriver.support.ui import WebDriverWait
11
+ from selenium.webdriver.support import expected_conditions as EC
12
+ from selenium.common.exceptions import TimeoutException, WebDriverException
13
+ from webdriver_manager.chrome import ChromeDriverManager
14
+ from selenium.webdriver.chrome.service import Service
15
+ from metrics import log_metric
16
+
17
+ class BucketEnrichMicroFunction:
18
+ def __init__(self):
19
+ self.chrome_options = Options()
20
+ self.chrome_options.add_argument('--headless')
21
+ self.chrome_options.add_argument('--no-sandbox')
22
+ self.chrome_options.add_argument('--disable-dev-shm-usage')
23
+ self.chrome_options.add_argument('--disable-gpu')
24
+ self.chrome_options.add_argument('--window-size=1920,1080')
25
+ self.chrome_options.add_argument('--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36')
26
+
27
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
28
+ """Main enrichment pipeline with async parallel execution"""
29
+ start_time = time.time()
30
+ enriched_data = data.get("enriched", {})
31
+
32
+ company = enriched_data.get("company", "Unknown")
33
+ location = enriched_data.get("location", "Unknown")
34
+ raw_input = data.get("raw_input", "")
35
+
36
+ # Skip if no company identified
37
+ if company in ["Unknown", "", None, "Not specified"]:
38
+ log_metric("bucket_enrich_skip", {"reason": "no_company"})
39
+ return {**data, "bucket_facts": {}}
40
+
41
+ try:
42
+ # Run enrichments in parallel using asyncio
43
+ bucket_facts = asyncio.run(self._async_enrich_all(company, location, raw_input))
44
+
45
+ # Log enrichment latency
46
+ total_time = time.time() - start_time
47
+ log_metric("enrich_latency", {
48
+ "company": company,
49
+ "total_seconds": total_time,
50
+ "facts_count": len(bucket_facts)
51
+ })
52
+ log_metric("enrich_parallel_seconds", {"value": total_time})
53
+
54
+ return {**data, "bucket_facts": bucket_facts}
55
+
56
+ except Exception as e:
57
+ log_metric("bucket_enrich_error", {"company": company, "error": str(e)})
58
+ return {**data, "bucket_facts": {}}
59
+
60
+ async def _async_enrich_all(self, company: str, location: str, raw_input: str) -> Dict[str, str]:
61
+ """Run all enrichments in parallel"""
62
+ # Prepare tasks for parallel execution
63
+ tasks = []
64
+
65
+ # Manager & Team enrichment (LinkedIn-based)
66
+ if "linkedin.com" in raw_input:
67
+ tasks.append(self._async_manager_enrich(raw_input))
68
+ else:
69
+ tasks.append(self._async_empty_result())
70
+
71
+ # Company-based enrichments
72
+ if company not in ["Unknown", "", None, "Not specified"]:
73
+ tasks.extend([
74
+ self._async_stack_enrich(company),
75
+ self._async_biz_enrich(company),
76
+ self._async_comp_enrich(company, location),
77
+ self._async_culture_enrich(company)
78
+ ])
79
+ else:
80
+ tasks.extend([
81
+ self._async_empty_result(),
82
+ self._async_empty_result(),
83
+ self._async_empty_result(),
84
+ self._async_empty_result()
85
+ ])
86
+
87
+ # Execute all tasks in parallel
88
+ results = await asyncio.gather(*tasks, return_exceptions=True)
89
+
90
+ # Merge results
91
+ bucket_facts = {}
92
+ for result in results:
93
+ if isinstance(result, dict):
94
+ bucket_facts.update(result)
95
+ elif isinstance(result, Exception):
96
+ log_metric("async_enrich_error", {"error": str(result)})
97
+
98
+ return bucket_facts
99
+
100
+ async def _async_empty_result(self) -> Dict[str, str]:
101
+ """Return empty result for skipped enrichments"""
102
+ return {}
103
+
104
+ async def _async_manager_enrich(self, linkedin_url: str) -> Dict[str, str]:
105
+ """Async wrapper for manager enrichment"""
106
+ return await asyncio.get_event_loop().run_in_executor(
107
+ None, self.manager_enrich, linkedin_url
108
+ )
109
+
110
+ async def _async_stack_enrich(self, company: str) -> Dict[str, str]:
111
+ """Async wrapper for stack enrichment"""
112
+ return await asyncio.get_event_loop().run_in_executor(
113
+ None, self.stack_enrich, company
114
+ )
115
+
116
+ async def _async_biz_enrich(self, company: str) -> Dict[str, str]:
117
+ """Async wrapper for business enrichment"""
118
+ return await asyncio.get_event_loop().run_in_executor(
119
+ None, self.biz_enrich, company
120
+ )
121
+
122
+ async def _async_comp_enrich(self, company: str, location: str) -> Dict[str, str]:
123
+ """Async wrapper for compensation enrichment"""
124
+ return await asyncio.get_event_loop().run_in_executor(
125
+ None, self.comp_enrich, company, location
126
+ )
127
+
128
+ async def _async_culture_enrich(self, company: str) -> Dict[str, str]:
129
+ """Async wrapper for culture enrichment"""
130
+ return await asyncio.get_event_loop().run_in_executor(
131
+ None, self.culture_enrich, company
132
+ )
133
+
134
+ def manager_enrich(self, linkedin_url: str) -> Dict[str, str]:
135
+ """Extract hiring manager and team info from LinkedIn job page"""
136
+ facts = {}
137
+
138
+ try:
139
+ options = Options()
140
+ options.add_argument("--headless")
141
+ options.add_argument("--no-sandbox")
142
+ options.add_argument("--disable-dev-shm-usage")
143
+ options.add_argument("--user-data-dir=/tmp/chrome_user_data")
144
+
145
+ driver = webdriver.Chrome(
146
+ service=Service(ChromeDriverManager().install()),
147
+ options=options
148
+ )
149
+
150
+ driver.get(linkedin_url)
151
+ time.sleep(2)
152
+
153
+ # Look for hiring manager info
154
+ try:
155
+ manager_element = driver.find_element(By.CSS_SELECTOR, '[data-test-id="hiring-manager"]')
156
+ if manager_element:
157
+ facts["hiring_manager"] = manager_element.text.strip()
158
+ except:
159
+ pass
160
+
161
+ # Look for team size indicators
162
+ try:
163
+ team_elements = driver.find_elements(By.XPATH, "//*[contains(text(), 'team') or contains(text(), 'employees')]")
164
+ for element in team_elements[:2]:
165
+ text = element.text.lower()
166
+ if any(keyword in text for keyword in ["team of", "team size", "employees"]):
167
+ facts["team_info"] = element.text.strip()
168
+ break
169
+ except:
170
+ pass
171
+
172
+ driver.quit()
173
+
174
+ except Exception as e:
175
+ log_metric("manager_enrich_error", {"url": linkedin_url, "error": str(e)})
176
+ facts["manager_error"] = f"Failed to extract manager info: {str(e)}"
177
+
178
+ return facts
179
+
180
+ def stack_enrich(self, company: str) -> Dict[str, str]:
181
+ """Get tech stack info from StackShare and GitHub"""
182
+ facts = {}
183
+
184
+ try:
185
+ # StackShare lookup (2s timeout)
186
+ stackshare_url = f"https://stackshare.io/{company.lower().replace(' ', '-')}"
187
+ response = requests.get(stackshare_url, timeout=2)
188
+
189
+ if response.status_code == 200:
190
+ soup = BeautifulSoup(response.content, 'html.parser')
191
+
192
+ # Extract popular tools
193
+ tool_elements = soup.find_all(class_=re.compile("tool|stack"))
194
+ tools = []
195
+ for elem in tool_elements[:10]:
196
+ text = elem.get_text().strip()
197
+ if text and len(text) < 50:
198
+ tools.append(text)
199
+
200
+ if tools:
201
+ facts["tech_stack"] = f"Popular tools: {', '.join(tools[:5])}"
202
+
203
+ except Exception as e:
204
+ log_metric("stack_enrich_error", {"company": company, "error": str(e)})
205
+
206
+ return facts
207
+
208
+ def biz_enrich(self, company: str) -> Dict[str, str]:
209
+ """Get business context from recent news and company info"""
210
+ facts = {}
211
+
212
+ try:
213
+ # Recent news search (2s timeout)
214
+ search_query = f"{company} news site:techcrunch.com OR site:bloomberg.com OR site:reuters.com"
215
+ search_url = f"https://www.google.com/search?q={search_query}&tbm=nws&tbs=qdr:m2"
216
+
217
+ headers = {
218
+ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
219
+ }
220
+
221
+ response = requests.get(search_url, headers=headers, timeout=2)
222
+ if response.status_code == 200:
223
+ soup = BeautifulSoup(response.content, 'html.parser')
224
+
225
+ # Extract recent headlines
226
+ headlines = []
227
+ for elem in soup.find_all(['h3', 'h4'], limit=3):
228
+ if elem.text.strip():
229
+ headlines.append(elem.text.strip())
230
+
231
+ if headlines:
232
+ facts["recent_news"] = " | ".join(headlines[:2])
233
+
234
+ except Exception:
235
+ pass
236
+
237
+ # Basic company info
238
+ try:
239
+ # Simple company lookup
240
+ facts["company_domain"] = f"{company.lower().replace(' ', '')}.com"
241
+
242
+ except Exception:
243
+ pass
244
+
245
+ return facts
246
+
247
+ def comp_enrich(self, company: str, location: str) -> Dict[str, str]:
248
+ """Get compensation data from levels.fyi"""
249
+ facts = {}
250
+
251
+ try:
252
+ # Levels.fyi lookup (2s timeout)
253
+ levels_url = f"https://www.levels.fyi/companies/{company.lower().replace(' ', '-')}"
254
+ response = requests.get(levels_url, timeout=2)
255
+
256
+ if response.status_code == 200:
257
+ soup = BeautifulSoup(response.content, 'html.parser')
258
+
259
+ # Look for salary ranges
260
+ salary_elements = soup.find_all(text=re.compile(r'\$\d{2,3}[kK]'))
261
+ if salary_elements:
262
+ salaries = [elem.strip() for elem in salary_elements[:3]]
263
+ facts["salary_range_levels"] = " - ".join(salaries)
264
+ facts["levels_url"] = f"🔗 {levels_url}"
265
+
266
+ except Exception:
267
+ pass
268
+
269
+ return facts
270
+
271
+ def culture_enrich(self, company: str) -> Dict[str, str]:
272
+ """Get culture and work-life balance info from Blind"""
273
+ facts = {}
274
+
275
+ try:
276
+ # Blind company lookup (2s timeout)
277
+ blind_url = f"https://www.teamblind.com/company/{company.lower().replace(' ', '-')}"
278
+ headers = {
279
+ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
280
+ }
281
+
282
+ response = requests.get(blind_url, headers=headers, timeout=2)
283
+
284
+ if response.status_code == 200:
285
+ soup = BeautifulSoup(response.content, 'html.parser')
286
+
287
+ # Look for ratings
288
+ rating_elements = soup.find_all(text=re.compile(r'\d\.\d'))
289
+ if rating_elements:
290
+ facts["blind_rating"] = rating_elements[0].strip()
291
+ facts["blind_url"] = f"🔗 {blind_url}"
292
+
293
+ # Look for culture keywords
294
+ culture_keywords = soup.find_all(text=re.compile(r'work.?life|culture|benefits|remote'))
295
+ if culture_keywords:
296
+ facts["culture_mentions"] = " | ".join([kw.strip() for kw in culture_keywords[:2]])
297
+
298
+ except Exception:
299
+ pass
300
+
301
+ return facts
micro/critique.py ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict
2
+ from llm_client import llm_client
3
+ from prompt_loader import prompt_loader
4
+ from metrics import log_metric
5
+
6
+ class CritiqueMicroFunction:
7
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
8
+ draft = data.get("draft", "")
9
+ qa_result = data.get("qa_result", "")
10
+ enriched_data = data.get("enriched", {})
11
+
12
+ if not draft or "failed" in draft.lower():
13
+ return {**data, "critique": "Critique skipped due to draft errors."}
14
+
15
+ try:
16
+ # Use LLM to provide detailed critique
17
+ prompt = prompt_loader.get_prompt("critique_prompt", draft=draft)
18
+
19
+ critique_prompt = prompt + f"""
20
+
21
+ Provide a comprehensive critique of this job role preview, focusing on:
22
+
23
+ ## Critical Analysis
24
+ 1. **Factual Accuracy**: Cross-check details against source data
25
+ 2. **Market Reality**: Are salary/requirements realistic for the role/level?
26
+ 3. **Completeness**: Missing critical information?
27
+ 4. **Tone & Style**: Appropriate for job seekers?
28
+ 5. **Actionability**: Are recommendations specific and useful?
29
+
30
+ ## Context
31
+ - QA Results: {qa_result}
32
+ - Source Job Data: {enriched_data}
33
+
34
+ ## Content to Critique
35
+ {draft}
36
+
37
+ ## Critique Format
38
+ **Strengths**: What works well
39
+ **Weaknesses**: Areas needing improvement
40
+ **Factual Issues**: Any inaccuracies found
41
+ **Market Insights**: Industry-specific observations
42
+ **Recommendations**: Specific improvements
43
+ **Risk Assessment**: Potential issues for job seekers
44
+ **Overall Score**: [1-10] with justification
45
+
46
+ Be constructive but thorough. This critique helps ensure job seekers get accurate, helpful guidance.
47
+ """
48
+
49
+ critique_response = llm_client.call_llm(critique_prompt)
50
+
51
+ # Extract overall score if present
52
+ score = None
53
+ if "overall score" in critique_response.lower():
54
+ import re
55
+ score_match = re.search(r'(\d+(?:\.\d+)?)/10|(\d+(?:\.\d+)?)\s*(?:out of|/)\s*10', critique_response.lower())
56
+ if score_match:
57
+ score = float(score_match.group(1) or score_match.group(2))
58
+
59
+ log_metric("critique_success", {
60
+ "content_length": len(draft),
61
+ "critique_length": len(critique_response),
62
+ "quality_score": score
63
+ })
64
+
65
+ return {**data, "critique": critique_response, "quality_score": score}
66
+
67
+ except Exception as e:
68
+ log_metric("critique_error", {"error": str(e)})
69
+ return {**data, "critique": f"Critique failed: {e}"}
micro/draft.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict
2
+ from llm_client import llm_client
3
+ from prompt_loader import prompt_loader
4
+ from metrics import log_metric
5
+
6
+ class DraftMicroFunction:
7
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
8
+ enriched_data = data.get("enriched", {})
9
+ scraped_text = data.get("scraped_text", "")
10
+
11
+ if not enriched_data or enriched_data.get("error"):
12
+ return {**data, "draft": "Unable to draft content due to enrichment errors."}
13
+
14
+ try:
15
+ # Prepare context for drafting
16
+ context = {
17
+ "role": enriched_data.get("role", "Unknown Role"),
18
+ "company": enriched_data.get("company", "Unknown Company"),
19
+ "level": enriched_data.get("level", "Unknown Level"),
20
+ "requirements": enriched_data.get("requirements", []),
21
+ "responsibilities": enriched_data.get("responsibilities", []),
22
+ "tech_stack": enriched_data.get("tech_stack", []),
23
+ "salary_range": enriched_data.get("salary_range", "Not specified"),
24
+ "work_mode": enriched_data.get("work_mode", "Not specified")
25
+ }
26
+
27
+ # Use LLM to draft comprehensive content
28
+ prompt = prompt_loader.get_prompt("draft_prompt", job_data=str(context))
29
+
30
+ detailed_prompt = prompt + f"""
31
+
32
+ Based on this job data: {context}
33
+
34
+ Create a comprehensive role preview and interview preparation kit with:
35
+
36
+ ## 🎯 Role Overview
37
+ - Role summary and key focus areas
38
+ - Company context and culture fit
39
+
40
+ ## 📋 Key Responsibilities & Requirements
41
+ - Core responsibilities breakdown
42
+ - Must-have vs nice-to-have skills
43
+ - Technical requirements analysis
44
+
45
+ ## 💰 Compensation & Benefits
46
+ - Salary analysis and market context
47
+ - Benefits and perquisites
48
+
49
+ ## 🎯 Interview Preparation
50
+ - Likely interview questions based on the role
51
+ - Technical topics to review
52
+ - Company-specific research areas
53
+ - Questions to ask the interviewer
54
+
55
+ ## 🚀 Next Steps
56
+ - Application strategy
57
+ - Timeline expectations
58
+ - Follow-up recommendations
59
+
60
+ Format as clear, actionable markdown suitable for job seekers.
61
+ """
62
+
63
+ draft_content = llm_client.call_llm(detailed_prompt)
64
+
65
+ log_metric("draft_success", {
66
+ "role": context["role"],
67
+ "company": context["company"],
68
+ "content_length": len(draft_content)
69
+ })
70
+
71
+ return {**data, "draft": draft_content}
72
+
73
+ except Exception as e:
74
+ log_metric("draft_error", {"error": str(e)})
75
+ return {**data, "draft": f"Draft generation failed: {e}"}
micro/enhanced_guide_renderer.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional
2
+ from metrics import log_metric
3
+
4
+ class EnhancedGuideRenderer:
5
+ """Enhanced guide renderer for PersonalizedInterviewGuide data structure"""
6
+
7
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
8
+ """Render PersonalizedInterviewGuide to formatted markdown"""
9
+
10
+ interview_guide = data.get("personalized_guide", {})
11
+ resume_data = data.get("resume_data_enhanced", {})
12
+ job_data = data.get("job_data_enhanced", {})
13
+ gap_analysis = data.get("gap_analysis_advanced", {})
14
+
15
+ if not interview_guide or "error" in interview_guide:
16
+ return {**data, "rendered_guide": "# Interview Guide Generation Failed\n\nPlease try again with valid resume and job data."}
17
+
18
+ try:
19
+ # Render comprehensive markdown guide
20
+ rendered_guide = self._render_personalized_guide(
21
+ interview_guide, job_data, gap_analysis
22
+ )
23
+
24
+ log_metric("guide_render_success", {
25
+ "total_length": len(rendered_guide),
26
+ "sections_count": 12
27
+ })
28
+
29
+ return {**data, "rendered_guide": rendered_guide}
30
+
31
+ except Exception as e:
32
+ log_metric("guide_render_error", {"error": str(e)})
33
+ fallback = self._create_fallback_guide(interview_guide, job_data, gap_analysis)
34
+ return {**data, "rendered_guide": fallback}
35
+
36
+ def _render_personalized_guide(self, guide: Dict[str, Any], job_data: Dict[str, Any], gap_analysis: Dict[str, Any]) -> str:
37
+ """Render complete personalized interview guide"""
38
+
39
+ # Extract data with fallbacks
40
+ header = guide.get("header", {})
41
+ role = header.get("role", job_data.get("role", "Data Scientist"))
42
+ company = header.get("company", job_data.get("company", "Spotify"))
43
+ match_score = header.get("match_score", gap_analysis.get("overall_match_score", 0))
44
+ match_level = header.get("match_level", "Good Match")
45
+ match_emoji = header.get("match_emoji", "🟢")
46
+
47
+ sections = []
48
+
49
+ # Header with correct match score
50
+ sections.append(f"""# 🎯 Personalized Interview Guide: {role} at {company}
51
+
52
+ **Match Score**: {match_emoji} {match_level} ({match_score:.1f}%)
53
+
54
+ ---""")
55
+
56
+ # Executive Summary
57
+ executive_summary = guide.get("executive_summary", "")
58
+ if executive_summary:
59
+ sections.append(f"""## Introduction
60
+
61
+ {executive_summary}""")
62
+
63
+ # Skills Analysis
64
+ skills_analysis = guide.get("skills_analysis", {})
65
+ if skills_analysis:
66
+ sections.append(self._render_skills_analysis(skills_analysis, gap_analysis))
67
+
68
+ # Interview Process
69
+ interview_process = guide.get("interview_process", {})
70
+ if interview_process:
71
+ sections.append(f"""## What Is the Interview Process Like at {company}?
72
+
73
+ {interview_process.get('content', '')}
74
+
75
+ **Why This Matters**: {interview_process.get('why_important', '')}
76
+
77
+ **Action Items** ({interview_process.get('time_to_complete', '30 minutes')}):
78
+ {self._format_action_items(interview_process.get('action_items', []))}""")
79
+
80
+ # Technical Questions
81
+ technical_questions = guide.get("technical_questions", [])
82
+ if technical_questions:
83
+ sections.append(self._render_questions_section(
84
+ "🔧 Technical & Problem-Solving Questions",
85
+ technical_questions,
86
+ f"These questions test your technical knowledge for the {role} role. Focus on demonstrating both your understanding and problem-solving approach."
87
+ ))
88
+
89
+ # Behavioral Questions
90
+ behavioral_questions = guide.get("behavioral_questions", [])
91
+ if behavioral_questions:
92
+ sections.append(self._render_questions_section(
93
+ "🎯 Behavioral & Experience Questions",
94
+ behavioral_questions,
95
+ "Use the STAR method (Situation, Task, Action, Result) to structure your responses. Draw from specific examples in your background."
96
+ ))
97
+
98
+ # Company Questions
99
+ company_questions = guide.get("company_questions", [])
100
+ if company_questions:
101
+ sections.append(self._render_questions_section(
102
+ "🏢 Company & Culture Questions",
103
+ company_questions,
104
+ f"These questions assess your interest in {company} and cultural fit. Research thoroughly and be genuine in your responses."
105
+ ))
106
+
107
+ # Preparation Strategy
108
+ preparation_strategy = guide.get("preparation_strategy", {})
109
+ if preparation_strategy:
110
+ sections.append(f"""## 🎯 Preparation Strategy
111
+
112
+ {preparation_strategy.get('content', '')}
113
+
114
+ **Why This Matters**: {preparation_strategy.get('why_important', '')}
115
+
116
+ **Action Items** ({preparation_strategy.get('time_to_complete', '2-3 hours')}):
117
+ {self._format_action_items(preparation_strategy.get('action_items', []))}""")
118
+
119
+ # Key Talking Points
120
+ talking_points = guide.get("talking_points", {})
121
+ if talking_points:
122
+ sections.append(f"""## 💬 Key Talking Points
123
+
124
+ Highlight these specific achievements and experiences during your interview:
125
+
126
+ {talking_points.get('content', '')}
127
+
128
+ **Why This Matters**: {talking_points.get('why_important', '')}
129
+
130
+ **Action Items** ({talking_points.get('time_to_complete', '1 hour')}):
131
+ {self._format_action_items(talking_points.get('action_items', []))}""")
132
+
133
+ # Smart Questions to Ask
134
+ questions_to_ask = guide.get("questions_to_ask", [])
135
+ if questions_to_ask:
136
+ sections.append(f"""## ❓ Smart Questions to Ask
137
+
138
+ Show your engagement and strategic thinking with these questions:
139
+
140
+ {self._format_questions_list(questions_to_ask)}""")
141
+
142
+ # Day-of Preparation
143
+ day_of_preparation = guide.get("day_of_preparation", {})
144
+ if day_of_preparation:
145
+ sections.append(f"""## 📅 Day-of-Interview Preparation
146
+
147
+ {day_of_preparation.get('content', '')}
148
+
149
+ **Action Items** ({day_of_preparation.get('time_to_complete', '1 hour before')}):
150
+ {self._format_action_items(day_of_preparation.get('action_items', []))}""")
151
+
152
+ # Success Metrics
153
+ success_metrics = guide.get("success_metrics", [])
154
+ if success_metrics:
155
+ sections.append(f"""## ✅ Success Metrics
156
+
157
+ You'll know the interview went well if:
158
+
159
+ {self._format_success_metrics(success_metrics)}""")
160
+
161
+ # Footer
162
+ sections.append(f"""## 🚀 Conclusion
163
+
164
+ You're well-prepared for this {role} interview at {company}! Your {match_score:.1f}% match score indicates strong alignment with their requirements.
165
+
166
+ **Remember**:
167
+ - Be authentic and confident
168
+ - Ask thoughtful questions
169
+ - Show enthusiasm for {company}
170
+ - Highlight your unique value proposition
171
+
172
+ Good luck with your interview! 🚀
173
+
174
+ ---
175
+
176
+ *This personalized guide was generated based on your specific background and the target role requirements.*""")
177
+
178
+ return "\n\n".join(sections)
179
+
180
+ def _render_skills_analysis(self, skills_analysis: Dict[str, Any], gap_analysis: Dict[str, Any]) -> str:
181
+ """Render skills analysis with proper data"""
182
+
183
+ # Get data from gap_analysis as fallback
184
+ strong_matches = gap_analysis.get("strong_matches", [])
185
+ partial_matches = gap_analysis.get("partial_matches", [])
186
+ gaps = gap_analysis.get("gaps", [])
187
+
188
+ # Create visual representation
189
+ strong_count = len(strong_matches)
190
+ partial_count = len(partial_matches)
191
+ gaps_count = len(gaps)
192
+ total = strong_count + partial_count + gaps_count
193
+
194
+ if total > 0:
195
+ strong_bar = "█" * min(20, int((strong_count / total) * 20)) if strong_count > 0 else ""
196
+ partial_bar = "▒" * min(20, int((partial_count / total) * 20)) if partial_count > 0 else ""
197
+ gaps_bar = "░" * min(20, int((gaps_count / total) * 20)) if gaps_count > 0 else ""
198
+ else:
199
+ strong_bar = partial_bar = gaps_bar = ""
200
+
201
+ # Get skill names
202
+ strong_names = [match.get("resume_skill", match.get("job_requirement", "")) for match in strong_matches[:5]]
203
+ partial_names = [match.get("resume_skill", match.get("job_requirement", "")) for match in partial_matches[:5]]
204
+ gap_names = [gap.get("job_requirement", "") for gap in gaps[:5]]
205
+
206
+ return f"""## 📊 Skills Match Analysis
207
+
208
+ **Overall Assessment**: {skills_analysis.get('summary', 'Strong technical background with relevant experience in data science and analytics.')}
209
+
210
+ ### Skills Breakdown
211
+ ```
212
+ Strong Matches {strong_bar} {strong_count}
213
+ Partial Matches {partial_bar} {partial_count}
214
+ Skill Gaps {gaps_bar} {gaps_count}
215
+ ```
216
+
217
+ **✅ Your Strengths**: {', '.join(strong_names) if strong_names else 'Core technical skills identified'}
218
+
219
+ {f"**⚡ Areas to Highlight**: {', '.join(partial_names)}" if partial_names else ""}
220
+
221
+ {f"**📚 Priority Learning**: {', '.join(gap_names)}" if gap_names else ""}"""
222
+
223
+ def _render_questions_section(self, title: str, questions: List[Dict], intro: str) -> str:
224
+ """Render questions with proper formatting"""
225
+
226
+ section = f"""## {title}
227
+
228
+ {intro}
229
+
230
+ """
231
+
232
+ for i, q in enumerate(questions, 1):
233
+ question_text = q.get("question", "")
234
+ difficulty = q.get("difficulty", "medium")
235
+ why_asked = q.get("why_asked", "")
236
+ approach_strategy = q.get("approach_strategy", "")
237
+ example_points = q.get("example_points", [])
238
+
239
+ # Difficulty indicator
240
+ if difficulty == "hard":
241
+ diff_icon = "🔴"
242
+ elif difficulty == "medium":
243
+ diff_icon = "🟡"
244
+ else:
245
+ diff_icon = "🟢"
246
+
247
+ section += f"""### {diff_icon} Question {i}: {question_text}
248
+
249
+ **Why they ask this**: {why_asked}
250
+
251
+ **How to approach**: {approach_strategy}
252
+
253
+ {f"**Key points to mention**: {', '.join(example_points[:3])}" if example_points else ""}
254
+
255
+ ---
256
+
257
+ """
258
+
259
+ return section.rstrip()
260
+
261
+ def _format_action_items(self, items: List[str]) -> str:
262
+ """Format action items as bullet list"""
263
+ return "\n".join([f"- {item}" for item in items]) if items else "- Review your background and prepare examples"
264
+
265
+ def _format_questions_list(self, questions: List[str]) -> str:
266
+ """Format questions as numbered list"""
267
+ return "\n".join([f"{i}. {q}" for i, q in enumerate(questions, 1)]) if questions else "1. What excites you most about this role?"
268
+
269
+ def _format_success_metrics(self, metrics: List[str]) -> str:
270
+ """Format success metrics as bullet list"""
271
+ return "\n".join([f"- {metric}" for metric in metrics]) if metrics else "- Strong rapport with interviewers\n- Technical discussions flow naturally\n- You feel confident about your responses"
272
+
273
+ def _create_fallback_guide(self, guide: Dict[str, Any], job_data: Dict[str, Any], gap_analysis: Dict[str, Any]) -> str:
274
+ """Create fallback guide if rendering fails"""
275
+
276
+ role = job_data.get("role", "Data Scientist")
277
+ company = job_data.get("company", "the company")
278
+ match_score = gap_analysis.get("overall_match_score", 0)
279
+
280
+ return f"""# 🎯 Interview Guide: {role} at {company}
281
+
282
+ **Match Score**: {match_score:.1f}%
283
+
284
+ ## Summary
285
+ You have a strong background that aligns well with this {role} position. Focus your preparation on highlighting your relevant experience and technical skills.
286
+
287
+ ## Key Preparation Areas
288
+ - Review your technical projects and be ready to discuss them in detail
289
+ - Prepare specific examples using the STAR method
290
+ - Research {company} and prepare thoughtful questions
291
+ - Practice explaining complex concepts in simple terms
292
+
293
+ ## Technical Focus Areas
294
+ Based on your background, be prepared to discuss:
295
+ - Data analysis and visualization
296
+ - Programming experience (Python, SQL, etc.)
297
+ - Statistical methods and machine learning
298
+ - Project outcomes and business impact
299
+
300
+ Good luck with your interview! 🚀"""
micro/enhanced_job_parser.py ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Enhanced Job Requirements Parser
3
+ Extracts structured job requirements with experience levels, context, and priorities
4
+ """
5
+
6
+ from typing import Any, Dict, List, Optional, Tuple
7
+ import re
8
+ from dataclasses import dataclass, asdict
9
+ from llm_client import LLMClient
10
+ from metrics import log_metric
11
+ import json
12
+
13
+ @dataclass
14
+ class JobRequirement:
15
+ skill: str
16
+ category: str # technical, experience, education, soft_skill, certification
17
+ importance: str # required, preferred, nice_to_have
18
+ experience_level: str # entry, mid, senior, expert
19
+ context: str # Where this requirement was mentioned
20
+ synonyms: Optional[List[str]] = None
21
+
22
+ def __post_init__(self):
23
+ if self.synonyms is None:
24
+ self.synonyms = []
25
+
26
+ @dataclass
27
+ class JobStructuredData:
28
+ role: str
29
+ company: str
30
+ location: str
31
+ seniority_level: str
32
+ tech_requirements: List[JobRequirement]
33
+ experience_requirements: List[JobRequirement]
34
+ education_requirements: List[JobRequirement]
35
+ soft_skill_requirements: List[JobRequirement]
36
+ responsibilities: List[str]
37
+ nice_to_haves: List[str]
38
+ years_experience_required: int
39
+ salary_range: Tuple[Optional[int], Optional[int]]
40
+ remote_type: str # remote, hybrid, onsite
41
+ company_stage: str # startup, scale-up, enterprise
42
+
43
+ def __post_init__(self):
44
+ if not self.tech_requirements:
45
+ self.tech_requirements = []
46
+ if not self.experience_requirements:
47
+ self.experience_requirements = []
48
+ if not self.education_requirements:
49
+ self.education_requirements = []
50
+ if not self.soft_skill_requirements:
51
+ self.soft_skill_requirements = []
52
+ if not self.responsibilities:
53
+ self.responsibilities = []
54
+ if not self.nice_to_haves:
55
+ self.nice_to_haves = []
56
+
57
+ class RequirementsNormalizer:
58
+ """Normalizes job requirements and identifies synonyms"""
59
+
60
+ def __init__(self):
61
+ self.experience_indicators = {
62
+ "entry": ["entry", "junior", "0-2 years", "new grad", "graduate", "recent grad", "0+ years"],
63
+ "mid": ["mid", "intermediate", "2-5 years", "3-5 years", "2+ years", "3+ years"],
64
+ "senior": ["senior", "5+ years", "5-8 years", "experienced", "6+ years", "7+ years"],
65
+ "expert": ["expert", "lead", "principal", "8+ years", "10+ years", "architect"]
66
+ }
67
+
68
+ self.importance_indicators = {
69
+ "required": ["required", "must have", "essential", "mandatory", "necessary"],
70
+ "preferred": ["preferred", "desired", "strong preference", "ideal"],
71
+ "nice_to_have": ["nice to have", "plus", "bonus", "would be great", "advantageous"]
72
+ }
73
+
74
+ self.category_patterns = {
75
+ "technical": [
76
+ r"\b(python|javascript|java|react|angular|vue|django|flask|aws|azure|docker|kubernetes)\b",
77
+ r"\b(sql|nosql|postgresql|mongodb|redis|git|linux|api|rest|graphql)\b",
78
+ r"\b(machine learning|ml|ai|data science|tensorflow|pytorch|pandas|numpy)\b"
79
+ ],
80
+ "experience": [
81
+ r"\b(\d+)[\+\-\s]*years?\s+(of\s+)?experience\b",
82
+ r"\bexperience\s+with\b",
83
+ r"\bproficient\s+in\b",
84
+ r"\bworked\s+with\b"
85
+ ],
86
+ "education": [
87
+ r"\b(bachelor|master|phd|degree|bs|ms|ba|ma)\b",
88
+ r"\b(computer science|engineering|mathematics|statistics)\b"
89
+ ],
90
+ "soft_skill": [
91
+ r"\b(communication|leadership|teamwork|problem.solving|analytical)\b",
92
+ r"\b(collaboration|mentoring|presentation|writing)\b"
93
+ ]
94
+ }
95
+
96
+ def extract_experience_level(self, text: str) -> str:
97
+ """Extract experience level from requirement text"""
98
+ text_lower = text.lower()
99
+
100
+ for level, indicators in self.experience_indicators.items():
101
+ for indicator in indicators:
102
+ if indicator in text_lower:
103
+ return level
104
+
105
+ return "mid" # Default
106
+
107
+ def extract_importance(self, text: str, context: str = "") -> str:
108
+ """Extract importance level from requirement text"""
109
+ combined_text = (text + " " + context).lower()
110
+
111
+ for importance, indicators in self.importance_indicators.items():
112
+ for indicator in indicators:
113
+ if indicator in combined_text:
114
+ return importance
115
+
116
+ return "required" # Default
117
+
118
+ def categorize_requirement(self, text: str) -> str:
119
+ """Categorize a requirement"""
120
+ text_lower = text.lower()
121
+
122
+ for category, patterns in self.category_patterns.items():
123
+ for pattern in patterns:
124
+ if re.search(pattern, text_lower):
125
+ return category
126
+
127
+ return "technical" # Default
128
+
129
+ class EnhancedJobParser:
130
+ """Enhanced job parser that extracts structured requirements with context"""
131
+
132
+ def __init__(self):
133
+ self.normalizer = RequirementsNormalizer()
134
+ self.llm_client = LLMClient()
135
+
136
+ async def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
137
+ job_text = data.get("scraped", {}).get("content", "")
138
+ enriched_data = data.get("enriched", {})
139
+
140
+ if not job_text and not enriched_data:
141
+ return {**data, "job_data_enhanced": {"error": "No job content available"}}
142
+
143
+ try:
144
+ # Extract structured job requirements
145
+ job_data = await self._extract_job_requirements_structured(job_text, enriched_data)
146
+
147
+ log_metric("job_parse_enhanced_success", {
148
+ "tech_requirements": len(job_data.tech_requirements),
149
+ "total_requirements": len(job_data.tech_requirements) + len(job_data.experience_requirements),
150
+ "years_required": job_data.years_experience_required
151
+ })
152
+
153
+ return {**data, "job_data_enhanced": asdict(job_data)}
154
+
155
+ except Exception as e:
156
+ log_metric("job_parse_enhanced_error", {"error": str(e)})
157
+ return {**data, "job_data_enhanced": {"error": f"Enhanced job parsing failed: {e}"}}
158
+
159
+ async def _extract_job_requirements_structured(self, job_text: str, enriched_data: Dict[str, Any]) -> JobStructuredData:
160
+ """Extract structured job requirements using LLM and pattern matching"""
161
+
162
+ # First try LLM extraction for detailed analysis
163
+ try:
164
+ llm_data = await self._llm_extract_requirements(job_text)
165
+ if llm_data:
166
+ return llm_data
167
+ except Exception as e:
168
+ log_metric("job_llm_extraction_error", {"error": str(e)})
169
+
170
+ # Fallback to pattern-based extraction
171
+ return self._pattern_based_extraction(job_text, enriched_data)
172
+
173
+ async def _llm_extract_requirements(self, job_text: str) -> Optional[JobStructuredData]:
174
+ """Use LLM to extract detailed job requirements"""
175
+
176
+ prompt = f"""
177
+ Analyze this job posting and extract structured requirements. Return ONLY valid JSON:
178
+
179
+ {{
180
+ "role": "Job Title",
181
+ "company": "Company Name",
182
+ "location": "City, State",
183
+ "seniority_level": "entry|mid|senior|lead",
184
+ "years_experience_required": 5,
185
+ "salary_range": [100000, 150000],
186
+ "remote_type": "remote|hybrid|onsite",
187
+ "company_stage": "startup|scale-up|enterprise",
188
+ "tech_requirements": [
189
+ {{
190
+ "skill": "Python",
191
+ "category": "technical",
192
+ "importance": "required|preferred|nice_to_have",
193
+ "experience_level": "entry|mid|senior|expert",
194
+ "context": "Where this was mentioned in the posting",
195
+ "synonyms": ["Python3", "py"]
196
+ }}
197
+ ],
198
+ "experience_requirements": [
199
+ {{
200
+ "skill": "Web development experience",
201
+ "category": "experience",
202
+ "importance": "required",
203
+ "experience_level": "mid",
204
+ "context": "Must have 3+ years building web applications",
205
+ "synonyms": ["full-stack development", "web dev"]
206
+ }}
207
+ ],
208
+ "education_requirements": [
209
+ {{
210
+ "skill": "Bachelor's degree",
211
+ "category": "education",
212
+ "importance": "preferred",
213
+ "experience_level": "entry",
214
+ "context": "BS in Computer Science or equivalent",
215
+ "synonyms": ["BS", "undergraduate degree"]
216
+ }}
217
+ ],
218
+ "soft_skill_requirements": [
219
+ {{
220
+ "skill": "Communication",
221
+ "category": "soft_skill",
222
+ "importance": "required",
223
+ "experience_level": "mid",
224
+ "context": "Strong written and verbal communication",
225
+ "synonyms": ["written communication", "verbal skills"]
226
+ }}
227
+ ],
228
+ "responsibilities": [
229
+ "Design and develop web applications",
230
+ "Collaborate with cross-functional teams"
231
+ ],
232
+ "nice_to_haves": [
233
+ "Experience with cloud platforms",
234
+ "Open source contributions"
235
+ ]
236
+ }}
237
+
238
+ Guidelines:
239
+ 1. Extract ALL technical skills, tools, frameworks, languages mentioned
240
+ 2. Identify experience level indicators (junior, senior, X+ years)
241
+ 3. Categorize by importance (required vs preferred vs nice-to-have)
242
+ 4. Include context of where each requirement was mentioned
243
+ 5. Add synonyms for technologies (React/ReactJS, ML/Machine Learning)
244
+ 6. Parse salary ranges and years of experience
245
+ 7. Determine company stage from description
246
+ 8. Identify remote work policy
247
+
248
+ Job posting:
249
+ {job_text}
250
+ """
251
+
252
+ try:
253
+ response = self.llm_client.call_llm(prompt, temperature=0, max_tokens=4000)
254
+
255
+ # Clean response to extract JSON
256
+ json_start = response.find('{')
257
+ json_end = response.rfind('}') + 1
258
+ if json_start != -1 and json_end > json_start:
259
+ json_str = response[json_start:json_end]
260
+ else:
261
+ json_str = response
262
+
263
+ data = json.loads(json_str)
264
+
265
+ # Convert to structured objects
266
+ return self._convert_to_job_data(data)
267
+
268
+ except json.JSONDecodeError as e:
269
+ log_metric("job_json_parse_error", {"error": str(e), "response": response[:500]})
270
+ return None
271
+ except Exception as e:
272
+ log_metric("job_llm_error", {"error": str(e)})
273
+ return None
274
+
275
+ def _pattern_based_extraction(self, job_text: str, enriched_data: Dict[str, Any]) -> JobStructuredData:
276
+ """Fallback pattern-based extraction"""
277
+
278
+ # Extract basic info from enriched data
279
+ role = enriched_data.get("role", "Unknown Role")
280
+ company = enriched_data.get("company", "Unknown Company")
281
+ location = enriched_data.get("location", "")
282
+
283
+ # Extract years of experience
284
+ years_exp = self._extract_years_experience(job_text)
285
+
286
+ # Extract technical requirements using patterns
287
+ tech_requirements = self._extract_technical_requirements_pattern(job_text)
288
+
289
+ # Extract salary range
290
+ salary_range = self._extract_salary_range(job_text)
291
+
292
+ return JobStructuredData(
293
+ role=role,
294
+ company=company,
295
+ location=location,
296
+ seniority_level=self._determine_seniority(job_text, years_exp),
297
+ tech_requirements=tech_requirements,
298
+ experience_requirements=[],
299
+ education_requirements=[],
300
+ soft_skill_requirements=[],
301
+ responsibilities=[],
302
+ nice_to_haves=[],
303
+ years_experience_required=years_exp,
304
+ salary_range=salary_range,
305
+ remote_type=self._extract_remote_type(job_text),
306
+ company_stage=self._determine_company_stage(job_text)
307
+ )
308
+
309
+ def _convert_to_job_data(self, data: Dict[str, Any]) -> JobStructuredData:
310
+ """Convert parsed JSON to JobStructuredData objects"""
311
+
312
+ # Convert requirement lists
313
+ tech_reqs = []
314
+ for req_data in data.get("tech_requirements", []):
315
+ tech_reqs.append(JobRequirement(**req_data))
316
+
317
+ exp_reqs = []
318
+ for req_data in data.get("experience_requirements", []):
319
+ exp_reqs.append(JobRequirement(**req_data))
320
+
321
+ edu_reqs = []
322
+ for req_data in data.get("education_requirements", []):
323
+ edu_reqs.append(JobRequirement(**req_data))
324
+
325
+ soft_reqs = []
326
+ for req_data in data.get("soft_skill_requirements", []):
327
+ soft_reqs.append(JobRequirement(**req_data))
328
+
329
+ # Handle salary range
330
+ salary_data = data.get("salary_range", [None, None])
331
+ if isinstance(salary_data, list) and len(salary_data) >= 2:
332
+ salary_range = (salary_data[0], salary_data[1])
333
+ else:
334
+ salary_range = (None, None)
335
+
336
+ return JobStructuredData(
337
+ role=data.get("role", "Unknown Role"),
338
+ company=data.get("company", "Unknown Company"),
339
+ location=data.get("location", ""),
340
+ seniority_level=data.get("seniority_level", "mid"),
341
+ tech_requirements=tech_reqs,
342
+ experience_requirements=exp_reqs,
343
+ education_requirements=edu_reqs,
344
+ soft_skill_requirements=soft_reqs,
345
+ responsibilities=data.get("responsibilities", []),
346
+ nice_to_haves=data.get("nice_to_haves", []),
347
+ years_experience_required=data.get("years_experience_required", 0),
348
+ salary_range=salary_range,
349
+ remote_type=data.get("remote_type", "onsite"),
350
+ company_stage=data.get("company_stage", "enterprise")
351
+ )
352
+
353
+ def _extract_years_experience(self, text: str) -> int:
354
+ """Extract years of experience required"""
355
+
356
+ # Pattern for X+ years
357
+ pattern = r'(\d+)\+?\s*years?\s+(?:of\s+)?experience'
358
+ matches = re.findall(pattern, text.lower())
359
+
360
+ if matches:
361
+ return int(matches[0])
362
+
363
+ # Check for seniority indicators
364
+ if any(word in text.lower() for word in ['senior', 'lead', 'principal']):
365
+ return 5
366
+ elif any(word in text.lower() for word in ['junior', 'entry', 'graduate']):
367
+ return 0
368
+
369
+ return 2 # Default
370
+
371
+ def _extract_technical_requirements_pattern(self, text: str) -> List[JobRequirement]:
372
+ """Extract technical requirements using pattern matching"""
373
+
374
+ # Common tech patterns
375
+ tech_patterns = [
376
+ r'\b(Python|JavaScript|Java|C\+\+|C#|Ruby|Go|Rust|Swift|Kotlin)\b',
377
+ r'\b(React|Angular|Vue|Django|Flask|Spring|Rails|Laravel)\b',
378
+ r'\b(AWS|Azure|GCP|Docker|Kubernetes|Git|SQL|NoSQL)\b',
379
+ r'\b(Machine Learning|ML|AI|Deep Learning|TensorFlow|PyTorch)\b',
380
+ r'\b(PostgreSQL|MongoDB|Redis|MySQL|Elasticsearch)\b'
381
+ ]
382
+
383
+ requirements = []
384
+
385
+ for pattern in tech_patterns:
386
+ matches = re.findall(pattern, text, re.IGNORECASE)
387
+ for match in matches:
388
+ # Determine importance and experience level from context
389
+ context_start = max(0, text.lower().find(match.lower()) - 50)
390
+ context_end = min(len(text), text.lower().find(match.lower()) + len(match) + 50)
391
+ context = text[context_start:context_end]
392
+
393
+ importance = self.normalizer.extract_importance(match, context)
394
+ experience_level = self.normalizer.extract_experience_level(context)
395
+
396
+ req = JobRequirement(
397
+ skill=match,
398
+ category="technical",
399
+ importance=importance,
400
+ experience_level=experience_level,
401
+ context=context.strip(),
402
+ synonyms=[]
403
+ )
404
+ requirements.append(req)
405
+
406
+ return requirements
407
+
408
+ def _extract_salary_range(self, text: str) -> Tuple[Optional[int], Optional[int]]:
409
+ """Extract salary range from text"""
410
+
411
+ # Pattern for salary ranges
412
+ patterns = [
413
+ r'\$(\d+)k?\s*-\s*\$?(\d+)k?',
414
+ r'\$(\d+),?(\d+)?\s*-\s*\$?(\d+),?(\d+)?',
415
+ r'(\d+)k?\s*-\s*(\d+)k?\s*(?:per\s+year|annually|\$)'
416
+ ]
417
+
418
+ for pattern in patterns:
419
+ matches = re.findall(pattern, text, re.IGNORECASE)
420
+ if matches:
421
+ match = matches[0]
422
+ try:
423
+ if len(match) >= 2:
424
+ low = int(match[0]) * 1000 if 'k' in text.lower() else int(match[0])
425
+ high = int(match[1]) * 1000 if 'k' in text.lower() else int(match[1])
426
+ return (low, high)
427
+ except:
428
+ continue
429
+
430
+ return (None, None)
431
+
432
+ def _determine_seniority(self, text: str, years_exp: int) -> str:
433
+ """Determine seniority level"""
434
+
435
+ text_lower = text.lower()
436
+
437
+ if any(word in text_lower for word in ['senior', 'sr.', 'lead', 'principal']):
438
+ return "senior"
439
+ elif any(word in text_lower for word in ['junior', 'jr.', 'entry', 'graduate']):
440
+ return "entry"
441
+ elif years_exp >= 5:
442
+ return "senior"
443
+ elif years_exp <= 1:
444
+ return "entry"
445
+ else:
446
+ return "mid"
447
+
448
+ def _extract_remote_type(self, text: str) -> str:
449
+ """Extract remote work type"""
450
+
451
+ text_lower = text.lower()
452
+
453
+ if any(word in text_lower for word in ['remote', 'work from home', 'distributed']):
454
+ return "remote"
455
+ elif any(word in text_lower for word in ['hybrid', 'flexible', 'part remote']):
456
+ return "hybrid"
457
+ else:
458
+ return "onsite"
459
+
460
+ def _determine_company_stage(self, text: str) -> str:
461
+ """Determine company stage"""
462
+
463
+ text_lower = text.lower()
464
+
465
+ if any(word in text_lower for word in ['startup', 'early stage', 'seed', 'series a']):
466
+ return "startup"
467
+ elif any(word in text_lower for word in ['scale up', 'series b', 'series c', 'growth']):
468
+ return "scale-up"
469
+ elif any(word in text_lower for word in ['fortune', 'enterprise', 'established', 'public company']):
470
+ return "enterprise"
471
+ else:
472
+ return "enterprise" # Default
micro/enhanced_resume_parser.py ADDED
@@ -0,0 +1,630 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Enhanced Resume Parser v2.0
3
+ Provides structured extraction of skills, experience, projects, and education
4
+ with proper normalization and context understanding.
5
+ """
6
+
7
+ from typing import Any, Dict, List, Optional
8
+ import re
9
+ import json
10
+ from dataclasses import dataclass, asdict, field
11
+ from llm_client import LLMClient
12
+ from metrics import log_metric
13
+ import tiktoken
14
+
15
+ @dataclass
16
+ class Experience:
17
+ title: str
18
+ company: str
19
+ duration: str
20
+ location: str = ""
21
+ responsibilities: List[str] = field(default_factory=list)
22
+ achievements: List[str] = field(default_factory=list)
23
+ technologies: List[str] = field(default_factory=list)
24
+ start_date: str = ""
25
+ end_date: str = ""
26
+ is_current: bool = False
27
+
28
+ def __post_init__(self):
29
+ if self.responsibilities is None:
30
+ self.responsibilities = []
31
+ if self.achievements is None:
32
+ self.achievements = []
33
+ if self.technologies is None:
34
+ self.technologies = []
35
+
36
+ @dataclass
37
+ class Project:
38
+ name: str
39
+ description: str
40
+ technologies: List[str] = field(default_factory=list)
41
+ github_url: str = ""
42
+ demo_url: str = ""
43
+ duration: str = ""
44
+ key_features: List[str] = field(default_factory=list)
45
+
46
+ def __post_init__(self):
47
+ if self.technologies is None:
48
+ self.technologies = []
49
+ if self.key_features is None:
50
+ self.key_features = []
51
+
52
+ @dataclass
53
+ class Education:
54
+ degree: str
55
+ field: str
56
+ school: str
57
+ graduation_year: str = ""
58
+ gpa: str = ""
59
+ relevant_courses: List[str] = None
60
+ honors: List[str] = None
61
+
62
+ def __post_init__(self):
63
+ if self.relevant_courses is None:
64
+ self.relevant_courses = []
65
+ if self.honors is None:
66
+ self.honors = []
67
+
68
+ @dataclass
69
+ class Skills:
70
+ technical: List[str] = None
71
+ programming_languages: List[str] = None
72
+ frameworks: List[str] = None
73
+ tools: List[str] = None
74
+ databases: List[str] = None
75
+ cloud_platforms: List[str] = None
76
+ methodologies: List[str] = None
77
+ soft_skills: List[str] = None
78
+
79
+ def __post_init__(self):
80
+ for field in ['technical', 'programming_languages', 'frameworks', 'tools',
81
+ 'databases', 'cloud_platforms', 'methodologies', 'soft_skills']:
82
+ if getattr(self, field) is None:
83
+ setattr(self, field, [])
84
+
85
+ @dataclass
86
+ class ResumeData:
87
+ personal_info: Dict[str, str]
88
+ summary: str
89
+ skills: Skills
90
+ experience: List[Experience]
91
+ education: List[Education]
92
+ projects: List[Project]
93
+ certifications: List[Dict[str, str]]
94
+ languages: List[str]
95
+ years_of_experience: int = 0
96
+
97
+ def __post_init__(self):
98
+ if not self.certifications:
99
+ self.certifications = []
100
+ if not self.languages:
101
+ self.languages = []
102
+
103
+ class SkillsNormalizer:
104
+ """Normalizes and categorizes skills with synonym detection"""
105
+
106
+ def __init__(self):
107
+ self.skill_synonyms = {
108
+ # Programming Languages
109
+ "python": ["python", "python3", "python 3", "py"],
110
+ "javascript": ["javascript", "js", "node.js", "nodejs", "node js"],
111
+ "typescript": ["typescript", "ts"],
112
+ "java": ["java", "java 8", "java 11", "java 17"],
113
+ "csharp": ["c#", "csharp", "c sharp", ".net", "dotnet"],
114
+ "cpp": ["c++", "cpp", "c plus plus"],
115
+ "go": ["go", "golang"],
116
+ "rust": ["rust", "rust-lang"],
117
+ "swift": ["swift", "ios development"],
118
+ "kotlin": ["kotlin", "android development"],
119
+ "r": ["r", "r programming"],
120
+ "scala": ["scala"],
121
+ "php": ["php", "php 7", "php 8"],
122
+ "ruby": ["ruby", "ruby on rails", "ror"],
123
+
124
+ # Web Frameworks
125
+ "react": ["react", "reactjs", "react.js", "react js"],
126
+ "angular": ["angular", "angularjs", "angular 2+"],
127
+ "vue": ["vue", "vue.js", "vuejs", "vue js"],
128
+ "svelte": ["svelte", "sveltekit"],
129
+ "django": ["django", "django rest framework", "drf"],
130
+ "flask": ["flask", "flask-restful"],
131
+ "fastapi": ["fastapi", "fast api"],
132
+ "express": ["express", "express.js", "expressjs"],
133
+ "spring": ["spring", "spring boot", "spring framework"],
134
+ "laravel": ["laravel"],
135
+ "rails": ["rails", "ruby on rails", "ror"],
136
+
137
+ # Databases
138
+ "postgresql": ["postgresql", "postgres", "pg", "psql"],
139
+ "mysql": ["mysql", "my sql"],
140
+ "mongodb": ["mongodb", "mongo", "mongo db"],
141
+ "redis": ["redis"],
142
+ "elasticsearch": ["elasticsearch", "elastic search"],
143
+ "cassandra": ["cassandra", "apache cassandra"],
144
+ "dynamodb": ["dynamodb", "dynamo db"],
145
+ "sqlite": ["sqlite", "sqlite3"],
146
+
147
+ # Cloud Platforms
148
+ "aws": ["aws", "amazon web services", "amazon aws"],
149
+ "azure": ["azure", "microsoft azure"],
150
+ "gcp": ["gcp", "google cloud", "google cloud platform"],
151
+ "heroku": ["heroku"],
152
+ "digitalocean": ["digitalocean", "digital ocean"],
153
+ "vercel": ["vercel"],
154
+ "netlify": ["netlify"],
155
+
156
+ # DevOps & Tools
157
+ "docker": ["docker", "containerization", "containers"],
158
+ "kubernetes": ["kubernetes", "k8s", "container orchestration"],
159
+ "jenkins": ["jenkins", "ci/cd"],
160
+ "github actions": ["github actions", "gh actions"],
161
+ "terraform": ["terraform", "infrastructure as code", "iac"],
162
+ "ansible": ["ansible"],
163
+ "git": ["git", "version control", "source control"],
164
+ "linux": ["linux", "unix", "ubuntu", "centos"],
165
+
166
+ # Data Science & ML
167
+ "machine learning": ["machine learning", "ml", "artificial intelligence", "ai"],
168
+ "deep learning": ["deep learning", "neural networks"],
169
+ "tensorflow": ["tensorflow", "tf"],
170
+ "pytorch": ["pytorch", "torch"],
171
+ "scikit-learn": ["scikit-learn", "sklearn", "scikit learn"],
172
+ "pandas": ["pandas", "data manipulation"],
173
+ "numpy": ["numpy", "numerical computing"],
174
+ "matplotlib": ["matplotlib", "data visualization"],
175
+ "seaborn": ["seaborn"],
176
+ "jupyter": ["jupyter", "jupyter notebooks"],
177
+
178
+ # Testing
179
+ "pytest": ["pytest", "python testing"],
180
+ "jest": ["jest", "javascript testing"],
181
+ "selenium": ["selenium", "web automation"],
182
+ "cypress": ["cypress", "e2e testing"],
183
+
184
+ # Methodologies
185
+ "agile": ["agile", "scrum", "kanban"],
186
+ "devops": ["devops", "dev ops"],
187
+ "microservices": ["microservices", "micro services"],
188
+ "rest api": ["rest", "rest api", "restful", "api development"],
189
+ "graphql": ["graphql", "graph ql"],
190
+ }
191
+
192
+ self.skill_categories = {
193
+ "programming_languages": ["python", "javascript", "typescript", "java", "csharp", "cpp", "go", "rust", "swift", "kotlin", "r", "scala", "php", "ruby"],
194
+ "frameworks": ["react", "angular", "vue", "svelte", "django", "flask", "fastapi", "express", "spring", "laravel", "rails"],
195
+ "databases": ["postgresql", "mysql", "mongodb", "redis", "elasticsearch", "cassandra", "dynamodb", "sqlite"],
196
+ "cloud_platforms": ["aws", "azure", "gcp", "heroku", "digitalocean", "vercel", "netlify"],
197
+ "tools": ["docker", "kubernetes", "jenkins", "github actions", "terraform", "ansible", "git", "linux", "pytest", "jest", "selenium", "cypress"],
198
+ "methodologies": ["agile", "devops", "microservices", "rest api", "graphql"]
199
+ }
200
+
201
+ def normalize_skill(self, skill: str) -> Optional[str]:
202
+ """Normalize a skill to its canonical form"""
203
+ skill_lower = skill.lower().strip()
204
+
205
+ for canonical, synonyms in self.skill_synonyms.items():
206
+ if skill_lower in synonyms:
207
+ return canonical
208
+
209
+ return skill_lower if len(skill_lower) > 1 else None
210
+
211
+ def categorize_skill(self, normalized_skill: str) -> str:
212
+ """Categorize a normalized skill"""
213
+ for category, skills in self.skill_categories.items():
214
+ if normalized_skill in skills:
215
+ return category
216
+ return "technical"
217
+
218
+ def normalize_skill_list(self, skills: List[str]) -> Dict[str, List[str]]:
219
+ """Normalize and categorize a list of skills"""
220
+ categorized = {
221
+ "programming_languages": [],
222
+ "frameworks": [],
223
+ "databases": [],
224
+ "cloud_platforms": [],
225
+ "tools": [],
226
+ "methodologies": [],
227
+ "technical": []
228
+ }
229
+
230
+ for skill in skills:
231
+ normalized = self.normalize_skill(skill)
232
+ if normalized:
233
+ category = self.categorize_skill(normalized)
234
+ if normalized not in categorized[category]:
235
+ categorized[category].append(normalized)
236
+
237
+ return categorized
238
+
239
+ class EnhancedResumeParser:
240
+ """Enhanced resume parser with structured extraction and normalization"""
241
+
242
+ def __init__(self):
243
+ self.skills_normalizer = SkillsNormalizer()
244
+ self.llm_client = LLMClient()
245
+
246
+ async def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
247
+ resume_text = data.get("resume_text", "")
248
+
249
+ if not resume_text:
250
+ return {**data, "resume_data_enhanced": {"error": "No resume content provided"}}
251
+
252
+ try:
253
+ # Extract structured resume data
254
+ resume_data = await self._extract_resume_data_structured(resume_text)
255
+
256
+ log_metric("resume_parse_enhanced_success", {
257
+ "skills_count": len(self._get_all_skills(resume_data.skills)),
258
+ "experience_count": len(resume_data.experience),
259
+ "projects_count": len(resume_data.projects),
260
+ "years_experience": resume_data.years_of_experience
261
+ })
262
+
263
+ return {**data, "resume_data_enhanced": asdict(resume_data)}
264
+
265
+ except Exception as e:
266
+ log_metric("resume_parse_enhanced_error", {"error": str(e)})
267
+ return {**data, "resume_data_enhanced": {"error": f"Enhanced resume parsing failed: {e}"}}
268
+
269
+ async def _extract_resume_data_structured(self, resume_text: str) -> ResumeData:
270
+ """Extract structured resume data using multiple approaches"""
271
+
272
+ # First, try comprehensive LLM extraction
273
+ try:
274
+ structured_data = await self._llm_extract_structured(resume_text)
275
+ if structured_data:
276
+ return structured_data
277
+ except Exception as e:
278
+ log_metric("resume_llm_extraction_error", {"error": str(e)})
279
+
280
+ # Fallback to section-based extraction
281
+ return await self._section_based_extraction(resume_text)
282
+
283
+ async def _llm_extract_structured(self, resume_text: str) -> Optional[ResumeData]:
284
+ """Use LLM to extract structured resume data"""
285
+
286
+ # Check token count and chunk if necessary
287
+ token_count = self._count_tokens(resume_text)
288
+ if token_count > 15000:
289
+ # For very long resumes, extract in sections
290
+ return await self._chunked_extraction(resume_text)
291
+
292
+ prompt = f"""
293
+ Extract comprehensive structured data from this resume. Return ONLY valid JSON with this exact structure:
294
+
295
+ {{
296
+ "personal_info": {{
297
+ "name": "Full Name",
298
+ "email": "email@domain.com",
299
+ "phone": "+1234567890",
300
+ "location": "City, State",
301
+ "linkedin": "linkedin.com/in/username",
302
+ "github": "github.com/username",
303
+ "website": "personal-website.com"
304
+ }},
305
+ "summary": "Professional summary or objective statement",
306
+ "skills": {{
307
+ "technical": ["skill1", "skill2"],
308
+ "programming_languages": ["Python", "JavaScript"],
309
+ "frameworks": ["React", "Django"],
310
+ "tools": ["Git", "Docker"],
311
+ "databases": ["PostgreSQL", "MongoDB"],
312
+ "cloud_platforms": ["AWS", "Azure"],
313
+ "methodologies": ["Agile", "DevOps"],
314
+ "soft_skills": ["Leadership", "Communication"]
315
+ }},
316
+ "experience": [
317
+ {{
318
+ "title": "Job Title",
319
+ "company": "Company Name",
320
+ "duration": "Jan 2020 - Present",
321
+ "location": "City, State",
322
+ "start_date": "2020-01",
323
+ "end_date": "Present",
324
+ "is_current": true,
325
+ "responsibilities": ["responsibility 1", "responsibility 2"],
326
+ "achievements": ["achievement 1", "achievement 2"],
327
+ "technologies": ["tech1", "tech2"]
328
+ }}
329
+ ],
330
+ "education": [
331
+ {{
332
+ "degree": "Bachelor of Science",
333
+ "field": "Computer Science",
334
+ "school": "University Name",
335
+ "graduation_year": "2020",
336
+ "gpa": "3.8",
337
+ "relevant_courses": ["Data Structures", "Algorithms"],
338
+ "honors": ["Dean's List", "Magna Cum Laude"]
339
+ }}
340
+ ],
341
+ "projects": [
342
+ {{
343
+ "name": "Project Name",
344
+ "description": "Brief description of the project",
345
+ "technologies": ["tech1", "tech2"],
346
+ "github_url": "github.com/user/repo",
347
+ "demo_url": "live-demo-url.com",
348
+ "duration": "3 months",
349
+ "key_features": ["feature1", "feature2"]
350
+ }}
351
+ ],
352
+ "certifications": [
353
+ {{
354
+ "name": "Certification Name",
355
+ "issuer": "Organization",
356
+ "date": "2023",
357
+ "credential_id": "123456"
358
+ }}
359
+ ],
360
+ "languages": ["English (Native)", "Spanish (Conversational)"],
361
+ "years_of_experience": 5
362
+ }}
363
+
364
+ Important guidelines:
365
+ 1. Extract ALL skills mentioned, including those in job descriptions and projects
366
+ 2. Normalize technology names (e.g., "React.js" → "React", "ML" → "Machine Learning")
367
+ 3. Calculate years_of_experience from work history
368
+ 4. Parse dates in YYYY-MM format when possible
369
+ 5. Group similar skills appropriately
370
+ 6. Extract quantifiable achievements when possible
371
+ 7. If information is missing, omit the field or use empty array/string
372
+
373
+ Resume text:
374
+ {resume_text}
375
+ """
376
+
377
+ try:
378
+ response = self.llm_client.call_llm(prompt, temperature=0, max_tokens=4000)
379
+
380
+ # Clean response to extract just JSON
381
+ json_start = response.find('{')
382
+ json_end = response.rfind('}') + 1
383
+ if json_start != -1 and json_end > json_start:
384
+ json_str = response[json_start:json_end]
385
+ else:
386
+ json_str = response
387
+
388
+ data = json.loads(json_str)
389
+
390
+ # Convert to structured objects
391
+ return self._convert_to_resume_data(data)
392
+
393
+ except json.JSONDecodeError as e:
394
+ log_metric("resume_json_parse_error", {"error": str(e), "response": response[:500]})
395
+ return None
396
+ except Exception as e:
397
+ log_metric("resume_llm_error", {"error": str(e)})
398
+ return None
399
+
400
+ async def _chunked_extraction(self, resume_text: str) -> ResumeData:
401
+ """Extract data from long resumes by processing in chunks"""
402
+
403
+ sections = self._split_resume_sections(resume_text)
404
+
405
+ # Extract each section separately
406
+ personal_info = await self._extract_personal_info(sections.get("header", ""))
407
+ summary = await self._extract_summary(sections.get("summary", ""))
408
+ skills = await self._extract_skills(sections.get("skills", ""))
409
+ experience = await self._extract_experience(sections.get("experience", ""))
410
+ education = await self._extract_education(sections.get("education", ""))
411
+ projects = await self._extract_projects(sections.get("projects", ""))
412
+ certifications = await self._extract_certifications(sections.get("certifications", ""))
413
+
414
+ # Calculate years of experience
415
+ years_exp = self._calculate_years_experience(experience)
416
+
417
+ return ResumeData(
418
+ personal_info=personal_info,
419
+ summary=summary,
420
+ skills=skills,
421
+ experience=experience,
422
+ education=education,
423
+ projects=projects,
424
+ certifications=certifications,
425
+ languages=[],
426
+ years_of_experience=years_exp
427
+ )
428
+
429
+ async def _section_based_extraction(self, resume_text: str) -> ResumeData:
430
+ """Fallback extraction using regex and basic parsing"""
431
+
432
+ # Basic regex-based extraction
433
+ personal_info = self._extract_personal_info_regex(resume_text)
434
+ skills = self._extract_skills_regex(resume_text)
435
+
436
+ return ResumeData(
437
+ personal_info=personal_info,
438
+ summary="",
439
+ skills=skills,
440
+ experience=[],
441
+ education=[],
442
+ projects=[],
443
+ certifications=[],
444
+ languages=[],
445
+ years_of_experience=0
446
+ )
447
+
448
+ def _convert_to_resume_data(self, data: Dict[str, Any]) -> ResumeData:
449
+ """Convert parsed JSON to ResumeData objects"""
450
+
451
+ # Parse skills with normalization
452
+ skills_data = data.get("skills", {})
453
+ all_skills = []
454
+ for skill_list in skills_data.values():
455
+ if isinstance(skill_list, list):
456
+ all_skills.extend(skill_list)
457
+
458
+ normalized_skills = self.skills_normalizer.normalize_skill_list(all_skills)
459
+ skills = Skills(**normalized_skills)
460
+
461
+ # Parse experience
462
+ experience = []
463
+ for exp_data in data.get("experience", []):
464
+ exp = Experience(**exp_data)
465
+ experience.append(exp)
466
+
467
+ # Parse education
468
+ education = []
469
+ for edu_data in data.get("education", []):
470
+ edu = Education(**edu_data)
471
+ education.append(edu)
472
+
473
+ # Parse projects
474
+ projects = []
475
+ for proj_data in data.get("projects", []):
476
+ proj = Project(**proj_data)
477
+ projects.append(proj)
478
+
479
+ return ResumeData(
480
+ personal_info=data.get("personal_info", {}),
481
+ summary=data.get("summary", ""),
482
+ skills=skills,
483
+ experience=experience,
484
+ education=education,
485
+ projects=projects,
486
+ certifications=data.get("certifications", []),
487
+ languages=data.get("languages", []),
488
+ years_of_experience=data.get("years_of_experience", 0)
489
+ )
490
+
491
+ def _get_all_skills(self, skills: Skills) -> List[str]:
492
+ """Get all skills as a flat list"""
493
+ all_skills = []
494
+ for field in ['technical', 'programming_languages', 'frameworks', 'tools',
495
+ 'databases', 'cloud_platforms', 'methodologies']:
496
+ all_skills.extend(getattr(skills, field, []))
497
+ return all_skills
498
+
499
+ def _split_resume_sections(self, resume_text: str) -> Dict[str, str]:
500
+ """Split resume into sections using common headers"""
501
+
502
+ sections = {}
503
+ current_section = "header"
504
+ current_content = []
505
+
506
+ # Common section headers
507
+ section_patterns = {
508
+ r'(experience|work experience|employment|professional experience)': 'experience',
509
+ r'(education|academic background)': 'education',
510
+ r'(skills|technical skills|core competencies)': 'skills',
511
+ r'(projects|personal projects|side projects)': 'projects',
512
+ r'(summary|objective|profile)': 'summary',
513
+ r'(certifications|licenses)': 'certifications'
514
+ }
515
+
516
+ lines = resume_text.split('\n')
517
+
518
+ for line in lines:
519
+ line_lower = line.lower().strip()
520
+
521
+ # Check if this line is a section header
522
+ section_found = False
523
+ for pattern, section_name in section_patterns.items():
524
+ if re.search(pattern, line_lower):
525
+ # Save previous section
526
+ if current_content:
527
+ sections[current_section] = '\n'.join(current_content)
528
+
529
+ current_section = section_name
530
+ current_content = []
531
+ section_found = True
532
+ break
533
+
534
+ if not section_found:
535
+ current_content.append(line)
536
+
537
+ # Save final section
538
+ if current_content:
539
+ sections[current_section] = '\n'.join(current_content)
540
+
541
+ return sections
542
+
543
+ def _extract_personal_info_regex(self, text: str) -> Dict[str, str]:
544
+ """Extract personal information using regex"""
545
+
546
+ # Email
547
+ email_match = re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
548
+ email = email_match.group() if email_match else ""
549
+
550
+ # Phone
551
+ phone_match = re.search(r'(\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})', text)
552
+ phone = phone_match.group() if phone_match else ""
553
+
554
+ # LinkedIn
555
+ linkedin_match = re.search(r'linkedin\.com/in/[\w-]+', text, re.IGNORECASE)
556
+ linkedin = f"https://{linkedin_match.group()}" if linkedin_match else ""
557
+
558
+ # GitHub
559
+ github_match = re.search(r'github\.com/[\w-]+', text, re.IGNORECASE)
560
+ github = f"https://{github_match.group()}" if github_match else ""
561
+
562
+ return {
563
+ "email": email,
564
+ "phone": phone,
565
+ "linkedin": linkedin,
566
+ "github": github
567
+ }
568
+
569
+ def _extract_skills_regex(self, text: str) -> Skills:
570
+ """Extract skills using regex patterns"""
571
+
572
+ # Common technical skills to look for
573
+ tech_keywords = [
574
+ 'Python', 'JavaScript', 'Java', 'C++', 'React', 'Node.js', 'SQL',
575
+ 'AWS', 'Docker', 'Git', 'Machine Learning', 'Data Science',
576
+ 'TensorFlow', 'PyTorch', 'Pandas', 'NumPy', 'Django', 'Flask',
577
+ 'PostgreSQL', 'MongoDB', 'Redis', 'Kubernetes', 'Jenkins'
578
+ ]
579
+
580
+ found_skills = []
581
+ for skill in tech_keywords:
582
+ if re.search(rf'\b{re.escape(skill)}\b', text, re.IGNORECASE):
583
+ found_skills.append(skill)
584
+
585
+ # Normalize skills
586
+ normalized = self.skills_normalizer.normalize_skill_list(found_skills)
587
+
588
+ return Skills(**normalized)
589
+
590
+ def _calculate_years_experience(self, experience: List[Experience]) -> int:
591
+ """Calculate total years of experience"""
592
+ if not experience:
593
+ return 0
594
+
595
+ # Simple calculation based on number of roles
596
+ # In practice, you'd want to parse dates and calculate overlap
597
+ return len(experience)
598
+
599
+ def _count_tokens(self, text: str) -> int:
600
+ """Count tokens in text"""
601
+ try:
602
+ encoding = tiktoken.encoding_for_model("gpt-4o-mini")
603
+ return len(encoding.encode(text))
604
+ except:
605
+ # Fallback: approximate as 4 chars per token
606
+ return len(text) // 4
607
+
608
+ # Placeholder methods for individual section extraction
609
+ # These would be implemented with specific LLM calls for each section
610
+
611
+ async def _extract_personal_info(self, text: str) -> Dict[str, str]:
612
+ return self._extract_personal_info_regex(text)
613
+
614
+ async def _extract_summary(self, text: str) -> str:
615
+ return text.strip()
616
+
617
+ async def _extract_skills(self, text: str) -> Skills:
618
+ return self._extract_skills_regex(text)
619
+
620
+ async def _extract_experience(self, text: str) -> List[Experience]:
621
+ return []
622
+
623
+ async def _extract_education(self, text: str) -> List[Education]:
624
+ return []
625
+
626
+ async def _extract_projects(self, text: str) -> List[Project]:
627
+ return []
628
+
629
+ async def _extract_certifications(self, text: str) -> List[Dict[str, str]]:
630
+ return []
micro/enrich.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import Any, Dict
3
+ from llm_client import llm_client
4
+ from prompt_loader import prompt_loader
5
+ from metrics import log_metric
6
+
7
+ class EnrichMicroFunction:
8
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
9
+ scraped_text = data.get("scraped_text", "")
10
+
11
+ if not scraped_text or scraped_text == "No content found":
12
+ return {**data, "enriched": {"role": "", "company": "", "level": "", "error": "No content to enrich"}}
13
+
14
+ try:
15
+ # Pre-process to extract obvious info
16
+ company = self._extract_company(scraped_text, data.get("raw_input", ""))
17
+ role = self._extract_role(scraped_text, data.get("raw_input", ""))
18
+
19
+ # Use LLM to extract structured data
20
+ enrichment_prompt = prompt_loader.get_prompt("enrich_prompt",
21
+ job_posting=scraped_text,
22
+ pre_company=company,
23
+ pre_role=role)
24
+
25
+ llm_response = llm_client.call_llm(enrichment_prompt)
26
+
27
+ # Parse JSON response
28
+ try:
29
+ enriched_data = json.loads(llm_response)
30
+
31
+ # Override with pre-extracted data if LLM missed it
32
+ if enriched_data.get("company") in ["Unknown", "", None] and company:
33
+ enriched_data["company"] = company
34
+ if enriched_data.get("role") in ["Unknown", "", None] and role:
35
+ enriched_data["role"] = role
36
+
37
+ except json.JSONDecodeError:
38
+ # Fallback: use pre-extracted data and simple LLM call
39
+ simple_prompt = f"""Extract job information from this text and respond with just the key details:
40
+
41
+ Job posting: {scraped_text[:1500]}
42
+
43
+ What is the job title, company, and seniority level?"""
44
+
45
+ simple_response = llm_client.call_llm(simple_prompt)
46
+
47
+ enriched_data = {
48
+ "role": role or "Unknown",
49
+ "company": company or "Unknown",
50
+ "level": self._extract_level(scraped_text, simple_response),
51
+ "location": "Unknown",
52
+ "requirements": [],
53
+ "responsibilities": [],
54
+ "parsed_response": simple_response
55
+ }
56
+
57
+ log_metric("enrich_success", {
58
+ "has_role": bool(enriched_data.get("role")),
59
+ "has_company": bool(enriched_data.get("company")),
60
+ "has_requirements": bool(enriched_data.get("requirements"))
61
+ })
62
+
63
+ return {**data, "enriched": enriched_data}
64
+
65
+ except Exception as e:
66
+ log_metric("enrich_error", {"error": str(e)})
67
+ return {**data, "enriched": {"error": f"Enrichment failed: {e}"}}
68
+
69
+ def _extract_company(self, scraped_text: str, raw_input: str) -> str:
70
+ """Extract company name from text or URL"""
71
+ import re
72
+
73
+ # Check URL for company indicators (expanded list)
74
+ url_company_map = {
75
+ "microsoft.com": "Microsoft",
76
+ "google.com": "Google",
77
+ "apple.com": "Apple",
78
+ "amazon.com": "Amazon",
79
+ "amazon.jobs": "Amazon",
80
+ # Note: LinkedIn is excluded here because linkedin.com hosts jobs for OTHER companies
81
+ "paypal.com": "PayPal",
82
+ "paypal.eightfold.ai": "PayPal",
83
+ "meta.com": "Meta",
84
+ "facebook.com": "Meta",
85
+ "netflix.com": "Netflix",
86
+ "spotify.com": "Spotify",
87
+ "uber.com": "Uber",
88
+ "airbnb.com": "Airbnb",
89
+ "salesforce.com": "Salesforce",
90
+ "oracle.com": "Oracle",
91
+ "adobe.com": "Adobe",
92
+ "nvidia.com": "NVIDIA",
93
+ "tesla.com": "Tesla",
94
+ "stripe.com": "Stripe",
95
+ "ing.com": "ING"
96
+ }
97
+
98
+ for domain, company in url_company_map.items():
99
+ if domain in raw_input.lower():
100
+ return company
101
+
102
+ # Look for company patterns in scraped text (improved patterns)
103
+ company_patterns = [
104
+ # Direct company mentions (case-insensitive)
105
+ r"\b(ING|Microsoft|Google|Apple|Amazon|Meta|Facebook|Netflix|Tesla|Uber|Airbnb|Spotify|PayPal|Salesforce|Oracle|Adobe|NVIDIA|Stripe|Parambil)\b",
106
+ # Company in context patterns
107
+ r"(?:at|with|for|join)\s+([A-Z][a-zA-Z\s&.,-]+(?:Inc|LLC|Corp|Corporation|Ltd|Limited|Bank|Group)?)\b",
108
+ r"The Benefits Of Working With Us At\s+([A-Z][a-zA-Z\s&.-]+)",
109
+ r"About\s+([A-Z][a-zA-Z\s&.-]+)(?:\s+Include|\s*$)",
110
+ # Job posting patterns
111
+ r"Company:\s*([^\n\r]+)",
112
+ r"Company Name:\s*([^\n\r]+)",
113
+ r"Organization:\s*([^\n\r]+)",
114
+ r"Employer:\s*([^\n\r]+)",
115
+ # Common job title patterns with "at Company"
116
+ r"(?:Engineer|Scientist|Manager|Analyst|Developer|Designer|Specialist|Coordinator|Director)\s+at\s+([^\n\r,]+)",
117
+ r"(?:Senior|Junior|Lead|Staff|Principal)\s+\w+\s+at\s+([^\n\r,]+)",
118
+ # First line company extraction (common format)
119
+ r"^([A-Z][a-zA-Z\s&.,-]+(?:Inc|LLC|Corp|Corporation|Ltd|Limited)?)\s*$"
120
+ ]
121
+
122
+ for pattern in company_patterns:
123
+ matches = re.finditer(pattern, scraped_text, re.IGNORECASE | re.MULTILINE)
124
+ for match in matches:
125
+ company = match.group(1) if match.lastindex else match.group(0)
126
+ # Clean up formatting
127
+ company = company.strip()
128
+ # Remove markdown formatting
129
+ company = re.sub(r'^\*+\s*', '', company) # Remove leading asterisks
130
+ company = re.sub(r'\s*\*+$', '', company) # Remove trailing asterisks
131
+ company = re.sub(r'\s+', ' ', company) # Normalize whitespace
132
+
133
+ # Filter out non-company names and LinkedIn
134
+ excluded = ['linkedin', 'linkedin corporation', 'show more', 'about the job', 'about', 'include', 'benefits']
135
+ if (company.lower() not in excluded and
136
+ len(company.strip()) >= 2 and
137
+ len(company.strip()) <= 50 and
138
+ not company.lower().startswith('http')):
139
+ return company
140
+
141
+ return ""
142
+
143
+ def _extract_role(self, scraped_text: str, raw_input: str) -> str:
144
+ """Extract job role/title from text or URL"""
145
+ import re
146
+
147
+ # Look for title patterns in scraped text first (more reliable)
148
+ title_patterns = [
149
+ # Specific title patterns for this job
150
+ r"(Regulatory Engagement and Oversight Specialist[^.\n]*)",
151
+ r"(Financial Risk Specialist[^.\n]*)",
152
+ # Generic title patterns
153
+ r"Title:\s*([^\n\r]+)",
154
+ r"Position:\s*([^\n\r]+)",
155
+ r"Role:\s*([^\n\r]+)",
156
+ r"Job Title:\s*([^\n\r]+)",
157
+ r"Job:\s*([^\n\r]+)",
158
+ # First line of job posting (often the title)
159
+ r"^([A-Z][a-zA-Z\s/-]+(?:Specialist|Engineer|Manager|Analyst|Developer|Designer|Coordinator|Director|Scientist))\s*$",
160
+ # Common job title patterns
161
+ r"\b((?:Senior|Jr|Junior|Lead|Staff|Principal)?\s*(?:Software|Data|Applied|Research|Machine Learning|AI|Product|Marketing|Sales|Business|Regulatory|Financial|Risk)\s*(?:Engineer|Scientist|Manager|Analyst|Developer|Designer|Specialist|Coordinator|Director))\b",
162
+ r"\b((?:Senior|Jr|Junior|Lead|Staff|Principal)?\s*(?:Full Stack|Frontend|Backend|DevOps|Cloud|Security|Mobile|Web)\s*(?:Engineer|Developer))\b"
163
+ ]
164
+
165
+ for pattern in title_patterns:
166
+ match = re.search(pattern, scraped_text, re.IGNORECASE | re.MULTILINE)
167
+ if match:
168
+ title = match.group(1).strip() if match.lastindex else match.group(0).strip()
169
+ # Clean up common formatting issues
170
+ title = re.sub(r'^\*+\s*', '', title) # Remove leading asterisks
171
+ title = re.sub(r'\s*\*+$', '', title) # Remove trailing asterisks
172
+ title = re.sub(r'\s+', ' ', title) # Normalize whitespace
173
+ title = re.sub(r'\s*for\s*$', '', title, flags=re.IGNORECASE) # Remove trailing "for"
174
+ if 5 <= len(title) <= 100: # Reasonable length check
175
+ return title
176
+
177
+ # Extract from URL if it contains job title (fallback)
178
+ if raw_input and "/" in raw_input:
179
+ url_parts = raw_input.split("/")
180
+ for part in reversed(url_parts): # Check from end first
181
+ if any(keyword in part.lower() for keyword in ["scientist", "engineer", "developer", "manager", "analyst", "designer", "specialist"]):
182
+ # Clean up URL formatting
183
+ role = part.replace("-", " ").replace("_", " ").replace("%20", " ")
184
+ role = re.sub(r'\([^)]*\)', '', role) # Remove parentheses content
185
+ role = re.sub(r'\?.*', '', role) # Remove query parameters
186
+ role = " ".join(word.capitalize() for word in role.split() if word)
187
+ if 10 <= len(role) <= 80:
188
+ return role.strip()
189
+
190
+ return ""
191
+
192
+ def _extract_level(self, scraped_text: str, llm_response: str) -> str:
193
+ """Extract seniority level from text"""
194
+ import re
195
+
196
+ text_to_check = f"{scraped_text} {llm_response}".lower()
197
+
198
+ if any(term in text_to_check for term in ["senior", "sr.", "lead", "staff", "principal"]):
199
+ return "Senior"
200
+ elif any(term in text_to_check for term in ["junior", "jr.", "entry", "associate", "grad"]):
201
+ return "Junior"
202
+ elif any(term in text_to_check for term in ["mid", "intermediate", "ii", "2"]):
203
+ return "Mid"
204
+ else:
205
+ return "Mid" # Default assumption
micro/gap_analysis.py ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional, Set
2
+ import re
3
+ from llm_client import llm_client
4
+ from prompt_loader import prompt_loader
5
+ from metrics import log_metric
6
+
7
+ class GapAnalysisMicroFunction:
8
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
9
+ resume_data = data.get("resume_data", {})
10
+ enriched_data = data.get("enriched", {})
11
+
12
+ if not resume_data or "error" in resume_data:
13
+ return {**data, "gap_analysis": {"error": "No resume data available"}}
14
+
15
+ if not enriched_data or enriched_data.get("error"):
16
+ return {**data, "gap_analysis": {"error": "No job data available"}}
17
+
18
+ try:
19
+ # Perform comprehensive gap analysis
20
+ gap_analysis = self._analyze_gaps(resume_data, enriched_data)
21
+
22
+ log_metric("gap_analysis_success", {
23
+ "match_score": gap_analysis.get("match_score", 0),
24
+ "strong_matches": len(gap_analysis.get("strong_matches", [])),
25
+ "gaps": len(gap_analysis.get("gaps", []))
26
+ })
27
+
28
+ return {**data, "gap_analysis": gap_analysis}
29
+
30
+ except Exception as e:
31
+ log_metric("gap_analysis_error", {"error": str(e)})
32
+ return {**data, "gap_analysis": {"error": f"Gap analysis failed: {e}"}}
33
+
34
+ def _analyze_gaps(self, resume_data: Dict[str, Any], job_data: Dict[str, Any]) -> Dict[str, Any]:
35
+ """Perform detailed gap analysis between resume and job requirements"""
36
+
37
+ # Extract skills from resume
38
+ resume_skills = self._extract_resume_skills(resume_data)
39
+
40
+ # Extract requirements from job
41
+ job_requirements = self._extract_job_requirements(job_data)
42
+
43
+ # Perform skill matching
44
+ strong_matches = []
45
+ partial_matches = []
46
+ gaps = []
47
+
48
+ for req in job_requirements:
49
+ req_lower = req.lower()
50
+ match_type = self._find_skill_match(req_lower, resume_skills)
51
+
52
+ if match_type == "strong":
53
+ strong_matches.append(req)
54
+ elif match_type == "partial":
55
+ partial_matches.append(req)
56
+ else:
57
+ gaps.append(req)
58
+
59
+ # Calculate match score (0-100)
60
+ total_requirements = len(job_requirements)
61
+ if total_requirements == 0:
62
+ match_score = 50 # Default if no requirements found
63
+ else:
64
+ strong_weight = 1.0
65
+ partial_weight = 0.5
66
+ score = (len(strong_matches) * strong_weight + len(partial_matches) * partial_weight) / total_requirements * 100
67
+ match_score = min(100, max(0, round(score)))
68
+
69
+ # Generate narrative summary
70
+ summary = self._generate_summary(strong_matches, partial_matches, gaps, match_score)
71
+
72
+ # Create skills map for visualization
73
+ skills_map = self._create_skills_map(strong_matches, partial_matches, gaps)
74
+
75
+ return {
76
+ "match_score": match_score,
77
+ "strong_matches": strong_matches,
78
+ "partial_matches": partial_matches,
79
+ "gaps": gaps,
80
+ "summary": summary,
81
+ "skills_map": skills_map,
82
+ "resume_skills_count": len(resume_skills),
83
+ "job_requirements_count": total_requirements
84
+ }
85
+
86
+ def _extract_resume_skills(self, resume_data: Dict[str, Any]) -> Set[str]:
87
+ """Extract all skills from resume data"""
88
+ skills = set()
89
+
90
+ # Technical skills
91
+ skills_section = resume_data.get("skills", {})
92
+ if isinstance(skills_section, dict):
93
+ for skill_category in skills_section.values():
94
+ if isinstance(skill_category, list):
95
+ skills.update([skill.lower() for skill in skill_category])
96
+
97
+ # Skills from experience
98
+ experience = resume_data.get("experience", [])
99
+ for exp in experience:
100
+ if isinstance(exp, dict):
101
+ technologies = exp.get("technologies", [])
102
+ if isinstance(technologies, list):
103
+ skills.update([tech.lower() for tech in technologies])
104
+
105
+ # Skills from projects
106
+ projects = resume_data.get("projects", [])
107
+ for proj in projects:
108
+ if isinstance(proj, dict):
109
+ technologies = proj.get("technologies", [])
110
+ if isinstance(technologies, list):
111
+ skills.update([tech.lower() for tech in technologies])
112
+
113
+ return skills
114
+
115
+ def _extract_job_requirements(self, job_data: Dict[str, Any]) -> List[str]:
116
+ """Extract requirements from job data"""
117
+ requirements = []
118
+
119
+ # From requirements field
120
+ job_reqs = job_data.get("requirements", [])
121
+ if isinstance(job_reqs, list):
122
+ requirements.extend(job_reqs)
123
+ elif isinstance(job_reqs, str):
124
+ # Split by common delimiters
125
+ requirements.extend(re.split(r'[,;\n•\-]', job_reqs))
126
+
127
+ # From tech stack
128
+ tech_stack = job_data.get("tech_stack", [])
129
+ if isinstance(tech_stack, list):
130
+ requirements.extend(tech_stack)
131
+ elif isinstance(tech_stack, str):
132
+ requirements.extend(re.split(r'[,;\n•\-]', tech_stack))
133
+
134
+ # From responsibilities (extract technical terms)
135
+ responsibilities = job_data.get("responsibilities", [])
136
+ if isinstance(responsibilities, list):
137
+ for resp in responsibilities:
138
+ if isinstance(resp, str):
139
+ # Extract technical terms
140
+ tech_terms = self._extract_tech_terms(resp)
141
+ requirements.extend(tech_terms)
142
+
143
+ # Clean and deduplicate
144
+ cleaned_requirements = []
145
+ for req in requirements:
146
+ if isinstance(req, str):
147
+ cleaned = req.strip().strip('•-').strip()
148
+ if cleaned and len(cleaned) > 2:
149
+ cleaned_requirements.append(cleaned)
150
+
151
+ return list(set(cleaned_requirements))
152
+
153
+ def _extract_tech_terms(self, text: str) -> List[str]:
154
+ """Extract technical terms from text"""
155
+ # Common tech terms and patterns
156
+ tech_patterns = [
157
+ r'\b(Python|JavaScript|Java|C\+\+|C#|Ruby|Go|Rust|Swift|Kotlin)\b',
158
+ r'\b(React|Angular|Vue|Django|Flask|Spring|Rails|Laravel)\b',
159
+ r'\b(AWS|Azure|GCP|Docker|Kubernetes|Git|SQL|NoSQL)\b',
160
+ r'\b(Machine Learning|ML|AI|Deep Learning|TensorFlow|PyTorch)\b',
161
+ r'\b(Data Science|Analytics|Statistics|Pandas|NumPy)\b',
162
+ r'\b(API|REST|GraphQL|Microservices|DevOps|CI/CD)\b'
163
+ ]
164
+
165
+ terms = []
166
+ for pattern in tech_patterns:
167
+ matches = re.findall(pattern, text, re.IGNORECASE)
168
+ terms.extend([match.lower() for match in matches])
169
+
170
+ return terms
171
+
172
+ def _find_skill_match(self, requirement: str, resume_skills: Set[str]) -> str:
173
+ """Find the type of match between requirement and resume skills"""
174
+ req_clean = requirement.lower().strip()
175
+
176
+ # Strong match: exact match or very close
177
+ if req_clean in resume_skills:
178
+ return "strong"
179
+
180
+ # Check for partial matches
181
+ for skill in resume_skills:
182
+ # Substring match (both directions)
183
+ if (req_clean in skill and len(req_clean) > 2) or (skill in req_clean and len(skill) > 2):
184
+ return "partial"
185
+
186
+ # Similar technologies (e.g., React/ReactJS, Python/Python3)
187
+ if self._are_similar_technologies(req_clean, skill):
188
+ return "strong"
189
+
190
+ return "none"
191
+
192
+ def _are_similar_technologies(self, tech1: str, tech2: str) -> bool:
193
+ """Check if two technologies are similar/related"""
194
+ similar_groups = [
195
+ ["python", "python3", "python2"],
196
+ ["javascript", "js", "node.js", "nodejs"],
197
+ ["react", "reactjs", "react.js"],
198
+ ["angular", "angularjs"],
199
+ ["vue", "vue.js", "vuejs"],
200
+ ["docker", "containerization"],
201
+ ["kubernetes", "k8s"],
202
+ ["aws", "amazon web services"],
203
+ ["gcp", "google cloud platform", "google cloud"],
204
+ ["azure", "microsoft azure"],
205
+ ["sql", "mysql", "postgresql", "postgres"],
206
+ ["nosql", "mongodb", "cassandra"],
207
+ ["machine learning", "ml", "artificial intelligence", "ai"],
208
+ ["tensorflow", "tf"],
209
+ ["pytorch", "torch"]
210
+ ]
211
+
212
+ for group in similar_groups:
213
+ if tech1 in group and tech2 in group:
214
+ return True
215
+
216
+ return False
217
+
218
+ def _generate_summary(self, strong_matches: List[str], partial_matches: List[str],
219
+ gaps: List[str], match_score: int) -> str:
220
+ """Generate narrative summary of the gap analysis"""
221
+
222
+ summary_parts = []
223
+
224
+ # Overall assessment
225
+ if match_score >= 80:
226
+ summary_parts.append(f"Excellent match ({match_score}% compatibility)!")
227
+ elif match_score >= 60:
228
+ summary_parts.append(f"Good match ({match_score}% compatibility) with some areas for growth.")
229
+ elif match_score >= 40:
230
+ summary_parts.append(f"Moderate match ({match_score}% compatibility) requiring focused preparation.")
231
+ else:
232
+ summary_parts.append(f"Challenging match ({match_score}% compatibility) needing significant upskilling.")
233
+
234
+ # Strengths
235
+ if strong_matches:
236
+ top_strengths = strong_matches[:3]
237
+ summary_parts.append(f"Your strongest assets are {', '.join(top_strengths)}.")
238
+
239
+ # Gaps to address
240
+ if gaps:
241
+ priority_gaps = gaps[:3]
242
+ summary_parts.append(f"Focus your preparation on {', '.join(priority_gaps)}.")
243
+
244
+ return " ".join(summary_parts)
245
+
246
+ def _create_skills_map(self, strong_matches: List[str], partial_matches: List[str],
247
+ gaps: List[str]) -> Dict[str, List[str]]:
248
+ """Create a skills map for visualization"""
249
+ return {
250
+ "strong": strong_matches[:10], # Limit for display
251
+ "partial": partial_matches[:10],
252
+ "gaps": gaps[:10]
253
+ }
micro/guide_render.py ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional
2
+ from metrics import log_metric
3
+
4
+ class GuideRenderMicroFunction:
5
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
6
+ interview_guide = data.get("interview_guide", {})
7
+
8
+ if not interview_guide or "error" in interview_guide:
9
+ return {**data, "rendered_guide": "# Interview Guide Generation Failed\n\nPlease try again with valid resume and job data."}
10
+
11
+ try:
12
+ # Render comprehensive markdown guide
13
+ rendered_guide = self._render_interview_guide(interview_guide)
14
+
15
+ log_metric("guide_render_success", {
16
+ "total_length": len(rendered_guide),
17
+ "sections_count": len(interview_guide.keys())
18
+ })
19
+
20
+ return {**data, "rendered_guide": rendered_guide}
21
+
22
+ except Exception as e:
23
+ log_metric("guide_render_error", {"error": str(e)})
24
+ fallback = self._create_fallback_guide(interview_guide)
25
+ return {**data, "rendered_guide": fallback}
26
+
27
+ def _render_interview_guide(self, guide: Dict[str, Any]) -> str:
28
+ """Render complete interview guide in Interview Query style"""
29
+
30
+ metadata = guide.get("metadata", {})
31
+ role = metadata.get("role", "Unknown Role")
32
+ company = metadata.get("company", "Unknown Company")
33
+ match_score = metadata.get("match_score", 0)
34
+
35
+ # Build comprehensive guide
36
+ sections = []
37
+
38
+ # Header
39
+ sections.append(self._render_header(role, company, match_score))
40
+
41
+ # Introduction
42
+ sections.append(self._render_introduction(guide.get("introduction", "")))
43
+
44
+ # Skills Analysis with visual
45
+ sections.append(self._render_skills_analysis(guide.get("skills_analysis", {})))
46
+
47
+ # Interview Process
48
+ sections.append(self._render_interview_process(company, guide.get("interview_process", "")))
49
+
50
+ # Question Sections
51
+ sections.append(self._render_questions(guide.get("questions", {}), role))
52
+
53
+ # Preparation Strategy
54
+ sections.append(self._render_preparation_tips(guide.get("preparation_tips", {})))
55
+
56
+ # Talking Points
57
+ sections.append(self._render_talking_points(guide.get("talking_points", [])))
58
+
59
+ # Smart Questions to Ask
60
+ sections.append(self._render_smart_questions(guide.get("smart_questions", [])))
61
+
62
+ # Conclusion with Resources
63
+ sections.append(self._render_conclusion(guide.get("conclusion", {})))
64
+
65
+ return "\n\n".join(sections)
66
+
67
+ def _render_header(self, role: str, company: str, match_score: int) -> str:
68
+ """Render header with match score indicator"""
69
+
70
+ # Match score indicator
71
+ if match_score >= 80:
72
+ score_indicator = "🟢 Excellent Match"
73
+ score_color = "green"
74
+ elif match_score >= 60:
75
+ score_indicator = "🟡 Good Match"
76
+ score_color = "yellow"
77
+ elif match_score >= 40:
78
+ score_indicator = "🟠 Moderate Match"
79
+ score_color = "orange"
80
+ else:
81
+ score_indicator = "🔴 Challenging Match"
82
+ score_color = "red"
83
+
84
+ return f"""# 🎯 Personalized Interview Guide: {role} at {company}
85
+
86
+ **Match Score**: {score_indicator} ({match_score}%)
87
+
88
+ ---"""
89
+
90
+ def _render_introduction(self, introduction: str) -> str:
91
+ """Render introduction section"""
92
+ return f"""## Introduction
93
+
94
+ {introduction}"""
95
+
96
+ def _render_skills_analysis(self, skills_analysis: Dict[str, Any]) -> str:
97
+ """Render skills analysis with visual chart"""
98
+
99
+ match_score = skills_analysis.get("match_score", 0)
100
+ summary = skills_analysis.get("summary", "")
101
+ skills_breakdown = skills_analysis.get("skills_breakdown", {})
102
+ chart_data = skills_analysis.get("chart_data", {})
103
+
104
+ # Create text-based bar chart
105
+ strong_count = chart_data.get("strong_matches", 0)
106
+ partial_count = chart_data.get("partial_matches", 0)
107
+ gaps_count = chart_data.get("gaps", 0)
108
+ total = strong_count + partial_count + gaps_count
109
+
110
+ if total > 0:
111
+ strong_bar = "█" * min(20, int((strong_count / total) * 20))
112
+ partial_bar = "▒" * min(20, int((partial_count / total) * 20))
113
+ gaps_bar = "░" * min(20, int((gaps_count / total) * 20))
114
+ else:
115
+ strong_bar = partial_bar = gaps_bar = ""
116
+
117
+ chart = f"""
118
+ ### Skills Match Analysis
119
+
120
+ **Overall Assessment**: {summary}
121
+
122
+ #### Skills Breakdown
123
+ ```
124
+ Strong Matches {strong_bar} {strong_count}
125
+ Partial Matches {partial_bar} {partial_count}
126
+ Skill Gaps {gaps_bar} {gaps_count}
127
+ ```
128
+ """
129
+
130
+ # Add detailed breakdowns
131
+ if skills_breakdown:
132
+ if skills_breakdown.get("strong"):
133
+ chart += f"\n**✅ Your Strengths**: {', '.join(skills_breakdown['strong'][:5])}"
134
+
135
+ if skills_breakdown.get("partial"):
136
+ chart += f"\n\n**⚡ Areas to Highlight**: {', '.join(skills_breakdown['partial'][:5])}"
137
+
138
+ if skills_breakdown.get("gaps"):
139
+ chart += f"\n\n**📚 Priority Learning**: {', '.join(skills_breakdown['gaps'][:5])}"
140
+
141
+ return chart
142
+
143
+ def _render_interview_process(self, company: str, process_content: str) -> str:
144
+ """Render interview process section"""
145
+ return f"""## What Is the Interview Process Like at {company}?
146
+
147
+ {process_content}"""
148
+
149
+ def _render_questions(self, questions: Dict[str, List[Dict]], role: str) -> str:
150
+ """Render all question sections"""
151
+
152
+ sections = []
153
+
154
+ # Technical Questions
155
+ if questions.get("technical"):
156
+ sections.append(self._render_question_section(
157
+ "Technical & Problem-Solving Questions",
158
+ questions["technical"],
159
+ f"These questions test your technical knowledge for the {role} role. Focus on demonstrating both your understanding and problem-solving approach."
160
+ ))
161
+
162
+ # Behavioral Questions
163
+ if questions.get("behavioral"):
164
+ sections.append(self._render_question_section(
165
+ "Behavioral & Experience Questions",
166
+ questions["behavioral"],
167
+ "Use the STAR method (Situation, Task, Action, Result) to structure your responses. Draw from specific examples in your background."
168
+ ))
169
+
170
+ # Company Questions
171
+ if questions.get("company"):
172
+ sections.append(self._render_question_section(
173
+ "Company & Culture Questions",
174
+ questions["company"],
175
+ "These questions assess your interest in the company and cultural fit. Research thoroughly and be genuine in your responses."
176
+ ))
177
+
178
+ return "\n\n".join(sections)
179
+
180
+ def _render_question_section(self, title: str, questions: List[Dict], intro: str) -> str:
181
+ """Render individual question section"""
182
+
183
+ section = f"""## {title}
184
+
185
+ {intro}
186
+
187
+ """
188
+
189
+ for i, q in enumerate(questions, 1):
190
+ question_text = q.get("question", "")
191
+ approach = q.get("approach", "")
192
+ difficulty = q.get("difficulty", "")
193
+
194
+ # Add difficulty indicator
195
+ if difficulty == "advanced":
196
+ diff_icon = "🔴"
197
+ elif difficulty == "intermediate":
198
+ diff_icon = "🟡"
199
+ else:
200
+ diff_icon = "🟢"
201
+
202
+ section += f"""**{i}. {question_text}** {diff_icon}
203
+
204
+ {approach}
205
+
206
+ """
207
+
208
+ return section.strip()
209
+
210
+ def _render_preparation_tips(self, tips: Dict[str, List[str]]) -> str:
211
+ """Render preparation tips section"""
212
+
213
+ section = "## Preparation Strategy\n\n"
214
+
215
+ # Priority areas (gaps)
216
+ if tips.get("priority_areas"):
217
+ section += "### 🎯 Priority Focus Areas\n\n"
218
+ for tip in tips["priority_areas"]:
219
+ section += f"- {tip}\n"
220
+ section += "\n"
221
+
222
+ # Leverage strengths
223
+ if tips.get("leverage_strengths"):
224
+ section += "### 💪 Leverage Your Strengths\n\n"
225
+ for tip in tips["leverage_strengths"]:
226
+ section += f"- {tip}\n"
227
+ section += "\n"
228
+
229
+ # General tips
230
+ if tips.get("general"):
231
+ section += "### 📋 General Interview Tips\n\n"
232
+ for tip in tips["general"]:
233
+ section += f"- {tip}\n"
234
+ section += "\n"
235
+
236
+ return section.strip()
237
+
238
+ def _render_talking_points(self, talking_points: List[str]) -> str:
239
+ """Render talking points section"""
240
+
241
+ if not talking_points:
242
+ return ""
243
+
244
+ section = "## Key Talking Points\n\n"
245
+ section += "Highlight these specific achievements and experiences during your interview:\n\n"
246
+
247
+ for point in talking_points:
248
+ section += f"- {point}\n"
249
+
250
+ return section
251
+
252
+ def _render_smart_questions(self, questions: List[str]) -> str:
253
+ """Render smart questions section"""
254
+
255
+ if not questions:
256
+ return ""
257
+
258
+ section = "## Smart Questions to Ask\n\n"
259
+ section += "End your interview strong by asking thoughtful questions:\n\n"
260
+
261
+ for question in questions:
262
+ section += f"- {question}\n"
263
+
264
+ return section
265
+
266
+ def _render_conclusion(self, conclusion: Dict[str, str]) -> str:
267
+ """Render conclusion with resources"""
268
+
269
+ summary = conclusion.get("summary", "")
270
+
271
+ section = f"""## Conclusion
272
+
273
+ {summary}
274
+
275
+ ### Additional Resources
276
+
277
+ - [Success Story]({conclusion.get('success_story_link', '#')})
278
+ - [Learning Resources]({conclusion.get('learning_resource_link', '#')})
279
+ - [Practice Questions]({conclusion.get('questions_practice_link', '#')})
280
+
281
+ ---
282
+
283
+ **Good luck with your interview!** 🚀
284
+
285
+ *This personalized guide was generated based on your specific background and the target role requirements.*"""
286
+
287
+ return section
288
+
289
+ def _create_fallback_guide(self, guide: Dict[str, Any]) -> str:
290
+ """Create fallback guide if rendering fails"""
291
+
292
+ return f"""# Personalized Interview Guide
293
+
294
+ ## Summary
295
+ This guide was generated to help you prepare for your interview.
296
+
297
+ ## Content
298
+ {str(guide)}
299
+
300
+ ---
301
+ *Please try regenerating for better formatting.*"""
micro/interview_guide.py ADDED
@@ -0,0 +1,408 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional
2
+ from llm_client import llm_client
3
+ from prompt_loader import prompt_loader
4
+ from metrics import log_metric
5
+ import json
6
+
7
+ class InterviewGuideMicroFunction:
8
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
9
+ resume_data = data.get("resume_data", {})
10
+ enriched_data = data.get("enriched", {})
11
+ gap_analysis = data.get("gap_analysis", {})
12
+
13
+ if not resume_data or "error" in resume_data:
14
+ return {**data, "interview_guide": {"error": "No resume data available"}}
15
+
16
+ if not enriched_data or enriched_data.get("error"):
17
+ return {**data, "interview_guide": {"error": "No job data available"}}
18
+
19
+ if not gap_analysis or "error" in gap_analysis:
20
+ return {**data, "interview_guide": {"error": "No gap analysis available"}}
21
+
22
+ try:
23
+ # Generate personalized interview guide
24
+ guide = self._generate_interview_guide(resume_data, enriched_data, gap_analysis)
25
+
26
+ log_metric("interview_guide_success", {
27
+ "sections_count": len(guide.get("sections", {})),
28
+ "questions_count": sum(len(q) for q in guide.get("questions", {}).values()),
29
+ "match_score": gap_analysis.get("match_score", 0)
30
+ })
31
+
32
+ return {**data, "interview_guide": guide}
33
+
34
+ except Exception as e:
35
+ log_metric("interview_guide_error", {"error": str(e)})
36
+ return {**data, "interview_guide": {"error": f"Interview guide generation failed: {e}"}}
37
+
38
+ def _generate_interview_guide(self, resume_data: Dict[str, Any],
39
+ job_data: Dict[str, Any],
40
+ gap_analysis: Dict[str, Any]) -> Dict[str, Any]:
41
+ """Generate comprehensive personalized interview guide"""
42
+
43
+ # Extract key information
44
+ role = job_data.get("role", "Unknown Role")
45
+ company = job_data.get("company", "Unknown Company")
46
+ match_score = gap_analysis.get("match_score", 0)
47
+ strong_matches = gap_analysis.get("strong_matches", [])
48
+ gaps = gap_analysis.get("gaps", [])
49
+
50
+ # Generate introduction
51
+ introduction = self._generate_introduction(role, company, resume_data, gap_analysis)
52
+
53
+ # Generate skills analysis section
54
+ skills_analysis = self._generate_skills_analysis(gap_analysis)
55
+
56
+ # Generate interview process section
57
+ interview_process = self._generate_interview_process(company, role)
58
+
59
+ # Generate question sections
60
+ questions = self._generate_question_sections(role, strong_matches, gaps, resume_data)
61
+
62
+ # Generate preparation tips
63
+ preparation_tips = self._generate_preparation_tips(gaps, strong_matches, match_score)
64
+
65
+ # Generate talking points
66
+ talking_points = self._generate_talking_points(resume_data, strong_matches)
67
+
68
+ # Generate smart questions to ask
69
+ smart_questions = self._generate_smart_questions(company, role, job_data)
70
+
71
+ # Generate conclusion with resources
72
+ conclusion = self._generate_conclusion(role, company, gaps)
73
+
74
+ return {
75
+ "introduction": introduction,
76
+ "skills_analysis": skills_analysis,
77
+ "interview_process": interview_process,
78
+ "questions": questions,
79
+ "preparation_tips": preparation_tips,
80
+ "talking_points": talking_points,
81
+ "smart_questions": smart_questions,
82
+ "conclusion": conclusion,
83
+ "metadata": {
84
+ "role": role,
85
+ "company": company,
86
+ "match_score": match_score,
87
+ "generated_at": self._get_timestamp()
88
+ }
89
+ }
90
+
91
+ def _generate_introduction(self, role: str, company: str,
92
+ resume_data: Dict[str, Any],
93
+ gap_analysis: Dict[str, Any]) -> str:
94
+ """Generate personalized introduction"""
95
+
96
+ match_score = gap_analysis.get("match_score", 0)
97
+ summary = gap_analysis.get("summary", "")
98
+
99
+ # Extract user's background
100
+ experience = resume_data.get("experience", [])
101
+ years_exp = len(experience)
102
+
103
+ recent_role = ""
104
+ if experience:
105
+ recent_role = experience[0].get("title", "") if isinstance(experience[0], dict) else ""
106
+
107
+ prompt = f"""
108
+ Write a personalized interview guide introduction for:
109
+ - Target Role: {role} at {company}
110
+ - Candidate Background: {recent_role} with {years_exp} roles
111
+ - Match Score: {match_score}%
112
+ - Gap Summary: {summary}
113
+
114
+ Use a confident, mentor-like tone. Start with the primary keyword "{role} interview" in the first 100 words.
115
+ Address the candidate directly ("you") and reference their specific background.
116
+ Keep it ≤150 words, 3 sentences max per paragraph.
117
+
118
+ Focus on:
119
+ 1. What makes this role exciting for someone with their background
120
+ 2. Their competitive advantages
121
+ 3. What this guide will help them achieve
122
+ """
123
+
124
+ return llm_client.call_llm(prompt)
125
+
126
+ def _generate_skills_analysis(self, gap_analysis: Dict[str, Any]) -> Dict[str, Any]:
127
+ """Generate visual skills analysis section"""
128
+
129
+ skills_map = gap_analysis.get("skills_map", {})
130
+ match_score = gap_analysis.get("match_score", 0)
131
+ summary = gap_analysis.get("summary", "")
132
+
133
+ # Create bar chart data for visualization
134
+ chart_data = {
135
+ "strong_matches": len(skills_map.get("strong", [])),
136
+ "partial_matches": len(skills_map.get("partial", [])),
137
+ "gaps": len(skills_map.get("gaps", []))
138
+ }
139
+
140
+ return {
141
+ "match_score": match_score,
142
+ "summary": summary,
143
+ "skills_breakdown": skills_map,
144
+ "chart_data": chart_data
145
+ }
146
+
147
+ def _generate_interview_process(self, company: str, role: str) -> str:
148
+ """Generate interview process section"""
149
+
150
+ prompt = f"""
151
+ Describe the typical interview process for a {role} position at {company}.
152
+ If you don't know the specific company process, describe the general process for this role type.
153
+
154
+ Include:
155
+ 1. Number of rounds typically
156
+ 2. Types of interviews (phone, technical, behavioral, onsite)
157
+ 3. Who you'll likely meet with
158
+ 4. Timeline expectations
159
+ 5. Any company-specific details if known
160
+
161
+ Use markdown formatting with headers. Keep conversational and actionable.
162
+ Max 200 words.
163
+ """
164
+
165
+ return llm_client.call_llm(prompt)
166
+
167
+ def _generate_question_sections(self, role: str, strong_matches: List[str],
168
+ gaps: List[str], resume_data: Dict[str, Any]) -> Dict[str, List[Dict]]:
169
+ """Generate categorized interview questions with personalized advice"""
170
+
171
+ questions = {}
172
+
173
+ # Technical questions (prioritize gaps)
174
+ technical_questions = self._generate_technical_questions(role, gaps, strong_matches)
175
+ if technical_questions:
176
+ questions["technical"] = technical_questions
177
+
178
+ # Behavioral questions
179
+ behavioral_questions = self._generate_behavioral_questions(role, resume_data)
180
+ if behavioral_questions:
181
+ questions["behavioral"] = behavioral_questions
182
+
183
+ # Company-specific questions
184
+ company_questions = self._generate_company_questions(role)
185
+ if company_questions:
186
+ questions["company"] = company_questions
187
+
188
+ return questions
189
+
190
+ def _generate_technical_questions(self, role: str, gaps: List[str],
191
+ strong_matches: List[str]) -> List[Dict[str, str]]:
192
+ """Generate technical questions with personalized approach guidance"""
193
+
194
+ # Prioritize gap areas for question focus
195
+ focus_areas = gaps[:3] if gaps else strong_matches[:3]
196
+
197
+ prompt = f"""
198
+ Generate 5 technical interview questions for a {role} position.
199
+ Focus areas based on candidate needs: {', '.join(focus_areas)}
200
+
201
+ For each question, provide:
202
+ 1. The question
203
+ 2. A 2-3 sentence approach tailored to someone who needs to strengthen {focus_areas[0] if focus_areas else 'general skills'}
204
+
205
+ Format as JSON array:
206
+ [
207
+ {{
208
+ "question": "Question text",
209
+ "approach": "Personalized approach advice",
210
+ "difficulty": "beginner|intermediate|advanced"
211
+ }}
212
+ ]
213
+
214
+ Focus on practical, real-world questions that test understanding.
215
+ """
216
+
217
+ try:
218
+ response = llm_client.call_llm(prompt)
219
+ # Parse JSON response
220
+ import json
221
+ from text_extractor import robust_json_parse
222
+ questions_data = robust_json_parse(response)
223
+ if isinstance(questions_data, list):
224
+ return questions_data
225
+ except:
226
+ pass
227
+
228
+ # Fallback questions
229
+ return [
230
+ {
231
+ "question": f"How would you approach solving a {role.lower()} problem?",
232
+ "approach": "Focus on your systematic problem-solving process and mention relevant experience.",
233
+ "difficulty": "intermediate"
234
+ }
235
+ ]
236
+
237
+ def _generate_behavioral_questions(self, role: str, resume_data: Dict[str, Any]) -> List[Dict[str, str]]:
238
+ """Generate behavioral questions with personalized advice"""
239
+
240
+ # Extract key experiences for STAR method guidance
241
+ experience = resume_data.get("experience", [])
242
+ recent_achievements = []
243
+
244
+ for exp in experience[:2]: # Focus on recent experience
245
+ if isinstance(exp, dict):
246
+ achievements = exp.get("achievements", [])
247
+ if achievements:
248
+ recent_achievements.extend(achievements[:2])
249
+
250
+ prompt = f"""
251
+ Generate 5 behavioral interview questions for a {role} position.
252
+ Candidate's recent achievements: {', '.join(recent_achievements[:3])}
253
+
254
+ For each question, provide specific STAR method guidance using their background.
255
+
256
+ Format as JSON array:
257
+ [
258
+ {{
259
+ "question": "Question text",
260
+ "approach": "STAR method guidance referencing their specific experience",
261
+ "difficulty": "standard"
262
+ }}
263
+ ]
264
+
265
+ Focus on leadership, problem-solving, teamwork, and role-specific scenarios.
266
+ """
267
+
268
+ try:
269
+ response = llm_client.call_llm(prompt)
270
+ from text_extractor import robust_json_parse
271
+ questions_data = robust_json_parse(response)
272
+ if isinstance(questions_data, list):
273
+ return questions_data
274
+ except:
275
+ pass
276
+
277
+ # Fallback questions
278
+ return [
279
+ {
280
+ "question": "Tell me about a challenging project you worked on.",
281
+ "approach": "Use STAR method: Situation, Task, Action, Result. Draw from your recent experience.",
282
+ "difficulty": "standard"
283
+ }
284
+ ]
285
+
286
+ def _generate_company_questions(self, role: str) -> List[Dict[str, str]]:
287
+ """Generate company-specific questions"""
288
+
289
+ return [
290
+ {
291
+ "question": "Why are you interested in this role?",
292
+ "approach": "Connect your career goals with the company's mission and this specific role's impact.",
293
+ "difficulty": "standard"
294
+ },
295
+ {
296
+ "question": "What do you know about our company?",
297
+ "approach": "Research their recent news, mission, and values. Show genuine interest in their work.",
298
+ "difficulty": "standard"
299
+ }
300
+ ]
301
+
302
+ def _generate_preparation_tips(self, gaps: List[str], strong_matches: List[str],
303
+ match_score: int) -> Dict[str, List[str]]:
304
+ """Generate personalized preparation tips"""
305
+
306
+ tips = {}
307
+
308
+ # Tips for gap areas (priority)
309
+ if gaps:
310
+ gap_tips = []
311
+ for gap in gaps[:3]:
312
+ gap_tips.append(f"Study {gap} fundamentals - focus on practical applications")
313
+ gap_tips.append(f"Find online tutorials or courses for {gap}")
314
+ gap_tips.append(f"Practice explaining {gap} concepts in simple terms")
315
+ tips["priority_areas"] = gap_tips[:5]
316
+
317
+ # Tips for strength areas
318
+ if strong_matches:
319
+ strength_tips = []
320
+ for strength in strong_matches[:3]:
321
+ strength_tips.append(f"Prepare advanced examples showcasing your {strength} expertise")
322
+ strength_tips.append(f"Think of specific metrics/results from {strength} projects")
323
+ tips["leverage_strengths"] = strength_tips[:5]
324
+
325
+ # General tips based on match score
326
+ if match_score < 60:
327
+ tips["general"] = [
328
+ "Focus heavily on demonstrating learning ability and enthusiasm",
329
+ "Prepare questions that show your eagerness to grow",
330
+ "Research the company thoroughly to show genuine interest"
331
+ ]
332
+ else:
333
+ tips["general"] = [
334
+ "Practice articulating your experience clearly and confidently",
335
+ "Prepare specific examples that align with job requirements",
336
+ "Focus on cultural fit and long-term career alignment"
337
+ ]
338
+
339
+ return tips
340
+
341
+ def _generate_talking_points(self, resume_data: Dict[str, Any],
342
+ strong_matches: List[str]) -> List[str]:
343
+ """Generate specific talking points based on resume"""
344
+
345
+ talking_points = []
346
+
347
+ # From recent experience
348
+ experience = resume_data.get("experience", [])
349
+ if experience:
350
+ recent_exp = experience[0]
351
+ if isinstance(recent_exp, dict):
352
+ achievements = recent_exp.get("achievements", [])
353
+ talking_points.extend(achievements[:2])
354
+
355
+ # From projects
356
+ projects = resume_data.get("projects", [])
357
+ for project in projects[:2]:
358
+ if isinstance(project, dict):
359
+ name = project.get("name", "")
360
+ description = project.get("description", "")
361
+ if name and description:
362
+ talking_points.append(f"{name}: {description}")
363
+
364
+ # From strong matches
365
+ for match in strong_matches[:3]:
366
+ talking_points.append(f"Deep experience with {match} from multiple projects")
367
+
368
+ return talking_points[:6]
369
+
370
+ def _generate_smart_questions(self, company: str, role: str,
371
+ job_data: Dict[str, Any]) -> List[str]:
372
+ """Generate thoughtful questions for the candidate to ask"""
373
+
374
+ questions = [
375
+ f"What does success look like for someone in this {role} role after 6 months?",
376
+ f"What are the biggest challenges facing the team/company right now?",
377
+ "What opportunities for growth and learning does this role offer?",
378
+ "How does this role contribute to the company's strategic goals?",
379
+ "What do you enjoy most about working at this company?"
380
+ ]
381
+
382
+ # Add role-specific questions
383
+ if "engineer" in role.lower():
384
+ questions.append("What's the team's approach to code reviews and technical debt?")
385
+ questions.append("How do you balance feature development with technical improvements?")
386
+ elif "data" in role.lower():
387
+ questions.append("What data infrastructure and tools does the team use?")
388
+ questions.append("How do you ensure data quality and reliability?")
389
+
390
+ return questions[:7]
391
+
392
+ def _generate_conclusion(self, role: str, company: str, gaps: List[str]) -> Dict[str, str]:
393
+ """Generate conclusion with resource links"""
394
+
395
+ # Focus on top gap for learning resource
396
+ primary_gap = gaps[0] if gaps else "general interview skills"
397
+
398
+ return {
399
+ "summary": f"This personalized guide gives you a strategic advantage for your {role} interview at {company}. Focus your preparation on the priority areas identified, leverage your strengths, and demonstrate your learning mindset.",
400
+ "success_story_link": f"Read about someone who successfully landed a {role} role",
401
+ "learning_resource_link": f"Top {primary_gap} learning resources for interview prep",
402
+ "questions_practice_link": f"Practice {role} interview questions"
403
+ }
404
+
405
+ def _get_timestamp(self) -> str:
406
+ """Get current timestamp"""
407
+ import datetime
408
+ return datetime.datetime.now().isoformat()
micro/patch_missing.py ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import os
3
+ from typing import Optional
4
+ from text_extractor import JobCore
5
+ from llm_client import google_search
6
+ from metrics import log_metric
7
+
8
+
9
+ def patch_missing(core: JobCore) -> JobCore:
10
+ """Patch missing fields in JobCore using Google search."""
11
+
12
+ # Check if Google patching is enabled
13
+ if not os.getenv("GOOGLE_PATCH_ENABLED", "true").lower() in ["true", "1", "yes"]:
14
+ return core
15
+
16
+ # Only patch if we have basic company info
17
+ if not core.company:
18
+ return core
19
+
20
+ patches_applied = 0
21
+
22
+ # Patch salary if missing
23
+ if not core.salary_low and not core.salary_high:
24
+ salary_info = _patch_salary(core.company, core.role)
25
+ if salary_info:
26
+ core.salary_low, core.salary_high = salary_info
27
+ core.source_map["salary"] = "google"
28
+ patches_applied += 1
29
+
30
+ # Patch funding if missing
31
+ if not core.funding:
32
+ funding_info = _patch_funding(core.company)
33
+ if funding_info:
34
+ core.funding = funding_info
35
+ core.source_map["funding"] = "google"
36
+ patches_applied += 1
37
+
38
+ # Patch mission if missing
39
+ if not core.mission:
40
+ mission_info = _patch_mission(core.company)
41
+ if mission_info:
42
+ core.mission = mission_info
43
+ core.source_map["mission"] = "google"
44
+ patches_applied += 1
45
+
46
+ # Patch location if missing
47
+ if not core.location:
48
+ location_info = _patch_location(core.company)
49
+ if location_info:
50
+ core.location = location_info
51
+ core.source_map["location"] = "google"
52
+ patches_applied += 1
53
+
54
+ log_metric("patch_missing", {
55
+ "company": core.company,
56
+ "patches_applied": patches_applied,
57
+ "source_map": core.source_map
58
+ })
59
+
60
+ return core
61
+
62
+
63
+ def _patch_salary(company: str, role: str) -> Optional[tuple[int, int]]:
64
+ """Search for salary information and extract range."""
65
+ if not company or not role:
66
+ return None
67
+
68
+ query = f'"{company}" "{role}" salary range'
69
+ snippets = google_search(query, top=3, timeout=5)
70
+
71
+ for snippet in snippets:
72
+ # Look for salary patterns like "$120k-$180k", "$150,000-$200,000"
73
+ salary_patterns = [
74
+ r'\$(\d+)k?[-–]\$?(\d+)k?',
75
+ r'\$(\d+),?(\d+)[-–]\$?(\d+),?(\d+)',
76
+ r'(\d+)k?[-–](\d+)k?\s*(?:per|/)?\s*year',
77
+ ]
78
+
79
+ for pattern in salary_patterns:
80
+ match = re.search(pattern, snippet, re.IGNORECASE)
81
+ if match:
82
+ try:
83
+ if 'k' in match.group(0).lower():
84
+ low = int(match.group(1)) * 1000
85
+ high = int(match.group(2)) * 1000
86
+ else:
87
+ low = int(match.group(1))
88
+ high = int(match.group(2))
89
+
90
+ # Sanity check: reasonable salary range
91
+ if 30000 <= low <= 500000 and 30000 <= high <= 500000 and low < high:
92
+ return (low, high)
93
+ except (ValueError, IndexError):
94
+ continue
95
+
96
+ return None
97
+
98
+
99
+ def _patch_funding(company: str) -> Optional[str]:
100
+ """Search for funding information."""
101
+ if not company:
102
+ return None
103
+
104
+ query = f'"{company}" funding round raised'
105
+ snippets = google_search(query, top=3, timeout=5)
106
+
107
+ for snippet in snippets:
108
+ # Look for funding patterns
109
+ funding_patterns = [
110
+ r'raised \$(\d+(?:\.\d+)?[MB]?)',
111
+ r'Series [A-Z] \$(\d+(?:\.\d+)?[MB]?)',
112
+ r'\$(\d+(?:\.\d+)?[MB]?) (?:Series|round|funding)',
113
+ r'(\$\d+(?:\.\d+)?[MB]? (?:million|billion))',
114
+ ]
115
+
116
+ for pattern in funding_patterns:
117
+ match = re.search(pattern, snippet, re.IGNORECASE)
118
+ if match:
119
+ return match.group(0)[:50] # Limit length
120
+
121
+ return None
122
+
123
+
124
+ def _patch_mission(company: str) -> Optional[str]:
125
+ """Search for company mission/tagline."""
126
+ if not company:
127
+ return None
128
+
129
+ query = f'"{company}" company mission tagline about'
130
+ snippets = google_search(query, top=3, timeout=5)
131
+
132
+ for snippet in snippets:
133
+ # Look for mission-like sentences
134
+ sentences = re.split(r'[.!?]+', snippet)
135
+ for sentence in sentences:
136
+ sentence = sentence.strip()
137
+ # Look for sentences that describe what the company does
138
+ if (len(sentence) > 20 and len(sentence) < 200 and
139
+ any(word in sentence.lower() for word in ['build', 'create', 'develop', 'provide', 'help', 'enable', 'platform'])):
140
+ return sentence
141
+
142
+ return None
143
+
144
+
145
+ def _patch_location(company: str) -> Optional[str]:
146
+ """Search for company headquarters location."""
147
+ if not company:
148
+ return None
149
+
150
+ query = f'"{company}" headquarters location'
151
+ snippets = google_search(query, top=3, timeout=5)
152
+
153
+ for snippet in snippets:
154
+ # Look for location patterns
155
+ location_patterns = [
156
+ r'([A-Z][a-z]+,\s*[A-Z]{2})', # City, State
157
+ r'([A-Z][a-z]+\s+[A-Z][a-z]+,\s*[A-Z]{2})', # City City, State
158
+ r'([A-Z][a-z]+,\s*[A-Z][a-z]+)', # City, Country
159
+ ]
160
+
161
+ for pattern in location_patterns:
162
+ match = re.search(pattern, snippet)
163
+ if match:
164
+ location = match.group(1).strip()
165
+ # Sanity check for common US locations
166
+ if any(state in location for state in ['CA', 'NY', 'WA', 'TX', 'MA']):
167
+ return location
168
+
169
+ return None
micro/personalized_interview_guide.py ADDED
@@ -0,0 +1,755 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Personalized Interview Guide Generator
3
+ Uses advanced gap analysis to create truly personalized interview preparation content
4
+ """
5
+
6
+ from typing import Any, Dict, List, Optional
7
+ from llm_client import LLMClient
8
+ from metrics import log_metric
9
+ import json
10
+ from dataclasses import dataclass, asdict
11
+
12
+ @dataclass
13
+ class InterviewQuestion:
14
+ question: str
15
+ category: str # technical, behavioral, company, situational
16
+ difficulty: str # easy, medium, hard
17
+ why_asked: str # Why this question is relevant for this candidate
18
+ approach_strategy: str # How to approach answering
19
+ example_points: List[str] # Specific points from candidate's background to mention
20
+ follow_up_questions: List[str] # Likely follow-up questions
21
+
22
+ @dataclass
23
+ class PersonalizedGuideSection:
24
+ title: str
25
+ content: str
26
+ why_important: str # Why this section matters for this specific candidate
27
+ action_items: List[str]
28
+ time_to_complete: str
29
+
30
+ @dataclass
31
+ class PersonalizedInterviewGuide:
32
+ header: Dict[str, Any] # Match score, company info, etc.
33
+ executive_summary: str
34
+ skills_analysis: Dict[str, Any]
35
+ interview_process: PersonalizedGuideSection
36
+ technical_questions: List[InterviewQuestion]
37
+ behavioral_questions: List[InterviewQuestion]
38
+ company_questions: List[InterviewQuestion]
39
+ preparation_strategy: PersonalizedGuideSection
40
+ talking_points: PersonalizedGuideSection
41
+ questions_to_ask: List[str]
42
+ day_of_preparation: PersonalizedGuideSection
43
+ success_metrics: List[str]
44
+
45
+ class PersonalizedInterviewGuideGenerator:
46
+ """Generates truly personalized interview guides based on advanced gap analysis"""
47
+
48
+ def __init__(self):
49
+ self.llm_client = LLMClient()
50
+
51
+ async def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
52
+ resume_data = data.get("resume_data_enhanced", {})
53
+ job_data = data.get("job_data_enhanced", {})
54
+ gap_analysis = data.get("gap_analysis_advanced", {})
55
+
56
+ if not all([resume_data, job_data, gap_analysis]):
57
+ return {**data, "personalized_guide": {"error": "Missing required data for personalized guide"}}
58
+
59
+ try:
60
+ # Generate personalized interview guide
61
+ guide = await self._generate_personalized_guide(resume_data, job_data, gap_analysis)
62
+
63
+ log_metric("personalized_guide_success", {
64
+ "overall_match_score": gap_analysis.get("overall_match_score", 0),
65
+ "technical_questions": len(guide.technical_questions),
66
+ "behavioral_questions": len(guide.behavioral_questions),
67
+ "total_action_items": sum(len(section.action_items) for section in [
68
+ guide.preparation_strategy, guide.talking_points, guide.day_of_preparation
69
+ ])
70
+ })
71
+
72
+ return {**data, "personalized_guide": asdict(guide)}
73
+
74
+ except Exception as e:
75
+ log_metric("personalized_guide_error", {"error": str(e)})
76
+ return {**data, "personalized_guide": {"error": f"Personalized guide generation failed: {e}"}}
77
+
78
+ async def _generate_personalized_guide(self, resume_data: Dict[str, Any],
79
+ job_data: Dict[str, Any],
80
+ gap_analysis: Dict[str, Any]) -> PersonalizedInterviewGuide:
81
+ """Generate the complete personalized interview guide"""
82
+
83
+ # Extract key information
84
+ role = job_data.get("role", "Unknown Role")
85
+ company = job_data.get("company", "Unknown Company")
86
+ match_score = gap_analysis.get("overall_match_score", 0)
87
+ candidate_name = resume_data.get("personal_info", {}).get("name", "")
88
+
89
+ # Generate header information
90
+ header = self._create_header(role, company, match_score, gap_analysis)
91
+
92
+ # Generate executive summary
93
+ executive_summary = await self._generate_executive_summary(
94
+ resume_data, job_data, gap_analysis
95
+ )
96
+
97
+ # Create skills analysis visualization
98
+ skills_analysis = self._create_skills_analysis(gap_analysis)
99
+
100
+ # Generate personalized sections
101
+ interview_process = await self._generate_interview_process_section(
102
+ job_data, gap_analysis
103
+ )
104
+
105
+ technical_questions = await self._generate_technical_questions(
106
+ job_data, gap_analysis, resume_data
107
+ )
108
+
109
+ behavioral_questions = await self._generate_behavioral_questions(
110
+ job_data, resume_data, gap_analysis
111
+ )
112
+
113
+ company_questions = await self._generate_company_questions(
114
+ job_data, gap_analysis
115
+ )
116
+
117
+ preparation_strategy = await self._generate_preparation_strategy(
118
+ gap_analysis, job_data
119
+ )
120
+
121
+ talking_points = await self._generate_talking_points(
122
+ resume_data, gap_analysis
123
+ )
124
+
125
+ questions_to_ask = await self._generate_questions_to_ask(
126
+ job_data, gap_analysis
127
+ )
128
+
129
+ day_of_preparation = await self._generate_day_of_preparation(
130
+ gap_analysis, job_data
131
+ )
132
+
133
+ success_metrics = self._generate_success_metrics(gap_analysis)
134
+
135
+ return PersonalizedInterviewGuide(
136
+ header=header,
137
+ executive_summary=executive_summary,
138
+ skills_analysis=skills_analysis,
139
+ interview_process=interview_process,
140
+ technical_questions=technical_questions,
141
+ behavioral_questions=behavioral_questions,
142
+ company_questions=company_questions,
143
+ preparation_strategy=preparation_strategy,
144
+ talking_points=talking_points,
145
+ questions_to_ask=questions_to_ask,
146
+ day_of_preparation=day_of_preparation,
147
+ success_metrics=success_metrics
148
+ )
149
+
150
+ def _create_header(self, role: str, company: str, match_score: float,
151
+ gap_analysis: Dict[str, Any]) -> Dict[str, Any]:
152
+ """Create header with match visualization"""
153
+
154
+ # Determine match level and color
155
+ if match_score >= 85:
156
+ match_level = "Excellent Match"
157
+ match_emoji = "🟢"
158
+ elif match_score >= 70:
159
+ match_level = "Strong Match"
160
+ match_emoji = "🟡"
161
+ elif match_score >= 55:
162
+ match_level = "Good Match"
163
+ match_emoji = "🟠"
164
+ else:
165
+ match_level = "Developing Match"
166
+ match_emoji = "🔴"
167
+
168
+ strong_matches = len(gap_analysis.get("strong_matches", []))
169
+ missing_skills = len(gap_analysis.get("missing_skills", []))
170
+
171
+ return {
172
+ "role": role,
173
+ "company": company,
174
+ "match_score": round(match_score, 1),
175
+ "match_level": match_level,
176
+ "match_emoji": match_emoji,
177
+ "strong_matches_count": strong_matches,
178
+ "missing_skills_count": missing_skills,
179
+ "total_requirements": gap_analysis.get("total_requirements", 0)
180
+ }
181
+
182
+ async def _generate_executive_summary(self, resume_data: Dict[str, Any],
183
+ job_data: Dict[str, Any],
184
+ gap_analysis: Dict[str, Any]) -> str:
185
+ """Generate personalized executive summary"""
186
+
187
+ role = job_data.get("role", "Unknown Role")
188
+ company = job_data.get("company", "Unknown Company")
189
+ match_score = gap_analysis.get("overall_match_score", 0)
190
+ years_exp = resume_data.get("years_of_experience", 0)
191
+ strengths = gap_analysis.get("strengths_summary", "")
192
+ gaps = gap_analysis.get("gaps_summary", "")
193
+ competitive_advantages = gap_analysis.get("competitive_advantages", [])
194
+
195
+ prompt = f"""
196
+ Write a personalized executive summary for an interview guide. This should sound like a knowledgeable mentor who has analyzed their specific background.
197
+
198
+ Candidate Profile:
199
+ - Years of experience: {years_exp}
200
+ - Target role: {role} at {company}
201
+ - Match score: {match_score}%
202
+ - Strengths: {strengths}
203
+ - Gaps to address: {gaps}
204
+ - Competitive advantages: {', '.join(competitive_advantages[:3])}
205
+
206
+ Guidelines:
207
+ - Start with "{role} interview" in first 100 words
208
+ - Address the candidate directly ("you")
209
+ - Be specific about their unique position
210
+ - Reference their actual background
211
+ - Confident, mentor-like tone
212
+ - 3-4 sentences max
213
+ - Actionable and encouraging
214
+
215
+ Example tone: "Your background in Python and machine learning puts you in a strong position for this Data Scientist role at TechCorp. With 5+ years of experience, you bring the technical depth they're seeking, and your AWS skills differentiate you from other candidates. Focus your preparation on demonstrating your model deployment experience and be ready to discuss your specific ML project outcomes."
216
+ """
217
+
218
+ try:
219
+ return self.llm_client.call_llm(prompt, temperature=0.3, max_tokens=300)
220
+ except:
221
+ return f"You're well-positioned for this {role} role at {company} with a {match_score}% match score."
222
+
223
+ def _create_skills_analysis(self, gap_analysis: Dict[str, Any]) -> Dict[str, Any]:
224
+ """Create visual skills analysis"""
225
+
226
+ strong_matches = gap_analysis.get("strong_matches", [])
227
+ partial_matches = gap_analysis.get("partial_matches", [])
228
+ missing_skills = gap_analysis.get("missing_skills", [])
229
+
230
+ # Create text-based visualization
231
+ skills_breakdown = {
232
+ "strong": [match.get("skill_name", "") for match in strong_matches[:8]],
233
+ "partial": [match.get("skill_name", "") for match in partial_matches[:6]],
234
+ "missing": [match.get("skill_name", "") for match in missing_skills[:6]]
235
+ }
236
+
237
+ # Generate bar chart representation
238
+ total_skills = len(strong_matches) + len(partial_matches) + len(missing_skills)
239
+
240
+ if total_skills > 0:
241
+ strong_percentage = len(strong_matches) / total_skills * 100
242
+ partial_percentage = len(partial_matches) / total_skills * 100
243
+ missing_percentage = len(missing_skills) / total_skills * 100
244
+ else:
245
+ strong_percentage = partial_percentage = missing_percentage = 0
246
+
247
+ return {
248
+ "skills_breakdown": skills_breakdown,
249
+ "percentages": {
250
+ "strong": round(strong_percentage, 1),
251
+ "partial": round(partial_percentage, 1),
252
+ "missing": round(missing_percentage, 1)
253
+ },
254
+ "summary": gap_analysis.get("strengths_summary", ""),
255
+ "categories_analysis": gap_analysis.get("skill_categories_analysis", {})
256
+ }
257
+
258
+ async def _generate_interview_process_section(self, job_data: Dict[str, Any],
259
+ gap_analysis: Dict[str, Any]) -> PersonalizedGuideSection:
260
+ """Generate personalized interview process section"""
261
+
262
+ company = job_data.get("company", "Unknown Company")
263
+ role = job_data.get("role", "Unknown Role")
264
+ seniority = job_data.get("seniority_level", "mid")
265
+ company_stage = job_data.get("company_stage", "enterprise")
266
+ match_score = gap_analysis.get("overall_match_score", 0)
267
+
268
+ prompt = f"""
269
+ Describe the likely interview process for a {role} position at {company} ({company_stage} company, {seniority} level).
270
+
271
+ Context:
272
+ - Candidate match score: {match_score}%
273
+ - Company stage: {company_stage}
274
+ - Role level: {seniority}
275
+
276
+ Include:
277
+ 1. Typical number of rounds
278
+ 2. Types of interviews expected
279
+ 3. Key stakeholders they'll meet
280
+ 4. Timeline and logistics
281
+ 5. Company-specific insights if available
282
+ 6. What to expect given their match score
283
+
284
+ Keep it practical and specific. Use markdown formatting.
285
+ Max 250 words.
286
+ """
287
+
288
+ try:
289
+ content = self.llm_client.call_llm(prompt, temperature=0, max_tokens=400)
290
+ except:
291
+ content = f"Typical {role} interviews include 3-4 rounds: phone screen, technical assessment, team interviews, and final round."
292
+
293
+ why_important = f"Understanding {company}'s process helps you prepare appropriately for each stage and set proper expectations."
294
+
295
+ action_items = [
296
+ f"Research {company}'s interview style on Glassdoor",
297
+ "Prepare for technical screening",
298
+ "Ready examples for behavioral questions",
299
+ "Prepare thoughtful questions for each interviewer"
300
+ ]
301
+
302
+ return PersonalizedGuideSection(
303
+ title="Interview Process",
304
+ content=content,
305
+ why_important=why_important,
306
+ action_items=action_items,
307
+ time_to_complete="30 minutes research"
308
+ )
309
+
310
+ async def _generate_technical_questions(self, job_data: Dict[str, Any],
311
+ gap_analysis: Dict[str, Any],
312
+ resume_data: Dict[str, Any]) -> List[InterviewQuestion]:
313
+ """Generate personalized technical questions"""
314
+
315
+ strong_matches = gap_analysis.get("strong_matches", [])
316
+ missing_skills = gap_analysis.get("missing_skills", [])
317
+ role = job_data.get("role", "Unknown Role")
318
+
319
+ # Focus on areas where candidate has gaps or needs to demonstrate strength
320
+ focus_areas = []
321
+
322
+ # Add strong areas to showcase
323
+ for match in strong_matches[:3]:
324
+ focus_areas.append({
325
+ "skill": match.get("skill_name", ""),
326
+ "type": "strength",
327
+ "context": f"Highlight your {match.get('skill_name', '')} expertise"
328
+ })
329
+
330
+ # Add gap areas they need to address
331
+ for match in missing_skills[:3]:
332
+ if match.get("importance") == "required":
333
+ focus_areas.append({
334
+ "skill": match.get("skill_name", ""),
335
+ "type": "gap",
336
+ "context": f"Be prepared for basic {match.get('skill_name', '')} questions"
337
+ })
338
+
339
+ questions = []
340
+
341
+ for area in focus_areas[:6]: # Limit to 6 technical questions
342
+ question = await self._generate_single_technical_question(
343
+ area, role, resume_data
344
+ )
345
+ if question:
346
+ questions.append(question)
347
+
348
+ return questions
349
+
350
+ async def _generate_single_technical_question(self, focus_area: Dict[str, Any],
351
+ role: str, resume_data: Dict[str, Any]) -> Optional[InterviewQuestion]:
352
+ """Generate a single personalized technical question"""
353
+
354
+ skill = focus_area["skill"]
355
+ area_type = focus_area["type"]
356
+
357
+ # Get candidate's relevant experience
358
+ relevant_projects = []
359
+ for project in resume_data.get("projects", []):
360
+ if isinstance(project, dict):
361
+ if skill.lower() in str(project.get("technologies", [])).lower():
362
+ relevant_projects.append(project.get("name", ""))
363
+
364
+ # Get relevant work experience
365
+ relevant_experience = []
366
+ for exp in resume_data.get("experience", []):
367
+ if isinstance(exp, dict):
368
+ if skill.lower() in str(exp.get("technologies", [])).lower():
369
+ relevant_experience.append(exp.get("title", ""))
370
+
371
+ prompt = f"""
372
+ Generate a technical interview question for a {role} position focusing on {skill}.
373
+
374
+ Context:
375
+ - Question type: {"showcase strength" if area_type == "strength" else "assess knowledge"}
376
+ - Candidate has {skill} in: {', '.join(relevant_projects + relevant_experience) or 'limited context'}
377
+ - This should be {"medium-hard" if area_type == "strength" else "easy-medium"} difficulty
378
+
379
+ Return JSON:
380
+ {{
381
+ "question": "Specific technical question",
382
+ "difficulty": "easy|medium|hard",
383
+ "why_asked": "Why this matters for this candidate specifically",
384
+ "approach_strategy": "How candidate should approach answering",
385
+ "example_points": ["Specific points from their background to mention"],
386
+ "follow_up_questions": ["Likely follow-up questions"]
387
+ }}
388
+
389
+ Make it specific to their background and the role.
390
+ """
391
+
392
+ try:
393
+ response = self.llm_client.call_llm(prompt, temperature=0.3, max_tokens=600)
394
+
395
+ # Parse JSON response
396
+ json_start = response.find('{')
397
+ json_end = response.rfind('}') + 1
398
+ if json_start != -1 and json_end > json_start:
399
+ data = json.loads(response[json_start:json_end])
400
+
401
+ return InterviewQuestion(
402
+ question=data.get("question", ""),
403
+ category="technical",
404
+ difficulty=data.get("difficulty", "medium"),
405
+ why_asked=data.get("why_asked", ""),
406
+ approach_strategy=data.get("approach_strategy", ""),
407
+ example_points=data.get("example_points", []),
408
+ follow_up_questions=data.get("follow_up_questions", [])
409
+ )
410
+ except:
411
+ pass
412
+
413
+ return None
414
+
415
+ async def _generate_behavioral_questions(self, job_data: Dict[str, Any],
416
+ resume_data: Dict[str, Any],
417
+ gap_analysis: Dict[str, Any]) -> List[InterviewQuestion]:
418
+ """Generate personalized behavioral questions"""
419
+
420
+ role = job_data.get("role", "Unknown Role")
421
+ company_stage = job_data.get("company_stage", "enterprise")
422
+ years_exp = resume_data.get("years_of_experience", 0)
423
+
424
+ # Generate questions based on candidate's background and role requirements
425
+ behavioral_focus = []
426
+
427
+ if years_exp < 3:
428
+ behavioral_focus.extend(["learning agility", "collaboration", "problem-solving"])
429
+ elif years_exp >= 5:
430
+ behavioral_focus.extend(["leadership", "mentoring", "conflict resolution"])
431
+ else:
432
+ behavioral_focus.extend(["project management", "cross-functional collaboration", "initiative"])
433
+
434
+ if company_stage == "startup":
435
+ behavioral_focus.append("adaptability")
436
+ elif company_stage == "enterprise":
437
+ behavioral_focus.append("process improvement")
438
+
439
+ questions = []
440
+
441
+ for focus in behavioral_focus[:4]: # Limit to 4 behavioral questions
442
+ question = await self._generate_single_behavioral_question(
443
+ focus, role, resume_data
444
+ )
445
+ if question:
446
+ questions.append(question)
447
+
448
+ return questions
449
+
450
+ async def _generate_single_behavioral_question(self, focus_area: str, role: str,
451
+ resume_data: Dict[str, Any]) -> Optional[InterviewQuestion]:
452
+ """Generate a single personalized behavioral question"""
453
+
454
+ # Get relevant experience for this behavioral area
455
+ recent_roles = []
456
+ for exp in resume_data.get("experience", [])[:2]: # Last 2 roles
457
+ if isinstance(exp, dict):
458
+ recent_roles.append({
459
+ "title": exp.get("title", ""),
460
+ "company": exp.get("company", ""),
461
+ "achievements": exp.get("achievements", [])
462
+ })
463
+
464
+ prompt = f"""
465
+ Generate a behavioral interview question for a {role} position focusing on {focus_area}.
466
+
467
+ Candidate context:
468
+ - Recent roles: {', '.join([r['title'] for r in recent_roles])}
469
+ - Key achievements: {', '.join([ach for role in recent_roles for ach in role.get('achievements', [])[:2]])}
470
+
471
+ Return JSON:
472
+ {{
473
+ "question": "STAR-format behavioral question",
474
+ "difficulty": "medium",
475
+ "why_asked": "Why this matters for this specific candidate and role",
476
+ "approach_strategy": "How to structure the STAR response",
477
+ "example_points": ["Specific experiences from their background to reference"],
478
+ "follow_up_questions": ["Likely follow-up questions"]
479
+ }}
480
+
481
+ Make the question specific and give them concrete examples from their background to use.
482
+ """
483
+
484
+ try:
485
+ response = self.llm_client.call_llm(prompt, temperature=0.3, max_tokens=600)
486
+
487
+ json_start = response.find('{')
488
+ json_end = response.rfind('}') + 1
489
+ if json_start != -1 and json_end > json_start:
490
+ data = json.loads(response[json_start:json_end])
491
+
492
+ return InterviewQuestion(
493
+ question=data.get("question", ""),
494
+ category="behavioral",
495
+ difficulty=data.get("difficulty", "medium"),
496
+ why_asked=data.get("why_asked", ""),
497
+ approach_strategy=data.get("approach_strategy", ""),
498
+ example_points=data.get("example_points", []),
499
+ follow_up_questions=data.get("follow_up_questions", [])
500
+ )
501
+ except:
502
+ pass
503
+
504
+ return None
505
+
506
+ async def _generate_company_questions(self, job_data: Dict[str, Any],
507
+ gap_analysis: Dict[str, Any]) -> List[InterviewQuestion]:
508
+ """Generate company-specific questions"""
509
+
510
+ company = job_data.get("company", "Unknown Company")
511
+ role = job_data.get("role", "Unknown Role")
512
+ company_stage = job_data.get("company_stage", "enterprise")
513
+
514
+ questions = []
515
+
516
+ # Generate 2-3 company-specific questions
517
+ for topic in ["company culture", "role challenges", "team dynamics"][:3]:
518
+ question = await self._generate_single_company_question(
519
+ topic, company, role, company_stage
520
+ )
521
+ if question:
522
+ questions.append(question)
523
+
524
+ return questions
525
+
526
+ async def _generate_single_company_question(self, topic: str, company: str,
527
+ role: str, stage: str) -> Optional[InterviewQuestion]:
528
+ """Generate a single company-specific question"""
529
+
530
+ prompt = f"""
531
+ Generate a company-specific interview question about {topic} for {role} at {company} ({stage} company).
532
+
533
+ Return JSON:
534
+ {{
535
+ "question": "Company-specific question they might ask",
536
+ "difficulty": "medium",
537
+ "why_asked": "Why this company asks this question",
538
+ "approach_strategy": "How to answer effectively",
539
+ "example_points": ["Key points to include in answer"],
540
+ "follow_up_questions": ["Likely follow-up questions"]
541
+ }}
542
+
543
+ Make it specific to the company stage and role.
544
+ """
545
+
546
+ try:
547
+ response = self.llm_client.call_llm(prompt, temperature=0.3, max_tokens=500)
548
+
549
+ json_start = response.find('{')
550
+ json_end = response.rfind('}') + 1
551
+ if json_start != -1 and json_end > json_start:
552
+ data = json.loads(response[json_start:json_end])
553
+
554
+ return InterviewQuestion(
555
+ question=data.get("question", ""),
556
+ category="company",
557
+ difficulty=data.get("difficulty", "medium"),
558
+ why_asked=data.get("why_asked", ""),
559
+ approach_strategy=data.get("approach_strategy", ""),
560
+ example_points=data.get("example_points", []),
561
+ follow_up_questions=data.get("follow_up_questions", [])
562
+ )
563
+ except:
564
+ pass
565
+
566
+ return None
567
+
568
+ async def _generate_preparation_strategy(self, gap_analysis: Dict[str, Any],
569
+ job_data: Dict[str, Any]) -> PersonalizedGuideSection:
570
+ """Generate personalized preparation strategy"""
571
+
572
+ priority_items = gap_analysis.get("preparation_priority", [])
573
+ missing_skills = gap_analysis.get("missing_skills", [])
574
+ match_score = gap_analysis.get("overall_match_score", 0)
575
+
576
+ strategy_content = f"""
577
+ ## Your Preparation Roadmap
578
+
579
+ Based on your {match_score}% match score, here's your personalized preparation strategy:
580
+
581
+ ### Immediate Priorities (Next 2-3 Days)
582
+ {chr(10).join([f'- {item}' for item in priority_items[:3]])}
583
+
584
+ ### This Week
585
+ {chr(10).join([f'- Review {skill.get("job_requirement", skill.get("skill_name", ""))} basics' for skill in missing_skills[:3] if skill.get("importance") == "required" and (skill.get("job_requirement") or skill.get("skill_name"))])}
586
+
587
+ ### Study Schedule
588
+ - **Technical prep**: 60% of time on gap areas
589
+ - **Behavioral prep**: 25% of time on STAR examples
590
+ - **Company research**: 15% of time on culture/mission
591
+ """
592
+
593
+ why_important = f"This targeted approach maximizes your preparation time by focusing on your specific gaps and strengths."
594
+
595
+ action_items = [
596
+ "Complete priority technical reviews",
597
+ "Prepare 5-7 STAR behavioral examples",
598
+ "Practice explaining your project experiences",
599
+ "Research company's recent news and developments"
600
+ ]
601
+
602
+ return PersonalizedGuideSection(
603
+ title="Preparation Strategy",
604
+ content=strategy_content,
605
+ why_important=why_important,
606
+ action_items=action_items,
607
+ time_to_complete="5-7 hours over 3-5 days"
608
+ )
609
+
610
+ async def _generate_talking_points(self, resume_data: Dict[str, Any],
611
+ gap_analysis: Dict[str, Any]) -> PersonalizedGuideSection:
612
+ """Generate personalized talking points from resume"""
613
+
614
+ strong_matches = gap_analysis.get("strong_matches", [])
615
+ competitive_advantages = gap_analysis.get("competitive_advantages", [])
616
+
617
+ # Extract key achievements
618
+ key_achievements = []
619
+ for exp in resume_data.get("experience", [])[:2]:
620
+ if isinstance(exp, dict):
621
+ achievements = exp.get("achievements", [])[:2]
622
+ key_achievements.extend(achievements)
623
+
624
+ # Extract notable projects
625
+ notable_projects = []
626
+ for project in resume_data.get("projects", [])[:3]:
627
+ if isinstance(project, dict):
628
+ notable_projects.append({
629
+ "name": project.get("name", ""),
630
+ "description": project.get("description", ""),
631
+ "technologies": project.get("technologies", [])
632
+ })
633
+
634
+ content = f"""
635
+ ## Your Key Talking Points
636
+
637
+ ### Lead with Your Strengths
638
+ {chr(10).join([f'- **{match.get("resume_skill", match.get("job_requirement", ""))}**: Highlight your {match.get("resume_skill", "")} experience' for match in strong_matches[:3] if match.get("resume_skill") or match.get("job_requirement")])}
639
+
640
+ ### Competitive Advantages
641
+ {chr(10).join([f'- {advantage}' for advantage in competitive_advantages[:3]])}
642
+
643
+ ### Project Highlights
644
+ {chr(10).join([f'- **{proj["name"]}**: {proj["description"][:100]}...' for proj in notable_projects[:3] if proj.get("name") and proj.get("description")])}
645
+
646
+ ### Achievement Examples
647
+ {chr(10).join([f'- {achievement[:100]}...' for achievement in key_achievements[:3] if achievement])}
648
+ """
649
+
650
+ why_important = "These talking points are directly pulled from your background and align with what this role values most."
651
+
652
+ action_items = [
653
+ "Practice describing each project in 2-3 minutes",
654
+ "Quantify achievements with specific numbers",
655
+ "Prepare follow-up details for each talking point",
656
+ "Connect each point back to the role requirements"
657
+ ]
658
+
659
+ return PersonalizedGuideSection(
660
+ title="Key Talking Points",
661
+ content=content,
662
+ why_important=why_important,
663
+ action_items=action_items,
664
+ time_to_complete="2 hours preparation"
665
+ )
666
+
667
+ async def _generate_questions_to_ask(self, job_data: Dict[str, Any],
668
+ gap_analysis: Dict[str, Any]) -> List[str]:
669
+ """Generate smart questions for the candidate to ask"""
670
+
671
+ company = job_data.get("company", "Unknown Company")
672
+ role = job_data.get("role", "Unknown Role")
673
+ company_stage = job_data.get("company_stage", "enterprise")
674
+ missing_skills = gap_analysis.get("missing_skills", [])
675
+
676
+ questions = [
677
+ f"What does success look like for a {role} in the first 90 days?",
678
+ f"How does the team approach professional development and learning?",
679
+ f"What are the biggest technical challenges facing the team right now?",
680
+ f"How does {company} support career growth for {role}s?",
681
+ f"What's the collaboration like between {role} and other teams?"
682
+ ]
683
+
684
+ # Add questions about learning opportunities for missing skills
685
+ if missing_skills:
686
+ skill_names = [skill.get("job_requirement", skill.get("skill_name", "")) for skill in missing_skills[:2] if skill.get("job_requirement") or skill.get("skill_name")]
687
+ if skill_names:
688
+ questions.append(f"Are there opportunities to develop skills in {', '.join(skill_names)}?")
689
+
690
+ return questions[:6]
691
+
692
+ async def _generate_day_of_preparation(self, gap_analysis: Dict[str, Any],
693
+ job_data: Dict[str, Any]) -> PersonalizedGuideSection:
694
+ """Generate day-of interview preparation"""
695
+
696
+ match_score = gap_analysis.get("overall_match_score", 0)
697
+ strong_matches = gap_analysis.get("strong_matches", [])
698
+
699
+ content = f"""
700
+ ## Day-of-Interview Checklist
701
+
702
+ ### Morning Review (30 minutes)
703
+ - Review your top 3 strengths: {', '.join([m.get('resume_skill', m.get('job_requirement', '')) for m in strong_matches[:3] if m.get('resume_skill') or m.get('job_requirement')])}
704
+ - Practice your 2-minute elevator pitch
705
+ - Review company's recent news/updates
706
+ - Check logistics (time, location, interviewer names)
707
+
708
+ ### Mental Preparation
709
+ - Confidence booster: You have a {match_score}% match score
710
+ - Remember your competitive advantages
711
+ - Focus on learning and growth mindset for any gaps
712
+
713
+ ### Final Reminders
714
+ - Bring copies of resume and portfolio
715
+ - Prepare notepad for notes and questions
716
+ - Arrive 10 minutes early
717
+ - Dress appropriately for company culture
718
+ """
719
+
720
+ why_important = "Last-minute preparation builds confidence and ensures you're mentally ready to showcase your best self."
721
+
722
+ action_items = [
723
+ "Set up outfit and materials the night before",
724
+ "Do a practice run of your travel route",
725
+ "Review your talking points one final time",
726
+ "Get good sleep and eat a proper breakfast"
727
+ ]
728
+
729
+ return PersonalizedGuideSection(
730
+ title="Day-of Preparation",
731
+ content=content,
732
+ why_important=why_important,
733
+ action_items=action_items,
734
+ time_to_complete="30 minutes morning of"
735
+ )
736
+
737
+ def _generate_success_metrics(self, gap_analysis: Dict[str, Any]) -> List[str]:
738
+ """Generate success metrics for the interview"""
739
+
740
+ match_score = gap_analysis.get("overall_match_score", 0)
741
+ strong_matches = len(gap_analysis.get("strong_matches", []))
742
+
743
+ metrics = [
744
+ f"Successfully demonstrate {strong_matches} core strengths",
745
+ "Ask 3-4 thoughtful questions about the role/team",
746
+ "Share specific examples from your background",
747
+ "Show enthusiasm for learning and growth"
748
+ ]
749
+
750
+ if match_score >= 70:
751
+ metrics.append("Position yourself as a strong candidate ready to contribute immediately")
752
+ else:
753
+ metrics.append("Show strong learning agility and potential for growth")
754
+
755
+ return metrics
micro/qa.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict
2
+ from llm_client import llm_client
3
+ from prompt_loader import prompt_loader
4
+ from metrics import log_metric
5
+
6
+ class QAMicroFunction:
7
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
8
+ draft = data.get("draft", "")
9
+ enriched_data = data.get("enriched", {})
10
+
11
+ if not draft or "failed" in draft.lower():
12
+ return {**data, "qa_result": "QA skipped due to draft errors."}
13
+
14
+ try:
15
+ # Use LLM to perform quality assurance
16
+ prompt = prompt_loader.get_prompt("qa_prompt", draft=draft)
17
+
18
+ qa_prompt = prompt + f"""
19
+
20
+ Review this job role preview and interview kit for:
21
+
22
+ 1. **Accuracy**: Does the content match the job data?
23
+ 2. **Completeness**: Are all sections well-developed?
24
+ 3. **Clarity**: Is the language clear and actionable?
25
+ 4. **Formatting**: Is the markdown properly structured?
26
+ 5. **Relevance**: Is the advice practical and current?
27
+
28
+ Job data context: {enriched_data}
29
+
30
+ Content to review:
31
+ {draft}
32
+
33
+ Provide feedback in this format:
34
+ ## QA Results
35
+ - **Overall Quality**: [Pass/Needs Improvement/Fail]
36
+ - **Issues Found**: [List specific issues or "None"]
37
+ - **Suggestions**: [Improvement recommendations]
38
+ - **Auto-fixes Applied**: [Any corrections made]
39
+
40
+ If minor issues are found, provide the corrected version after your analysis.
41
+ """
42
+
43
+ qa_response = llm_client.call_llm(qa_prompt)
44
+
45
+ # Add bucket verification
46
+ bucket_markdown = data.get("bucket_markdown", "")
47
+
48
+ if bucket_markdown:
49
+ required_headers = [
50
+ "### Team & Manager",
51
+ "### Tech Stack Snapshot",
52
+ "### Business Context",
53
+ "### Comp & Leveling",
54
+ "### Career Trajectory",
55
+ "### Culture/WLB",
56
+ "### Interview Runway",
57
+ "### Onboarding & Tooling",
58
+ "### Location/Remote",
59
+ "### Strategic Risks"
60
+ ]
61
+
62
+ missing_headers = []
63
+ for header in required_headers:
64
+ if header not in bucket_markdown:
65
+ missing_headers.append(header)
66
+
67
+ if missing_headers:
68
+ log_metric("qa_grade", {"bucket_verification": "FAIL", "missing": missing_headers})
69
+ qa_response += f"\n\n**BUCKET VERIFICATION FAILED**: Missing headers: {missing_headers}"
70
+ else:
71
+ log_metric("qa_grade", {"bucket_verification": "PASS"})
72
+ qa_response += f"\n\n**BUCKET VERIFICATION PASSED**: All 10 bucket headers present"
73
+
74
+ # Check if auto-fixes were applied
75
+ if "corrected version" in qa_response.lower() or "auto-fixes applied" in qa_response.lower():
76
+ # Extract corrected content if available
77
+ parts = qa_response.split("## Corrected Version")
78
+ if len(parts) > 1:
79
+ corrected_draft = parts[1].strip()
80
+ data["draft"] = corrected_draft
81
+ log_metric("qa_auto_fix", {"fixes_applied": True})
82
+
83
+ log_metric("qa_success", {
84
+ "content_length": len(draft),
85
+ "qa_response_length": len(qa_response)
86
+ })
87
+
88
+ return {**data, "qa_result": qa_response}
89
+
90
+ except Exception as e:
91
+ log_metric("qa_error", {"error": str(e)})
92
+ return {**data, "qa_result": f"QA failed: {e}"}
micro/render.py ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, Optional
2
+ from metrics import log_metric
3
+
4
+ class RenderMicroFunction:
5
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
6
+ draft = data.get("draft", "")
7
+ qa_result = data.get("qa_result", "")
8
+ critique = data.get("critique", "")
9
+ enriched_data = data.get("enriched", {})
10
+ quality_score = data.get("quality_score")
11
+
12
+ try:
13
+ # Create comprehensive markdown output
14
+ rendered = self._create_final_output(draft, qa_result, critique, enriched_data, quality_score)
15
+
16
+ # Add bucket processing
17
+ bucket_facts = data.get("bucket_facts", {})
18
+ bucket_markdown = ""
19
+ if bucket_facts:
20
+ from bucket_map import map_facts
21
+ from render_buckets import render_buckets
22
+
23
+ buckets = map_facts(bucket_facts)
24
+ bucket_markdown = render_buckets(bucket_facts, buckets)
25
+
26
+ log_metric("bucket_missing", {
27
+ "empty_buckets": len([k for k, v in buckets.items() if not v or (len(v) == 1 and "research needed" in v[0].lower())])
28
+ })
29
+
30
+ log_metric("render_success", {
31
+ "total_length": len(rendered),
32
+ "has_qa": bool(qa_result),
33
+ "has_critique": bool(critique),
34
+ "quality_score": quality_score,
35
+ "has_buckets": bool(bucket_markdown)
36
+ })
37
+
38
+ return {**data, "rendered_markdown": rendered, "bucket_markdown": bucket_markdown}
39
+
40
+ except Exception as e:
41
+ log_metric("render_error", {"error": str(e)})
42
+ fallback = f"# Job Analysis Results\n\n{draft}\n\n---\n\nQA: {qa_result}\n\nCritique: {critique}"
43
+ return {**data, "rendered_markdown": fallback, "bucket_markdown": ""}
44
+
45
+ def _create_final_output(self, draft: str, qa_result: str, critique: str, enriched_data: Dict, quality_score: Optional[float] = None) -> str:
46
+ """Create comprehensive final output"""
47
+
48
+ # Header with quality indicator
49
+ quality_indicator = ""
50
+ if quality_score:
51
+ if quality_score >= 8:
52
+ quality_indicator = "🟢 High Quality"
53
+ elif quality_score >= 6:
54
+ quality_indicator = "🟡 Good Quality"
55
+ else:
56
+ quality_indicator = "🔴 Needs Improvement"
57
+
58
+ # Extract key job info for header
59
+ role = enriched_data.get("role", "Unknown Role")
60
+ company = enriched_data.get("company", "Unknown Company")
61
+ level = enriched_data.get("level", "")
62
+
63
+ header = f"""# 🎯 {role} at {company}
64
+ {f"**Level**: {level}" if level else ""}
65
+ {f"**Quality**: {quality_indicator}" if quality_indicator else ""}
66
+
67
+ ---
68
+ """
69
+
70
+ # Main content (the draft)
71
+ main_content = draft if draft else "Content generation failed."
72
+
73
+ # QA and Critique sections (collapsible)
74
+ qa_section = ""
75
+ if qa_result and qa_result != "QA skipped due to draft errors.":
76
+ qa_section = f"""
77
+
78
+ <details>
79
+ <summary>📋 Quality Assurance Results</summary>
80
+
81
+ {qa_result}
82
+
83
+ </details>
84
+ """
85
+
86
+ critique_section = ""
87
+ if critique and critique != "Critique skipped due to draft errors.":
88
+ critique_section = f"""
89
+
90
+ <details>
91
+ <summary>🔍 Expert Critique</summary>
92
+
93
+ {critique}
94
+
95
+ </details>
96
+ """
97
+
98
+ # Footer with metadata
99
+ tech_stack = enriched_data.get("tech_stack", [])
100
+ work_mode = enriched_data.get("work_mode", "")
101
+ salary_range = enriched_data.get("salary_range", "")
102
+
103
+ metadata = []
104
+ if tech_stack:
105
+ metadata.append(f"**Tech Stack**: {', '.join(tech_stack[:5])}")
106
+ if work_mode:
107
+ metadata.append(f"**Work Mode**: {work_mode}")
108
+ if salary_range and salary_range != "Not specified":
109
+ metadata.append(f"**Salary**: {salary_range}")
110
+
111
+ footer = ""
112
+ if metadata:
113
+ footer = f"""
114
+
115
+ ---
116
+ ## 📊 Quick Facts
117
+ {chr(10).join([f"- {item}" for item in metadata])}
118
+ """
119
+
120
+ # Combine all sections
121
+ final_output = header + main_content + qa_section + critique_section + footer
122
+
123
+ return final_output
micro/resume_parser.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional
2
+ import re
3
+ from llm_client import llm_client
4
+ from prompt_loader import prompt_loader
5
+ from metrics import log_metric
6
+ from text_extractor import extract_nobs, robust_json_parse
7
+
8
+ class ResumeParserMicroFunction:
9
+ async def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
10
+ resume_text = data.get("resume_text", "")
11
+
12
+ if not resume_text:
13
+ return {**data, "resume_data": {"error": "No resume content provided"}}
14
+
15
+ try:
16
+ # Extract structured resume data
17
+ resume_data = await self._extract_resume_data(resume_text)
18
+
19
+ log_metric("resume_parse_success", {
20
+ "skills_count": len(resume_data.get("skills", {}).get("technical", [])),
21
+ "experience_count": len(resume_data.get("experience", [])),
22
+ "projects_count": len(resume_data.get("projects", []))
23
+ })
24
+
25
+ return {**data, "resume_data": resume_data}
26
+
27
+ except Exception as e:
28
+ log_metric("resume_parse_error", {"error": str(e)})
29
+ return {**data, "resume_data": {"error": f"Resume parsing failed: {e}"}}
30
+
31
+ async def _extract_resume_data(self, resume_text: str) -> Dict[str, Any]:
32
+ """Extract structured data from resume using LLM"""
33
+
34
+ # Use robust extraction for large resumes
35
+ structured_data = await extract_nobs(resume_text)
36
+
37
+ if not structured_data or "error" in structured_data:
38
+ # Fallback to basic LLM extraction
39
+ prompt = f"""
40
+ Extract structured data from this resume:
41
+
42
+ {resume_text}
43
+
44
+ Return JSON with:
45
+ {{
46
+ "personal_info": {{
47
+ "name": "string",
48
+ "email": "string",
49
+ "phone": "string",
50
+ "location": "string",
51
+ "linkedin": "string",
52
+ "github": "string"
53
+ }},
54
+ "summary": "professional summary/objective",
55
+ "skills": {{
56
+ "technical": ["skill1", "skill2"],
57
+ "programming_languages": ["Python", "JavaScript"],
58
+ "frameworks": ["React", "Django"],
59
+ "tools": ["Git", "Docker"],
60
+ "databases": ["PostgreSQL", "MongoDB"],
61
+ "cloud": ["AWS", "Azure"]
62
+ }},
63
+ "experience": [
64
+ {{
65
+ "title": "Job Title",
66
+ "company": "Company Name",
67
+ "duration": "Jan 2020 - Present",
68
+ "location": "City, State",
69
+ "responsibilities": ["bullet point 1", "bullet point 2"],
70
+ "achievements": ["achievement 1", "achievement 2"],
71
+ "technologies": ["tech1", "tech2"]
72
+ }}
73
+ ],
74
+ "education": [
75
+ {{
76
+ "degree": "Bachelor of Science",
77
+ "field": "Computer Science",
78
+ "school": "University Name",
79
+ "graduation": "2020",
80
+ "gpa": "3.8",
81
+ "relevant_courses": ["Data Structures", "Algorithms"]
82
+ }}
83
+ ],
84
+ "projects": [
85
+ {{
86
+ "name": "Project Name",
87
+ "description": "Brief description",
88
+ "technologies": ["tech1", "tech2"],
89
+ "github": "github.com/repo",
90
+ "demo": "live-demo-url"
91
+ }}
92
+ ],
93
+ "certifications": [
94
+ {{
95
+ "name": "Certification Name",
96
+ "issuer": "Organization",
97
+ "date": "2023",
98
+ "credential_id": "123456"
99
+ }}
100
+ ]
101
+ }}
102
+
103
+ Only return valid JSON, no extra text.
104
+ """
105
+
106
+ llm_response = llm_client.call_llm(prompt)
107
+
108
+ # Parse JSON response
109
+ try:
110
+ resume_data = robust_json_parse(llm_response)
111
+ if not resume_data:
112
+ raise ValueError("No valid JSON returned")
113
+ return resume_data
114
+ except Exception as e:
115
+ log_metric("resume_llm_parse_error", {"error": str(e)})
116
+ # Return basic extracted data as fallback
117
+ return self._basic_extraction(resume_text)
118
+
119
+ return structured_data
120
+
121
+ def _basic_extraction(self, resume_text: str) -> Dict[str, Any]:
122
+ """Basic regex-based extraction as fallback"""
123
+
124
+ # Extract email
125
+ email_match = re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', resume_text)
126
+ email = email_match.group() if email_match else ""
127
+
128
+ # Extract phone
129
+ phone_match = re.search(r'(\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})', resume_text)
130
+ phone = phone_match.group() if phone_match else ""
131
+
132
+ # Extract GitHub
133
+ github_match = re.search(r'github\.com/[\w-]+', resume_text, re.IGNORECASE)
134
+ github = f"https://{github_match.group()}" if github_match else ""
135
+
136
+ # Extract LinkedIn
137
+ linkedin_match = re.search(r'linkedin\.com/in/[\w-]+', resume_text, re.IGNORECASE)
138
+ linkedin = f"https://{linkedin_match.group()}" if linkedin_match else ""
139
+
140
+ # Extract common technical skills
141
+ tech_keywords = [
142
+ 'Python', 'JavaScript', 'Java', 'C++', 'React', 'Node.js', 'SQL',
143
+ 'AWS', 'Docker', 'Git', 'Machine Learning', 'Data Science',
144
+ 'TensorFlow', 'PyTorch', 'Pandas', 'NumPy', 'Django', 'Flask'
145
+ ]
146
+
147
+ found_skills = []
148
+ for skill in tech_keywords:
149
+ if re.search(rf'\b{re.escape(skill)}\b', resume_text, re.IGNORECASE):
150
+ found_skills.append(skill)
151
+
152
+ return {
153
+ "personal_info": {
154
+ "email": email,
155
+ "phone": phone,
156
+ "github": github,
157
+ "linkedin": linkedin
158
+ },
159
+ "skills": {
160
+ "technical": found_skills
161
+ },
162
+ "extraction_method": "basic_fallback"
163
+ }
micro/resume_parser_v2.py ADDED
@@ -0,0 +1,620 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Enhanced Resume Parser v2.0
3
+ Provides structured extraction of skills, experience, projects, and education
4
+ with proper normalization and context understanding.
5
+ """
6
+
7
+ from typing import Any, Dict, List, Optional, Set, Tuple
8
+ import re
9
+ import json
10
+ from dataclasses import dataclass, asdict
11
+ from llm_client import llm_client
12
+ from metrics import log_metric
13
+ import tiktoken
14
+
15
+ @dataclass
16
+ class Experience:
17
+ title: str
18
+ company: str
19
+ duration: str
20
+ location: str = ""
21
+ responsibilities: List[str] = None
22
+ achievements: List[str] = None
23
+ technologies: List[str] = None
24
+ start_date: str = ""
25
+ end_date: str = ""
26
+ is_current: bool = False
27
+
28
+ def __post_init__(self):
29
+ if self.responsibilities is None:
30
+ self.responsibilities = []
31
+ if self.achievements is None:
32
+ self.achievements = []
33
+ if self.technologies is None:
34
+ self.technologies = []
35
+
36
+ @dataclass
37
+ class Project:
38
+ name: str
39
+ description: str
40
+ technologies: List[str] = None
41
+ github_url: str = ""
42
+ demo_url: str = ""
43
+ duration: str = ""
44
+ key_features: List[str] = None
45
+
46
+ def __post_init__(self):
47
+ if self.technologies is None:
48
+ self.technologies = []
49
+ if self.key_features is None:
50
+ self.key_features = []
51
+
52
+ @dataclass
53
+ class Education:
54
+ degree: str
55
+ field: str
56
+ school: str
57
+ graduation_year: str = ""
58
+ gpa: str = ""
59
+ relevant_courses: List[str] = None
60
+ honors: List[str] = None
61
+
62
+ def __post_init__(self):
63
+ if self.relevant_courses is None:
64
+ self.relevant_courses = []
65
+ if self.honors is None:
66
+ self.honors = []
67
+
68
+ @dataclass
69
+ class Skills:
70
+ technical: List[str] = None
71
+ programming_languages: List[str] = None
72
+ frameworks: List[str] = None
73
+ tools: List[str] = None
74
+ databases: List[str] = None
75
+ cloud_platforms: List[str] = None
76
+ methodologies: List[str] = None
77
+ soft_skills: List[str] = None
78
+
79
+ def __post_init__(self):
80
+ for field in ['technical', 'programming_languages', 'frameworks', 'tools',
81
+ 'databases', 'cloud_platforms', 'methodologies', 'soft_skills']:
82
+ if getattr(self, field) is None:
83
+ setattr(self, field, [])
84
+
85
+ @dataclass
86
+ class ResumeData:
87
+ personal_info: Dict[str, str]
88
+ summary: str
89
+ skills: Skills
90
+ experience: List[Experience]
91
+ education: List[Education]
92
+ projects: List[Project]
93
+ certifications: List[Dict[str, str]]
94
+ languages: List[str]
95
+ years_of_experience: int = 0
96
+
97
+ def __post_init__(self):
98
+ if not self.certifications:
99
+ self.certifications = []
100
+ if not self.languages:
101
+ self.languages = []
102
+
103
+ class SkillsNormalizer:
104
+ """Normalizes and categorizes skills with synonym detection"""
105
+
106
+ def __init__(self):
107
+ self.skill_synonyms = {
108
+ # Programming Languages
109
+ "python": ["python", "python3", "python 3", "py"],
110
+ "javascript": ["javascript", "js", "node.js", "nodejs", "node js"],
111
+ "typescript": ["typescript", "ts"],
112
+ "java": ["java", "java 8", "java 11", "java 17"],
113
+ "csharp": ["c#", "csharp", "c sharp", ".net", "dotnet"],
114
+ "cpp": ["c++", "cpp", "c plus plus"],
115
+ "go": ["go", "golang"],
116
+ "rust": ["rust", "rust-lang"],
117
+ "swift": ["swift", "ios development"],
118
+ "kotlin": ["kotlin", "android development"],
119
+ "r": ["r", "r programming"],
120
+ "scala": ["scala"],
121
+ "php": ["php", "php 7", "php 8"],
122
+ "ruby": ["ruby", "ruby on rails", "ror"],
123
+
124
+ # Web Frameworks
125
+ "react": ["react", "reactjs", "react.js", "react js"],
126
+ "angular": ["angular", "angularjs", "angular 2+"],
127
+ "vue": ["vue", "vue.js", "vuejs", "vue js"],
128
+ "svelte": ["svelte", "sveltekit"],
129
+ "django": ["django", "django rest framework", "drf"],
130
+ "flask": ["flask", "flask-restful"],
131
+ "fastapi": ["fastapi", "fast api"],
132
+ "express": ["express", "express.js", "expressjs"],
133
+ "spring": ["spring", "spring boot", "spring framework"],
134
+ "laravel": ["laravel"],
135
+ "rails": ["rails", "ruby on rails", "ror"],
136
+
137
+ # Databases
138
+ "postgresql": ["postgresql", "postgres", "pg", "psql"],
139
+ "mysql": ["mysql", "my sql"],
140
+ "mongodb": ["mongodb", "mongo", "mongo db"],
141
+ "redis": ["redis"],
142
+ "elasticsearch": ["elasticsearch", "elastic search"],
143
+ "cassandra": ["cassandra", "apache cassandra"],
144
+ "dynamodb": ["dynamodb", "dynamo db"],
145
+ "sqlite": ["sqlite", "sqlite3"],
146
+
147
+ # Cloud Platforms
148
+ "aws": ["aws", "amazon web services", "amazon aws"],
149
+ "azure": ["azure", "microsoft azure"],
150
+ "gcp": ["gcp", "google cloud", "google cloud platform"],
151
+ "heroku": ["heroku"],
152
+ "digitalocean": ["digitalocean", "digital ocean"],
153
+ "vercel": ["vercel"],
154
+ "netlify": ["netlify"],
155
+
156
+ # DevOps & Tools
157
+ "docker": ["docker", "containerization", "containers"],
158
+ "kubernetes": ["kubernetes", "k8s", "container orchestration"],
159
+ "jenkins": ["jenkins", "ci/cd"],
160
+ "github actions": ["github actions", "gh actions"],
161
+ "terraform": ["terraform", "infrastructure as code", "iac"],
162
+ "ansible": ["ansible"],
163
+ "git": ["git", "version control", "source control"],
164
+ "linux": ["linux", "unix", "ubuntu", "centos"],
165
+
166
+ # Data Science & ML
167
+ "machine learning": ["machine learning", "ml", "artificial intelligence", "ai"],
168
+ "deep learning": ["deep learning", "neural networks"],
169
+ "tensorflow": ["tensorflow", "tf"],
170
+ "pytorch": ["pytorch", "torch"],
171
+ "scikit-learn": ["scikit-learn", "sklearn", "scikit learn"],
172
+ "pandas": ["pandas", "data manipulation"],
173
+ "numpy": ["numpy", "numerical computing"],
174
+ "matplotlib": ["matplotlib", "data visualization"],
175
+ "seaborn": ["seaborn"],
176
+ "jupyter": ["jupyter", "jupyter notebooks"],
177
+
178
+ # Testing
179
+ "pytest": ["pytest", "python testing"],
180
+ "jest": ["jest", "javascript testing"],
181
+ "selenium": ["selenium", "web automation"],
182
+ "cypress": ["cypress", "e2e testing"],
183
+
184
+ # Methodologies
185
+ "agile": ["agile", "scrum", "kanban"],
186
+ "devops": ["devops", "dev ops"],
187
+ "microservices": ["microservices", "micro services"],
188
+ "rest api": ["rest", "rest api", "restful", "api development"],
189
+ "graphql": ["graphql", "graph ql"],
190
+ }
191
+
192
+ self.skill_categories = {
193
+ "programming_languages": ["python", "javascript", "typescript", "java", "csharp", "cpp", "go", "rust", "swift", "kotlin", "r", "scala", "php", "ruby"],
194
+ "frameworks": ["react", "angular", "vue", "svelte", "django", "flask", "fastapi", "express", "spring", "laravel", "rails"],
195
+ "databases": ["postgresql", "mysql", "mongodb", "redis", "elasticsearch", "cassandra", "dynamodb", "sqlite"],
196
+ "cloud_platforms": ["aws", "azure", "gcp", "heroku", "digitalocean", "vercel", "netlify"],
197
+ "tools": ["docker", "kubernetes", "jenkins", "github actions", "terraform", "ansible", "git", "linux", "pytest", "jest", "selenium", "cypress"],
198
+ "methodologies": ["agile", "devops", "microservices", "rest api", "graphql"]
199
+ }
200
+
201
+ def normalize_skill(self, skill: str) -> Optional[str]:
202
+ """Normalize a skill to its canonical form"""
203
+ skill_lower = skill.lower().strip()
204
+
205
+ for canonical, synonyms in self.skill_synonyms.items():
206
+ if skill_lower in synonyms:
207
+ return canonical
208
+
209
+ return skill_lower if len(skill_lower) > 1 else None
210
+
211
+ def categorize_skill(self, normalized_skill: str) -> str:
212
+ """Categorize a normalized skill"""
213
+ for category, skills in self.skill_categories.items():
214
+ if normalized_skill in skills:
215
+ return category
216
+ return "technical"
217
+
218
+ def normalize_skill_list(self, skills: List[str]) -> Dict[str, List[str]]:
219
+ """Normalize and categorize a list of skills"""
220
+ categorized = {
221
+ "programming_languages": [],
222
+ "frameworks": [],
223
+ "databases": [],
224
+ "cloud_platforms": [],
225
+ "tools": [],
226
+ "methodologies": [],
227
+ "technical": []
228
+ }
229
+
230
+ for skill in skills:
231
+ normalized = self.normalize_skill(skill)
232
+ if normalized:
233
+ category = self.categorize_skill(normalized)
234
+ if normalized not in categorized[category]:
235
+ categorized[category].append(normalized)
236
+
237
+ return categorized
238
+
239
+ class ResumeParserV2:
240
+ """Enhanced resume parser with structured extraction and normalization"""
241
+
242
+ def __init__(self):
243
+ self.skills_normalizer = SkillsNormalizer()
244
+
245
+ async def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
246
+ resume_text = data.get("resume_text", "")
247
+
248
+ if not resume_text:
249
+ return {**data, "resume_data_v2": {"error": "No resume content provided"}}
250
+
251
+ try:
252
+ # Extract structured resume data
253
+ resume_data = await self._extract_resume_data_structured(resume_text)
254
+
255
+ log_metric("resume_parse_v2_success", {
256
+ "skills_count": len(self._get_all_skills(resume_data.skills)),
257
+ "experience_count": len(resume_data.experience),
258
+ "projects_count": len(resume_data.projects),
259
+ "years_experience": resume_data.years_of_experience
260
+ })
261
+
262
+ return {**data, "resume_data_v2": asdict(resume_data)}
263
+
264
+ except Exception as e:
265
+ log_metric("resume_parse_v2_error", {"error": str(e)})
266
+ return {**data, "resume_data_v2": {"error": f"Resume parsing v2 failed: {e}"}}
267
+
268
+ async def _extract_resume_data_structured(self, resume_text: str) -> ResumeData:
269
+ """Extract structured resume data using multiple approaches"""
270
+
271
+ # First, try comprehensive LLM extraction
272
+ try:
273
+ structured_data = await self._llm_extract_structured(resume_text)
274
+ if structured_data:
275
+ return structured_data
276
+ except Exception as e:
277
+ log_metric("resume_llm_extraction_error", {"error": str(e)})
278
+
279
+ # Fallback to section-based extraction
280
+ return await self._section_based_extraction(resume_text)
281
+
282
+ async def _llm_extract_structured(self, resume_text: str) -> Optional[ResumeData]:
283
+ """Use LLM to extract structured resume data"""
284
+
285
+ # Check token count and chunk if necessary
286
+ token_count = self._count_tokens(resume_text)
287
+ if token_count > 15000:
288
+ # For very long resumes, extract in sections
289
+ return await self._chunked_extraction(resume_text)
290
+
291
+ prompt = f"""
292
+ Extract comprehensive structured data from this resume. Return ONLY valid JSON with this exact structure:
293
+
294
+ {{
295
+ "personal_info": {{
296
+ "name": "Full Name",
297
+ "email": "email@domain.com",
298
+ "phone": "+1234567890",
299
+ "location": "City, State",
300
+ "linkedin": "linkedin.com/in/username",
301
+ "github": "github.com/username",
302
+ "website": "personal-website.com"
303
+ }},
304
+ "summary": "Professional summary or objective statement",
305
+ "skills": {{
306
+ "technical": ["skill1", "skill2"],
307
+ "programming_languages": ["Python", "JavaScript"],
308
+ "frameworks": ["React", "Django"],
309
+ "tools": ["Git", "Docker"],
310
+ "databases": ["PostgreSQL", "MongoDB"],
311
+ "cloud_platforms": ["AWS", "Azure"],
312
+ "methodologies": ["Agile", "DevOps"],
313
+ "soft_skills": ["Leadership", "Communication"]
314
+ }},
315
+ "experience": [
316
+ {{
317
+ "title": "Job Title",
318
+ "company": "Company Name",
319
+ "duration": "Jan 2020 - Present",
320
+ "location": "City, State",
321
+ "start_date": "2020-01",
322
+ "end_date": "Present",
323
+ "is_current": true,
324
+ "responsibilities": ["responsibility 1", "responsibility 2"],
325
+ "achievements": ["achievement 1", "achievement 2"],
326
+ "technologies": ["tech1", "tech2"]
327
+ }}
328
+ ],
329
+ "education": [
330
+ {{
331
+ "degree": "Bachelor of Science",
332
+ "field": "Computer Science",
333
+ "school": "University Name",
334
+ "graduation_year": "2020",
335
+ "gpa": "3.8",
336
+ "relevant_courses": ["Data Structures", "Algorithms"],
337
+ "honors": ["Dean's List", "Magna Cum Laude"]
338
+ }}
339
+ ],
340
+ "projects": [
341
+ {{
342
+ "name": "Project Name",
343
+ "description": "Brief description of the project",
344
+ "technologies": ["tech1", "tech2"],
345
+ "github_url": "github.com/user/repo",
346
+ "demo_url": "live-demo-url.com",
347
+ "duration": "3 months",
348
+ "key_features": ["feature1", "feature2"]
349
+ }}
350
+ ],
351
+ "certifications": [
352
+ {{
353
+ "name": "Certification Name",
354
+ "issuer": "Organization",
355
+ "date": "2023",
356
+ "credential_id": "123456"
357
+ }}
358
+ ],
359
+ "languages": ["English (Native)", "Spanish (Conversational)"],
360
+ "years_of_experience": 5
361
+ }}
362
+
363
+ Important guidelines:
364
+ 1. Extract ALL skills mentioned, including those in job descriptions and projects
365
+ 2. Normalize technology names (e.g., "React.js" → "React", "ML" → "Machine Learning")
366
+ 3. Calculate years_of_experience from work history
367
+ 4. Parse dates in YYYY-MM format when possible
368
+ 5. Group similar skills appropriately
369
+ 6. Extract quantifiable achievements when possible
370
+ 7. If information is missing, omit the field or use empty array/string
371
+
372
+ Resume text:
373
+ {resume_text}
374
+ """
375
+
376
+ try:
377
+ response = llm_client.call_llm(prompt, temperature=0, max_tokens=4000)
378
+ data = json.loads(response)
379
+
380
+ # Convert to structured objects
381
+ return self._convert_to_resume_data(data)
382
+
383
+ except json.JSONDecodeError as e:
384
+ log_metric("resume_json_parse_error", {"error": str(e)})
385
+ return None
386
+ except Exception as e:
387
+ log_metric("resume_llm_error", {"error": str(e)})
388
+ return None
389
+
390
+ async def _chunked_extraction(self, resume_text: str) -> ResumeData:
391
+ """Extract data from long resumes by processing in chunks"""
392
+
393
+ sections = self._split_resume_sections(resume_text)
394
+
395
+ # Extract each section separately
396
+ personal_info = await self._extract_personal_info(sections.get("header", ""))
397
+ summary = await self._extract_summary(sections.get("summary", ""))
398
+ skills = await self._extract_skills(sections.get("skills", ""))
399
+ experience = await self._extract_experience(sections.get("experience", ""))
400
+ education = await self._extract_education(sections.get("education", ""))
401
+ projects = await self._extract_projects(sections.get("projects", ""))
402
+ certifications = await self._extract_certifications(sections.get("certifications", ""))
403
+
404
+ # Calculate years of experience
405
+ years_exp = self._calculate_years_experience(experience)
406
+
407
+ return ResumeData(
408
+ personal_info=personal_info,
409
+ summary=summary,
410
+ skills=skills,
411
+ experience=experience,
412
+ education=education,
413
+ projects=projects,
414
+ certifications=certifications,
415
+ languages=[],
416
+ years_of_experience=years_exp
417
+ )
418
+
419
+ async def _section_based_extraction(self, resume_text: str) -> ResumeData:
420
+ """Fallback extraction using regex and basic parsing"""
421
+
422
+ # Basic regex-based extraction
423
+ personal_info = self._extract_personal_info_regex(resume_text)
424
+ skills = self._extract_skills_regex(resume_text)
425
+
426
+ return ResumeData(
427
+ personal_info=personal_info,
428
+ summary="",
429
+ skills=skills,
430
+ experience=[],
431
+ education=[],
432
+ projects=[],
433
+ certifications=[],
434
+ languages=[],
435
+ years_of_experience=0
436
+ )
437
+
438
+ def _convert_to_resume_data(self, data: Dict[str, Any]) -> ResumeData:
439
+ """Convert parsed JSON to ResumeData objects"""
440
+
441
+ # Parse skills with normalization
442
+ skills_data = data.get("skills", {})
443
+ all_skills = []
444
+ for skill_list in skills_data.values():
445
+ if isinstance(skill_list, list):
446
+ all_skills.extend(skill_list)
447
+
448
+ normalized_skills = self.skills_normalizer.normalize_skill_list(all_skills)
449
+ skills = Skills(**normalized_skills)
450
+
451
+ # Parse experience
452
+ experience = []
453
+ for exp_data in data.get("experience", []):
454
+ exp = Experience(**exp_data)
455
+ experience.append(exp)
456
+
457
+ # Parse education
458
+ education = []
459
+ for edu_data in data.get("education", []):
460
+ edu = Education(**edu_data)
461
+ education.append(edu)
462
+
463
+ # Parse projects
464
+ projects = []
465
+ for proj_data in data.get("projects", []):
466
+ proj = Project(**proj_data)
467
+ projects.append(proj)
468
+
469
+ return ResumeData(
470
+ personal_info=data.get("personal_info", {}),
471
+ summary=data.get("summary", ""),
472
+ skills=skills,
473
+ experience=experience,
474
+ education=education,
475
+ projects=projects,
476
+ certifications=data.get("certifications", []),
477
+ languages=data.get("languages", []),
478
+ years_of_experience=data.get("years_of_experience", 0)
479
+ )
480
+
481
+ def _get_all_skills(self, skills: Skills) -> List[str]:
482
+ """Get all skills as a flat list"""
483
+ all_skills = []
484
+ for field in ['technical', 'programming_languages', 'frameworks', 'tools',
485
+ 'databases', 'cloud_platforms', 'methodologies']:
486
+ all_skills.extend(getattr(skills, field, []))
487
+ return all_skills
488
+
489
+ def _split_resume_sections(self, resume_text: str) -> Dict[str, str]:
490
+ """Split resume into sections using common headers"""
491
+
492
+ sections = {}
493
+ current_section = "header"
494
+ current_content = []
495
+
496
+ # Common section headers
497
+ section_patterns = {
498
+ r'(experience|work experience|employment|professional experience)': 'experience',
499
+ r'(education|academic background)': 'education',
500
+ r'(skills|technical skills|core competencies)': 'skills',
501
+ r'(projects|personal projects|side projects)': 'projects',
502
+ r'(summary|objective|profile)': 'summary',
503
+ r'(certifications|licenses)': 'certifications'
504
+ }
505
+
506
+ lines = resume_text.split('\n')
507
+
508
+ for line in lines:
509
+ line_lower = line.lower().strip()
510
+
511
+ # Check if this line is a section header
512
+ section_found = False
513
+ for pattern, section_name in section_patterns.items():
514
+ if re.search(pattern, line_lower):
515
+ # Save previous section
516
+ if current_content:
517
+ sections[current_section] = '\n'.join(current_content)
518
+
519
+ current_section = section_name
520
+ current_content = []
521
+ section_found = True
522
+ break
523
+
524
+ if not section_found:
525
+ current_content.append(line)
526
+
527
+ # Save final section
528
+ if current_content:
529
+ sections[current_section] = '\n'.join(current_content)
530
+
531
+ return sections
532
+
533
+ def _extract_personal_info_regex(self, text: str) -> Dict[str, str]:
534
+ """Extract personal information using regex"""
535
+
536
+ # Email
537
+ email_match = re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
538
+ email = email_match.group() if email_match else ""
539
+
540
+ # Phone
541
+ phone_match = re.search(r'(\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})', text)
542
+ phone = phone_match.group() if phone_match else ""
543
+
544
+ # LinkedIn
545
+ linkedin_match = re.search(r'linkedin\.com/in/[\w-]+', text, re.IGNORECASE)
546
+ linkedin = f"https://{linkedin_match.group()}" if linkedin_match else ""
547
+
548
+ # GitHub
549
+ github_match = re.search(r'github\.com/[\w-]+', text, re.IGNORECASE)
550
+ github = f"https://{github_match.group()}" if github_match else ""
551
+
552
+ return {
553
+ "email": email,
554
+ "phone": phone,
555
+ "linkedin": linkedin,
556
+ "github": github
557
+ }
558
+
559
+ def _extract_skills_regex(self, text: str) -> Skills:
560
+ """Extract skills using regex patterns"""
561
+
562
+ # Common technical skills to look for
563
+ tech_keywords = [
564
+ 'Python', 'JavaScript', 'Java', 'C++', 'React', 'Node.js', 'SQL',
565
+ 'AWS', 'Docker', 'Git', 'Machine Learning', 'Data Science',
566
+ 'TensorFlow', 'PyTorch', 'Pandas', 'NumPy', 'Django', 'Flask',
567
+ 'PostgreSQL', 'MongoDB', 'Redis', 'Kubernetes', 'Jenkins'
568
+ ]
569
+
570
+ found_skills = []
571
+ for skill in tech_keywords:
572
+ if re.search(rf'\b{re.escape(skill)}\b', text, re.IGNORECASE):
573
+ found_skills.append(skill)
574
+
575
+ # Normalize skills
576
+ normalized = self.skills_normalizer.normalize_skill_list(found_skills)
577
+
578
+ return Skills(**normalized)
579
+
580
+ def _calculate_years_experience(self, experience: List[Experience]) -> int:
581
+ """Calculate total years of experience"""
582
+ if not experience:
583
+ return 0
584
+
585
+ # Simple calculation based on number of roles
586
+ # In practice, you'd want to parse dates and calculate overlap
587
+ return len(experience)
588
+
589
+ def _count_tokens(self, text: str) -> int:
590
+ """Count tokens in text"""
591
+ try:
592
+ encoding = tiktoken.encoding_for_model("gpt-4o-mini")
593
+ return len(encoding.encode(text))
594
+ except:
595
+ # Fallback: approximate as 4 chars per token
596
+ return len(text) // 4
597
+
598
+ # Placeholder methods for individual section extraction
599
+ # These would be implemented with specific LLM calls for each section
600
+
601
+ async def _extract_personal_info(self, text: str) -> Dict[str, str]:
602
+ return self._extract_personal_info_regex(text)
603
+
604
+ async def _extract_summary(self, text: str) -> str:
605
+ return text.strip()
606
+
607
+ async def _extract_skills(self, text: str) -> Skills:
608
+ return self._extract_skills_regex(text)
609
+
610
+ async def _extract_experience(self, text: str) -> List[Experience]:
611
+ return []
612
+
613
+ async def _extract_education(self, text: str) -> List[Education]:
614
+ return []
615
+
616
+ async def _extract_projects(self, text: str) -> List[Project]:
617
+ return []
618
+
619
+ async def _extract_certifications(self, text: str) -> List[Dict[str, str]]:
620
+ return []
micro/scrape.py ADDED
@@ -0,0 +1,421 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ import time
3
+ import re
4
+ from typing import Dict, Tuple, Optional
5
+ from datetime import datetime
6
+
7
+ # Try to import Selenium, but handle gracefully if not available
8
+ try:
9
+ from selenium import webdriver
10
+ from selenium.webdriver.chrome.options import Options
11
+ from selenium.webdriver.common.by import By
12
+ from selenium.webdriver.support.ui import WebDriverWait
13
+ from selenium.webdriver.support import expected_conditions as EC
14
+ from selenium.common.exceptions import TimeoutException, WebDriverException
15
+ from webdriver_manager.chrome import ChromeDriverManager
16
+ from selenium.webdriver.chrome.service import Service
17
+ SELENIUM_AVAILABLE = True
18
+ except ImportError:
19
+ SELENIUM_AVAILABLE = False
20
+ print("Warning: Selenium not available. Web scraping will use requests-only fallback.")
21
+
22
+
23
+ class LinkedInAuthError(Exception):
24
+ """Raised when LinkedIn requires authentication"""
25
+ pass
26
+
27
+
28
+ def canonicalise(url: str) -> str:
29
+ """Convert URL to canonical form for better caching"""
30
+ if not url.startswith(('http://', 'https://')):
31
+ url = 'https://' + url
32
+
33
+ # Handle LinkedIn URLs
34
+ if 'linkedin.com' in url:
35
+ # Extract job ID from currentJobId parameter
36
+ job_id_match = re.search(r'currentJobId=(\d+)', url)
37
+ if job_id_match:
38
+ job_id = job_id_match.group(1)
39
+ return f"https://www.linkedin.com/jobs/view/{job_id}"
40
+
41
+ # Extract job ID from /jobs/view/ URLs
42
+ view_match = re.search(r'/jobs/view/(\d+)', url)
43
+ if view_match:
44
+ job_id = view_match.group(1)
45
+ return f"https://www.linkedin.com/jobs/view/{job_id}"
46
+
47
+ return url
48
+
49
+
50
+ def extract_preview_from_html(html: str, url: str) -> Dict[str, str]:
51
+ """Extract preview info from HTML for immediate display"""
52
+ preview = {
53
+ 'company': 'Not specified',
54
+ 'role': 'Not specified',
55
+ 'location': 'Not specified',
56
+ 'posted_days': 'Recently'
57
+ }
58
+
59
+ if not html:
60
+ return preview
61
+
62
+ # LinkedIn job page patterns
63
+ if 'linkedin.com' in url:
64
+ # Company name patterns
65
+ company_patterns = [
66
+ r'<span[^>]*class="[^"]*job-details-jobs-unified-top-card__company-name[^"]*"[^>]*>([^<]+)</span>',
67
+ r'<a[^>]*class="[^"]*job-details-jobs-unified-top-card__company-name[^"]*"[^>]*>([^<]+)</a>',
68
+ r'"hiringCompany":\s*{\s*"name":\s*"([^"]+)"',
69
+ r'<h4[^>]*class="[^"]*job-details-jobs-unified-top-card__company-name[^"]*"[^>]*>([^<]+)</h4>'
70
+ ]
71
+
72
+ for pattern in company_patterns:
73
+ match = re.search(pattern, html, re.IGNORECASE | re.DOTALL)
74
+ if match:
75
+ preview['company'] = match.group(1).strip()
76
+ break
77
+
78
+ # Job title patterns
79
+ title_patterns = [
80
+ r'<h1[^>]*class="[^"]*job-details-jobs-unified-top-card__job-title[^"]*"[^>]*>([^<]+)</h1>',
81
+ r'"jobTitle":\s*"([^"]+)"',
82
+ r'<title>([^|]+)\s*\|[^<]*</title>'
83
+ ]
84
+
85
+ for pattern in title_patterns:
86
+ match = re.search(pattern, html, re.IGNORECASE | re.DOTALL)
87
+ if match:
88
+ preview['role'] = match.group(1).strip()
89
+ break
90
+
91
+ # Location patterns
92
+ location_patterns = [
93
+ r'<span[^>]*class="[^"]*job-details-jobs-unified-top-card__bullet[^"]*"[^>]*>([^<]+)</span>',
94
+ r'"jobLocation":\s*{\s*"displayName":\s*"([^"]+)"',
95
+ r'<div[^>]*class="[^"]*job-details-jobs-unified-top-card__primary-description-container[^"]*"[^>]*>.*?<span[^>]*>([^<]+)</span>'
96
+ ]
97
+
98
+ for pattern in location_patterns:
99
+ match = re.search(pattern, html, re.IGNORECASE | re.DOTALL)
100
+ if match:
101
+ location = match.group(1).strip()
102
+ if location and not any(x in location.lower() for x in ['applicant', 'employee', 'easy apply']):
103
+ preview['location'] = location
104
+ break
105
+
106
+ # Microsoft careers patterns
107
+ elif 'microsoft.com' in url:
108
+ company_match = re.search(r'<title>([^|]+)\s*\|\s*Microsoft\s*Careers', html, re.IGNORECASE)
109
+ if company_match:
110
+ preview['role'] = company_match.group(1).strip()
111
+ preview['company'] = 'Microsoft'
112
+
113
+ location_match = re.search(r'"jobLocation":\s*"([^"]+)"', html)
114
+ if location_match:
115
+ preview['location'] = location_match.group(1).strip()
116
+
117
+ # Google careers patterns
118
+ elif 'google.com' in url:
119
+ preview['company'] = 'Google'
120
+ title_match = re.search(r'<title>([^|]+)\s*\|\s*Google\s*Careers', html, re.IGNORECASE)
121
+ if title_match:
122
+ preview['role'] = title_match.group(1).strip()
123
+
124
+ # Amazon jobs patterns
125
+ elif 'amazon.jobs' in url:
126
+ preview['company'] = 'Amazon'
127
+ title_match = re.search(r'<h1[^>]*class="[^"]*job-title[^"]*"[^>]*>([^<]+)</h1>', html, re.IGNORECASE)
128
+ if title_match:
129
+ preview['role'] = title_match.group(1).strip()
130
+
131
+ # PayPal patterns
132
+ elif 'paypal.eightfold.ai' in url:
133
+ preview['company'] = 'PayPal'
134
+ title_match = re.search(r'"jobTitle":\s*"([^"]+)"', html)
135
+ if title_match:
136
+ preview['role'] = title_match.group(1).strip()
137
+
138
+ # Clean up extracted text
139
+ for key in preview:
140
+ if isinstance(preview[key], str):
141
+ preview[key] = re.sub(r'\s+', ' ', preview[key]).strip()
142
+ if len(preview[key]) > 100:
143
+ preview[key] = preview[key][:97] + '...'
144
+
145
+ return preview
146
+
147
+
148
+ class ScrapeMicroFunction:
149
+ """Micro-function for web scraping with enhanced preview extraction"""
150
+
151
+ def __init__(self):
152
+ if SELENIUM_AVAILABLE:
153
+ self.chrome_options = Options()
154
+ self.chrome_options.add_argument('--headless')
155
+ self.chrome_options.add_argument('--no-sandbox')
156
+ self.chrome_options.add_argument('--disable-dev-shm-usage')
157
+ self.chrome_options.add_argument('--disable-gpu')
158
+ self.chrome_options.add_argument('--window-size=1920,1080')
159
+ self.chrome_options.add_argument('--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36')
160
+ else:
161
+ self.chrome_options = None
162
+
163
+ def run(self, data: dict) -> dict:
164
+ """Main scraping function returning preview and full content"""
165
+ raw_input = data.get('raw_input', '') or data.get('input', '')
166
+
167
+ if not raw_input:
168
+ return {
169
+ 'success': False,
170
+ 'error': 'No input provided',
171
+ 'preview': {'company': 'Error', 'role': 'No input', 'location': '', 'posted_days': ''},
172
+ 'content': '',
173
+ 'scraped_text': ''
174
+ }
175
+
176
+ # If it's a URL, scrape it
177
+ if raw_input.startswith(('http://', 'https://', 'www.')):
178
+ canonical_url = canonicalise(raw_input)
179
+ result = self._scrape_url(canonical_url)
180
+
181
+ # Add scraped_text for backward compatibility
182
+ result['scraped_text'] = result.get('content', '')
183
+ return {**data, **result, 'raw_input': raw_input}
184
+ else:
185
+ # Direct text input - use text_extractor
186
+ from text_extractor import extract_entities
187
+ from micro.patch_missing import patch_missing
188
+
189
+ job_core = extract_entities(raw_input)
190
+ # Apply Google patching for missing fields
191
+ job_core = patch_missing(job_core)
192
+
193
+ # Convert JobCore to preview format
194
+ preview = {
195
+ 'company': job_core.company or 'Not specified',
196
+ 'role': job_core.role or 'Not specified',
197
+ 'location': job_core.location or 'Not specified',
198
+ 'posted_days': str(job_core.posted_days) if job_core.posted_days else 'Recently'
199
+ }
200
+
201
+ return {
202
+ **data,
203
+ 'success': True,
204
+ 'content': raw_input,
205
+ 'preview': preview,
206
+ 'url': None,
207
+ 'scraped_text': raw_input,
208
+ 'raw_input': raw_input,
209
+ 'job_core': job_core # Add extracted entities for downstream use
210
+ }
211
+
212
+ def _scrape_url(self, url: str) -> dict:
213
+ """Scrape URL and extract both preview and full content"""
214
+ try:
215
+ # Try LinkedIn-specific scraping first
216
+ if 'linkedin.com' in url:
217
+ return self._scrape_linkedin(url)
218
+ else:
219
+ return self._scrape_generic(url)
220
+
221
+ except Exception as e:
222
+ return {
223
+ 'success': False,
224
+ 'error': str(e),
225
+ 'preview': {'company': 'Error', 'role': str(e)[:50], 'location': '', 'posted_days': ''},
226
+ 'content': ''
227
+ }
228
+
229
+ def _scrape_linkedin(self, url: str) -> dict:
230
+ """LinkedIn-specific scraping with auth detection"""
231
+ if not SELENIUM_AVAILABLE:
232
+ raise LinkedInAuthError("LinkedIn requires authentication - Selenium not available in this environment")
233
+
234
+ driver = None
235
+ try:
236
+ service = Service(ChromeDriverManager().install())
237
+ driver = webdriver.Chrome(service=service, options=self.chrome_options)
238
+ driver.set_page_load_timeout(10)
239
+
240
+ driver.get(url)
241
+
242
+ # Wait briefly and check for auth redirect
243
+ time.sleep(2)
244
+ current_url = driver.current_url
245
+
246
+ if 'authwall' in current_url or 'login' in current_url or 'challenge' in current_url:
247
+ raise LinkedInAuthError("LinkedIn requires authentication")
248
+
249
+ # Wait for job content to load
250
+ try:
251
+ WebDriverWait(driver, 8).until(
252
+ EC.presence_of_element_located((By.TAG_NAME, "main"))
253
+ )
254
+ except TimeoutException:
255
+ pass
256
+
257
+ html = driver.page_source
258
+ preview = extract_preview_from_html(html, url)
259
+
260
+ return {
261
+ 'success': True,
262
+ 'content': html,
263
+ 'preview': preview,
264
+ 'url': url
265
+ }
266
+
267
+ except LinkedInAuthError:
268
+ raise
269
+ except Exception as e:
270
+ return {
271
+ 'success': False,
272
+ 'error': f"LinkedIn scraping failed: {str(e)}",
273
+ 'preview': {'company': 'LinkedIn', 'role': 'Auth Required', 'location': '', 'posted_days': ''},
274
+ 'content': ''
275
+ }
276
+ finally:
277
+ if driver:
278
+ driver.quit()
279
+
280
+ def _scrape_generic(self, url: str) -> dict:
281
+ """Generic scraping for non-LinkedIn URLs"""
282
+ try:
283
+ # Try requests first (faster)
284
+ headers = {
285
+ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
286
+ }
287
+ response = requests.get(url, headers=headers, timeout=10)
288
+ response.raise_for_status()
289
+
290
+ html = response.text
291
+ preview = extract_preview_from_html(html, url)
292
+
293
+ return {
294
+ 'success': True,
295
+ 'content': html,
296
+ 'preview': preview,
297
+ 'url': url
298
+ }
299
+
300
+ except Exception as e:
301
+ # Fallback to Selenium
302
+ return self._scrape_with_selenium(url)
303
+
304
+ def _scrape_with_selenium(self, url: str) -> dict:
305
+ """Selenium fallback for sites that block requests"""
306
+ if not SELENIUM_AVAILABLE:
307
+ return {
308
+ 'success': False,
309
+ 'error': "Selenium not available - please copy and paste the job description text instead",
310
+ 'preview': {'company': 'Error', 'role': 'Selenium not available', 'location': '', 'posted_days': ''},
311
+ 'content': ''
312
+ }
313
+
314
+ driver = None
315
+ try:
316
+ service = Service(ChromeDriverManager().install())
317
+ driver = webdriver.Chrome(service=service, options=self.chrome_options)
318
+ driver.set_page_load_timeout(15)
319
+
320
+ driver.get(url)
321
+ time.sleep(3)
322
+
323
+ html = driver.page_source
324
+ preview = extract_preview_from_html(html, url)
325
+
326
+ return {
327
+ 'success': True,
328
+ 'content': html,
329
+ 'preview': preview,
330
+ 'url': url
331
+ }
332
+
333
+ except Exception as e:
334
+ return {
335
+ 'success': False,
336
+ 'error': f"Selenium scraping failed: {str(e)}",
337
+ 'preview': {'company': 'Error', 'role': 'Scraping failed', 'location': '', 'posted_days': ''},
338
+ 'content': ''
339
+ }
340
+ finally:
341
+ if driver:
342
+ driver.quit()
343
+
344
+ def _extract_preview_from_text(self, text: str) -> Dict[str, str]:
345
+ """Extract preview info from pasted text"""
346
+ preview = {
347
+ 'company': 'Not specified',
348
+ 'role': 'Not specified',
349
+ 'location': 'Not specified',
350
+ 'posted_days': 'Recently'
351
+ }
352
+
353
+ lines = text.split('\n')
354
+
355
+ # Enhanced extraction patterns for better accuracy
356
+ for i, line in enumerate(lines[:20]): # Check first 20 lines
357
+ line = line.strip()
358
+ if len(line) < 3 or len(line) > 150:
359
+ continue
360
+
361
+ # Pattern: "Company · Role · Location"
362
+ if '·' in line and preview['company'] == 'Not specified':
363
+ parts = [p.strip() for p in line.split('·')]
364
+ if len(parts) >= 3:
365
+ preview['company'] = parts[0]
366
+ preview['role'] = parts[1]
367
+ preview['location'] = parts[2]
368
+ continue
369
+
370
+ # Pattern: "Role at Company"
371
+ if ' at ' in line and any(word in line.lower() for word in ['engineer', 'developer', 'analyst', 'manager', 'scientist', 'designer']):
372
+ parts = line.split(' at ')
373
+ if len(parts) == 2:
374
+ preview['role'] = parts[0].strip()
375
+ preview['company'] = parts[1].strip()
376
+ continue
377
+
378
+ # Look for standalone role titles
379
+ if preview['role'] == 'Not specified' and any(word in line.lower() for word in ['engineer', 'developer', 'analyst', 'manager', 'scientist', 'designer', 'specialist']):
380
+ # Check if it's likely a job title (not part of description)
381
+ if i < 5 and not line.lower().startswith(('we', 'the', 'our', 'about', 'job', 'position')):
382
+ preview['role'] = line
383
+
384
+ # Look for company names (common patterns)
385
+ if preview['company'] == 'Not specified':
386
+ if any(word in line.lower() for word in ['group', 'search', 'inc', 'corp', 'company', 'technologies', 'systems', 'solutions']):
387
+ # Avoid generic descriptions and clean up
388
+ if not any(word in line.lower() for word in ['the', 'our', 'we', 'about', 'job', 'position', 'looking', 'seeking', 'logo']):
389
+ # Clean up common suffixes
390
+ clean_company = line.replace(' logo', '').replace(' Logo', '').strip()
391
+ preview['company'] = clean_company
392
+
393
+ # Look for location patterns
394
+ if preview['location'] == 'Not specified':
395
+ # Extract location from patterns like "New York, NY · other text"
396
+ location_match = re.search(r'([^·•]+(?:, [A-Z]{2}|New York|California|Remote))[·•\s]', line)
397
+ if location_match:
398
+ preview['location'] = location_match.group(1).strip()
399
+ # Fallback to simple patterns
400
+ elif any(pattern in line for pattern in [', NY', ', CA', ', TX', ', FL', 'New York', 'California', 'Remote']):
401
+ if not any(word in line.lower() for word in ['we', 'the', 'our', 'about', 'job']):
402
+ # Try to extract just the location part
403
+ for pattern in [', NY', ', CA', ', TX', ', FL']:
404
+ if pattern in line:
405
+ parts = line.split(pattern)
406
+ if len(parts) >= 2:
407
+ location_part = parts[0].split()[-1] + pattern
408
+ preview['location'] = location_part
409
+ break
410
+ if preview['location'] == 'Not specified' and 'New York' in line:
411
+ preview['location'] = 'New York, NY'
412
+ elif preview['location'] == 'Not specified':
413
+ preview['location'] = line
414
+
415
+ return preview
416
+
417
+ @staticmethod
418
+ def from_text(raw: str) -> Dict[str, str]:
419
+ """Static method to extract company/role/location from plain text"""
420
+ scraper = ScrapeMicroFunction()
421
+ return scraper._extract_preview_from_text(raw)
orchestrator.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Protocol
2
+ import asyncio
3
+ from text_extractor import extract_batch, JobCore
4
+ from micro.patch_missing import patch_missing
5
+
6
+ class MicroFunction(Protocol):
7
+ def run(self, data: Dict[str, Any]) -> Dict[str, Any]:
8
+ ...
9
+
10
+ class Orchestrator:
11
+ def __init__(self, steps: List[MicroFunction]):
12
+ self.steps = steps
13
+
14
+ def run(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
15
+ data = input_data
16
+ for step in self.steps:
17
+ data = step.run(data)
18
+ return data
19
+
20
+ def run_from_text(self, raw_jd: str) -> Dict[str, Any]:
21
+ """Process job description text through the pipeline"""
22
+ data = {"raw_input": raw_jd, "input": raw_jd}
23
+ for step in self.steps:
24
+ data = step.run(data)
25
+ return data
26
+
27
+ async def analyze(raw: str) -> JobCore:
28
+ """Analyze job description using fast chunked extraction then patch missing data."""
29
+ # Extract using concurrent chunked processing
30
+ job_core = await extract_batch(raw)
31
+
32
+ # Patch missing data with Google search
33
+ enriched_core = patch_missing(job_core)
34
+
35
+ return enriched_core
prompt_loader.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import yaml
2
+ from typing import Dict, Any
3
+
4
+ class PromptLoader:
5
+ def __init__(self, prompt_file: str = "prompts/v1.yaml"):
6
+ with open(prompt_file, 'r') as f:
7
+ self.prompts = yaml.safe_load(f)
8
+
9
+ def get_prompt(self, prompt_name: str, **kwargs) -> str:
10
+ """Get and format a prompt with variables"""
11
+ template = self.prompts.get(prompt_name, "")
12
+ return template.format(**kwargs)
13
+
14
+ # Global instance
15
+ prompt_loader = PromptLoader()
prompts/v1.yaml ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prompt repository v1
2
+ # Store all LLM prompts, templates, and system messages here
3
+
4
+ scrape_prompt: |
5
+ You are a job posting extraction expert. Extract and clean the core job posting content from the following text.
6
+
7
+ Focus on:
8
+ - Job title and role description
9
+ - Company name and information
10
+ - Requirements and qualifications
11
+ - Responsibilities and duties
12
+ - Compensation and benefits
13
+ - Location and work arrangement
14
+
15
+ Remove irrelevant content like navigation, ads, boilerplate text, and website elements.
16
+ Return clean, structured job posting content.
17
+
18
+ Raw content: {job_posting}
19
+
20
+ enrich_prompt: |
21
+ You are a job market analyst. Extract structured information from this job posting and return only valid JSON.
22
+
23
+ Pre-extracted hints:
24
+ - Company: {pre_company}
25
+ - Role: {pre_role}
26
+
27
+ Job posting content:
28
+ {job_posting}
29
+
30
+ Extract the following information and return as valid JSON:
31
+ {{
32
+ "role": "Job title/role (use hint if accurate)",
33
+ "company": "Company name (use hint if accurate)",
34
+ "level": "Seniority level (Junior/Mid/Senior/Staff/Principal)",
35
+ "location": "Job location",
36
+ "requirements": ["List of key requirements"],
37
+ "responsibilities": ["List of key responsibilities"],
38
+ "salary_range": "Salary information if available or 'Not specified'",
39
+ "work_mode": "Remote/Hybrid/On-site or 'Not specified'",
40
+ "tech_stack": ["Technologies mentioned"]
41
+ }}
42
+
43
+ Be precise and factual. If information is unclear, use 'Not specified' rather than guessing.
44
+
45
+ draft_prompt: |
46
+ You are a career advisor creating a comprehensive job analysis. Based on the provided job data, create a detailed role preview and interview preparation guide.
47
+
48
+ Job data: {job_data}
49
+
50
+ Your analysis should be thorough, practical, and tailored to job seekers.
51
+
52
+ critique_prompt: |
53
+ You are a senior career consultant reviewing job analysis content for accuracy and helpfulness.
54
+
55
+ Evaluate this content critically and provide constructive feedback.
56
+
57
+ Content to review: {draft}
58
+
59
+ qa_prompt: |
60
+ You are a quality assurance specialist for career content. Review this job analysis for accuracy, completeness, and clarity.
61
+
62
+ Content to review: {draft}
read_pdf.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ PDF Reader Script
4
+ Extracts text content from PDF files using multiple methods.
5
+ """
6
+
7
+ import PyPDF2
8
+ import pdfplumber
9
+ from pathlib import Path
10
+
11
+
12
+ def read_pdf_with_pypdf2(pdf_path: str) -> str:
13
+ """Read PDF using PyPDF2."""
14
+ try:
15
+ with open(pdf_path, 'rb') as file:
16
+ pdf_reader = PyPDF2.PdfReader(file)
17
+ text = ""
18
+ for page_num in range(len(pdf_reader.pages)):
19
+ page = pdf_reader.pages[page_num]
20
+ text += f"\n--- Page {page_num + 1} ---\n"
21
+ text += page.extract_text()
22
+ return text
23
+ except Exception as e:
24
+ return f"PyPDF2 Error: {e}"
25
+
26
+
27
+ def read_pdf_with_pdfplumber(pdf_path: str) -> str:
28
+ """Read PDF using pdfplumber."""
29
+ try:
30
+ text = ""
31
+ with pdfplumber.open(pdf_path) as pdf:
32
+ for page_num, page in enumerate(pdf.pages):
33
+ text += f"\n--- Page {page_num + 1} ---\n"
34
+ page_text = page.extract_text()
35
+ if page_text:
36
+ text += page_text
37
+ else:
38
+ text += "[No text found on this page]"
39
+ return text
40
+ except Exception as e:
41
+ return f"pdfplumber Error: {e}"
42
+
43
+
44
+ def main():
45
+ """Main function to read the PDF."""
46
+ pdf_path = "JRD_v1.1.pdf"
47
+
48
+ if not Path(pdf_path).exists():
49
+ print(f"Error: PDF file '{pdf_path}' not found!")
50
+ return
51
+
52
+ print("=" * 60)
53
+ print("PDF CONTENT EXTRACTION")
54
+ print("=" * 60)
55
+ print(f"File: {pdf_path}")
56
+ print()
57
+
58
+ # Try PyPDF2 first
59
+ print("📄 Using PyPDF2:")
60
+ print("-" * 30)
61
+ pypdf2_text = read_pdf_with_pypdf2(pdf_path)
62
+ print(pypdf2_text[:1000]) # Show first 1000 characters
63
+ if len(pypdf2_text) > 1000:
64
+ print("... (truncated)")
65
+ print()
66
+
67
+ # Try pdfplumber as backup
68
+ print("📄 Using pdfplumber:")
69
+ print("-" * 30)
70
+ pdfplumber_text = read_pdf_with_pdfplumber(pdf_path)
71
+ print(pdfplumber_text[:1000]) # Show first 1000 characters
72
+ if len(pdfplumber_text) > 1000:
73
+ print("... (truncated)")
74
+ print()
75
+
76
+ # Save full content to file
77
+ output_file = "pdf_content.txt"
78
+ with open(output_file, 'w', encoding='utf-8') as f:
79
+ f.write("=== PyPDF2 EXTRACTION ===\n")
80
+ f.write(pypdf2_text)
81
+ f.write("\n\n=== PDFPLUMBER EXTRACTION ===\n")
82
+ f.write(pdfplumber_text)
83
+
84
+ print(f"✅ Full content saved to: {output_file}")
85
+ print("=" * 60)
86
+
87
+
88
+ if __name__ == "__main__":
89
+ main()
reddit_client.py ADDED
@@ -0,0 +1,410 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Reddit Client for fetching job-related posts during analysis
3
+ """
4
+ import requests
5
+ import requests.auth
6
+ import random
7
+ import time
8
+ from typing import List, Dict, Any, Optional
9
+ from config import REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_USER_AGENT, JOB_SUBREDDITS
10
+
11
+ class RedditClient:
12
+ """Client for fetching Reddit posts from job-related subreddits"""
13
+
14
+ def __init__(self):
15
+ self.client_id = REDDIT_CLIENT_ID
16
+ self.client_secret = REDDIT_CLIENT_SECRET
17
+ self.user_agent = REDDIT_USER_AGENT
18
+ self.access_token = None
19
+ self.token_expires = 0
20
+
21
+ def get_access_token(self) -> Optional[str]:
22
+ """Get Reddit API access token"""
23
+ if self.access_token and time.time() < self.token_expires:
24
+ return self.access_token
25
+
26
+ auth = requests.auth.HTTPBasicAuth(self.client_id, self.client_secret)
27
+ data = {'grant_type': 'client_credentials'}
28
+ headers = {'User-Agent': self.user_agent}
29
+
30
+ try:
31
+ response = requests.post('https://www.reddit.com/api/v1/access_token',
32
+ auth=auth, data=data, headers=headers, timeout=10)
33
+
34
+ if response.status_code == 200:
35
+ token_data = response.json()
36
+ self.access_token = token_data['access_token']
37
+ self.token_expires = time.time() + token_data['expires_in'] - 60 # 1 min buffer
38
+ return self.access_token
39
+ except Exception as e:
40
+ print(f"Failed to get Reddit token: {e}")
41
+
42
+ return None
43
+
44
+ def get_hot_posts(self, subreddit: str, limit: int = 5) -> List[Dict[str, Any]]:
45
+ """Get hot posts from a subreddit"""
46
+ token = self.get_access_token()
47
+ if not token:
48
+ return []
49
+
50
+ headers = {
51
+ 'Authorization': f'Bearer {token}',
52
+ 'User-Agent': self.user_agent
53
+ }
54
+
55
+ try:
56
+ url = f'https://oauth.reddit.com/r/{subreddit}/hot'
57
+ params = {'limit': limit}
58
+
59
+ response = requests.get(url, headers=headers, params=params, timeout=10)
60
+
61
+ if response.status_code == 200:
62
+ data = response.json()
63
+ posts = []
64
+
65
+ for post in data['data']['children']:
66
+ post_data = post['data']
67
+ posts.append({
68
+ 'title': post_data['title'],
69
+ 'score': post_data['score'],
70
+ 'num_comments': post_data['num_comments'],
71
+ 'url': f"https://reddit.com{post_data['permalink']}",
72
+ 'subreddit': post_data['subreddit'],
73
+ 'created_utc': post_data['created_utc']
74
+ })
75
+
76
+ return posts
77
+ except Exception as e:
78
+ print(f"Failed to fetch posts from r/{subreddit}: {e}")
79
+
80
+ return []
81
+
82
+ def get_random_job_posts(self, num_posts: int = 3) -> List[Dict[str, Any]]:
83
+ """Get random posts from job-related subreddits"""
84
+ all_posts = []
85
+
86
+ # Try to get posts from multiple subreddits
87
+ for subreddit in random.sample(JOB_SUBREDDITS, min(3, len(JOB_SUBREDDITS))):
88
+ posts = self.get_hot_posts(subreddit, limit=2)
89
+ all_posts.extend(posts)
90
+
91
+ # Return random selection
92
+ if all_posts:
93
+ return random.sample(all_posts, min(num_posts, len(all_posts)))
94
+
95
+ # Fallback posts if Reddit API fails
96
+ return [
97
+ {
98
+ 'title': '💡 Pro tip: Research the company culture before your interview',
99
+ 'score': 156,
100
+ 'num_comments': 23,
101
+ 'url': '#',
102
+ 'subreddit': 'careerguidance',
103
+ 'created_utc': time.time()
104
+ },
105
+ {
106
+ 'title': '🎯 STAR method for behavioral questions - game changer!',
107
+ 'score': 289,
108
+ 'num_comments': 47,
109
+ 'url': '#',
110
+ 'subreddit': 'jobs',
111
+ 'created_utc': time.time()
112
+ },
113
+ {
114
+ 'title': '🔥 Always prepare 3 questions to ask the interviewer',
115
+ 'score': 198,
116
+ 'num_comments': 31,
117
+ 'url': '#',
118
+ 'subreddit': 'careeradvice',
119
+ 'created_utc': time.time()
120
+ }
121
+ ]
122
+
123
+ def format_posts_for_display(self, posts: List[Dict[str, Any]]) -> str:
124
+ """Format posts for HTML display"""
125
+ if not posts:
126
+ return "<p>Loading career insights...</p>"
127
+
128
+ html = '<div class="reddit-posts">'
129
+
130
+ for post in posts:
131
+ # Format time ago
132
+ time_ago = int(time.time() - post['created_utc'])
133
+ if time_ago < 3600:
134
+ time_str = f"{time_ago // 60}m ago"
135
+ elif time_ago < 86400:
136
+ time_str = f"{time_ago // 3600}h ago"
137
+ else:
138
+ time_str = f"{time_ago // 86400}d ago"
139
+
140
+ html += f'''
141
+ <div class="reddit-post">
142
+ <div class="post-header">
143
+ <span class="subreddit">r/{post['subreddit']}</span>
144
+ <span class="post-time">{time_str}</span>
145
+ </div>
146
+ <h4 class="post-title">{post['title']}</h4>
147
+ <div class="post-stats">
148
+ <span class="upvotes">↑ {post['score']}</span>
149
+ <span class="comments">💬 {post['num_comments']}</span>
150
+ </div>
151
+ </div>
152
+ '''
153
+
154
+ html += '</div>'
155
+ return html
156
+
157
+ def get_top_posts_of_week(self, subreddit: str, limit: int = 1) -> List[Dict[str, Any]]:
158
+ """Get top posts of the week from a subreddit"""
159
+ token = self.get_access_token()
160
+ if not token:
161
+ print(f"No token for r/{subreddit}")
162
+ return []
163
+
164
+ headers = {
165
+ 'Authorization': f'Bearer {token}',
166
+ 'User-Agent': self.user_agent
167
+ }
168
+
169
+ try:
170
+ url = f'https://oauth.reddit.com/r/{subreddit}/top'
171
+ params = {'limit': limit, 't': 'week'} # t=week for top of the week
172
+
173
+ print(f"Fetching from: {url} with params: {params}")
174
+ response = requests.get(url, headers=headers, params=params, timeout=10)
175
+ print(f"Response status for r/{subreddit}: {response.status_code}")
176
+
177
+ if response.status_code == 200:
178
+ data = response.json()
179
+ posts = []
180
+
181
+ if 'data' in data and 'children' in data['data'] and len(data['data']['children']) > 0:
182
+ for post in data['data']['children']:
183
+ post_data = post['data']
184
+
185
+ # Get full post content including body text
186
+ full_content = post_data.get('selftext', '').strip()
187
+ if not full_content:
188
+ # If no selftext, create a preview from the title
189
+ full_content = "Link post - click 'View Full Post' to see the discussion and content."
190
+ elif len(full_content) > 400: # Truncate very long posts
191
+ full_content = full_content[:400] + "..."
192
+
193
+ posts.append({
194
+ 'title': post_data['title'],
195
+ 'content': full_content,
196
+ 'score': post_data['score'],
197
+ 'num_comments': post_data['num_comments'],
198
+ 'url': f"https://reddit.com{post_data['permalink']}",
199
+ 'subreddit': post_data['subreddit'],
200
+ 'created_utc': post_data['created_utc'],
201
+ 'author': post_data['author'],
202
+ 'flair': post_data.get('link_flair_text', '') or ''
203
+ })
204
+
205
+ print(f"Successfully fetched {len(posts)} posts from r/{subreddit}")
206
+ return posts
207
+ else:
208
+ print(f"No posts found in response for r/{subreddit}")
209
+ else:
210
+ print(f"API error for r/{subreddit}: {response.status_code} - {response.text[:200]}")
211
+
212
+ except Exception as e:
213
+ print(f"Exception fetching posts from r/{subreddit}: {e}")
214
+
215
+ return []
216
+
217
+ def get_job_posts_by_subreddit(self) -> Dict[str, Dict[str, Any]]:
218
+ """Get one top post of the week from each job-related subreddit"""
219
+ subreddit_posts = {}
220
+
221
+ for subreddit in JOB_SUBREDDITS:
222
+ posts = self.get_top_posts_of_week(subreddit, limit=1)
223
+ if posts:
224
+ subreddit_posts[subreddit] = posts[0]
225
+ print(f"✅ Successfully got post from r/{subreddit}")
226
+ else:
227
+ print(f"❌ Failed to get posts from r/{subreddit} - API credentials invalid")
228
+ # Return empty dict if Reddit API fails - no fallbacks
229
+ subreddit_posts[subreddit] = None
230
+
231
+ # Filter out None values
232
+ subreddit_posts = {k: v for k, v in subreddit_posts.items() if v is not None}
233
+
234
+ return subreddit_posts
235
+
236
+ def get_single_subreddit_post(self, subreddit: str) -> Optional[Dict[str, Any]]:
237
+ """Get a single fresh post from a specific subreddit"""
238
+ posts = self.get_top_posts_of_week(subreddit, limit=3) # Get 3 to have variety
239
+ if posts:
240
+ # Return a random post from the top 3 for variety
241
+ import random
242
+ return random.choice(posts)
243
+ else:
244
+ # If Reddit API fails, return None - no fallbacks
245
+ print(f"❌ Failed to refresh post from r/{subreddit} - API credentials invalid")
246
+ return None
247
+
248
+ def format_subreddit_posts_for_display(self, subreddit_posts: Dict[str, Dict[str, Any]]) -> str:
249
+ """Format subreddit posts for HTML display with individual refresh buttons"""
250
+ if not subreddit_posts:
251
+ return "<p>Loading career insights...</p>"
252
+
253
+ html = '<div class="reddit-posts-enhanced">'
254
+
255
+ for subreddit, post in subreddit_posts.items():
256
+ # Format time ago
257
+ time_ago = int(time.time() - post['created_utc'])
258
+ if time_ago < 3600:
259
+ time_str = f"{time_ago // 60}m ago"
260
+ elif time_ago < 86400:
261
+ time_str = f"{time_ago // 3600}h ago"
262
+ else:
263
+ time_str = f"{time_ago // 86400}d ago"
264
+
265
+ # Format flair
266
+ flair_html = f'<span class="post-flair">{post["flair"]}</span>' if post.get('flair') else ''
267
+
268
+ html += f'''
269
+ <div class="reddit-post-enhanced" data-subreddit="{subreddit}">
270
+ <div class="post-header-enhanced">
271
+ <div class="subreddit-info">
272
+ <span class="subreddit-name">r/{subreddit}</span>
273
+ {flair_html}
274
+ </div>
275
+ <div class="post-meta">
276
+ <span class="post-time">{time_str}</span>
277
+ <button class="refresh-post-btn" onclick="refreshPost('{subreddit}')" title="Get new post from this subreddit">
278
+ 🔄
279
+ </button>
280
+ </div>
281
+ </div>
282
+ <h4 class="post-title-enhanced">
283
+ <a href="{post['url']}" target="_blank" rel="noopener">{post['title']}</a>
284
+ </h4>
285
+ <div class="post-content-enhanced">
286
+ {post['content']}
287
+ </div>
288
+ <div class="post-stats-enhanced">
289
+ <div class="stats-left">
290
+ <span class="upvotes">↑ {post['score']}</span>
291
+ <span class="comments">💬 {post['num_comments']}</span>
292
+ <span class="author">👤 u/{post['author']}</span>
293
+ </div>
294
+ <a href="{post['url']}" target="_blank" class="view-full-btn" rel="noopener">
295
+ View Full Post →
296
+ </a>
297
+ </div>
298
+ </div>
299
+ '''
300
+
301
+ html += '''
302
+ </div>
303
+ <script>
304
+ function refreshPost(subreddit) {
305
+ console.log('Refreshing post for r/' + subreddit);
306
+ // Map subreddit names to button IDs
307
+ const buttonMap = {
308
+ 'jobs': 'refresh-jobs',
309
+ 'careerguidance': 'refresh-careerguidance',
310
+ 'cscareerquestions': 'refresh-cscareerquestions',
311
+ 'careeradvice': 'refresh-careeradvice',
312
+ 'ITCareerQuestions': 'refresh-ITCareerQuestions'
313
+ };
314
+
315
+ const buttonId = buttonMap[subreddit];
316
+ if (buttonId) {
317
+ // Find and click the hidden Gradio button
318
+ const refreshBtn = document.getElementById(buttonId);
319
+ if (refreshBtn) {
320
+ refreshBtn.click();
321
+ } else {
322
+ console.log('Button not found:', buttonId);
323
+ }
324
+ }
325
+ }
326
+ </script>
327
+ '''
328
+
329
+ return html
330
+
331
+ def format_posts_as_widget_cards(self, subreddit_posts: Dict[str, Dict[str, Any]]) -> str:
332
+ """Format subreddit posts as compact widget cards for top display"""
333
+ if not subreddit_posts:
334
+ return "<div class='reddit-widgets-loading' style='text-align: center; padding: 2rem; color: rgba(255,255,255,0.6);'>Reddit API credentials invalid - no posts available</div>"
335
+
336
+ html = '<div class="reddit-widgets-container">'
337
+
338
+ for subreddit, post in subreddit_posts.items():
339
+ # Format time ago
340
+ time_ago = int(time.time() - post['created_utc'])
341
+ if time_ago < 3600:
342
+ time_str = f"{time_ago // 60}m"
343
+ elif time_ago < 86400:
344
+ time_str = f"{time_ago // 3600}h"
345
+ else:
346
+ time_str = f"{time_ago // 86400}d"
347
+
348
+ # Truncate content for widget view
349
+ widget_content = post['content']
350
+ if len(widget_content) > 120:
351
+ widget_content = widget_content[:120] + "..."
352
+
353
+ # Format source display - use actual source for articles, r/subreddit for Reddit posts
354
+ source_display = post['subreddit'] if post['subreddit'].startswith(('jobs', 'career', 'cs', 'IT')) else post['subreddit']
355
+ if not source_display.startswith('r/'):
356
+ source_prefix = "" # For articles from Forbes, Harvard, etc.
357
+ else:
358
+ source_prefix = "r/"
359
+
360
+ html += f'''
361
+ <div class="reddit-widget-card" data-subreddit="{subreddit}">
362
+ <div class="widget-header">
363
+ <span class="widget-subreddit">{source_prefix}{source_display}</span>
364
+ <div class="widget-actions">
365
+ <span class="widget-time">{time_str}</span>
366
+ <button class="widget-refresh-btn" onclick="refreshPost('{subreddit}')" title="Refresh">
367
+ 🔄
368
+ </button>
369
+ </div>
370
+ </div>
371
+ <h5 class="widget-title">
372
+ <a href="{post['url']}" target="_blank" rel="noopener">{post['title']}</a>
373
+ </h5>
374
+ <p class="widget-content">{widget_content}</p>
375
+ <div class="widget-stats">
376
+ <span class="widget-score">↑ {post['score']}</span>
377
+ <span class="widget-comments">💬 {post['num_comments']}</span>
378
+ <a href="{post['url']}" target="_blank" class="widget-link" rel="noopener">Read →</a>
379
+ </div>
380
+ </div>
381
+ '''
382
+
383
+ html += '''
384
+ </div>
385
+ <script>
386
+ function refreshPost(subreddit) {
387
+ console.log('Refreshing widget for r/' + subreddit);
388
+ const buttonMap = {
389
+ 'jobs': 'refresh-jobs',
390
+ 'careerguidance': 'refresh-careerguidance',
391
+ 'cscareerquestions': 'refresh-cscareerquestions',
392
+ 'careeradvice': 'refresh-careeradvice',
393
+ 'ITCareerQuestions': 'refresh-ITCareerQuestions'
394
+ };
395
+
396
+ const buttonId = buttonMap[subreddit];
397
+ if (buttonId) {
398
+ const refreshBtn = document.getElementById(buttonId);
399
+ if (refreshBtn) {
400
+ refreshBtn.click();
401
+ }
402
+ }
403
+ }
404
+ </script>
405
+ '''
406
+
407
+ return html
408
+
409
+ # Global Reddit client instance
410
+ reddit_client = RedditClient()
render_buckets.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List
2
+ import re
3
+
4
+ def render_buckets(bucket_facts: Dict[str, str], buckets: Dict[str, List[str]]) -> str:
5
+ """Render markdown for intelligence buckets, hiding empty ones"""
6
+
7
+ # Predefined bucket order and emojis
8
+ bucket_order = [
9
+ ("Team & Manager", "👥"),
10
+ ("Tech Stack Snapshot", "⚡"),
11
+ ("Business Context", "🏢"),
12
+ ("Comp & Leveling", "💰"),
13
+ ("Career Trajectory", "📈"),
14
+ ("Culture/WLB", "🌟"),
15
+ ("Interview Runway", "🎯"),
16
+ ("Onboarding & Tooling", "🛠️"),
17
+ ("Location/Remote", "🌍"),
18
+ ("Strategic Risks", "⚠️")
19
+ ]
20
+
21
+ rendered_sections = []
22
+ empty_bucket_count = 0
23
+
24
+ for bucket_name, emoji in bucket_order:
25
+ facts = buckets.get(bucket_name, [])
26
+
27
+ # Skip empty buckets
28
+ if not facts or all(not fact.strip() for fact in facts):
29
+ empty_bucket_count += 1
30
+ continue
31
+
32
+ # Render bucket with limited bullets
33
+ section = f"## {emoji} {bucket_name}\n\n"
34
+
35
+ # Limit to 6 bullets per bucket
36
+ limited_facts = facts[:6]
37
+
38
+ for fact in limited_facts:
39
+ if fact.strip():
40
+ # Ensure fact ends with source link
41
+ fact = fact.strip()
42
+ if not fact.endswith('🔗'):
43
+ # Add generic source link if missing
44
+ if 'http' not in fact:
45
+ fact += " 🔗"
46
+
47
+ section += f"- {fact}\n"
48
+
49
+ section += "\n"
50
+ rendered_sections.append(section)
51
+
52
+ # Log empty buckets for metrics
53
+ if empty_bucket_count > 0:
54
+ from metrics import log_metric
55
+ log_metric("bucket_missing", {"empty_buckets": empty_bucket_count})
56
+
57
+ # Only return content if we have non-empty buckets
58
+ if rendered_sections:
59
+ header = "# 🧠 Deep Intelligence Analysis\n\n"
60
+ return header + "".join(rendered_sections)
61
+ else:
62
+ return ""
63
+
64
+ def format_bullet_with_source(text: str, source_url: str = "") -> str:
65
+ """Format a bullet point with proper source link"""
66
+ text = text.strip()
67
+
68
+ # If already has source link, return as-is
69
+ if '🔗' in text:
70
+ return text
71
+
72
+ # Add source link
73
+ if source_url:
74
+ return f"{text} 🔗 {source_url}"
75
+ else:
76
+ return f"{text} 🔗"
77
+
78
+ def _format_bullet(item: str) -> str:
79
+ """Format individual bullet with emoji and source links"""
80
+
81
+ # Extract URLs and add link emoji
82
+ if "🔗" in item:
83
+ return item
84
+ elif "http" in item:
85
+ # Add link emoji for URLs
86
+ item = re.sub(r'(https?://[^\s]+)', r'🔗 \1', item)
87
+
88
+ # Add context emoji based on content
89
+ if any(keyword in item.lower() for keyword in ["manager", "team", "hiring"]):
90
+ return f"👥 {item}"
91
+ elif any(keyword in item.lower() for keyword in ["salary", "comp", "pay"]):
92
+ return f"💰 {item}"
93
+ elif any(keyword in item.lower() for keyword in ["culture", "rating"]):
94
+ return f"🏢 {item}"
95
+ elif any(keyword in item.lower() for keyword in ["stack", "tech", "tools"]):
96
+ return f"⚙️ {item}"
97
+ elif any(keyword in item.lower() for keyword in ["news", "business"]):
98
+ return f"📈 {item}"
99
+ else:
100
+ return f"📋 {item}"
render_cards.py ADDED
@@ -0,0 +1,310 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Clean card-based rendering for IQKiller job analysis.
3
+ Replaces markdown blob with focused HTML cards.
4
+ """
5
+ from typing import Dict, List, Optional, Any
6
+ import re
7
+
8
+
9
+ def badge(value: str, field: str, source_map: Dict[str, str]) -> str:
10
+ """Add '(from Google)' badge if field was patched via Google search."""
11
+ if source_map.get(field) == "google":
12
+ return f"{value} <em style='color:#666; font-size:0.9em'>(from Google)</em>"
13
+ return value
14
+
15
+
16
+ def bullets(items: List[str], css_class: str = "text-gray-700") -> str:
17
+ """Convert list to HTML bullet points."""
18
+ if not items:
19
+ return ""
20
+
21
+ bullet_items = "".join([f"<li class='{css_class}'>{item}</li>" for item in items])
22
+ return f"<ul class='list-disc list-inside space-y-1'>{bullet_items}</ul>"
23
+
24
+
25
+ def at_a_glance_card(job_data: Dict[str, Any], source_map: Dict[str, str]) -> str:
26
+ """Build the main at-a-glance job info card."""
27
+ company = job_data.get("company", "Unknown Company")
28
+ role = job_data.get("role", "Unknown Role")
29
+ level = job_data.get("seniority", job_data.get("level", ""))
30
+ location = job_data.get("location", "")
31
+ posted = job_data.get("posted_days", job_data.get("posted_age", ""))
32
+
33
+ # Format salary range
34
+ salary_low = job_data.get("salary_low")
35
+ salary_high = job_data.get("salary_high")
36
+ salary_text = ""
37
+ if salary_low or salary_high:
38
+ if salary_low and salary_high:
39
+ salary_text = f"${salary_low:,} - ${salary_high:,}"
40
+ elif salary_low:
41
+ salary_text = f"${salary_low:,}+"
42
+ elif salary_high:
43
+ salary_text = f"Up to ${salary_high:,}"
44
+ salary_text = badge(salary_text, "salary_low", source_map)
45
+
46
+ # Format posted time
47
+ posted_text = ""
48
+ if posted:
49
+ if isinstance(posted, int):
50
+ if posted == 1:
51
+ posted_text = "1 day ago"
52
+ else:
53
+ posted_text = f"{posted} days ago"
54
+ else:
55
+ posted_text = str(posted)
56
+
57
+ return f"""
58
+ <div class='bg-white border border-gray-200 rounded-lg p-6 shadow-sm mb-4'>
59
+ <div class='flex items-start justify-between'>
60
+ <div class='flex-1'>
61
+ <h2 class='text-2xl font-bold text-gray-900'>{role}</h2>
62
+ <p class='text-lg text-blue-600 font-semibold mt-1'>{company}</p>
63
+ <div class='flex flex-wrap gap-4 mt-3 text-sm text-gray-600'>
64
+ {f"<span>📍 {location}</span>" if location else ""}
65
+ {f"<span>⚡ {level}</span>" if level else ""}
66
+ {f"<span>🕒 {posted_text}</span>" if posted_text else ""}
67
+ </div>
68
+ {f"<div class='mt-3 text-lg font-semibold text-green-600'>{salary_text}</div>" if salary_text else ""}
69
+ </div>
70
+ </div>
71
+ </div>
72
+ """
73
+
74
+
75
+ def quick_context_card(job_data: Dict[str, Any], source_map: Dict[str, str]) -> str:
76
+ """Build mission and funding context banner."""
77
+ mission = job_data.get("mission", "")
78
+ funding = job_data.get("funding", "")
79
+
80
+ if not mission and not funding:
81
+ return ""
82
+
83
+ content = ""
84
+ if mission:
85
+ content += f"<p class='text-gray-800'>{mission}</p>"
86
+
87
+ if funding:
88
+ funding_text = badge(funding, "funding", source_map)
89
+ content += f"<p class='text-blue-700 font-medium mt-2'>💰 {funding_text}</p>"
90
+
91
+ return f"""
92
+ <div class='bg-green-50 border border-green-200 rounded-lg p-4 mb-4'>
93
+ <h3 class='text-lg font-semibold text-green-800 mb-2'>Quick Context</h3>
94
+ {content}
95
+ </div>
96
+ """
97
+
98
+
99
+ def skills_section(must_have: List[str], nice_to_have: List[str]) -> str:
100
+ """Build must-have and nice-to-have skills sections."""
101
+ if not must_have and not nice_to_have:
102
+ return ""
103
+
104
+ content = ""
105
+
106
+ if must_have:
107
+ content += f"""
108
+ <div class='mb-4'>
109
+ <h4 class='font-semibold text-gray-900 mb-2'>Must-Have Skills</h4>
110
+ {bullets(must_have, "text-gray-700")}
111
+ </div>
112
+ """
113
+
114
+ if nice_to_have:
115
+ content += f"""
116
+ <div>
117
+ <h4 class='font-semibold text-gray-600 mb-2'>Nice-to-Have Skills</h4>
118
+ {bullets(nice_to_have, "text-gray-500")}
119
+ </div>
120
+ """
121
+
122
+ return f"""
123
+ <div class='bg-white border border-gray-200 rounded-lg p-6 mb-4'>
124
+ <h3 class='text-lg font-semibold text-gray-900 mb-4'>Skills & Requirements</h3>
125
+ {content}
126
+ </div>
127
+ """
128
+
129
+
130
+ def interview_cheat_sheet(tech_q: List[str], behav_q: List[str]) -> str:
131
+ """Build collapsible interview prep section."""
132
+ if not tech_q and not behav_q:
133
+ return ""
134
+
135
+ tech_content = bullets(tech_q[:3], "text-gray-700") if tech_q else ""
136
+ behav_content = bullets(behav_q[:3], "text-gray-700") if behav_q else ""
137
+
138
+ return f"""
139
+ <div class='bg-white border border-gray-200 rounded-lg p-6 mb-4'>
140
+ <details>
141
+ <summary class='text-lg font-semibold text-gray-900 cursor-pointer hover:text-blue-600'>
142
+ Interview Cheat Sheet
143
+ </summary>
144
+ <div class='mt-4 space-y-4'>
145
+ {f"<div><h4 class='font-semibold text-gray-900 mb-2'>Technical Questions</h4>{tech_content}</div>" if tech_content else ""}
146
+ {f"<div><h4 class='font-semibold text-gray-900 mb-2'>Behavioral Questions</h4>{behav_content}</div>" if behav_content else ""}
147
+ </div>
148
+ </details>
149
+ </div>
150
+ """
151
+
152
+
153
+ def comp_perks_section(job_data: Dict[str, Any], perks: List[str]) -> str:
154
+ """Build compensation and perks section."""
155
+ salary_low = job_data.get("salary_low")
156
+ salary_high = job_data.get("salary_high")
157
+
158
+ if not salary_low and not salary_high and not perks:
159
+ return ""
160
+
161
+ content = ""
162
+
163
+ if perks:
164
+ content += f"""
165
+ <div>
166
+ <h4 class='font-semibold text-gray-900 mb-2'>Perks & Benefits</h4>
167
+ {bullets(perks, "text-gray-700")}
168
+ </div>
169
+ """
170
+
171
+ return f"""
172
+ <div class='bg-white border border-gray-200 rounded-lg p-6 mb-4'>
173
+ <h3 class='text-lg font-semibold text-gray-900 mb-4'>Compensation & Perks</h3>
174
+ {content}
175
+ </div>
176
+ """
177
+
178
+
179
+ def red_flags_section(red_flags: List[str]) -> str:
180
+ """Build red flags warning section."""
181
+ if not red_flags:
182
+ return ""
183
+
184
+ return f"""
185
+ <div class='bg-red-50 border border-red-200 rounded-lg p-4 mb-4'>
186
+ <h3 class='text-lg font-semibold text-red-800 mb-2'>🚩 Red Flag Watchlist</h3>
187
+ {bullets(red_flags, "text-red-700")}
188
+ </div>
189
+ """
190
+
191
+
192
+ def next_actions_section(apply_link: str = "") -> str:
193
+ """Build action buttons section."""
194
+ apply_button = ""
195
+ if apply_link:
196
+ apply_button = f"""
197
+ <a href="{apply_link}" target="_blank"
198
+ class='inline-block bg-blue-600 text-white px-6 py-2 rounded-lg hover:bg-blue-700 transition-colors'>
199
+ Apply Now
200
+ </a>
201
+ """
202
+
203
+ return f"""
204
+ <div class='bg-gray-50 border border-gray-200 rounded-lg p-6'>
205
+ <h3 class='text-lg font-semibold text-gray-900 mb-4'>Next Actions</h3>
206
+ <div class='flex gap-3 flex-wrap'>
207
+ <button onclick='copyToClipboard()'
208
+ class='bg-green-600 text-white px-6 py-2 rounded-lg hover:bg-green-700 transition-colors'>
209
+ 📋 Copy Summary
210
+ </button>
211
+ <button onclick='downloadPDF()'
212
+ class='bg-gray-600 text-white px-6 py-2 rounded-lg hover:bg-gray-700 transition-colors'>
213
+ 📥 Download PDF
214
+ </button>
215
+ {apply_button}
216
+ </div>
217
+ </div>
218
+ """
219
+
220
+
221
+ def extract_qa_data(qa_content: str) -> Dict[str, List[str]]:
222
+ """Extract structured data from QA content."""
223
+ data = {
224
+ "must_have": [],
225
+ "nice_to_have": [],
226
+ "tech_q": [],
227
+ "behav_q": [],
228
+ "perks": [],
229
+ "red_flags": []
230
+ }
231
+
232
+ # Simple regex patterns to extract lists from QA content
233
+ patterns = {
234
+ "must_have": r"(?:must.?have|required|essential).*?(?:\n|$)((?:\s*[-•]\s*.+(?:\n|$))*)",
235
+ "nice_to_have": r"(?:nice.?to.?have|preferred|bonus).*?(?:\n|$)((?:\s*[-•]\s*.+(?:\n|$))*)",
236
+ "tech_q": r"(?:technical|tech).*?question.*?(?:\n|$)((?:\s*[-•]\s*.+(?:\n|$))*)",
237
+ "behav_q": r"(?:behavioral|behaviour).*?question.*?(?:\n|$)((?:\s*[-•]\s*.+(?:\n|$))*)",
238
+ "perks": r"(?:perks|benefits).*?(?:\n|$)((?:\s*[-•]\s*.+(?:\n|$))*)",
239
+ "red_flags": r"(?:red.?flag|warning|concern).*?(?:\n|$)((?:\s*[-•]\s*.+(?:\n|$))*)"
240
+ }
241
+
242
+ for key, pattern in patterns.items():
243
+ matches = re.findall(pattern, qa_content, re.IGNORECASE | re.MULTILINE)
244
+ for match in matches:
245
+ items = re.findall(r"[-•]\s*(.+)", match)
246
+ data[key].extend([item.strip() for item in items if item.strip()])
247
+
248
+ return data
249
+
250
+
251
+ def to_html(result_data: Dict[str, Any]) -> str:
252
+ """Convert job analysis result to clean HTML cards."""
253
+ # Extract job core data
254
+ enriched = result_data.get("enriched", {})
255
+ source_map = enriched.get("source_map", {})
256
+
257
+ # Extract QA-derived data (using correct orchestrator keys)
258
+ qa_content = result_data.get("qa_result", "")
259
+ critique_content = result_data.get("critique", "")
260
+ draft_content = result_data.get("draft", "")
261
+
262
+ qa_data = extract_qa_data(qa_content + "\n" + critique_content + "\n" + draft_content)
263
+
264
+ # Apply link from enriched data
265
+ apply_link = enriched.get("apply_link", "")
266
+
267
+ # Build HTML sections
268
+ html_parts = [
269
+ at_a_glance_card(enriched, source_map),
270
+ quick_context_card(enriched, source_map),
271
+ skills_section(qa_data["must_have"], qa_data["nice_to_have"]),
272
+ interview_cheat_sheet(qa_data["tech_q"], qa_data["behav_q"]),
273
+ comp_perks_section(enriched, qa_data["perks"]),
274
+ red_flags_section(qa_data["red_flags"]),
275
+ next_actions_section(apply_link)
276
+ ]
277
+
278
+ # JavaScript for copy functionality
279
+ role = enriched.get('role', 'Unknown')
280
+ company = enriched.get('company', 'Unknown')
281
+ location = enriched.get('location', 'N/A')
282
+ seniority = enriched.get('seniority', 'N/A')
283
+ mission = enriched.get('mission', '')
284
+
285
+ js_script = f"""
286
+ <script>
287
+ window.__IQ_SUMMARY__ = `Job: {role} at {company}
288
+ Location: {location}
289
+ Level: {seniority}
290
+ {mission}`;
291
+
292
+ function copyToClipboard() {{
293
+ navigator.clipboard.writeText(window.__IQ_SUMMARY__).then(() => {{
294
+ alert('Summary copied to clipboard!');
295
+ }});
296
+ }}
297
+
298
+ function downloadPDF() {{
299
+ alert('PDF download coming soon!');
300
+ }}
301
+ </script>
302
+ """
303
+
304
+ # Combine all sections
305
+ return "\n".join([part for part in html_parts if part.strip()]) + js_script
306
+
307
+
308
+ def skeleton() -> str:
309
+ """Return loading skeleton placeholder."""
310
+ return "<div class='animate-pulse p-6 text-gray-400'>Analyzing JD…</div>"
render_cards_test.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tests for render_cards module.
3
+ """
4
+ import pytest
5
+ from render_cards import at_a_glance_card, badge, bullets, to_html
6
+
7
+
8
+ def test_badge_adds_google_source():
9
+ """Test that badge adds '(from Google)' when field is in source_map."""
10
+ source_map = {"salary_low": "google"}
11
+ result = badge("$120,000", "salary_low", source_map)
12
+ assert "(from Google)" in result
13
+ assert "120,000" in result
14
+
15
+
16
+ def test_badge_no_source():
17
+ """Test that badge doesn't add annotation when field not in source_map."""
18
+ source_map = {}
19
+ result = badge("$120,000", "salary_low", source_map)
20
+ assert "(from Google)" not in result
21
+ assert result == "$120,000"
22
+
23
+
24
+ def test_bullets_creates_list():
25
+ """Test that bullets creates proper HTML list."""
26
+ items = ["Python", "Machine Learning", "SQL"]
27
+ result = bullets(items)
28
+ assert "<ul" in result
29
+ assert "<li" in result
30
+ assert "Python" in result
31
+ assert "SQL" in result
32
+
33
+
34
+ def test_at_a_glance_card_basic():
35
+ """Test at-a-glance card contains key job info."""
36
+ job_data = {
37
+ "company": "TechCorp",
38
+ "role": "Senior Engineer",
39
+ "location": "San Francisco",
40
+ "seniority": "Senior"
41
+ }
42
+ source_map = {}
43
+
44
+ result = at_a_glance_card(job_data, source_map)
45
+
46
+ assert "TechCorp" in result
47
+ assert "Senior Engineer" in result
48
+ assert "San Francisco" in result
49
+ assert "Senior" in result
50
+ assert "bg-white" in result # Tailwind class
51
+
52
+
53
+ def test_to_html_complete():
54
+ """Test full HTML generation with sample data."""
55
+ result_data = {
56
+ "enriched": {
57
+ "company": "TestCorp",
58
+ "role": "Software Engineer",
59
+ "location": "Remote",
60
+ "mission": "Building the future",
61
+ "source_map": {}
62
+ },
63
+ "qa_content": "Must-have skills:\n- Python\n- SQL",
64
+ "critique_content": "Red flags:\n- Long hours mentioned"
65
+ }
66
+
67
+ html = to_html(result_data)
68
+
69
+ assert "TestCorp" in html
70
+ assert "Software Engineer" in html
71
+ assert "Building the future" in html
72
+ assert "<script>" in html # JavaScript included
73
+ assert "copyToClipboard" in html
74
+
75
+
76
+ def test_empty_data_handling():
77
+ """Test handling of missing or empty data."""
78
+ result_data = {"enriched": {}, "qa_content": "", "critique_content": ""}
79
+
80
+ html = to_html(result_data)
81
+
82
+ # Should still generate basic structure without errors
83
+ assert "Unknown" in html # Fallback values
84
+ assert "<script>" in html
renderer_nobs.py ADDED
@@ -0,0 +1,470 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ No-BS Job Brief renderer for IQKiller.
3
+ Creates compact, single-card job briefs with essential info only.
4
+ """
5
+ from typing import Dict, List, Optional, Any
6
+
7
+ def skeleton() -> str:
8
+ """Return loading skeleton placeholder."""
9
+ return "<div class='animate-pulse p-4 text-gray-400'>Analyzing…</div>"
10
+
11
+ def bullets(items: List[str], css_class: str = "text-gray-700") -> str:
12
+ """Convert list to HTML bullet points."""
13
+ if not items:
14
+ return ""
15
+
16
+ bullet_items = "".join([f"<li class='{css_class} text-sm'>{item}</li>" for item in items])
17
+ return f"<ul class='list-disc list-inside space-y-1 ml-4'>{bullet_items}</ul>"
18
+
19
+ def hide_if_empty(content: str, wrapper: str = "") -> str:
20
+ """Hide section if content is empty."""
21
+ if not content.strip():
22
+ return ""
23
+ return wrapper.format(content=content) if wrapper else content
24
+
25
+ def format_title_line(data: Dict[str, Any]) -> str:
26
+ """Format the compact title line."""
27
+ title = data.get("title", "Unknown Role")
28
+ company = data.get("company", "Unknown Company")
29
+ location = data.get("location", "")
30
+ work_type = data.get("work_type", "")
31
+ salary_band = data.get("salary_band", "")
32
+
33
+ # Build title components
34
+ parts = [f"{title} · {company}"]
35
+
36
+ if work_type or location:
37
+ location_work = " ".join(filter(None, [work_type, location]))
38
+ parts.append(location_work)
39
+
40
+ if salary_band:
41
+ parts.append(f"• {salary_band}")
42
+
43
+ return " — ".join(parts)
44
+
45
+ def format_mission(mission: str) -> str:
46
+ """Format mission one-liner (≤25 words)."""
47
+ if not mission:
48
+ return ""
49
+
50
+ # Truncate if too long
51
+ words = mission.split()
52
+ if len(words) > 25:
53
+ mission = " ".join(words[:25]) + "..."
54
+
55
+ return f"""
56
+ <div class='mb-4'>
57
+ <p class='text-gray-800 text-sm italic'>{mission}</p>
58
+ </div>
59
+ """
60
+
61
+ def format_must_have(must_have: List[str]) -> str:
62
+ """Format must-have stack (≤6 bullets, <7 words each)."""
63
+ if not must_have:
64
+ return ""
65
+
66
+ # Limit to 6 items and truncate long items
67
+ limited_items = must_have[:6]
68
+ truncated_items = []
69
+
70
+ for item in limited_items:
71
+ words = item.split()
72
+ if len(words) > 7:
73
+ item = " ".join(words[:7]) + "..."
74
+ truncated_items.append(item)
75
+
76
+ bullet_html = bullets(truncated_items, "text-gray-900 font-medium")
77
+
78
+ return f"""
79
+ <div class='mb-4'>
80
+ <h3 class='text-sm font-semibold text-gray-900 mb-2'>Must-Have Stack</h3>
81
+ {bullet_html}
82
+ </div>
83
+ """
84
+
85
+ def format_nice_to_have(nice_to_have: List[str]) -> str:
86
+ """Format nice-to-have skills (grey bullets)."""
87
+ if not nice_to_have:
88
+ return ""
89
+
90
+ # Limit to 6 items
91
+ limited_items = nice_to_have[:6]
92
+ bullet_html = bullets(limited_items, "text-gray-500")
93
+
94
+ return f"""
95
+ <div class='mb-4'>
96
+ <h3 class='text-sm font-semibold text-gray-600 mb-2'>Nice-to-Haves</h3>
97
+ {bullet_html}
98
+ </div>
99
+ """
100
+
101
+ def format_why_it_matters(why_it_matters: str) -> str:
102
+ """Format why-it-matters (≤30 words)."""
103
+ if not why_it_matters:
104
+ return ""
105
+
106
+ # Truncate if too long
107
+ words = why_it_matters.split()
108
+ if len(words) > 30:
109
+ why_it_matters = " ".join(words[:30]) + "..."
110
+
111
+ return f"""
112
+ <div class='mb-4'>
113
+ <h3 class='text-sm font-semibold text-blue-700 mb-2'>Why It Matters</h3>
114
+ <p class='text-gray-700 text-sm'>{why_it_matters}</p>
115
+ </div>
116
+ """
117
+
118
+ def format_perks(perks: List[str]) -> str:
119
+ """Format perks as inline list."""
120
+ if not perks:
121
+ return ""
122
+
123
+ perks_text = " • ".join(perks[:8]) # Limit to avoid overflow
124
+
125
+ return f"""
126
+ <div class='mb-4'>
127
+ <h3 class='text-sm font-semibold text-green-700 mb-2'>Perks</h3>
128
+ <p class='text-gray-700 text-sm'>{perks_text}</p>
129
+ </div>
130
+ """
131
+
132
+ def format_red_flags(red_flags: List[str]) -> str:
133
+ """Format red flags (red, only if any)."""
134
+ if not red_flags:
135
+ return ""
136
+
137
+ bullet_html = bullets(red_flags, "text-red-700")
138
+
139
+ return f"""
140
+ <div class='mb-4 bg-red-50 border border-red-200 rounded-lg p-3'>
141
+ <h3 class='text-sm font-semibold text-red-800 mb-2'>🚩 Red Flags</h3>
142
+ {bullet_html}
143
+ </div>
144
+ """
145
+
146
+ def format_technical_questions(technical_questions: List[str]) -> str:
147
+ """Format likely technical interview questions."""
148
+ if not technical_questions:
149
+ return ""
150
+
151
+ bullet_html = bullets(technical_questions[:6], "text-red-700")
152
+
153
+ return f"""
154
+ <div class='mb-4 bg-red-50 border border-red-200 rounded-lg p-3'>
155
+ <h3 class='text-sm font-semibold text-red-800 mb-2'>🔧 Technical Questions</h3>
156
+ <p class='text-xs text-red-600 mb-2'>Likely technical questions they'll ask:</p>
157
+ {bullet_html}
158
+ </div>
159
+ """
160
+
161
+ def format_behavioral_questions(behavioral_questions: List[str]) -> str:
162
+ """Format likely behavioral interview questions."""
163
+ if not behavioral_questions:
164
+ return ""
165
+
166
+ bullet_html = bullets(behavioral_questions[:6], "text-purple-700")
167
+
168
+ return f"""
169
+ <div class='mb-4 bg-purple-50 border border-purple-200 rounded-lg p-3'>
170
+ <h3 class='text-sm font-semibold text-purple-800 mb-2'>💬 Behavioral Questions</h3>
171
+ <p class='text-xs text-purple-600 mb-2'>Behavioral questions to prepare for:</p>
172
+ {bullet_html}
173
+ </div>
174
+ """
175
+
176
+ def format_talking_points(talking_points: List[str]) -> str:
177
+ """Format key talking points to emphasize."""
178
+ if not talking_points:
179
+ return ""
180
+
181
+ bullet_html = bullets(talking_points[:6], "text-indigo-700")
182
+
183
+ return f"""
184
+ <div class='mb-4 bg-indigo-50 border border-indigo-200 rounded-lg p-3'>
185
+ <h3 class='text-sm font-semibold text-indigo-800 mb-2'>🎯 Talking Points</h3>
186
+ <p class='text-xs text-indigo-600 mb-2'>Highlight these experiences/achievements:</p>
187
+ {bullet_html}
188
+ </div>
189
+ """
190
+
191
+ def format_company_intel(company_intel: List[str]) -> str:
192
+ """Format key company intelligence for interview research."""
193
+ if not company_intel:
194
+ return ""
195
+
196
+ bullet_html = bullets(company_intel[:3], "text-blue-700")
197
+
198
+ return f"""
199
+ <div class='mb-4 bg-blue-50 border border-blue-200 rounded-lg p-3'>
200
+ <h3 class='text-sm font-semibold text-blue-800 mb-2'>🏢 Company Intel</h3>
201
+ <p class='text-xs text-blue-600 mb-2'>Key facts to mention:</p>
202
+ {bullet_html}
203
+ </div>
204
+ """
205
+
206
+ def format_smart_questions(smart_questions: List[str]) -> str:
207
+ """Format smart questions for the applicant to ask."""
208
+ if not smart_questions:
209
+ return ""
210
+
211
+ bullet_html = bullets(smart_questions[:5], "text-green-700")
212
+
213
+ return f"""
214
+ <div class='mb-4 bg-green-50 border border-green-200 rounded-lg p-3'>
215
+ <h3 class='text-sm font-semibold text-green-800 mb-2'>❓ Smart Questions</h3>
216
+ <p class='text-xs text-green-600 mb-2'>Ask these to show strategic thinking:</p>
217
+ {bullet_html}
218
+ </div>
219
+ """
220
+
221
+ def format_role_challenges(role_challenges: List[str]) -> str:
222
+ """Format main challenges this role will solve."""
223
+ if not role_challenges:
224
+ return ""
225
+
226
+ bullet_html = bullets(role_challenges[:5], "text-orange-700")
227
+
228
+ return f"""
229
+ <div class='mb-4 bg-orange-50 border border-orange-200 rounded-lg p-3'>
230
+ <h3 class='text-sm font-semibold text-orange-800 mb-2'>⚡ Role Challenges</h3>
231
+ <p class='text-xs text-orange-600 mb-2'>Key problems you'll solve:</p>
232
+ {bullet_html}
233
+ </div>
234
+ """
235
+
236
+ def format_success_metrics(success_metrics: List[str]) -> str:
237
+ """Format how success is measured in this role."""
238
+ if not success_metrics:
239
+ return ""
240
+
241
+ bullet_html = bullets(success_metrics[:5], "text-teal-700")
242
+
243
+ return f"""
244
+ <div class='mb-4 bg-teal-50 border border-teal-200 rounded-lg p-3'>
245
+ <h3 class='text-sm font-semibold text-teal-800 mb-2'>📊 Success Metrics</h3>
246
+ <p class='text-xs text-teal-600 mb-2'>How success is measured:</p>
247
+ {bullet_html}
248
+ </div>
249
+ """
250
+
251
+ def format_salary_context(salary_context: str) -> str:
252
+ """Format salary negotiation context."""
253
+ if not salary_context:
254
+ return ""
255
+
256
+ return f"""
257
+ <div class='mb-4 bg-yellow-50 border border-yellow-200 rounded-lg p-3'>
258
+ <h3 class='text-sm font-semibold text-yellow-800 mb-2'>💰 Salary Context</h3>
259
+ <p class='text-yellow-700 text-sm'>{salary_context}</p>
260
+ </div>
261
+ """
262
+
263
+ def format_next_actions(apply_link: str = "") -> str:
264
+ """Format next actions with apply and copy buttons."""
265
+ apply_button = ""
266
+ if apply_link:
267
+ apply_button = f"""
268
+ <a href="{apply_link}" target="_blank"
269
+ class='inline-flex items-center px-3 py-1.5 text-sm bg-blue-600 text-white rounded hover:bg-blue-700 transition-colors mr-2'>
270
+ ▶ Apply
271
+ </a>
272
+ """
273
+
274
+ return f"""
275
+ <div class='pt-4 border-t border-gray-200'>
276
+ <div class='flex items-center gap-2'>
277
+ {apply_button}
278
+ <button onclick="copySummary()"
279
+ class='inline-flex items-center px-3 py-1.5 text-sm bg-gray-100 text-gray-700 rounded hover:bg-gray-200 transition-colors'>
280
+ 📋 Copy summary
281
+ </button>
282
+ </div>
283
+ </div>
284
+ """
285
+
286
+ def create_copy_script() -> str:
287
+ """Create JavaScript for copy functionality."""
288
+ return """
289
+ <script>
290
+ function copySummary(){
291
+ navigator.clipboard.writeText(document.getElementById("iq-summary").innerText);
292
+ }
293
+ </script>
294
+ """
295
+
296
+ def create_summary_text(data: Dict[str, Any]) -> str:
297
+ """Create plain text summary for copying."""
298
+ lines = []
299
+
300
+ # Title line
301
+ lines.append(format_title_line(data))
302
+
303
+ # Mission
304
+ mission = data.get("mission", "")
305
+ if mission:
306
+ lines.append(f"Mission: {mission}")
307
+
308
+ # Must-have
309
+ must_have = data.get("must_have", [])
310
+ if must_have:
311
+ lines.append("Must-Have Stack:")
312
+ for item in must_have[:6]:
313
+ lines.append(f" • {item}")
314
+
315
+ # Nice-to-have
316
+ nice_to_have = data.get("nice_to_have", [])
317
+ if nice_to_have:
318
+ lines.append("Nice-to-Haves:")
319
+ for item in nice_to_have[:6]:
320
+ lines.append(f" • {item}")
321
+
322
+ # Why it matters
323
+ why_it_matters = data.get("why_it_matters", "")
324
+ if why_it_matters:
325
+ lines.append(f"Why It Matters: {why_it_matters}")
326
+
327
+ # Technical questions
328
+ technical_questions = data.get("technical_questions", [])
329
+ if technical_questions:
330
+ lines.append("Technical Questions:")
331
+ for item in technical_questions[:6]:
332
+ lines.append(f" • {item}")
333
+
334
+ # Behavioral questions
335
+ behavioral_questions = data.get("behavioral_questions", [])
336
+ if behavioral_questions:
337
+ lines.append("Behavioral Questions:")
338
+ for item in behavioral_questions[:6]:
339
+ lines.append(f" • {item}")
340
+
341
+ # Talking points
342
+ talking_points = data.get("talking_points", [])
343
+ if talking_points:
344
+ lines.append("Talking Points:")
345
+ for item in talking_points[:6]:
346
+ lines.append(f" • {item}")
347
+
348
+ # Company intel
349
+ company_intel = data.get("company_intel", [])
350
+ if company_intel:
351
+ lines.append("Company Intel:")
352
+ for item in company_intel[:3]:
353
+ lines.append(f" • {item}")
354
+
355
+ # Smart questions
356
+ smart_questions = data.get("smart_questions", [])
357
+ if smart_questions:
358
+ lines.append("Smart Questions:")
359
+ for item in smart_questions[:5]:
360
+ lines.append(f" • {item}")
361
+
362
+ # Role challenges
363
+ role_challenges = data.get("role_challenges", [])
364
+ if role_challenges:
365
+ lines.append("Role Challenges:")
366
+ for item in role_challenges[:5]:
367
+ lines.append(f" • {item}")
368
+
369
+ # Success metrics
370
+ success_metrics = data.get("success_metrics", [])
371
+ if success_metrics:
372
+ lines.append("Success Metrics:")
373
+ for item in success_metrics[:5]:
374
+ lines.append(f" • {item}")
375
+
376
+ # Salary context
377
+ salary_context = data.get("salary_context", "")
378
+ if salary_context:
379
+ lines.append(f"Salary Context: {salary_context}")
380
+
381
+ # Perks
382
+ perks = data.get("perks", [])
383
+ if perks:
384
+ lines.append(f"Perks: {' • '.join(perks)}")
385
+
386
+ return "\n\n".join(lines)
387
+
388
+ def to_html(data: Dict[str, Any]) -> str:
389
+ """Build the complete No-BS job brief card."""
390
+
391
+ # Handle data normalization
392
+ if isinstance(data.get("must_have"), str):
393
+ data["must_have"] = [data["must_have"]]
394
+ if isinstance(data.get("nice_to_have"), str):
395
+ data["nice_to_have"] = [data["nice_to_have"]]
396
+ if isinstance(data.get("perks"), str):
397
+ data["perks"] = [data["perks"]]
398
+ if isinstance(data.get("red_flags"), str):
399
+ data["red_flags"] = [data["red_flags"]]
400
+ if isinstance(data.get("technical_questions"), str):
401
+ data["technical_questions"] = [data["technical_questions"]]
402
+ if isinstance(data.get("behavioral_questions"), str):
403
+ data["behavioral_questions"] = [data["behavioral_questions"]]
404
+ if isinstance(data.get("talking_points"), str):
405
+ data["talking_points"] = [data["talking_points"]]
406
+ if isinstance(data.get("company_intel"), str):
407
+ data["company_intel"] = [data["company_intel"]]
408
+ if isinstance(data.get("smart_questions"), str):
409
+ data["smart_questions"] = [data["smart_questions"]]
410
+ if isinstance(data.get("role_challenges"), str):
411
+ data["role_challenges"] = [data["role_challenges"]]
412
+ if isinstance(data.get("success_metrics"), str):
413
+ data["success_metrics"] = [data["success_metrics"]]
414
+
415
+ # Build sections
416
+ title_line = format_title_line(data)
417
+ mission_section = format_mission(data.get("mission", ""))
418
+ must_have_section = format_must_have(data.get("must_have", []))
419
+ nice_to_have_section = format_nice_to_have(data.get("nice_to_have", []))
420
+ why_it_matters_section = format_why_it_matters(data.get("why_it_matters", ""))
421
+ perks_section = format_perks(data.get("perks", []))
422
+ red_flags_section = format_red_flags(data.get("red_flags", []))
423
+
424
+ # Interview Query-style sections
425
+ technical_questions_section = format_technical_questions(data.get("technical_questions", []))
426
+ behavioral_questions_section = format_behavioral_questions(data.get("behavioral_questions", []))
427
+ talking_points_section = format_talking_points(data.get("talking_points", []))
428
+ company_intel_section = format_company_intel(data.get("company_intel", []))
429
+ smart_questions_section = format_smart_questions(data.get("smart_questions", []))
430
+ role_challenges_section = format_role_challenges(data.get("role_challenges", []))
431
+ success_metrics_section = format_success_metrics(data.get("success_metrics", []))
432
+ salary_context_section = format_salary_context(data.get("salary_context", ""))
433
+
434
+ next_actions_section = format_next_actions(data.get("apply_link", ""))
435
+
436
+ # Create plain text summary for copying
437
+ summary_text = create_summary_text(data)
438
+
439
+ # Build complete HTML
440
+ html = f"""
441
+ <div class='max-w-2xl mx-auto'>
442
+ <div class='bg-white border border-gray-200 rounded-lg p-6 shadow-sm'>
443
+ <h1 class='text-lg font-bold text-gray-900 mb-4'>{title_line}</h1>
444
+
445
+ {hide_if_empty(mission_section)}
446
+ {hide_if_empty(must_have_section)}
447
+ {hide_if_empty(nice_to_have_section)}
448
+ {hide_if_empty(why_it_matters_section)}
449
+ {hide_if_empty(perks_section)}
450
+ {hide_if_empty(red_flags_section)}
451
+
452
+ {hide_if_empty(technical_questions_section)}
453
+ {hide_if_empty(behavioral_questions_section)}
454
+ {hide_if_empty(talking_points_section)}
455
+ {hide_if_empty(company_intel_section)}
456
+ {hide_if_empty(smart_questions_section)}
457
+ {hide_if_empty(role_challenges_section)}
458
+ {hide_if_empty(success_metrics_section)}
459
+ {hide_if_empty(salary_context_section)}
460
+
461
+ {next_actions_section}
462
+ </div>
463
+
464
+ <pre id="iq-summary" style="display: none;">{summary_text}</pre>
465
+ </div>
466
+
467
+ {create_copy_script()}
468
+ """
469
+
470
+ return html
requirements.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio==4.43.0
2
+ flask>=2.0.0
3
+ requests
4
+ beautifulsoup4
5
+ typer
6
+ pyyaml
7
+ pdfplumber
8
+ pydantic
9
+ openai>=1.0.0
10
+ anthropic>=0.8.0
11
+ python-dotenv==1.0.0
12
+ google-search-results
13
+ tiktoken>=0.7.0
14
+ PyJWT>=2.8.0
15
+ python-multipart>=0.0.6
16
+ diskcache>=5.6.0
17
+ PyPDF2>=3.0.0
18
+ python-docx>=1.1.0
19
+ google-auth>=2.0.0
20
+ google-auth-oauthlib>=0.5.0
21
+ google-auth-httplib2>=0.1.0
22
+ authlib>=1.2.0
23
+ selenium>=4.0.0
24
+ webdriver-manager>=3.8.0
salary_negotiation_simulator.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Salary Negotiation Simulator for IQKiller
3
+ """
4
+ import random
5
+ import time
6
+ from typing import Dict, List, Any
7
+
8
+ class SalaryNegotiationSimulator:
9
+ def __init__(self, user_role="Software Engineer", base_salary=75000):
10
+ self.user_role = user_role
11
+ self.base_salary = base_salary
12
+ self.total_score = 0
13
+
14
+ def get_scenarios(self):
15
+ return [
16
+ {
17
+ "title": "🎯 First Offer Challenge",
18
+ "context": f"Hiring manager offers ${self.base_salary - 10000:,}",
19
+ "choices": [
20
+ {"id": "accept", "text": "I accept!", "points": -10},
21
+ {"id": "counter", "text": "Market rate is higher", "points": 20},
22
+ {"id": "aggressive", "text": "Too low!", "points": -5}
23
+ ]
24
+ }
25
+ ]
simple_app.py ADDED
@@ -0,0 +1,273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Simple Flask-based Job Posting Analysis App
4
+ """
5
+
6
+ import hashlib
7
+ import os
8
+ from typing import Dict, Optional, Tuple
9
+ from urllib.parse import urlparse
10
+ import time
11
+ import requests
12
+ from bs4 import BeautifulSoup
13
+ from flask import Flask, render_template_string, request, jsonify
14
+ import pdfplumber
15
+ import re
16
+ import gradio as gr
17
+ import asyncio
18
+
19
+ app = Flask(__name__)
20
+
21
+
22
+ class JobPostingAnalyzer:
23
+ """Simplified analyzer without caching."""
24
+
25
+ def __init__(self):
26
+ """Initialize the analyzer."""
27
+ pass
28
+
29
+ def _is_valid_url(self, url: str) -> bool:
30
+ """Validate if URL is properly formatted."""
31
+ try:
32
+ result = urlparse(url)
33
+ return all([result.scheme, result.netloc])
34
+ except Exception:
35
+ return False
36
+
37
+ def _is_pdf_file(self, path: str) -> bool:
38
+ """Check if the path is a PDF file."""
39
+ return path.lower().endswith('.pdf') or os.path.exists(path)
40
+
41
+ def scrape_pdf_content(self, pdf_path: str) -> Optional[str]:
42
+ """Scrape content from PDF file."""
43
+ try:
44
+ with pdfplumber.open(pdf_path) as pdf:
45
+ text = ""
46
+ for page in pdf.pages:
47
+ page_text = page.extract_text()
48
+ if page_text:
49
+ text += page_text + "\n"
50
+ return text
51
+ except Exception as e:
52
+ print(f"Error reading PDF: {e}")
53
+ return None
54
+
55
+ def scrape_job_posting(self, url: str) -> Optional[str]:
56
+ """Scrape job posting content from URL or PDF file."""
57
+ # Check if it's a PDF file
58
+ if self._is_pdf_file(url):
59
+ return self.scrape_pdf_content(url)
60
+
61
+ # Otherwise treat as URL
62
+ try:
63
+ headers = {
64
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
65
+ }
66
+ response = requests.get(url, headers=headers, timeout=10)
67
+ response.raise_for_status()
68
+
69
+ soup = BeautifulSoup(response.content, "html.parser")
70
+
71
+ # Remove script and style elements
72
+ for script in soup(["script", "style"]):
73
+ script.decompose()
74
+
75
+ # Extract text content
76
+ text = soup.get_text()
77
+ lines = (line.strip() for line in text.splitlines())
78
+ chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
79
+ text = " ".join(chunk for chunk in chunks if chunk)
80
+
81
+ return text
82
+ except Exception as e:
83
+ print(f"Error scraping URL: {e}")
84
+ return None
85
+
86
+ def enrich_job_data(self, scraped_text: str) -> Dict[str, str]:
87
+ """Extract and enrich job posting data."""
88
+ lines = scraped_text.split('\n')
89
+ job_data = {
90
+ "title": "",
91
+ "company": "",
92
+ "location": "",
93
+ "level": "",
94
+ "requirements": "",
95
+ "responsibilities": ""
96
+ }
97
+
98
+ # Enhanced extraction logic for JRD content
99
+ for i, line in enumerate(lines):
100
+ line_lower = line.lower()
101
+
102
+ # Look for project title
103
+ if "project:" in line_lower and not job_data["title"]:
104
+ job_data["title"] = line.strip()
105
+ elif "joint requirements document" in line_lower and not job_data["title"]:
106
+ job_data["title"] = "Joint Requirements Document (JRD)"
107
+
108
+ # Look for company info
109
+ if "microsoft" in line_lower and not job_data["company"]:
110
+ job_data["company"] = "Microsoft"
111
+
112
+ # Look for level/position info
113
+ if any(level in line_lower for level in ["senior", "lead", "principal", "staff"]):
114
+ job_data["level"] = line.strip()
115
+
116
+ # Look for requirements
117
+ if "requirements" in line_lower or "functional" in line_lower:
118
+ # Get next few lines as requirements
119
+ req_lines = []
120
+ for j in range(i, min(i + 10, len(lines))):
121
+ if lines[j].strip():
122
+ req_lines.append(lines[j].strip())
123
+ job_data["requirements"] = " ".join(req_lines)
124
+ break
125
+
126
+ return job_data
127
+
128
+ def generate_preview(self, job_data: Dict[str, str]) -> str:
129
+ """Generate markdown preview from job data."""
130
+ preview = "### Role Snapshot\n"
131
+
132
+ if job_data["title"]:
133
+ preview += f"- **Title:** {job_data['title']}\n"
134
+ if job_data["level"]:
135
+ preview += f"- **Level:** {job_data['level']}\n"
136
+ if job_data["company"]:
137
+ preview += f"- **Company:** {job_data['company']}\n"
138
+ if job_data["location"]:
139
+ preview += f"- **Location:** {job_data['location']}\n"
140
+ if job_data["requirements"]:
141
+ preview += f"- **Requirements:** {job_data['requirements'][:200]}...\n"
142
+
143
+ preview += "\n---\n"
144
+ return preview
145
+
146
+ def analyze_job_posting(self, url: str) -> Tuple[bool, str]:
147
+ """Main analysis function with caching."""
148
+ # Check if it's a PDF file first
149
+ if self._is_pdf_file(url):
150
+ # For PDFs, don't validate URL format
151
+ pass
152
+ elif not self._is_valid_url(url):
153
+ return False, "Invalid URL format. Please provide a valid job posting URL or PDF file path."
154
+
155
+ # Scrape the content (URL or PDF)
156
+ scraped_text = self.scrape_job_posting(url)
157
+ if not scraped_text:
158
+ return False, "Failed to scrape content. Please check the file path or URL."
159
+
160
+ # Enrich the data
161
+ job_data = self.enrich_job_data(scraped_text)
162
+
163
+ # Generate preview
164
+ preview = self.generate_preview(job_data)
165
+
166
+ return True, preview
167
+
168
+
169
+ # Initialize analyzer
170
+ analyzer = JobPostingAnalyzer()
171
+
172
+ # HTML template
173
+ HTML_TEMPLATE = """
174
+ <!DOCTYPE html>
175
+ <html>
176
+ <head>
177
+ <title>Job Posting Analyzer</title>
178
+ <style>
179
+ body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
180
+ .container { background: #f5f5f5; padding: 20px; border-radius: 8px; }
181
+ input[type="text"] { width: 100%; padding: 10px; margin: 10px 0; border: 1px solid #ddd; border-radius: 4px; }
182
+ button { background: #007bff; color: white; padding: 10px 20px; border: none; border-radius: 4px; cursor: pointer; }
183
+ button:hover { background: #0056b3; }
184
+ .result { margin-top: 20px; padding: 15px; border-radius: 4px; }
185
+ .success { background: #d4edda; border: 1px solid #c3e6cb; }
186
+ .error { background: #f8d7da; border: 1px solid #f5c6cb; }
187
+ .preview { background: white; padding: 15px; border-radius: 4px; margin-top: 10px; }
188
+ .info { background: #d1ecf1; border: 1px solid #bee5eb; padding: 10px; border-radius: 4px; margin-bottom: 15px; }
189
+ </style>
190
+ </head>
191
+ <body>
192
+ <div class="container">
193
+ <h1>🎯 Job Posting Analyzer</h1>
194
+ <p>Paste a job posting URL or PDF file path to analyze and generate interview preparation materials.</p>
195
+
196
+ <div class="info">
197
+ <strong>Supported inputs:</strong><br>
198
+ • URLs: https://example.com/job-posting<br>
199
+ • PDF files: JRD_v1.1.pdf (local files)
200
+ </div>
201
+
202
+ <form method="POST">
203
+ <input type="text" name="url" placeholder="https://example.com/job-posting or JRD_v1.1.pdf" value="{{ url or '' }}" required>
204
+ <button type="submit">🔍 Analyze Job Posting</button>
205
+ </form>
206
+
207
+ {% if result %}
208
+ <div class="result {% if success %}success{% else %}error{% endif %}">
209
+ <strong>{{ status }}</strong>
210
+ {% if success and preview %}
211
+ <div class="preview">
212
+ <h3>Preview:</h3>
213
+ <pre>{{ preview }}</pre>
214
+ </div>
215
+ {% endif %}
216
+ </div>
217
+ {% endif %}
218
+ </div>
219
+ </body>
220
+ </html>
221
+ """
222
+
223
+
224
+ @app.route('/', methods=['GET', 'POST'])
225
+ def index():
226
+ """Main page with form and results."""
227
+ url = ""
228
+ result = ""
229
+ success = False
230
+ status = ""
231
+ preview = ""
232
+
233
+ if request.method == 'POST':
234
+ url = request.form.get('url', '').strip()
235
+ if url:
236
+ success, result = analyzer.analyze_job_posting(url)
237
+ if success:
238
+ status = "✅ Analysis complete! Preview generated."
239
+ preview = result
240
+ else:
241
+ status = f"❌ Error: {result}"
242
+
243
+ return render_template_string(HTML_TEMPLATE,
244
+ url=url,
245
+ result=result,
246
+ success=success,
247
+ status=status,
248
+ preview=preview)
249
+
250
+
251
+ @app.route('/api/analyze', methods=['POST'])
252
+ def api_analyze():
253
+ """API endpoint for job posting analysis."""
254
+ data = request.get_json()
255
+ url = data.get('url', '').strip()
256
+
257
+ if not url:
258
+ return jsonify({'success': False, 'error': 'URL is required'})
259
+
260
+ success, result = analyzer.analyze_job_posting(url)
261
+
262
+ return jsonify({
263
+ 'success': success,
264
+ 'result': result if success else None,
265
+ 'error': result if not success else None
266
+ })
267
+
268
+
269
+ if __name__ == '__main__':
270
+ print("🚀 Starting Job Posting Analyzer...")
271
+ print("📱 Web interface available at: http://localhost:5000")
272
+ print("🔌 API endpoint available at: http://localhost:5000/api/analyze")
273
+ app.run(debug=True, host='0.0.0.0', port=5000)
test_app.py ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Simple test script for job posting analysis functionality.
4
+ """
5
+
6
+ import time
7
+ from typing import Dict, Any, Optional, Tuple
8
+ import requests
9
+ from bs4 import BeautifulSoup
10
+ import re
11
+ from urllib.parse import urlparse, urljoin
12
+ import gradio as gr
13
+ import asyncio
14
+ import pytest
15
+
16
+
17
+ class JobPostingAnalyzer:
18
+ """Simplified analyzer for testing without caching."""
19
+
20
+ def __init__(self):
21
+ """Initialize the analyzer."""
22
+ pass
23
+
24
+ def _is_valid_url(self, url: str) -> bool:
25
+ """Validate if URL is properly formatted."""
26
+ try:
27
+ result = urlparse(url)
28
+ return all([result.scheme, result.netloc])
29
+ except Exception:
30
+ return False
31
+
32
+ def scrape_job_posting(self, url: str) -> Optional[str]:
33
+ """Scrape job posting content from URL."""
34
+ try:
35
+ headers = {
36
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
37
+ }
38
+ response = requests.get(url, headers=headers, timeout=10)
39
+ response.raise_for_status()
40
+
41
+ soup = BeautifulSoup(response.content, "html.parser")
42
+
43
+ # Remove script and style elements
44
+ for script in soup(["script", "style"]):
45
+ script.decompose()
46
+
47
+ # Extract text content
48
+ text = soup.get_text()
49
+ lines = (line.strip() for line in text.splitlines())
50
+ chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
51
+ text = " ".join(chunk for chunk in chunks if chunk)
52
+
53
+ return text
54
+ except Exception as e:
55
+ print(f"Error scraping URL: {e}")
56
+ return None
57
+
58
+ def enrich_job_data(self, scraped_text: str) -> Dict[str, str]:
59
+ """Extract and enrich job posting data."""
60
+ lines = scraped_text.split('\n')
61
+ job_data = {
62
+ "title": "",
63
+ "company": "",
64
+ "location": "",
65
+ "level": "",
66
+ "requirements": "",
67
+ "responsibilities": ""
68
+ }
69
+
70
+ # Simple extraction logic
71
+ for i, line in enumerate(lines):
72
+ line_lower = line.lower()
73
+ if "senior" in line_lower or "lead" in line_lower:
74
+ job_data["level"] = line.strip()
75
+ elif "engineer" in line_lower or "developer" in line_lower:
76
+ if not job_data["title"]:
77
+ job_data["title"] = line.strip()
78
+
79
+ return job_data
80
+
81
+ def generate_preview(self, job_data: Dict[str, str]) -> str:
82
+ """Generate markdown preview from job data."""
83
+ preview = "### Role Snapshot\n"
84
+
85
+ if job_data["title"]:
86
+ preview += f"- **Title:** {job_data['title']}\n"
87
+ if job_data["level"]:
88
+ preview += f"- **Level:** {job_data['level']}\n"
89
+ if job_data["company"]:
90
+ preview += f"- **Company:** {job_data['company']}\n"
91
+ if job_data["location"]:
92
+ preview += f"- **Location:** {job_data['location']}\n"
93
+
94
+ preview += "\n---\n"
95
+ return preview
96
+
97
+ def analyze_job_posting(self, url: str) -> Tuple[bool, str]:
98
+ """Main analysis function with caching."""
99
+ if not self._is_valid_url(url):
100
+ return False, "Invalid URL format. Please provide a valid job posting URL."
101
+
102
+ # Scrape the job posting
103
+ scraped_text = self.scrape_job_posting(url)
104
+ if not scraped_text:
105
+ return False, "Failed to scrape job posting. Please check the URL and try again."
106
+
107
+ # Enrich the data
108
+ job_data = self.enrich_job_data(scraped_text)
109
+
110
+ # Generate preview
111
+ preview = self.generate_preview(job_data)
112
+
113
+ return True, preview
114
+
115
+
116
+ def main():
117
+ """Test the job posting analyzer."""
118
+ analyzer = JobPostingAnalyzer()
119
+
120
+ # Test with a sample job posting URL
121
+ test_url = "https://jobs.lever.co/example/senior-data-engineer"
122
+
123
+ print("Testing Job Posting Analyzer...")
124
+ print(f"URL: {test_url}")
125
+
126
+ success, result = analyzer.analyze_job_posting(test_url)
127
+
128
+ if success:
129
+ print("✅ Analysis successful!")
130
+ print("\nPreview:")
131
+ print(result)
132
+ else:
133
+ print(f"❌ Analysis failed: {result}")
134
+
135
+ print("\nTest completed!")
136
+
137
+
138
+ if __name__ == "__main__":
139
+ main()
test_interview_guide.py ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for the personalized interview guide functionality.
4
+ Demonstrates the complete pipeline: Resume + Job → Gap Analysis → Personalized Guide
5
+ """
6
+
7
+ import asyncio
8
+ import json
9
+ from interview_orchestrator import create_personalized_interview_guide
10
+
11
+ # Sample resume data
12
+ SAMPLE_RESUME = """
13
+ John Smith
14
+ Software Engineer
15
+ john.smith@email.com | (555) 123-4567 | LinkedIn: linkedin.com/in/johnsmith | GitHub: github.com/johnsmith
16
+
17
+ PROFESSIONAL SUMMARY
18
+ Experienced full-stack software engineer with 3+ years developing web applications using Python, JavaScript, and cloud technologies. Strong background in API development, database design, and agile methodologies.
19
+
20
+ TECHNICAL SKILLS
21
+ Programming Languages: Python, JavaScript, SQL, HTML/CSS
22
+ Frameworks: Django, React, Node.js, Express
23
+ Databases: PostgreSQL, MongoDB, Redis
24
+ Cloud/DevOps: AWS (EC2, S3, RDS), Docker, Git, CI/CD
25
+ Tools: VS Code, Postman, Jira, Slack
26
+
27
+ PROFESSIONAL EXPERIENCE
28
+
29
+ Software Engineer | TechCorp | Jan 2021 - Present
30
+ • Developed and maintained 5+ web applications serving 10,000+ users daily
31
+ • Built RESTful APIs using Python/Django with 99.9% uptime
32
+ • Implemented responsive front-end components using React and modern JavaScript
33
+ • Collaborated with cross-functional teams in Agile/Scrum environment
34
+ • Reduced database query time by 40% through optimization and indexing
35
+
36
+ Junior Developer | StartupXYZ | Jun 2020 - Dec 2020
37
+ • Created feature-rich web application using MERN stack
38
+ • Integrated third-party APIs and payment processing systems
39
+ • Participated in code reviews and maintained coding standards
40
+ • Deployed applications to AWS cloud infrastructure
41
+
42
+ PROJECTS
43
+
44
+ E-commerce Platform (2023)
45
+ • Built full-stack e-commerce solution with Django backend and React frontend
46
+ • Implemented user authentication, shopping cart, and payment integration
47
+ • Technologies: Python, Django, React, PostgreSQL, Stripe API
48
+
49
+ Task Management App (2022)
50
+ • Developed collaborative task management application
51
+ • Features include real-time updates, file uploads, and team collaboration
52
+ • Technologies: Node.js, Express, MongoDB, Socket.io
53
+
54
+ EDUCATION
55
+ Bachelor of Science in Computer Science
56
+ State University | 2020
57
+ GPA: 3.7/4.0
58
+ Relevant Coursework: Data Structures, Algorithms, Database Systems, Software Engineering
59
+ """
60
+
61
+ # Sample job posting
62
+ SAMPLE_JOB = """
63
+ Senior Full Stack Engineer
64
+ DataFlow Inc.
65
+ San Francisco, CA | Remote
66
+
67
+ About DataFlow Inc.
68
+ We're a fast-growing fintech startup building next-generation data analytics tools for financial institutions. Our platform processes billions of transactions daily and helps banks make better decisions through AI-powered insights.
69
+
70
+ Role Overview
71
+ We're seeking a Senior Full Stack Engineer to join our engineering team and help scale our platform to handle growing demand. You'll work on both frontend and backend systems, collaborate with data scientists, and contribute to architectural decisions.
72
+
73
+ Key Responsibilities
74
+ • Design and implement scalable web applications using modern technologies
75
+ • Build robust APIs and microservices to support our data platform
76
+ • Collaborate with product and design teams to deliver exceptional user experiences
77
+ • Optimize application performance and ensure high availability
78
+ • Mentor junior developers and contribute to engineering best practices
79
+ • Work with data engineering team to build data visualization tools
80
+
81
+ Required Qualifications
82
+ • 4+ years of experience in full-stack web development
83
+ • Strong proficiency in Python and modern JavaScript frameworks
84
+ • Experience with cloud platforms (AWS, GCP, or Azure)
85
+ • Knowledge of relational databases and SQL optimization
86
+ • Familiarity with containerization (Docker) and CI/CD pipelines
87
+ • Experience with agile development methodologies
88
+ • Bachelor's degree in Computer Science or related field
89
+
90
+ Preferred Qualifications
91
+ • Experience with financial/fintech applications
92
+ • Knowledge of data visualization libraries (D3.js, Chart.js)
93
+ • Familiarity with machine learning concepts
94
+ • Experience with Kubernetes and microservices architecture
95
+ • Previous experience at a startup or high-growth company
96
+
97
+ Technical Stack
98
+ • Backend: Python, Django/Flask, PostgreSQL, Redis
99
+ • Frontend: React, TypeScript, Next.js
100
+ • Infrastructure: AWS, Docker, Kubernetes
101
+ • Data: Apache Airflow, Spark, Snowflake
102
+
103
+ Compensation & Benefits
104
+ • Competitive salary: $140,000 - $180,000
105
+ • Equity package
106
+ • Comprehensive health, dental, and vision insurance
107
+ • Flexible PTO policy
108
+ • $2,000 annual learning and development budget
109
+ • Remote-first culture with optional office access
110
+
111
+ Why Join DataFlow?
112
+ • Work on cutting-edge fintech technology
113
+ • High-impact role in a fast-growing company
114
+ • Collaborative and learning-focused culture
115
+ • Opportunity to shape product direction
116
+ • Competitive compensation and equity upside
117
+ """
118
+
119
+ async def test_interview_guide_generation():
120
+ """Test the complete interview guide generation pipeline"""
121
+
122
+ print("🚀 Testing Personalized Interview Guide Generation")
123
+ print("=" * 60)
124
+
125
+ print("\n📝 Resume Summary:")
126
+ print(f"- Length: {len(SAMPLE_RESUME)} characters")
127
+ print("- Skills: Python, JavaScript, React, Django, AWS")
128
+ print("- Experience: 3+ years full-stack development")
129
+
130
+ print("\n🎯 Job Summary:")
131
+ print("- Role: Senior Full Stack Engineer at DataFlow Inc.")
132
+ print("- Requirements: 4+ years, Python, JavaScript, Cloud, Fintech")
133
+ print("- Salary: $140k-$180k")
134
+
135
+ print("\n⚡ Generating Interview Guide...")
136
+ print("-" * 40)
137
+
138
+ # Generate the guide
139
+ result = create_personalized_interview_guide(SAMPLE_RESUME, SAMPLE_JOB)
140
+
141
+ if result.get("success"):
142
+ print("✅ Guide generation successful!")
143
+
144
+ # Display metrics
145
+ gap_analysis = result.get("gap_analysis", {})
146
+ match_score = gap_analysis.get("match_score", 0)
147
+ processing_time = result.get("processing_time", 0)
148
+ guide_length = len(result.get("rendered_guide", ""))
149
+
150
+ print(f"\n📊 Results:")
151
+ print(f"- Match Score: {match_score}%")
152
+ print(f"- Processing Time: {processing_time:.2f} seconds")
153
+ print(f"- Guide Length: {guide_length} characters")
154
+
155
+ # Display gap analysis summary
156
+ summary = gap_analysis.get("summary", "")
157
+ if summary:
158
+ print(f"\n🎯 Gap Analysis: {summary}")
159
+
160
+ # Display skills breakdown
161
+ skills_map = gap_analysis.get("skills_map", {})
162
+ if skills_map:
163
+ print(f"\n💪 Strengths: {skills_map.get('strong', [])[:3]}")
164
+ print(f"📚 Areas to Study: {skills_map.get('gaps', [])[:3]}")
165
+
166
+ # Show first part of rendered guide
167
+ rendered_guide = result.get("rendered_guide", "")
168
+ if rendered_guide:
169
+ print(f"\n📄 Generated Guide Preview:")
170
+ print("-" * 40)
171
+ preview = rendered_guide[:500] + "..." if len(rendered_guide) > 500 else rendered_guide
172
+ print(preview)
173
+ print("-" * 40)
174
+
175
+ # Save full guide to file
176
+ with open("sample_interview_guide.md", "w") as f:
177
+ f.write(rendered_guide)
178
+ print(f"\n💾 Full guide saved to: sample_interview_guide.md")
179
+
180
+ else:
181
+ print("❌ Guide generation failed!")
182
+ error_msg = result.get("error", "Unknown error")
183
+ print(f"Error: {error_msg}")
184
+
185
+ # Show debug info if available
186
+ if "data" in result:
187
+ print("\n🔍 Debug Information:")
188
+ print(json.dumps(result["data"], indent=2))
189
+
190
+ def test_validation():
191
+ """Test input validation"""
192
+
193
+ print("\n🧪 Testing Input Validation")
194
+ print("-" * 30)
195
+
196
+ # Test empty inputs
197
+ result1 = create_personalized_interview_guide("", SAMPLE_JOB)
198
+ print(f"Empty resume: {'✅ Caught' if not result1.get('success') else '❌ Missed'}")
199
+
200
+ result2 = create_personalized_interview_guide(SAMPLE_RESUME, "")
201
+ print(f"Empty job: {'✅ Caught' if not result2.get('success') else '❌ Missed'}")
202
+
203
+ # Test short inputs
204
+ result3 = create_personalized_interview_guide("Short resume", SAMPLE_JOB)
205
+ print(f"Short resume: {'✅ Caught' if not result3.get('success') else '❌ Missed'}")
206
+
207
+ result4 = create_personalized_interview_guide(SAMPLE_RESUME, "Short job")
208
+ print(f"Short job: {'✅ Caught' if not result4.get('success') else '❌ Missed'}")
209
+
210
+ async def main():
211
+ """Main test function"""
212
+
213
+ print("🎯 IQKiller Personalized Interview Guide Test Suite")
214
+ print("=" * 60)
215
+
216
+ # Run main test
217
+ await test_interview_guide_generation()
218
+
219
+ # Run validation tests
220
+ test_validation()
221
+
222
+ print("\n🎉 Test suite completed!")
223
+ print("\nTo view the full generated guide, open: sample_interview_guide.md")
224
+
225
+ if __name__ == "__main__":
226
+ asyncio.run(main())
test_jrd_pdf.py ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script to analyze the JRD PDF file using our job posting analyzer.
4
+ """
5
+
6
+ import hashlib
7
+ from typing import Dict, Optional, Tuple
8
+ from urllib.parse import urlparse
9
+ import time
10
+ import requests
11
+ from bs4 import BeautifulSoup
12
+ import PyPDF2
13
+ import pdfplumber
14
+ import re
15
+ from urllib.parse import urljoin
16
+ import gradio as gr
17
+ import asyncio
18
+
19
+
20
+ class JobPostingAnalyzer:
21
+ """Test analyzer without caching."""
22
+
23
+ def __init__(self):
24
+ """Initialize the analyzer."""
25
+ pass
26
+
27
+
28
+
29
+ def _is_valid_url(self, url: str) -> bool:
30
+ """Validate if URL is properly formatted."""
31
+ try:
32
+ result = urlparse(url)
33
+ return all([result.scheme, result.netloc])
34
+ except Exception:
35
+ return False
36
+
37
+ def scrape_pdf_content(self, pdf_path: str) -> Optional[str]:
38
+ """Scrape content from PDF file."""
39
+ try:
40
+ # Try pdfplumber first for better text extraction
41
+ with pdfplumber.open(pdf_path) as pdf:
42
+ text = ""
43
+ for page in pdf.pages:
44
+ page_text = page.extract_text()
45
+ if page_text:
46
+ text += page_text + "\n"
47
+ return text
48
+ except Exception as e:
49
+ print(f"Error reading PDF: {e}")
50
+ return None
51
+
52
+ def scrape_job_posting(self, url: str) -> Optional[str]:
53
+ """Scrape job posting content from URL or PDF file."""
54
+ # Check if it's a local file path
55
+ if url.startswith('/') or url.startswith('./') or url.endswith('.pdf'):
56
+ return self.scrape_pdf_content(url)
57
+
58
+ # Otherwise treat as URL
59
+ try:
60
+ headers = {
61
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
62
+ }
63
+ response = requests.get(url, headers=headers, timeout=10)
64
+ response.raise_for_status()
65
+
66
+ soup = BeautifulSoup(response.content, "html.parser")
67
+
68
+ # Remove script and style elements
69
+ for script in soup(["script", "style"]):
70
+ script.decompose()
71
+
72
+ # Extract text content
73
+ text = soup.get_text()
74
+ lines = (line.strip() for line in text.splitlines())
75
+ chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
76
+ text = " ".join(chunk for chunk in chunks if chunk)
77
+
78
+ return text
79
+ except Exception as e:
80
+ print(f"Error scraping URL: {e}")
81
+ return None
82
+
83
+ def enrich_job_data(self, scraped_text: str) -> Dict[str, str]:
84
+ """Extract and enrich job posting data."""
85
+ lines = scraped_text.split('\n')
86
+ job_data = {
87
+ "title": "",
88
+ "company": "",
89
+ "location": "",
90
+ "level": "",
91
+ "requirements": "",
92
+ "responsibilities": ""
93
+ }
94
+
95
+ # Enhanced extraction logic for JRD content
96
+ for i, line in enumerate(lines):
97
+ line_lower = line.lower()
98
+
99
+ # Look for project title
100
+ if "project:" in line_lower and not job_data["title"]:
101
+ job_data["title"] = line.strip()
102
+ elif "joint requirements document" in line_lower and not job_data["title"]:
103
+ job_data["title"] = "Joint Requirements Document (JRD)"
104
+
105
+ # Look for company info
106
+ if "microsoft" in line_lower and not job_data["company"]:
107
+ job_data["company"] = "Microsoft"
108
+
109
+ # Look for level/position info
110
+ if any(level in line_lower for level in ["senior", "lead", "principal", "staff"]):
111
+ job_data["level"] = line.strip()
112
+
113
+ # Look for requirements
114
+ if "requirements" in line_lower or "functional" in line_lower:
115
+ # Get next few lines as requirements
116
+ req_lines = []
117
+ for j in range(i, min(i + 10, len(lines))):
118
+ if lines[j].strip():
119
+ req_lines.append(lines[j].strip())
120
+ job_data["requirements"] = " ".join(req_lines)
121
+ break
122
+
123
+ return job_data
124
+
125
+ def generate_preview(self, job_data: Dict[str, str]) -> str:
126
+ """Generate markdown preview from job data."""
127
+ preview = "### Role Snapshot\n"
128
+
129
+ if job_data["title"]:
130
+ preview += f"- **Title:** {job_data['title']}\n"
131
+ if job_data["level"]:
132
+ preview += f"- **Level:** {job_data['level']}\n"
133
+ if job_data["company"]:
134
+ preview += f"- **Company:** {job_data['company']}\n"
135
+ if job_data["location"]:
136
+ preview += f"- **Location:** {job_data['location']}\n"
137
+ if job_data["requirements"]:
138
+ preview += f"- **Requirements:** {job_data['requirements'][:200]}...\n"
139
+
140
+ preview += "\n---\n"
141
+ return preview
142
+
143
+ def analyze_job_posting(self, url: str) -> Tuple[bool, str]:
144
+ """Main analysis function with caching."""
145
+ cache_key = self._get_cache_key(url)
146
+ cached_result = self.cache.get(cache_key)
147
+
148
+ if cached_result:
149
+ return True, cached_result
150
+
151
+ # Scrape the content (URL or PDF)
152
+ scraped_text = self.scrape_job_posting(url)
153
+ if not scraped_text:
154
+ return False, "Failed to scrape content. Please check the file path or URL."
155
+
156
+ # Enrich the data
157
+ job_data = self.enrich_job_data(scraped_text)
158
+
159
+ # Generate preview
160
+ preview = self.generate_preview(job_data)
161
+
162
+ # Cache the result
163
+ self.cache.set(cache_key, preview, expire=self.cache_timeout)
164
+
165
+ return True, preview
166
+
167
+
168
+ def main():
169
+ """Test the job posting analyzer with the JRD PDF."""
170
+ analyzer = JobPostingAnalyzer()
171
+
172
+ # Test with the JRD PDF file
173
+ pdf_path = "JRD_v1.1.pdf"
174
+
175
+ print("Testing Job Posting Analyzer with JRD PDF...")
176
+ print(f"File: {pdf_path}")
177
+ print("=" * 60)
178
+
179
+ success, result = analyzer.analyze_job_posting(pdf_path)
180
+
181
+ if success:
182
+ print("✅ Analysis successful!")
183
+ print("\nPreview:")
184
+ print(result)
185
+ else:
186
+ print(f"❌ Analysis failed: {result}")
187
+
188
+ print("\n" + "=" * 60)
189
+ print("Test completed!")
190
+
191
+
192
+ if __name__ == "__main__":
193
+ main()
tests/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Tests package
tests/test_async_latency.py ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pytest
2
+ import asyncio
3
+ import time
4
+ from unittest.mock import patch, MagicMock
5
+ from micro.bucket_enrich import BucketEnrichMicroFunction
6
+
7
+
8
+ class TestAsyncLatency:
9
+ """Test suite for verifying async parallel execution performance."""
10
+
11
+ def test_parallel_vs_serial_latency(self):
12
+ """Test that parallel execution is faster than serial execution."""
13
+
14
+ # Mock the individual enrichment functions to simulate network delays
15
+ with patch.object(BucketEnrichMicroFunction, 'stack_enrich') as mock_stack, \
16
+ patch.object(BucketEnrichMicroFunction, 'biz_enrich') as mock_biz, \
17
+ patch.object(BucketEnrichMicroFunction, 'comp_enrich') as mock_comp, \
18
+ patch.object(BucketEnrichMicroFunction, 'culture_enrich') as mock_culture, \
19
+ patch.object(BucketEnrichMicroFunction, 'manager_enrich') as mock_manager:
20
+
21
+ # Mock each function to sleep for 0.5 seconds (simulating network delay)
22
+ def mock_sleep_and_return(sleep_time: float, return_value: dict):
23
+ def side_effect(*args, **kwargs):
24
+ time.sleep(sleep_time)
25
+ return return_value
26
+ return side_effect
27
+
28
+ mock_stack.side_effect = mock_sleep_and_return(0.5, {"stack": "Python"})
29
+ mock_biz.side_effect = mock_sleep_and_return(0.5, {"news": "Growing"})
30
+ mock_comp.side_effect = mock_sleep_and_return(0.5, {"salary": "$120k"})
31
+ mock_culture.side_effect = mock_sleep_and_return(0.5, {"culture": "Remote"})
32
+ mock_manager.side_effect = mock_sleep_and_return(0.5, {"manager": "John"})
33
+
34
+ # Test data
35
+ test_data = {
36
+ "enriched": {
37
+ "company": "TestCorp",
38
+ "location": "San Francisco, CA"
39
+ },
40
+ "raw_input": "https://linkedin.com/jobs/test-job"
41
+ }
42
+
43
+ enrich_func = BucketEnrichMicroFunction()
44
+
45
+ # Measure parallel execution time
46
+ start_time = time.time()
47
+ result = enrich_func.run(test_data)
48
+ parallel_time = time.time() - start_time
49
+
50
+ # Verify result structure
51
+ assert "bucket_facts" in result
52
+ assert len(result["bucket_facts"]) > 0
53
+
54
+ # Expected serial time would be ~2.5 seconds (5 * 0.5)
55
+ # Parallel time should be ~0.5 seconds (max of parallel tasks)
56
+ expected_serial_time = 2.5
57
+ max_acceptable_parallel_time = 1.0 # Give some buffer for overhead
58
+
59
+ assert parallel_time < max_acceptable_parallel_time, \
60
+ f"Parallel execution took {parallel_time:.2f}s, expected < {max_acceptable_parallel_time}s"
61
+
62
+ # Verify it's significantly faster than serial would be
63
+ speedup = expected_serial_time / parallel_time
64
+ assert speedup > 2.0, \
65
+ f"Speedup of {speedup:.2f}x is less than expected minimum of 2.0x"
66
+
67
+ def test_async_gather_functionality(self):
68
+ """Test that asyncio.gather works correctly with our async wrapper functions."""
69
+
70
+ enrich_func = BucketEnrichMicroFunction()
71
+
72
+ # Mock the sync functions to return known values
73
+ with patch.object(enrich_func, 'stack_enrich', return_value={"stack": "Python"}), \
74
+ patch.object(enrich_func, 'biz_enrich', return_value={"news": "Growing"}), \
75
+ patch.object(enrich_func, 'comp_enrich', return_value={"salary": "$120k"}), \
76
+ patch.object(enrich_func, 'culture_enrich', return_value={"culture": "Remote"}):
77
+
78
+ # Test the async wrapper directly
79
+ result = asyncio.run(enrich_func._async_enrich_all(
80
+ company="TestCorp",
81
+ location="San Francisco, CA",
82
+ raw_input="normal job posting"
83
+ ))
84
+
85
+ # Verify all results are merged correctly
86
+ expected_facts = {
87
+ "stack": "Python",
88
+ "news": "Growing",
89
+ "salary": "$120k",
90
+ "culture": "Remote"
91
+ }
92
+
93
+ for key, value in expected_facts.items():
94
+ assert key in result
95
+ assert result[key] == value
96
+
97
+ def test_async_exception_handling(self):
98
+ """Test that exceptions in async tasks are handled gracefully."""
99
+
100
+ enrich_func = BucketEnrichMicroFunction()
101
+
102
+ # Mock some functions to raise exceptions
103
+ with patch.object(enrich_func, 'stack_enrich', side_effect=Exception("Network error")), \
104
+ patch.object(enrich_func, 'biz_enrich', return_value={"news": "Growing"}), \
105
+ patch.object(enrich_func, 'comp_enrich', return_value={"salary": "$120k"}), \
106
+ patch.object(enrich_func, 'culture_enrich', return_value={"culture": "Remote"}):
107
+
108
+ # Should not raise exception, but should handle it gracefully
109
+ result = asyncio.run(enrich_func._async_enrich_all(
110
+ company="TestCorp",
111
+ location="San Francisco, CA",
112
+ raw_input="normal job posting"
113
+ ))
114
+
115
+ # Should still get results from non-failing functions
116
+ assert "news" in result
117
+ assert "salary" in result
118
+ assert "culture" in result
119
+
120
+ # The failing function should not contribute to results
121
+ assert "stack" not in result
122
+
123
+ def test_parallel_execution_with_timeouts(self):
124
+ """Test that parallel execution respects timeout constraints."""
125
+
126
+ # Mock functions with different execution times
127
+ def create_timeout_mock(delay: float, return_value: dict):
128
+ def side_effect(*args, **kwargs):
129
+ time.sleep(delay)
130
+ return return_value
131
+ return side_effect
132
+
133
+ with patch.object(BucketEnrichMicroFunction, 'stack_enrich') as mock_stack, \
134
+ patch.object(BucketEnrichMicroFunction, 'biz_enrich') as mock_biz, \
135
+ patch.object(BucketEnrichMicroFunction, 'comp_enrich') as mock_comp, \
136
+ patch.object(BucketEnrichMicroFunction, 'culture_enrich') as mock_culture:
137
+
138
+ # Set up different delays
139
+ mock_stack.side_effect = create_timeout_mock(0.2, {"stack": "Python"})
140
+ mock_biz.side_effect = create_timeout_mock(0.3, {"news": "Growing"})
141
+ mock_comp.side_effect = create_timeout_mock(0.4, {"salary": "$120k"})
142
+ mock_culture.side_effect = create_timeout_mock(0.1, {"culture": "Remote"})
143
+
144
+ test_data = {
145
+ "enriched": {
146
+ "company": "TestCorp",
147
+ "location": "San Francisco, CA"
148
+ },
149
+ "raw_input": "normal job posting"
150
+ }
151
+
152
+ enrich_func = BucketEnrichMicroFunction()
153
+
154
+ start_time = time.time()
155
+ result = enrich_func.run(test_data)
156
+ total_time = time.time() - start_time
157
+
158
+ # Total time should be close to the longest task (0.4s) rather than sum (1.0s)
159
+ assert total_time < 0.7, f"Execution took {total_time:.2f}s, expected < 0.7s"
160
+ assert total_time > 0.3, f"Execution took {total_time:.2f}s, expected > 0.3s"
161
+
162
+ # Verify all results are present
163
+ facts = result["bucket_facts"]
164
+ assert "stack" in facts
165
+ assert "news" in facts
166
+ assert "salary" in facts
167
+ assert "culture" in facts